Laravel Queue + OpenAI API: “Request Timeout” on long-running LLM jobs

Question

42 views12 hours ago

2

Werner Brink 18 hours ago 0 Comments

I’m trying to integrate the OpenAI PHP SDK into my Laravel 11 project to generate long-form SEO blog posts.

The integration works fine for short snippets, but when I send a prompt for a 2,000-word article, the request just hangs and eventually throws a 408 Request Timeout from my Nginx proxy. I know I shouldn’t run this directly in the controller, so I moved it to a Laravel Queue Job, but now I’m hitting a new wall.

My Problem:
Even inside the queue, the job is being marked as “failed” after 60 seconds because of the default retry_after setting. OpenAI sometimes takes 90+ seconds to stream a long response, and the worker thinks the job died and tries to restart it, causing a loop of half-finished API calls.

What I’ve tried:
Increased max_execution_time in php.ini (didn’t help the worker).
Tried using openai-php/client directly instead of the Laravel wrapper.
Set $timeout = 120 in my GenerateArticle job class, but the worker still kills it.

public function handle(): void
{
    // This part takes forever...
    $response = OpenAI::chat()->create([
        'model' => 'gpt-4-turbo',
        'messages' => [['role' => 'user', 'content' => $this->prompt]],
    ]);

    $this->post->update(['content' => $response->choices[0]->message->content]);
}

How do I properly handle these long-running AI tasks without the worker timing out or me hitting a 504 on the frontend? Should I be using Laravel AI SDK streaming or Laravel Reverb to push the content back to the UI piece-by-piece?

Werner Brink Answered question 12 hours ago

3 Answers

score 2 · Answer 1 · 2026-04-26T21:55:43+00:00

You’re hitting a classic mismatch between how Laravel handles queues and how LLMs behave in the real world. By 2026 standards, GPT-4-Turbo is actually considered “slow” for 2,000-word outputs, so your architectural approach is more important than just bumping a timeout number.

Here is how you fix the loop and handle the UI.

1. The “Worker vs. Connection” Timeout Fix
The reason your $timeout = 120 isn’t working is likely because your Queue Connection retry_after setting in config/queue.php is lower than your Job’s $timeout.

If retry_after is 60 seconds and your job takes 90, the queue manager thinks the worker died and releases the job back to another worker while the first one is still running.

In your Job class, set these explicitly:

public $timeout = 300; // 5 minutes for safety
public $failOnTimeout = true;

public function handle(): void
{
    // Important: Tell the OpenAI client itself not to time out!
    $response = OpenAI::client(config('openai.api_key'))
        ->withHttpHeader('X-My-Header', 'Value')
        ->chat()
        ->create([
            'model' => 'gpt-4-turbo',
            'messages' => [['role' => 'user', 'content' => $this->prompt]],
            'timeout' => 300, // Pass the timeout to the Guzzle/HTTP client
        ]);

    $this->post->update(['content' => $response->choices[0]->message->content]);
}

2. Update your Worker Command
When you run your worker, you need to make sure the global timeout flag allows for these long runs:

php artisan queue:work --timeout=305

3. The “2026 Way”: Streaming + Reverb
Waiting 90 seconds for a page to refresh is a terrible UX. Since you’re on Laravel 11/12+, you should absolutely be using Laravel Reverb (WebSockets) to stream the text to the UI as it’s generated.

Instead of a single $response->create(), use ->stream():

– Job: Dispatches a PartialContentGenerated event every time a chunk of text comes in.

– Event: Broadcasts via Reverb to a private channel for that specific Post ID.

– Frontend: A simple Livewire or Vue component listens for the broadcast and appends the text to the screen in real-time.

Example Job Logic:

$stream = OpenAI::chat()->createStreamed([
    'model' => 'gpt-4-turbo',
    'messages' => [['role' => 'user', 'content' => $this->prompt]],
]);

foreach ($stream as $response) {
    $text = $response->choices[0]->delta->content;
    if ($text) {
        // Broadcast this chunk via Reverb
        PartialContentGenerated::dispatch($this->post, $text);
    }
}

Summary Checklist:
config/queue.php: Set retry_after to at least 360 (must be higher than your job timeout).

Job Class: Set public $timeout = 300.

OpenAI Client: Pass the timeout parameter into the create() method.

UI: Use Reverb. Even if the job takes 2 minutes, the user sees progress immediately, which prevents them from clicking “refresh” and triggering even more API calls.

Have you checked your Nginx proxy_read_timeout? If you eventually move back to a synchronous approach for any reason, that’s usually the culprit for the 408/504 errors.

How are you handling the database connection during that long 90-second wait? If you have a low max_connections, long-running jobs can sometimes hog them.

score 2 · Answer 2 · 2026-04-26T22:07:54+00:00

The first answer is correct regarding the retry_after vs. $timeout mismatch, but there is a deeper architectural “gotcha” here that will cause your workers to crash even if you bump those numbers to 10 minutes.

The “Zombie Connection” Problem
When you run a synchronous OpenAI request that takes 90+ seconds, your PHP worker is sitting idle waiting for an I/O response. In many default configurations:

1. The Database Connection times out: If your wait_timeout in MySQL is low, the worker will lose its connection while waiting for OpenAI. When it finally gets the response and tries to $post->update(), you’ll get a General error: 2006 MySQL server has gone away.

2. The “Zombie” Job: If the worker is killed by an external process (like an OOM killer or a deployment script), and your retry_after is set too high, that job stays in “reserved” limbo and won’t be retried for a very long time.

The “Best Practice” Architecture for 2026
Stop trying to make a single Job wait for the whole article. Instead, use a Job Chain or Batch approach:

1. Dispatcher Job: This job calls OpenAI, but you should use Streaming (as mentioned in the first answer).

2. The “Chunk” Pattern: Instead of one massive prompt, break the SEO article into sections (Intro, Body, Conclusion).

– Dispatch a Batch of jobs.

– Each job handles one section.

– This keeps each job under 15–20 seconds, well within the “safe” zone for default Laravel workers.

3. The “Status” Table: Instead of hoping the job finishes, create a generation_tasks table. The UI should poll this (or use Reverb) to show a progress bar (e.g., “Intro generated…”, “Images being researched…”).

A Note on the Laravel AI SDK
If you are already on Laravel 11/12, you should be using the first-party laravel/ai SDK (or openai-php/laravel). It handles the streaming interface much more cleanly than raw Guzzle calls:

// In your Job
public function handle(): void
{
    // The SDK handles the timeout and streaming overhead
    $result = AI::withTimeout(300)->stream('gpt-4-turbo', $this->prompt);

    foreach ($result as $chunk) {
        // Update the cache or broadcast via Reverb
        Cache::put("post_{$this->id}_content", $chunk, 600);
        PostContentUpdated::dispatch($this->post, $chunk);
    }
}

TL;DR: Bumping timeouts is a “band-aid.” For 2,000-word articles, stream the response and decouple your UI from the Job completion via Reverb.

score 1 · Answer 3 · 2026-04-26T22:17:50+00:00

Update: Solved

Thanks to @conradm and @admin for the help. It was a combination of two things: the hidden queue timeout mismatch and the way the OpenAI client handles its own internal Guzzle requests.

The “looping” job was definitely caused by retry_after in my config/queue.php being lower than my job timeout. The worker was timing out, the job was released, and then another worker picked it up while the first was still actually talking to OpenAI.

What worked for me:

1. Updated Queue Config: I bumped retry_after to 400 seconds (way higher than the actual generation time) so the queue manager gives the worker enough room.

2. Switched to Streaming: I abandoned the synchronous create() call. It’s just not viable for 2,000 words. I’m now using the laravel/ai SDK to stream chunks via Laravel Reverb.

3. Client-Side Timeout: I had to explicitly pass the timeout to the underlying HTTP client.

Final working Job logic:

public $timeout = 350;

public function handle(): void
{
    // Use the AI SDK to stream the response
    $stream = AI::withTimeout(300)->stream('gpt-4-turbo', $this->prompt);

    $fullContent = "";

    foreach ($stream as $chunk) {
        $fullContent .= $chunk;

        // Push to Reverb so the user sees progress
        ArticleProgressUpdated::dispatch($this->post, $chunk);
    }

    $this->post->update([
        'content' => $fullContent,
        'status' => 'completed'
    ]);
}

Also, even if you fix the PHP timeouts, check your Nginx/Apache config. I had to bump proxy_read_timeout to 300 as well, or the frontend would still throw a 504 while waiting for the initial stream headers.

The UI feels 100x faster now because the text starts appearing in 2 seconds instead of the user staring at a spinner for a minute and a half.