Stream large AI model responses to the client in Laravel using response()->stream() and Inertia/Vue?

Question

828 viewsJanuary 24, 2026

3

archie January 24, 2026 0 Comments

I’m building a chat application in Laravel where I’m using the openai-php/laravel package to interact with the ChatGPT API. The AI responses can be quite large, and I want to stream them to the frontend in real-time for a better user experience.
I know Laravel supports streaming responses with return response()->stream(...), and there are packages for frontend integration like stream-react or stream-view.
The issue is: I’m struggling to properly implement the streaming logic within my Laravel controller and consume it effectively on the client side (using Vue 3 and Inertia.js).

How do I correctly set up the response()->stream() function to receive chunks from the OpenAI API and pass them to the client?
What is the best way to handle the connection and ensure all data is sent without the request timing out in Laravel?
Are there any specific headers I need to set for this to work with an Inertia/Vue frontend?

// Current (problematic) Controller Code 

public function generateResponse(Request $request) { 

    $prompt = $request->input('prompt'); 
    
    // This code waits for the full response, not ideal for streaming 
    
    $result = OpenAI::chat()->create([ 
        'model' => 'gpt-3.5-turbo', 
        'messages' => [ 
            ['role' => 'user', 'content' => $prompt] 
        ], 
        'stream' => true, // I want to use streaming but don't know how to implement
    ]); 

    // How to stream $result chunks to the client? 

    return response()->json(['response' => $result['choices'][0]['message']['content']]); 
}

Any guidance or code examples for the controller and frontend logic would be greatly appreciated!

Dewald Hugo Changed status to publish January 24, 2026

2 Answers

score 1 · Answer 1 · 2026-01-24T18:52:10+00:00

You want to stream ChatGPT responses from Laravel to a Vue 3 + Inertia frontend using openai-php/laravel, instead of waiting for the full response.

Your current implementation waits for the entire response and returns JSON, which defeats streaming.

Important clarification (Inertia limitation)

Inertia does not support streamed responses.

Inertia expects a single JSON payload and buffers the response. Token-by-token streaming will not work through an Inertia visit.

Correct approach:

Use Inertia for page loads only
Expose a separate streaming endpoint
Consume it directly from the Vue component using fetch()

Trying to stream through an Inertia response will fail or silently buffer.

Correct way to stream OpenAI responses in Laravel

Use createStreamed() not create()

Setting stream => true does not automatically stream the response. You must use the streamed API method:

OpenAI::chat()->createStreamed([...]);

This returns an iterator that yields delta tokens, not full messages.

Laravel Controller Example

use Illuminate\Http\Request;
use OpenAI\Laravel\Facades\OpenAI;

public function generateResponse(Request $request)
{
    $prompt = $request->input('prompt');

    return response()->stream(function () use ($prompt) {

        // Disable output buffering
        while (ob_get_level() > 0) {
            ob_end_flush();
        }

        set_time_limit(0);

        $stream = OpenAI::chat()->createStreamed([
            'model' => 'gpt-4o-mini',
            'messages' => [
                ['role' => 'user', 'content' => $prompt],
            ],
        ]);

        foreach ($stream as $response) {
            $delta = $response->choices[0]->delta->content ?? null;

            if ($delta !== null) {
                echo $delta;
                flush();
            }
        }

    }, 200, [
        'Content-Type' => 'text/plain; charset=utf-8',
        'Cache-Control' => 'no-cache',
        'X-Accel-Buffering' => 'no', // Required for Nginx
    ]);
}

Preventing buffering and timeouts

PHP

Streaming keeps the request alive
Disable execution time limits:

set_time_limit(0);

Nginx (required)

proxy_buffering off; 
proxy_cache off;

Without this, streaming will not work even if PHP is correct.

Vue 3 Client (do NOT use Axios or Inertia)

Axios buffers responses. Use the Fetch API with streams:

async function streamChat(prompt) {
  const response = await fetch('/chat/stream', {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      'X-CSRF-TOKEN': document
        .querySelector('meta[name="csrf-token"]').content,
    },
    body: JSON.stringify({ prompt }),
  })

  const reader = response.body.getReader()
  const decoder = new TextDecoder()
  let output = ''

  while (true) {
    const { value, done } = await reader.read()
    if (done) break

    output += decoder.decode(value, { stream: true })
    chatOutput.value = output
  }
}

Required headers (server side)

Header	Reason
`Content-Type: text/plain`	Prevent JSON buffering
`X-Accel-Buffering: no`	Disable Nginx buffering
`Cache-Control: no-cache`	Avoid proxy caching
`Connection: keep-alive`	Long-lived stream

Summary

Inertia cannot handle streamed responses
Use a dedicated streaming endpoint
Use createStreamed() from openai-php
Disable PHP and Nginx buffering
Consume the stream with fetch() + ReadableStream
Avoid Axios and Inertia for streaming
gpt-3.5-turbo should be considered legacy

Recommended next steps

Switch to SSE (text/event-stream) for better browser semantics
Add abort/cancel support
Move to WebSockets (Laravel Reverb) for production chat apps

This is the minimal correct setup for real-time token streaming in Laravel.

I hope this helped, let me know if there’s anything that wasn’t clear 🙂

I have this working now using response()->stream() and createStreamed() and I’m consuming it on the frontend with fetch() and getReader(), so the tokens are coming through fine.

One thing I don’t fully understand from your answer is what actually happens when the user stops reading or navigates away. For example, if the user clicks a “stop” button or closes the page while the stream is still running, does Laravel automatically stop the loop, or will it keep generating tokens from OpenAI until it finishes?

Right now I’m just looping over the streamed response and echoing chunks. I’m not sure how (or if) I should detect that the client disconnected and break out of the loop, or whether PHP/FPM or Nginx will handle that for me.

Do I need to explicitly check for a disconnected client inside the streaming loop, and if so what’s the correct way to do that in Laravel? Or is this something that only becomes an issue when moving to SSE or WebSockets?

score 1 · Answer 2 · 2026-01-24T21:14:21+00:00

This is a very good question.

Short version: no, Laravel does not automatically stop your streaming loop, and yes, you should explicitly handle client disconnects if you care about cost, resource usage, or correctness.

Below is the practical, PHP-level explanation.

What actually happens when the client disconnects

When you stream a response with response()->stream():

PHP keeps executing your callback until it finishes
OpenAI will continue streaming tokens
Your loop will keep running unless you stop it
The fact that the browser closed the page does not magically stop PHP

Laravel itself does not manage this for you. Once the request is handed off to PHP-FPM, it’s your responsibility.

In other words:
If you do nothing, you will keep paying for tokens no one sees.

How to detect a disconnected client in PHP

PHP provides a built-in way to detect this:

the browser navigates away
the tab is closed
the fetch request is aborted

However, you must explicitly check it.

Correct way to handle this in your streaming loop

Update the loop like this:

You stop reading from OpenAI
The loop exits immediately
The PHP request terminates cleanly

This is the single most important missing piece in most examples.

Do PHP-FPM or Nginx handle this for you?

No.

Nginx will close the client socket
PHP-FPM will keep executing until your script exits
Laravel does not wrap or intercept the streaming loop

Unless you break the loop, OpenAI keeps streaming.

What about `ignore_user_abort()`?

By default, PHP behaves as if:

You still need:

Frontend: how this ties back to `fetch()`

When you cancel the request on the client:

TCP connection close
connection_aborted() → true in PHP
Your loop exits (if you check it)

Without the server-side check, aborting on the client does nothing useful.

Why this matters more with OpenAI streaming

OpenAI streaming is pull-based:

Your loop keeps pulling tokens
There is no automatic “stop” signal
The SDK won’t stop unless you stop iterating

So client disconnect handling is mandatory in production.

SSE / WebSockets note

SSE: same issue, same solution (connection_aborted())
WebSockets: easier, because disconnect events are explicit
If you expect frequent cancels, WebSockets are cleaner

But even SSE does not solve this automatically.

Bottom line

Laravel does not stop streamed responses automatically
PHP continues executing after client disconnect
You must check connection_aborted() inside the loop
Without this, you waste tokens and server resources
This is not optional for production systems