Stream large AI model responses to the client in Laravel using response()->stream() and Inertia/Vue?

Question

850 viewsJanuary 24, 2026

3

archie January 24, 2026 0 Comments

I’m building a chat application in Laravel where I’m using the openai-php/laravel package to interact with the ChatGPT API. The AI responses can be quite large, and I want to stream them to the frontend in real-time for a better user experience.
I know Laravel supports streaming responses with return response()->stream(...), and there are packages for frontend integration like stream-react or stream-view.
The issue is: I’m struggling to properly implement the streaming logic within my Laravel controller and consume it effectively on the client side (using Vue 3 and Inertia.js).

How do I correctly set up the response()->stream() function to receive chunks from the OpenAI API and pass them to the client?
What is the best way to handle the connection and ensure all data is sent without the request timing out in Laravel?
Are there any specific headers I need to set for this to work with an Inertia/Vue frontend?

// Current (problematic) Controller Code 

public function generateResponse(Request $request) { 

    $prompt = $request->input('prompt'); 
    
    // This code waits for the full response, not ideal for streaming 
    
    $result = OpenAI::chat()->create([ 
        'model' => 'gpt-3.5-turbo', 
        'messages' => [ 
            ['role' => 'user', 'content' => $prompt] 
        ], 
        'stream' => true, // I want to use streaming but don't know how to implement
    ]); 

    // How to stream $result chunks to the client? 

    return response()->json(['response' => $result['choices'][0]['message']['content']]); 
}

Any guidance or code examples for the controller and frontend logic would be greatly appreciated!

Dewald Hugo Changed status to publish January 24, 2026

2 Answers

You are viewing 1 out of 2 answers, click here to view all answers.

score 1 · Answer 1 · 2026-01-24T18:52:10+00:00

You want to stream ChatGPT responses from Laravel to a Vue 3 + Inertia frontend using openai-php/laravel, instead of waiting for the full response.

Your current implementation waits for the entire response and returns JSON, which defeats streaming.

Important clarification (Inertia limitation)

Inertia does not support streamed responses.

Inertia expects a single JSON payload and buffers the response. Token-by-token streaming will not work through an Inertia visit.

Correct approach:

Use Inertia for page loads only
Expose a separate streaming endpoint
Consume it directly from the Vue component using fetch()

Trying to stream through an Inertia response will fail or silently buffer.

Correct way to stream OpenAI responses in Laravel

Use createStreamed() not create()

Setting stream => true does not automatically stream the response. You must use the streamed API method:

OpenAI::chat()->createStreamed([...]);

This returns an iterator that yields delta tokens, not full messages.

Laravel Controller Example

use Illuminate\Http\Request;
use OpenAI\Laravel\Facades\OpenAI;

public function generateResponse(Request $request)
{
    $prompt = $request->input('prompt');

    return response()->stream(function () use ($prompt) {

        // Disable output buffering
        while (ob_get_level() > 0) {
            ob_end_flush();
        }

        set_time_limit(0);

        $stream = OpenAI::chat()->createStreamed([
            'model' => 'gpt-4o-mini',
            'messages' => [
                ['role' => 'user', 'content' => $prompt],
            ],
        ]);

        foreach ($stream as $response) {
            $delta = $response->choices[0]->delta->content ?? null;

            if ($delta !== null) {
                echo $delta;
                flush();
            }
        }

    }, 200, [
        'Content-Type' => 'text/plain; charset=utf-8',
        'Cache-Control' => 'no-cache',
        'X-Accel-Buffering' => 'no', // Required for Nginx
    ]);
}

Preventing buffering and timeouts

PHP

Streaming keeps the request alive
Disable execution time limits:

set_time_limit(0);

Nginx (required)

proxy_buffering off; 
proxy_cache off;

Without this, streaming will not work even if PHP is correct.

Vue 3 Client (do NOT use Axios or Inertia)

Axios buffers responses. Use the Fetch API with streams:

async function streamChat(prompt) {
  const response = await fetch('/chat/stream', {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      'X-CSRF-TOKEN': document
        .querySelector('meta[name="csrf-token"]').content,
    },
    body: JSON.stringify({ prompt }),
  })

  const reader = response.body.getReader()
  const decoder = new TextDecoder()
  let output = ''

  while (true) {
    const { value, done } = await reader.read()
    if (done) break

    output += decoder.decode(value, { stream: true })
    chatOutput.value = output
  }
}

Required headers (server side)

Header	Reason
`Content-Type: text/plain`	Prevent JSON buffering
`X-Accel-Buffering: no`	Disable Nginx buffering
`Cache-Control: no-cache`	Avoid proxy caching
`Connection: keep-alive`	Long-lived stream

Summary

Inertia cannot handle streamed responses
Use a dedicated streaming endpoint
Use createStreamed() from openai-php
Disable PHP and Nginx buffering
Consume the stream with fetch() + ReadableStream
Avoid Axios and Inertia for streaming
gpt-3.5-turbo should be considered legacy

Recommended next steps

Switch to SSE (text/event-stream) for better browser semantics
Add abort/cancel support
Move to WebSockets (Laravel Reverb) for production chat apps

This is the minimal correct setup for real-time token streaming in Laravel.

I hope this helped, let me know if there’s anything that wasn’t clear 🙂

I have this working now using response()->stream() and createStreamed() and I’m consuming it on the frontend with fetch() and getReader(), so the tokens are coming through fine.

One thing I don’t fully understand from your answer is what actually happens when the user stops reading or navigates away. For example, if the user clicks a “stop” button or closes the page while the stream is still running, does Laravel automatically stop the loop, or will it keep generating tokens from OpenAI until it finishes?

Right now I’m just looping over the streamed response and echoing chunks. I’m not sure how (or if) I should detect that the client disconnected and break out of the loop, or whether PHP/FPM or Nginx will handle that for me.

Do I need to explicitly check for a disconnected client inside the streaming loop, and if so what’s the correct way to do that in Laravel? Or is this something that only becomes an issue when moving to SSE or WebSockets?