I’m building a chat application in Laravel where I’m using the openai-php/laravel package to interact with the ChatGPT API. The AI responses can be quite large, and I want to stream them to the frontend in real-time for a better user experience.
I know Laravel supports streaming responses with return response()->stream(...), and there are packages for frontend integration like stream-react or stream-view.
The issue is: I’m struggling to properly implement the streaming logic within my Laravel controller and consume it effectively on the client side (using Vue 3 and Inertia.js).
- How do I correctly set up the
response()->stream()function to receive chunks from the OpenAI API and pass them to the client? - What is the best way to handle the connection and ensure all data is sent without the request timing out in Laravel?
- Are there any specific headers I need to set for this to work with an Inertia/Vue frontend?
// Current (problematic) Controller Code
public function generateResponse(Request $request) {
$prompt = $request->input('prompt');
// This code waits for the full response, not ideal for streaming
$result = OpenAI::chat()->create([
'model' => 'gpt-3.5-turbo',
'messages' => [
['role' => 'user', 'content' => $prompt]
],
'stream' => true, // I want to use streaming but don't know how to implement
]);
// How to stream $result chunks to the client?
return response()->json(['response' => $result['choices'][0]['message']['content']]);
}
Any guidance or code examples for the controller and frontend logic would be greatly appreciated!
You want to stream ChatGPT responses from Laravel to a Vue 3 + Inertia frontend using openai-php/laravel, instead of waiting for the full response.
Your current implementation waits for the entire response and returns JSON, which defeats streaming.
Important clarification (Inertia limitation)
Inertia does not support streamed responses.
Inertia expects a single JSON payload and buffers the response. Token-by-token streaming will not work through an Inertia visit.
Correct approach:
- Use Inertia for page loads only
- Expose a separate streaming endpoint
- Consume it directly from the Vue component using
fetch()
Trying to stream through an Inertia response will fail or silently buffer.
Correct way to stream OpenAI responses in Laravel
Use createStreamed() not create()
Setting stream => true does not automatically stream the response. You must use the streamed API method:
OpenAI::chat()->createStreamed([...]);
This returns an iterator that yields delta tokens, not full messages.
Laravel Controller Example
use Illuminate\Http\Request;
use OpenAI\Laravel\Facades\OpenAI;
public function generateResponse(Request $request)
{
$prompt = $request->input('prompt');
return response()->stream(function () use ($prompt) {
// Disable output buffering
while (ob_get_level() > 0) {
ob_end_flush();
}
set_time_limit(0);
$stream = OpenAI::chat()->createStreamed([
'model' => 'gpt-4o-mini',
'messages' => [
['role' => 'user', 'content' => $prompt],
],
]);
foreach ($stream as $response) {
$delta = $response->choices[0]->delta->content ?? null;
if ($delta !== null) {
echo $delta;
flush();
}
}
}, 200, [
'Content-Type' => 'text/plain; charset=utf-8',
'Cache-Control' => 'no-cache',
'X-Accel-Buffering' => 'no', // Required for Nginx
]);
}
Preventing buffering and timeouts
PHP
- Streaming keeps the request alive
- Disable execution time limits:
set_time_limit(0);
Nginx (required)
proxy_buffering off;
proxy_cache off;
Without this, streaming will not work even if PHP is correct.
Vue 3 Client (do NOT use Axios or Inertia)
Axios buffers responses. Use the Fetch API with streams:
async function streamChat(prompt) {
const response = await fetch('/chat/stream', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'X-CSRF-TOKEN': document
.querySelector('meta[name="csrf-token"]').content,
},
body: JSON.stringify({ prompt }),
})
const reader = response.body.getReader()
const decoder = new TextDecoder()
let output = ''
while (true) {
const { value, done } = await reader.read()
if (done) break
output += decoder.decode(value, { stream: true })
chatOutput.value = output
}
}
Required headers (server side)
| Header | Reason |
|---|---|
Content-Type: text/plain |
Prevent JSON buffering |
X-Accel-Buffering: no |
Disable Nginx buffering |
Cache-Control: no-cache |
Avoid proxy caching |
Connection: keep-alive |
Long-lived stream |
Summary
- Inertia cannot handle streamed responses
- Use a dedicated streaming endpoint
- Use
createStreamed()fromopenai-php - Disable PHP and Nginx buffering
- Consume the stream with
fetch()+ReadableStream - Avoid Axios and Inertia for streaming
gpt-3.5-turboshould be considered legacy
Recommended next steps
- Switch to SSE (
text/event-stream) for better browser semantics - Add abort/cancel support
- Move to WebSockets (Laravel Reverb) for production chat apps
This is the minimal correct setup for real-time token streaming in Laravel.
I hope this helped, let me know if there’s anything that wasn’t clear 🙂

I have this working now using response()->stream() and createStreamed() and I’m consuming it on the frontend with fetch() and getReader(), so the tokens are coming through fine.
One thing I don’t fully understand from your answer is what actually happens when the user stops reading or navigates away. For example, if the user clicks a “stop” button or closes the page while the stream is still running, does Laravel automatically stop the loop, or will it keep generating tokens from OpenAI until it finishes?
Right now I’m just looping over the streamed response and echoing chunks. I’m not sure how (or if) I should detect that the client disconnected and break out of the loop, or whether PHP/FPM or Nginx will handle that for me.
Do I need to explicitly check for a disconnected client inside the streaming loop, and if so what’s the correct way to do that in Laravel? Or is this something that only becomes an issue when moving to SSE or WebSockets?