I’m building a chat application in Laravel where I’m using the openai-php/laravel package to interact with the ChatGPT API. The AI responses can be quite large, and I want to stream them to the frontend in real-time for a better user experience.
I know Laravel supports streaming responses with return response()->stream(...), and there are packages for frontend integration like stream-react or stream-view.
The issue is: I’m struggling to properly implement the streaming logic within my Laravel controller and consume it effectively on the client side (using Vue 3 and Inertia.js).
- How do I correctly set up the
response()->stream()function to receive chunks from the OpenAI API and pass them to the client? - What is the best way to handle the connection and ensure all data is sent without the request timing out in Laravel?
- Are there any specific headers I need to set for this to work with an Inertia/Vue frontend?
// Current (problematic) Controller Code
public function generateResponse(Request $request) {
$prompt = $request->input('prompt');
// This code waits for the full response, not ideal for streaming
$result = OpenAI::chat()->create([
'model' => 'gpt-3.5-turbo',
'messages' => [
['role' => 'user', 'content' => $prompt]
],
'stream' => true, // I want to use streaming but don't know how to implement
]);
// How to stream $result chunks to the client?
return response()->json(['response' => $result['choices'][0]['message']['content']]);
}
Any guidance or code examples for the controller and frontend logic would be greatly appreciated!
You want to stream ChatGPT responses from Laravel to a Vue 3 + Inertia frontend using openai-php/laravel, instead of waiting for the full response.
Your current implementation waits for the entire response and returns JSON, which defeats streaming.
Important clarification (Inertia limitation)
Inertia does not support streamed responses.
Inertia expects a single JSON payload and buffers the response. Token-by-token streaming will not work through an Inertia visit.
Correct approach:
- Use Inertia for page loads only
- Expose a separate streaming endpoint
- Consume it directly from the Vue component using
fetch()
Trying to stream through an Inertia response will fail or silently buffer.
Correct way to stream OpenAI responses in Laravel
Use createStreamed() not create()
Setting stream => true does not automatically stream the response. You must use the streamed API method:
OpenAI::chat()->createStreamed([...]);
This returns an iterator that yields delta tokens, not full messages.
Laravel Controller Example
use Illuminate\Http\Request;
use OpenAI\Laravel\Facades\OpenAI;
public function generateResponse(Request $request)
{
$prompt = $request->input('prompt');
return response()->stream(function () use ($prompt) {
// Disable output buffering
while (ob_get_level() > 0) {
ob_end_flush();
}
set_time_limit(0);
$stream = OpenAI::chat()->createStreamed([
'model' => 'gpt-4o-mini',
'messages' => [
['role' => 'user', 'content' => $prompt],
],
]);
foreach ($stream as $response) {
$delta = $response->choices[0]->delta->content ?? null;
if ($delta !== null) {
echo $delta;
flush();
}
}
}, 200, [
'Content-Type' => 'text/plain; charset=utf-8',
'Cache-Control' => 'no-cache',
'X-Accel-Buffering' => 'no', // Required for Nginx
]);
}
Preventing buffering and timeouts
PHP
- Streaming keeps the request alive
- Disable execution time limits:
set_time_limit(0);
Nginx (required)
proxy_buffering off;
proxy_cache off;
Without this, streaming will not work even if PHP is correct.
Vue 3 Client (do NOT use Axios or Inertia)
Axios buffers responses. Use the Fetch API with streams:
async function streamChat(prompt) {
const response = await fetch('/chat/stream', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'X-CSRF-TOKEN': document
.querySelector('meta[name="csrf-token"]').content,
},
body: JSON.stringify({ prompt }),
})
const reader = response.body.getReader()
const decoder = new TextDecoder()
let output = ''
while (true) {
const { value, done } = await reader.read()
if (done) break
output += decoder.decode(value, { stream: true })
chatOutput.value = output
}
}
Required headers (server side)
| Header | Reason |
|---|---|
Content-Type: text/plain |
Prevent JSON buffering |
X-Accel-Buffering: no |
Disable Nginx buffering |
Cache-Control: no-cache |
Avoid proxy caching |
Connection: keep-alive |
Long-lived stream |
Summary
- Inertia cannot handle streamed responses
- Use a dedicated streaming endpoint
- Use
createStreamed()fromopenai-php - Disable PHP and Nginx buffering
- Consume the stream with
fetch()+ReadableStream - Avoid Axios and Inertia for streaming
gpt-3.5-turboshould be considered legacy
Recommended next steps
- Switch to SSE (
text/event-stream) for better browser semantics - Add abort/cancel support
- Move to WebSockets (Laravel Reverb) for production chat apps
This is the minimal correct setup for real-time token streaming in Laravel.
I hope this helped, let me know if there’s anything that wasn’t clear 🙂
This is a very good question.
Short version: no, Laravel does not automatically stop your streaming loop, and yes, you should explicitly handle client disconnects if you care about cost, resource usage, or correctness.
Below is the practical, PHP-level explanation.
What actually happens when the client disconnects
When you stream a response with response()->stream():
-
PHP keeps executing your callback until it finishes
-
OpenAI will continue streaming tokens
-
Your loop will keep running unless you stop it
-
The fact that the browser closed the page does not magically stop PHP
Laravel itself does not manage this for you. Once the request is handed off to PHP-FPM, it’s your responsibility.
In other words:
If you do nothing, you will keep paying for tokens no one sees.
How to detect a disconnected client in PHP
PHP provides a built-in way to detect this:
-
the browser navigates away
-
the tab is closed
-
the fetch request is aborted
However, you must explicitly check it.
Correct way to handle this in your streaming loop
Update the loop like this:
-
You stop reading from OpenAI
-
The loop exits immediately
-
The PHP request terminates cleanly
This is the single most important missing piece in most examples.
Do PHP-FPM or Nginx handle this for you?
No.
-
Nginx will close the client socket
-
PHP-FPM will keep executing until your script exits
-
Laravel does not wrap or intercept the streaming loop
Unless you break the loop, OpenAI keeps streaming.
What about ignore_user_abort()?
By default, PHP behaves as if:
You still need:
Frontend: how this ties back to fetch()
When you cancel the request on the client:
-
TCP connection close
-
connection_aborted()→truein PHP -
Your loop exits (if you check it)
Without the server-side check, aborting on the client does nothing useful.
Why this matters more with OpenAI streaming
OpenAI streaming is pull-based:
-
Your loop keeps pulling tokens
-
There is no automatic “stop” signal
-
The SDK won’t stop unless you stop iterating
So client disconnect handling is mandatory in production.
SSE / WebSockets note
-
SSE: same issue, same solution (
connection_aborted()) -
WebSockets: easier, because disconnect events are explicit
-
If you expect frequent cancels, WebSockets are cleaner
But even SSE does not solve this automatically.
Bottom line
-
Laravel does not stop streamed responses automatically
-
PHP continues executing after client disconnect
-
You must check
connection_aborted()inside the loop -
Without this, you waste tokens and server resources
-
This is not optional for production systems
Thanks!

I have this working now using response()->stream() and createStreamed() and I’m consuming it on the frontend with fetch() and getReader(), so the tokens are coming through fine.
One thing I don’t fully understand from your answer is what actually happens when the user stops reading or navigates away. For example, if the user clicks a “stop” button or closes the page while the stream is still running, does Laravel automatically stop the loop, or will it keep generating tokens from OpenAI until it finishes?
Right now I’m just looping over the streamed response and echoing chunks. I’m not sure how (or if) I should detect that the client disconnected and break out of the loop, or whether PHP/FPM or Nginx will handle that for me.
Do I need to explicitly check for a disconnected client inside the streaming loop, and if so what’s the correct way to do that in Laravel? Or is this something that only becomes an issue when moving to SSE or WebSockets?