laravel ai streaming

Livewire vs SSE vs WebSockets: Choosing the Right Laravel AI Streaming Transport

The moment you ship your first AI feature, you hit a wall that traditional Laravel CRUD never prepares you for. Your controller calls an API, and instead of a clean response at 200ms, it trickles data back over 8, 12, sometimes 30 seconds. Users stare at a spinner. They refresh. They leave.

Laravel AI streaming is not optional polish. It is the difference between a feature that feels alive and one that feels broken. And the moment you commit to streaming, you are choosing between three fundamentally different transport mechanisms: Livewire, Server-Sent Events, and WebSockets. Each one maps to a different problem. Pick the wrong one and you will spend weeks undoing it — usually right before a deadline.

This guide goes beyond the surface comparison. We implement all three approaches with production-ready Laravel 11/12 code, walk through the edge cases that only appear under load, and give you a decision framework you can use on the next kickoff call.

Why AI Output Breaks Traditional Transport Assumptions

Most web apps are transactional. User does something, server responds, done. Even when the response is slow, the pattern holds: ask, wait, receive. HTTP was built for exactly this shape.

Large language models break it. They do not compute an answer and return it — they reason in pieces, emitting tokens as they go. A full response might be 400 tokens. At typical generation speeds, that is 8–15 seconds of incremental output before you have anything to show.

Showing this process to users is not cosmetic. Research consistently shows that perceived wait time drops significantly when users can see progress. A spinner at second twelve feels broken. A sentence building at second twelve feels like thinking. That psychological distinction translates directly to engagement and retention.

Once you accept that AI output is a process rather than a result, the transport mechanism becomes part of the UX — not an implementation detail you can defer.

Option 1: Livewire — State-First AI Streaming

Livewire is usually the first tool Laravel developers reach for, and that instinct is not wrong. The server owns state, the frontend reflects it, and updates happen automatically. For teams that live in Blade and PHP, it keeps the entire feature in one mental model.

For AI interfaces, the Livewire pattern typically works like this: the user submits a prompt, a queued job calls the AI API and writes tokens to the database incrementally, and the Livewire component polls that record to update the UI. You are not streaming tokens directly into the browser — you are streaming them into your database, and Livewire is reading from there.

This is a meaningful distinction. You are adding a database write loop to the critical path. For most applications, this is an acceptable trade-off. For high-concurrency or cost-sensitive systems, it can become a bottleneck — we will come back to that.

Implementation: Livewire Polling with Queued Streaming

First, the Eloquent model that holds the streaming state:

// database/migrations/xxxx_create_ai_conversations_table.php
Schema::create('ai_conversations', function (Blueprint $table) {
    $table->id();
    $table->foreignId('user_id')->constrained()->cascadeOnDelete();
    $table->text('prompt');
    $table->longText('partial_response')->nullable();
    $table->longText('response')->nullable();
    $table->string('status')->default('pending'); // pending|streaming|complete|failed
    $table->timestamps();
});

The Livewire component using Laravel 11’s #[Validate] attribute syntax:

<?php
// app/Livewire/AiChat.php

namespace App\Livewire;

use App\Jobs\ProcessAiStream;
use App\Models\AiConversation;
use Livewire\Attributes\Validate;
use Livewire\Component;

class AiChat extends Component
{
    #[Validate('required|string|max:2000')]
    public string $prompt = '';

    public string $partialResponse = '';
    public bool   $isStreaming     = false;
    public ?int   $conversationId  = null;

    public function submit(): void
    {
        $this->validate();

        $conversation = AiConversation::create([
            'user_id' => auth()->id(),
            'prompt'  => $this->prompt,
            'status'  => 'pending',
        ]);

        $this->conversationId  = $conversation->id;
        $this->isStreaming      = true;
        $this->partialResponse  = '';
        $this->prompt           = '';

        ProcessAiStream::dispatch($conversation);
    }

    public function poll(): void
    {
        if (! $this->conversationId || ! $this->isStreaming) {
            return;
        }

        $conversation = AiConversation::find($this->conversationId);

        if (! $conversation) {
            $this->isStreaming = false;
            return;
        }

        $this->partialResponse = $conversation->partial_response ?? '';

        if (in_array($conversation->status, ['complete', 'failed'], true)) {
            $this->isStreaming = false;
        }
    }

    public function render()
    {
        return view('livewire.ai-chat');
    }
}

The queued job that calls the API and writes tokens to the database:

<?php
// app/Jobs/ProcessAiStream.php

namespace App\Jobs;

use App\Models\AiConversation;
use Anthropic\Laravel\Facades\Anthropic;
use Illuminate\Bus\Queueable;
use Illuminate\Contracts\Queue\ShouldQueue;
use Illuminate\Foundation\Bus\Dispatchable;
use Illuminate\Queue\InteractsWithQueue;
use Illuminate\Queue\SerializesModels;
use Throwable;

class ProcessAiStream implements ShouldQueue
{
    use Dispatchable, InteractsWithQueue, Queueable, SerializesModels;

    public int $timeout = 120;
    public int $tries   = 1; // Do not retry AI streaming jobs automatically

    public function __construct(public AiConversation $conversation) {}

    public function handle(): void
    {
        try {
            $stream = Anthropic::messages()->createStreamed([
                'model'      => 'claude-sonnet-4-6',
                'max_tokens' => 2048,
                'messages'   => [
                    ['role' => 'user', 'content' => $this->conversation->prompt],
                ],
            ]);

            $buffer    = '';
            $charCount = 0;

            foreach ($stream as $event) {
                if ($event->type === 'content_block_delta') {
                    $token      = $event->delta->text ?? '';
                    $buffer    .= $token;
                    $charCount += strlen($token);

                    // Batch writes every ~100 characters to reduce DB pressure
                    if ($charCount >= 100) {
                        $this->conversation->update([
                            'partial_response' => $buffer,
                            'status'           => 'streaming',
                        ]);
                        $charCount = 0;
                    }
                }
            }

            $this->conversation->update([
                'partial_response' => $buffer,
                'response'         => $buffer,
                'status'           => 'complete',
            ]);
        } catch (Throwable $e) {
            $this->conversation->update(['status' => 'failed']);
            report($e);
        }
    }
}

The Blade template for the component:

blade

{{-- resources/views/livewire/ai-chat.blade.php --}}
<div>
    <form wire:submit="submit">
        <textarea
            wire:model="prompt"
            placeholder="Ask anything..."
            rows="4"
            @if($isStreaming) disabled @endif
        ></textarea>

        <button type="submit" wire:loading.attr="disabled" @if($isStreaming) disabled @endif>
            <span wire:loading wire:target="submit">Thinking...</span>
            <span wire:loading.remove wire:target="submit">Send</span>
        </button>
    </form>

    @if($isStreaming || $partialResponse)
        <div wire:poll.750ms="poll" class="ai-response">
            {{ $partialResponse }}
            @if($isStreaming)
                <span class="blinking-cursor" aria-hidden="true">▍</span>
            @endif
        </div>
    @endif
</div>

[Production Pitfall] The polling interval is the lever you will want to tune — and the one that will bite you at scale. At 500ms polling with 200 concurrent users, you are generating 400 database reads per second against the ai_conversations table before a single user has even sent a message. Add an index on (user_id, status), consider Redis-backed state for the partial response, and bump the poll interval to 750ms–1s unless your users are watching a typing test. Database polling is pragmatic. It is not free.

If you want to see the Livewire polling approach taken all the way through to a production Claude integration — conversation history, database-backed memory, and the full component lifecycle — the Laravel Livewire Claude API guide covers everything you would wire up before calling this architecture done.

Option 2: Server-Sent Events — Native Laravel AI Streaming

SSE is the transport most developers overlook and, in most AI scenarios, the one they should be using.

A single long-lived HTTP connection from server to client. The server pushes updates as plain-text events. The browser listens. No handshake protocol, no bidirectional overhead, no WebSocket server to run. The SSE spec is part of the HTML standard — every modern browser handles reconnection natively.

This simplicity is precisely why SSE maps so well to AI output. The model speaks, the UI listens. The data flows in one direction, at the pace the AI sets. There is no illusion of interactivity at the transport layer — just clean, progressive delivery.

For interfaces where watching text appear is itself valuable — writing assistants, exploratory chats, educational tools — SSE gives you authenticity that polling cannot match, without the infrastructure overhead of WebSockets.

See the Anthropic Streaming Messages documentation for the full event type reference we consume below.

The SSE Controller

<?php
// app/Http/Controllers/AiStreamController.php

namespace App\Http\Controllers;

use Anthropic\Laravel\Facades\Anthropic;
use Illuminate\Http\Request;
use Illuminate\Http\StreamedResponse;
use Throwable;

class AiStreamController extends Controller
{
    public function __invoke(Request $request): StreamedResponse
    {
        $request->validate([
            'prompt' => 'required|string|max:4000',
        ]);

        $prompt = $request->string('prompt')->toString();

        return response()->stream(
            callback: function () use ($prompt): void {
                $this->streamResponse($prompt);
            },
            status: 200,
            headers: [
                'Content-Type'      => 'text/event-stream',
                'Cache-Control'     => 'no-cache, no-store',
                'X-Accel-Buffering' => 'no',   // Critical for Nginx
                'Connection'        => 'keep-alive',
            ]
        );
    }

    private function streamResponse(string $prompt): void
    {
        try {
            $stream = Anthropic::messages()->createStreamed([
                'model'      => 'claude-sonnet-4-6',
                'max_tokens' => 2048,
                'messages'   => [
                    ['role' => 'user', 'content' => $prompt],
                ],
            ]);

            foreach ($stream as $event) {
                // Bail if the user closed the tab
                if (connection_aborted()) {
                    break;
                }

                if ($event->type === 'content_block_delta') {
                    $token = $event->delta->text ?? '';

                    if ($token !== '') {
                        $this->sendEvent('token', ['text' => $token]);
                    }
                }

                if ($event->type === 'message_stop') {
                    $this->sendEvent('done', ['finished' => true]);
                    break;
                }
            }
        } catch (Throwable $e) {
            report($e);

            $this->sendEvent('error', [
                'message' => 'The stream encountered an error. Please try again.',
            ]);
        }
    }

    private function sendEvent(string $event, array $data): void
    {
        echo "event: {$event}\n";
        echo 'data: ' . json_encode($data) . "\n\n";
        ob_flush();
        flush();
    }
}

Route registration and custom throttle — note the Laravel 11 bootstrap/app.php pattern, not Kernel.php:

// bootstrap/app.php
use Illuminate\Cache\RateLimiting\Limit;
use Illuminate\Http\Request;
use Illuminate\Support\Facades\RateLimiter;

return Application::configure(basePath: dirname(__DIR__))
    ->withRouting(
        web: __DIR__.'/../routes/web.php',
        commands: __DIR__.'/../routes/console.php',
        health: '/up',
    )
    ->withMiddleware(function (Middleware $middleware) {
        // register any middleware aliases here
    })
    ->booted(function () {
        RateLimiter::for('ai-stream', function (Request $request) {
            return Limit::perMinute(10)
                ->by($request->user()?->id ?: $request->ip())
                ->response(fn () => response()->json(
                    ['error' => 'Too many requests. Please wait before sending another prompt.'],
                    429
                ));
        });
    })
    ->create();
// routes/web.php
use App\Http\Controllers\AiStreamController;

Route::middleware(['auth', 'throttle:ai-stream'])
    ->get('/ai/stream', AiStreamController::class)
    ->name('ai.stream');

Client-Side EventSource

The browser’s native EventSource API handles SSE. No libraries, no build steps, no abstraction.

// resources/js/ai-stream.js

class AiStream {
    #source = null;
    #output = null;

    constructor(outputElement) {
        this.#output = outputElement;
    }

    start(prompt) {
        this.stop(); // Close any existing connection first
        this.#output.textContent = '';

        const url = new URL('/ai/stream', window.location.origin);
        url.searchParams.set('prompt', prompt);

        this.#source = new EventSource(url.toString());

        this.#source.addEventListener('token', (e) => {
            const { text } = JSON.parse(e.data);
            this.#output.textContent += text;
        });

        this.#source.addEventListener('done', () => {
            this.stop();
        });

        this.#source.addEventListener('error', (e) => {
            console.error('SSE error:', e);
            this.#output.textContent += '\n\n[Stream interrupted. Refresh and try again.]';
            this.stop();
        });
    }

    stop() {
        this.#source?.close();
        this.#source = null;
    }
}

// Usage
const output = document.getElementById('ai-output');
const stream = new AiStream(output);

document.getElementById('send-btn').addEventListener('click', () => {
    const prompt = document.getElementById('prompt-input').value.trim();
    if (prompt) stream.start(prompt);
});

Nginx Configuration for SSE

The single most common SSE failure in production has nothing to do with your Laravel code. It is your reverse proxy buffering the response. By default, Nginx buffers upstream responses — your perfectly streaming controller becomes a single flushed response that arrives when the model finishes.

# /etc/nginx/sites-available/your-app

location /ai/stream {
    proxy_pass           http://your-app-upstream;
    proxy_http_version   1.1;
    proxy_set_header     Connection '';
    proxy_buffering      off;
    proxy_cache          off;
    proxy_read_timeout   120s;
    chunked_transfer_encoding on;
}

The X-Accel-Buffering: no header in the controller is a secondary guard for Nginx — it instructs the proxy to disable buffering per-response. Both the Nginx config and the header are necessary. One without the other and you will waste an afternoon confirming your code works locally but not on Forge.

[Edge Case Alert] The native EventSource API reconnects automatically when the connection drops — using an exponential backoff that you do not control. If your stream ends with a done event and you do not call source.close() immediately, the browser will attempt to reconnect, hit your /ai/stream endpoint with the same query string, and start a second generation. Always close the source explicitly inside your done handler. For unauthenticated endpoints, this is also a free DDOS vector worth rate-limiting hard.

Option 3: WebSockets with Laravel Reverb

WebSockets are consistently oversold for AI interfaces. Most chat applications that look like they need bidirectional communication actually follow a speak-and-listen pattern — user sends one message, model responds, repeat. That is SSE territory.

WebSockets become the right answer when both sides need to speak unpredictably and concurrently: a user wants to interrupt the model mid-response, a shared session needs to broadcast output to multiple clients simultaneously, or a tool-calling workflow has the client triggering actions while the model is still reasoning. If your feature needs any of those behaviours, WebSockets are the correct choice — not because they are more powerful, but because they model the actual communication pattern accurately.

Laravel Reverb is the first-party WebSocket server for Laravel, and it replaces the operational overhead of running a separate Soketi or Pusher subscription. We use it here with a broadcasting event that the queue job fires per-token.

The Broadcasting Event

<?php
// app/Events/AiTokenReceived.php

namespace App\Events;

use Illuminate\Broadcasting\InteractsWithSockets;
use Illuminate\Broadcasting\PrivateChannel;
use Illuminate\Contracts\Broadcasting\ShouldBroadcast;
use Illuminate\Foundation\Events\Dispatchable;
use Illuminate\Queue\SerializesModels;

class AiTokenReceived implements ShouldBroadcast
{
    use Dispatchable, InteractsWithSockets, SerializesModels;

    public function __construct(
        public readonly int    $conversationId,
        public readonly string $token,
        public readonly bool   $done = false,
    ) {}

    public function broadcastOn(): array
    {
        return [
            new PrivateChannel("ai-response.{$this->conversationId}"),
        ];
    }

    public function broadcastAs(): string
    {
        return 'token.received';
    }

    public function broadcastWith(): array
    {
        return [
            'token' => $this->token,
            'done'  => $this->done,
        ];
    }
}

The queue job that streams and broadcasts per-token:

<?php
// app/Jobs/StreamWithBroadcast.php

namespace App\Jobs;

use App\Events\AiTokenReceived;
use App\Models\AiConversation;
use Anthropic\Laravel\Facades\Anthropic;
use Illuminate\Bus\Queueable;
use Illuminate\Contracts\Queue\ShouldQueue;
use Illuminate\Foundation\Bus\Dispatchable;
use Illuminate\Queue\InteractsWithQueue;
use Illuminate\Queue\SerializesModels;
use Throwable;

class StreamWithBroadcast implements ShouldQueue
{
    use Dispatchable, InteractsWithQueue, Queueable, SerializesModels;

    public int $timeout = 120;
    public int $tries   = 1;

    public function __construct(public AiConversation $conversation) {}

    public function handle(): void
    {
        $fullResponse = '';

        try {
            $stream = Anthropic::messages()->createStreamed([
                'model'      => 'claude-sonnet-4-6',
                'max_tokens' => 2048,
                'messages'   => [
                    ['role' => 'user', 'content' => $this->conversation->prompt],
                ],
            ]);

            foreach ($stream as $event) {
                if ($event->type === 'content_block_delta') {
                    $token         = $event->delta->text ?? '';
                    $fullResponse .= $token;

                    broadcast(new AiTokenReceived(
                        conversationId: $this->conversation->id,
                        token: $token,
                    ))->via('reverb');
                }
            }

            // Signal stream completion
            broadcast(new AiTokenReceived(
                conversationId: $this->conversation->id,
                token: '',
                done: true,
            ))->via('reverb');

            $this->conversation->update([
                'response' => $fullResponse,
                'status'   => 'complete',
            ]);
        } catch (Throwable $e) {
            report($e);
            $this->conversation->update(['status' => 'failed']);

            broadcast(new AiTokenReceived(
                conversationId: $this->conversation->id,
                token: '',
                done: true,
            ))->via('reverb');
        }
    }
}

Client-Side Echo Listener

// resources/js/app.js (with Laravel Echo + Reverb)
import Echo from 'laravel-echo';
import Pusher from 'pusher-js';

window.Pusher = Pusher;

const echo = new Echo({
    broadcaster: 'reverb',
    key: import.meta.env.VITE_REVERB_APP_KEY,
    wsHost: import.meta.env.VITE_REVERB_HOST,
    wsPort: import.meta.env.VITE_REVERB_PORT ?? 80,
    wssPort: import.meta.env.VITE_REVERB_PORT ?? 443,
    forceTLS: (import.meta.env.VITE_REVERB_SCHEME ?? 'https') === 'https',
    enabledTransports: ['ws', 'wss'],
});

function listenForTokens(conversationId, outputElement, onComplete) {
    const channel = echo.private(`ai-response.${conversationId}`);

    channel.listen('.token.received', ({ token, done }) => {
        if (done) {
            echo.leave(`ai-response.${conversationId}`);
            onComplete?.();
            return;
        }

        outputElement.textContent += token;
    });
}

[Architect’s Note] One token per broadcast event sounds intuitive, but it means a 300-token response fires 300 separate broadcasts through your queue and Reverb server. At load, this is a write amplification problem you need to account for. Consider batching tokens client-side with a small buffer (10–20ms flush interval) or token-count threshold, and grouping broadcast events accordingly. The UX is indistinguishable to the user, and the infrastructure load drops significantly. The Laravel Broadcasting documentation covers channel configuration and the full Reverb setup if you are starting from scratch.

The Decision Matrix

There is no single right answer here — but there are wrong ones. This matrix is opinionated by design.

ScenarioRecommended TransportWhy
Internal tool, summariser, one-shot promptLivewire + PollingSimplest implementation, easiest debugging, acceptable latency
Chat interface with progressive text revealSSEDirect token delivery, no infrastructure overhead
Writing assistant, document generationSSEUnidirectional, long-running, no interruption needed
Collaborative sessions, multiple usersWebSockets + ReverbBidirectional, multi-subscriber channels
User can interrupt mid-responseWebSockets + ReverbClient needs to speak while server is still streaming
Tool-calling agent with client triggersWebSockets + ReverbClient messages arrive unpredictably during model execution
Mobile app or native clientSSESimpler protocol, fewer connection management concerns

The honest meta-point: most AI interfaces only look conversational. At the transport layer, they follow a strict speak-then-listen sequence. If your user sends one message and waits for the model to finish before sending another, you do not need WebSockets. You need SSE and the operational simplicity that comes with it.

Production Considerations: Timeouts, Buffering, and Rate Limiting

All three transports fail in the same environment: long-running requests hitting infrastructure that assumes short ones.

PHP-FPM timeout: The default request_terminate_timeout in PHP-FPM is 0 (no timeout) on some distributions, and 60s on others. An AI response that takes 25 seconds will be killed mid-stream on any server where max_execution_time is set at 30. Set max_execution_time = 0 in your php.ini for SSE endpoints, or scope it to the specific location in your Nginx config.

Queue worker timeout: For the Livewire and WebSocket paths that run through queue jobs, the $timeout property on the job class must exceed your longest expected model response. Set it to 120 as a floor, and configure your Horizon or queue worker timeout in config/horizon.php or the php artisan queue:work command to match.

Redis for partial state: When you are writing partial responses to the database in the Livewire polling path, consider writing to Redis instead for the streaming duration and persisting to the database on completion. This removes the per-poll database read entirely and scales horizontally without effort. Use Cache::store('redis')->put("stream:{$conversationId}", $buffer, now()->addMinutes(5)) and read from the same key in the Livewire poll method.

Whichever transport you land on, token accounting needs to run at the middleware layer before a single prompt reaches your AI service — the Laravel AI Middleware: Token Tracking & Rate Limiting guide covers per-user tiered limits and cost logging in exactly the production pattern this architecture builds toward.

OpenAI Compatibility

All three implementations above use the Anthropic SDK. If you are targeting OpenAI with gpt-4o, the streaming loop changes slightly:

use OpenAI\Laravel\Facades\OpenAI;

$stream = OpenAI::chat()->createStreamed([
    'model'    => 'gpt-4o',
    'messages' => [
        ['role' => 'user', 'content' => $prompt],
    ],
]);

foreach ($stream as $response) {
    if (connection_aborted()) {
        break;
    }

    $token = $response->choices[0]->delta->content ?? '';

    if ($token !== '') {
        $this->sendEvent('token', ['text' => $token]);
    }

    if ($response->choices[0]->finishReason !== null) {
        $this->sendEvent('done', ['finished' => true]);
        break;
    }
}

The transport layer — SSE headers, WebSocket events, Livewire polling — is identical. Only the SDK streaming API surface differs.

Designing for Flow, Not Features

The strategic question is not which transport is most capable. It is which one fades into the background and lets users focus on what the AI helps them do.

Choose Livewire when state management is the hard problem and streaming latency is acceptable. Choose SSE when token-by-token delivery is the feature. Choose WebSockets when both sides of the conversation need to speak freely and unpredictably.

The engineering trap is optimising for impressiveness rather than fit. Streaming text users cannot meaningfully interact with is spectacle. Delaying output until everything is ready makes intelligent systems feel distant. Bidirectional infrastructure for a unidirectional product is maintenance debt dressed up as ambition.

Most production AI interfaces run on leaner foundations than you expect. Build to the actual communication pattern first. You can always add complexity — you rarely get to remove it cleanly.

[Word to the Wise] The architects who consistently ship stable AI features are not the ones who chose the most sophisticated transport. They are the ones who matched transport complexity to interface complexity and kept the other layers clean. Pick your transport, lock your model versions, build your monitoring, and get the feature in front of users. The engineering is rarely the bottleneck. The feedback loop is.

Subscribe
Notify of
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Navigation
Scroll to Top