laravel ai streaming ux

Laravel AI Streaming UX: Typing Indicators, Thought States, and Stream Cancellation

The Gap Between “Working” and “Good”

A streaming AI response that technically works can still feel broken. You know the gap if you have shipped an AI feature in Laravel recently. The backend streams tokens correctly, the tests pass, and then a non-technical stakeholder opens the interface and asks why it looks frozen before anything appears.

Laravel AI streaming UX is not a backend problem. The transport choice is settled territory. Our Module 3 overview of real-time AI delivery patterns covers the available options in context. Choosing the right streaming transport for your architecture is step one. This article addresses step two: what the user actually experiences once transport is working.

Three states drive the entire user experience of an AI streaming interface.

  • Pre-stream (thinking): the silence between user submission and the first token
  • Mid-stream (generating): progressive token rendering without visual degradation
  • Post-stream (complete or cancelled): clean resolution with an accurate final state

Users do not experience your transport layer. They experience the three seconds of silence before the first token arrives, the wall of text that appears with no visual rhythm, and the complete absence of a way to stop a runaway generation. These are solvable problems with specific implementations in the Alpine.js and Prism PHP stack. We will address each one.

None of these problems are exotic. They appear in every AI interface that ships without deliberate UX work on top of the transport layer. The good news is that each has a clean, implementable fix that does not require changes to your backend architecture.

Pre-Stream: Handling the Silence Before the First Token

The gap between a user submitting a request and the first token arriving is where most AI interfaces lose users. On fast, low-complexity completions it is imperceptible. On complex reasoning tasks, cold API starts, or heavily loaded endpoints, it runs two to five seconds. That silence reads as broken, even when the backend is working exactly as intended.

The solution is straightforward: emit a named SSE event immediately when the request arrives, before the Prism call begins. The frontend transitions to a “thinking” state on receipt of that event, not on the first token. The pre-stream silence becomes covered time rather than dead time.

UI State ↔ SSE Stream Lifecycle

IDLE Input Active Submit button ready Submit THINKING Typing Indicator event: thinking First Token STREAMING Output + Stop data chunks data: [DONE] COMPLETE Full Render Safe source.close() abort() CANCELLED Partial Render stream closed

Mapping Wire Payloads to Component State Mutations

A streaming component cannot rely on visual guesswork. We must explicitly bind incoming Server-Sent Event data blocks to a deterministic frontend state machine to prevent layout flashes under high network latency. The execution bench below simulates this exact network boundary interaction. Toggle through the operational phases to observe how partial raw tokens alter client view properties while the backend pipeline streams live data chunks.

IDLE
Input View active
Form field interactive, submission button enabled.
THINKING
Typing Indicator
Input frozen, shimmer layout animation active.
STREAMING
Token Delivery
Micro-buffering text chunks, cancel button rendering live.
COMPLETE
Stream Closed
Markdown finalized, validation schema verification passed.
CANCELLED
Partial Render
Pipeline broken on explicit consumer request layer.
Live Server-Sent Events Payload Inspector
// Select an operational lifecycle stage above to inspect live pipeline data streams...

Pattern A: Animated Typing Indicator with Alpine.js

The SSE route emits event: thinking immediately, before the Prism call begins. Both steps run inside the same response()->stream() callback, so no second request is needed:

use Illuminate\Http\Request;
use Prism\Prism\Facades\Prism;

Route::get('/chat/stream', function (Request $request) {
    return response()->stream(function () use ($request) {
        echo "event: thinking\n";
        echo "data: {}\n\n";
        ob_flush();
        flush();

        $stream = Prism::text()
            ->using('anthropic', 'claude-sonnet-4-6')
            ->withMaxTokens(2048)
            ->withPrompt($request->string('message')->limit(2000)->value())
            ->stream();

        foreach ($stream as $chunk) {
            if (connection_aborted()) {
                break;
            }
            echo "data: " . json_encode(['text' => $chunk->text]) . "\n\n";
            ob_flush();
            flush();
        }

        if (!connection_aborted()) {
            echo "data: [DONE]\n\n";
            ob_flush();
            flush();
        }
    }, 200, [
        'Content-Type'      => 'text/event-stream',
        'Cache-Control'     => 'no-cache',
        'X-Accel-Buffering' => 'no',
    ]);
})->middleware('auth:sanctum');

The Alpine.js component manages state as an explicit string enum rather than a boolean flag. This matters once you add cancellation. When an EventSource closes, onerror fires regardless of whether the close was intentional. A string state lets onerror check whether the closure was a user cancellation or a network failure and respond differently. A boolean isLoading cannot make that distinction. All reactive properties are declared in the data object so Alpine tracks them from initialisation:

function chatStream() {
    return {
        output:   '',
        message:  '',
        state:    'idle', // idle | thinking | streaming | complete | cancelled
        thoughts: [],
        buffer:   '',
        rafId:    null,
        source:   null,

        startStream() {
            this.state    = 'thinking';
            this.output   = '';
            this.thoughts = [];
            this.buffer   = '';

            if (this.rafId) {
                cancelAnimationFrame(this.rafId);
                this.rafId = null;
            }
            if (this.source) this.source.close();

            this.source = new EventSource(
                `/chat/stream?message=${encodeURIComponent(this.message)}`
            );

            this.source.addEventListener('thinking', () => {
                this.state = 'thinking';
            });

            this.source.addEventListener('tool_call', (event) => {
                const data = JSON.parse(event.data);
                this.thoughts.push(data.status);
            });

            const flush = () => {
                if (this.buffer) {
                    this.output += this.buffer;
                    this.buffer  = '';
                }
                if (this.state === 'streaming') {
                    this.rafId = requestAnimationFrame(flush);
                }
            };

            this.source.onmessage = (event) => {
                if (event.data === '[DONE]') {
                    this.state = 'complete';
                    if (this.rafId) {
                        cancelAnimationFrame(this.rafId);
                        this.output += this.buffer;
                        this.buffer  = '';
                        this.rafId   = null;
                    }
                    this.source.close();
                    return;
                }
                try {
                    const parsed = JSON.parse(event.data);
                    if (this.state !== 'cancelled') {
                        this.state   = 'streaming';
                        this.buffer += parsed.text ?? '';
                        if (!this.rafId) {
                            this.rafId = requestAnimationFrame(flush);
                        }
                    }
                } catch (e) {
                    console.error('SSE parse error:', e);
                }
            };

            this.source.onerror = () => {
                if (this.state !== 'cancelled') {
                    this.state = 'idle';
                }
                this.source.close();
            };
        },

        cancel() {
            if (this.source) this.source.close();
            this.state = 'cancelled';
            if (this.rafId) {
                cancelAnimationFrame(this.rafId);
                this.output += this.buffer;
                this.buffer  = '';
                this.rafId   = null;
            }
        }
    };
}

The state-driven template wires directly to the component:

<div x-data="chatStream()">
    <div x-show="state === 'idle' || state === 'complete' || state === 'cancelled'">
        <input x-model="message" type="text" placeholder="Ask something…">
        <button @click="startStream()">Send</button>
    </div>

    <div x-show="state === 'thinking'" class="typing-indicator">
        <span></span><span></span><span></span>
    </div>

    <div x-show="thoughts.length > 0" class="thought-states">
        <template x-for="(thought, index) in thoughts" :key="index">
            <p x-text="thought" class="thought-item"></p>
        </template>
    </div>

    <div
        x-show="state === 'streaming' || state === 'complete' || state === 'cancelled'"
        x-text="output"
        class="output-area">
    </div>

    <button x-show="state === 'streaming'" @click="cancel()">Stop</button>
</div>

[Architect’s Note] These examples use Prism PHP rather than the first-party laravel/ai SDK. Prism’s streaming API is more mature for agent workflows at the time of writing and offers broader provider coverage. The laravel/ai SDK is the long-term production direction for Laravel teams. Evaluate it for new projects before defaulting to Prism, particularly if your provider set is limited to OpenAI or Anthropic.

Pattern B: Skeleton Loader with Livewire

When the AI response renders into a structured layout such as a card, data table, or generated form preview, a skeleton placeholder is more appropriate than a typing indicator. A typing indicator implies the response is textual and conversational. A skeleton implies structure is incoming. Livewire’s wire:loading directive handles this without additional JavaScript:

<div wire:loading wire:target="generate">
    <div class="skeleton-card">
        <div class="skeleton-line w-3/4"></div>
        <div class="skeleton-line w-full"></div>
        <div class="skeleton-line w-5/6"></div>
    </div>
</div>

<div wire:loading.remove wire:target="generate">
    {{ $response }}
</div>

The Livewire and Claude API real-time chat guide covers the full Livewire component lifecycle for streaming responses, including wire:poll fallbacks for environments that do not support persistent SSE connections.

Mid-Stream: Rendering Tokens Without Browser Lag

Naive token appending (appending directly to the output on every SSE message) causes layout thrashing on long responses. The browser reflows the DOM on every mutation. By token 500 on a complex answer, the interface becomes visibly sluggish on mid-range hardware, and the degradation compounds as the response grows.

The batched DOM update approach using requestAnimationFrame is already integrated into the chatStream() component above. The flush function accumulates incoming tokens in this.buffer and drains that buffer into this.output on each animation frame rather than on each SSE event. This caps DOM mutations at 60fps regardless of how fast SSE messages arrive.

[Production Pitfall] Teams running high-token responses (2,000-plus tokens) on mobile hardware observe the degradation before desktop users do. On a fast network with a capable model, raw token appending can attempt several hundred DOM mutations per second. The frame-rate cap applies uniformly: whether a response arrives at 30 tokens per second or 300, the output area updates at the browser’s natural paint cycle.

A second optimisation worth the dependency cost: if your AI output is markdown, incremental rendering during the stream gives a substantially better reading experience than watching raw markdown syntax accumulate. Use a lightweight client-side parser such as marked.js. The tradeoff is real: partial markdown produces malformed HTML mid-stream, so rendering must happen on a debounced interval rather than on every token. A 150ms debounce prevents half-rendered code blocks from reaching the user while keeping the output visually progressive. Parse against the full accumulated string on each debounce tick, not against each individual token.

Agentic Thought States

Multi-step agentic workflows carry a specific UX problem that single-turn streaming does not. The model may invoke two or three tools before producing a visible response. From the user’s perspective this is indistinguishable from an unusually long pre-stream silence. The processing is invisible. Users have no signal that anything is happening, no indication of what the system is doing, and no basis for deciding whether to wait or cancel.

The correct approach surfaces each tool invocation as it happens. Each Prism tool callback emits a named SSE event at the start of its execution. The frontend listens for tool_call events and renders them as a transient thought state log, positioned above the main response area and kept visually subordinate:

use Prism\Prism\Tool;

$tools = [
    Tool::as('search_knowledge_base')
        ->for('Search the internal knowledge base for relevant articles')
        ->withStringParameter('query', 'The search query')
        ->using(function (string $query): string {
            echo "event: tool_call\n";
            echo "data: " . json_encode(['status' => "Searching: {$query}"]) . "\n\n";
            ob_flush();
            flush();

            return KnowledgeBase::search($query)->take(5)->toJson();
        }),
];

The tool_call event listener and thoughts array are already part of the chatStream() component. The template section above renders the log. Keep the thought state items small in font size, muted in colour, and separated from the primary output area by visual weight rather than a divider. They are process indicators, not content. Users should be able to read the main response without the thought log competing for attention.

[Production Pitfall] Thought state emission via SSE only works when the tool callback executes within the same PHP output buffer as the streaming response. This is the case for synchronous SSE streams running in a single HTTP request. For agents dispatched as queued jobs to Laravel Horizon, the tool callback runs in a worker process with no HTTP output buffer. In that architecture, broadcast thought states over WebSockets via Reverb instead. The Reverb token-by-token delivery guide covers the broadcast channel approach for long-running queue processes.

For the full Prism tool registration and callback patterns that power these events, Building Agentic Laravel Apps with Prism PHP is the reference implementation.

Stream Cancellation: The Backend Half Nobody Implements

Cancellation has two halves. The frontend half is five lines of JavaScript and most implementations include it. The backend half is where implementations fall apart, and the consequences at scale are not theoretical.

Frontend Cancellation

Close the EventSource and update the state. The cancel() method is already defined in the chatStream() component. The critical detail is the state machine: onerror fires whenever an EventSource closes, regardless of whether the closure was intentional. By setting state to cancelled before calling source.close(), the onerror handler sees cancelled and does not overwrite it with idle. Without this, a user who cancels and immediately starts a new request may find the UI snapping back to idle unexpectedly.

Backend Cancellation for Direct Streams

When an SSE connection closes, PHP does not automatically terminate the running process. The Prism call continues executing on the server, consuming tokens and provider quota, even though no client is receiving output. For short completions this is wasteful. For agentic loops with multiple tool calls, it can be genuinely costly.

[Production Pitfall] On a multi-tenant application under moderate load, unguarded streams accumulate. Each orphaned process holds an open HTTP connection to the LLM provider. Provider rate limits count active connections, not just requests per minute. The first visible symptom is intermittent 429 errors under load that disappear when traffic drops, making them difficult to attribute to the real cause.

The connection_aborted() guard in the foreach loop shown in the Pre-Stream section above is the correct pattern. Check it on every iteration and break when the client has disconnected. Do not omit the [DONE] sentinel emission on clean completions: the frontend needs it to transition to complete rather than waiting for an onerror event to fire on connection close. The production SSE guide covering reconnects and timeouts addresses additional lifecycle concerns for long-lived SSE connections, including keepalive intervals and multi-tenant stream isolation.

[Edge Case Alert] connection_aborted() relies on PHP detecting that the client has closed the TCP connection. Under some Nginx configurations, the upstream connection to PHP-FPM stays open after the browser disconnects, meaning connection_aborted() never returns true. The fastcgi_ignore_client_abort Nginx directive must be set to off (the default) for PHP to detect disconnects correctly. Verify this in your stack before relying on connection_aborted() as a cancellation mechanism. A quick test: open an SSE stream, close the browser tab, and check whether the PHP process terminates within a few seconds.

Backend Cancellation for Queued Agentic Jobs

For long-running agentic jobs dispatched to Laravel Horizon, connection_aborted() does not apply. The queue worker process runs independently of the HTTP connection. Cancellation requires a Redis flag the job checks at the start of each step.

The cancel endpoint sets the flag scoped to both the user ID and the job ID:

use Illuminate\Http\Request;
use Illuminate\Support\Facades\Cache;

Route::post('/chat/{jobId}/cancel', function (string $jobId, Request $request) {
    $userId   = $request->user()->id;
    $cacheKey = "stream_cancel:{$userId}:{$jobId}";

    Cache::put($cacheKey, true, now()->addMinutes(5));

    return response()->json(['cancelled' => true]);
})->middleware('auth:sanctum');

The job checks the flag before each step and cleans up on exit:

<?php

namespace App\Jobs;

use Illuminate\Bus\Queueable;
use Illuminate\Contracts\Queue\ShouldQueue;
use Illuminate\Foundation\Bus\Dispatchable;
use Illuminate\Queue\InteractsWithQueue;
use Illuminate\Support\Facades\Cache;

class AgentStreamJob implements ShouldQueue
{
    use Dispatchable, InteractsWithQueue, Queueable;

    public function __construct(
        private readonly string $jobId,
        private readonly int    $userId,
        private readonly array  $agentSteps,
    ) {}

    public function handle(): void
    {
        $cacheKey = "stream_cancel:{$this->userId}:{$this->jobId}";

        foreach ($this->agentSteps as $step) {
            if (Cache::get($cacheKey)) {
                Cache::forget($cacheKey);
                return;
            }
            $this->executeStep($step);
        }
    }

    private function executeStep(mixed $step): void
    {
        // step execution logic
    }
}

[Architect’s Note] Scope the Redis key to both the user ID and the job ID, as shown. A key scoped only to job ID can be overwritten by another user if job IDs are sequential integers or short UUIDs with collision potential. The cancel endpoint must verify that $jobId belongs to the requesting user before setting the flag. Omitting this check allows any authenticated user to cancel any other user’s running job.

CSS: Typing Indicator and Thought State Styles

Self-contained CSS for both components. No external dependencies. Designed for dark backgrounds; adjust the --color-text-secondary custom property for light themes.

.typing-indicator {
    display: flex;
    gap: 4px;
    padding: 12px 0;
}

.typing-indicator span {
    width: 6px;
    height: 6px;
    border-radius: 50%;
    background: currentColor;
    opacity: 0.4;
    animation: pulse 1.2s infinite;
}

.typing-indicator span:nth-child(2) { animation-delay: 0.2s; }
.typing-indicator span:nth-child(3) { animation-delay: 0.4s; }

@keyframes pulse {
    0%, 80%, 100% { opacity: 0.4; transform: scale(1); }
    40%           { opacity: 1;   transform: scale(1.2); }
}

.thought-states {
    margin: 8px 0;
}

.thought-item {
    font-size: 12px;
    color: var(--color-text-secondary, #888);
    margin: 2px 0;
    padding-left: 8px;
    border-left: 2px solid currentColor;
    opacity: 0.7;
}

The border-left: 2px solid currentColor on .thought-item inherits the text colour, so the accent bar updates automatically if you override --color-text-secondary in a specific context.

Streaming UX as a Product Decision

The three problems this article addresses (pre-stream silence, mid-stream lag, and orphaned backend processes) share a common root: they are invisible during development and obvious in production. On localhost with a fast API response and no concurrent users, none of them surface. Under real conditions, all three do.

That is worth naming explicitly because it changes how you prioritise the work. Typing indicators and thought states are not polish. They are the difference between a feature that users trust and one they abandon after two slow responses. The connection_aborted() guard and the Redis cancellation flag are not edge case handling. They are the mechanism that prevents token spend from silently accumulating on requests nobody is waiting for.

The state machine approach matters here too. Representing stream state as a string enum rather than a collection of boolean flags is a small architectural decision with compounding benefits. Adding a new state (a rate_limited retry state, for example, or a partial state for interrupted long-context responses) requires one new string value and one new handler. With boolean flags, the same addition requires coordinating multiple flags that can contradict each other. Keep the state machine explicit and let the UI derive entirely from it.

[Word to the Wise] The UX patterns in this article are transport-agnostic in principle but SSE-specific in implementation. If your architecture moves to WebSockets via Reverb, the state machine transfers cleanly. The EventSource calls become channel subscriptions, the onerror handler becomes a disconnect listener, and the connection_aborted() loop becomes a Redis flag check. The concepts travel; the APIs do not. Design the state machine first and treat the transport as a detail beneath it.

One thing this article does not cover: what happens after the stream completes. Persisting the response, associating it with a conversation thread, and making it available for follow-up requests are handled at the application layer, not the streaming layer. That is the right separation. The streaming interface has one job: deliver the response state to the user accurately and without waste. Everything above that is the application’s concern.


Frequently Asked Questions

Does connection_aborted() work reliably across all Laravel hosting configurations?

Not universally. The function depends on PHP detecting a TCP disconnect from the client. With Nginx as a reverse proxy and PHP-FPM, the fastcgi_ignore_client_abort directive must be off (the default) for the disconnect to propagate correctly. On some managed hosting environments or when using Laravel Octane with Swoole or RoadRunner, the connection lifecycle behaves differently. Test explicitly in your stack before treating connection_aborted() as a reliable cancellation signal.

Can I apply this typing indicator pattern with Livewire instead of Alpine.js?

Yes, with a different approach. Livewire does not natively manage an EventSource, so the SSE lifecycle stays in Alpine.js or vanilla JavaScript and you dispatch Livewire events from the JS layer when state changes. Alternatively, use wire:loading for the pre-stream indicator and pair it with a Livewire streaming action. The Livewire and Claude API integration guide demonstrates the full component setup.

What happens to streaming state if the user navigates away mid-stream?

On navigation, the browser closes the EventSource connection. The onerror handler fires. If state is not already cancelled, the handler sets it to idle. The PHP process continues until connection_aborted() detects the disconnect (subject to the Nginx caveat above). For queued jobs, the process continues regardless until the Redis cancellation flag check runs on the next step, which is why the flag expiry window (five minutes in the example above) must be wide enough to cover the maximum step duration.

Does the requestAnimationFrame batching approach work in all target browsers?

requestAnimationFrame is supported in all modern browsers and has been since IE10. For very old Safari versions (pre-2014), a setTimeout fallback at 16ms achieves the same 60fps cap. In practice, if you are shipping a Laravel AI interface, the browsers that support your stack all support requestAnimationFrame without a polyfill.

How should I handle streaming cancellation when using Reverb WebSockets instead of SSE?

The frontend pattern is similar: call channel.unbind() and pusher.unsubscribe() instead of source.close(), then update the state machine. The backend pattern depends on whether the response is generated synchronously inside the broadcasting job or via a queue. For synchronous generation, check connection_aborted() is not applicable; poll a Redis flag instead, with the same user-scoped key structure shown for queued agentic jobs above.

Dewald Hugo

A software architect with 15+ years of experience in the PHP and Laravel ecosystem. Dewald created Origin Main to provide the engineering rigour required to integrate AI into professional, high-concurrency production systems. He writes for developers who care less about "getting it to work" and more about "getting it to last".

Subscribe
Notify of
0 Comments
Oldest
Newest Most Voted
Quick Navigation
Scroll to Top