Stack: Laravel 11/12, PHP 8.2+, PostgreSQL 16+ with pgvector, Prism PHP

There’s no official Laravel AI SDK. What exists is Prism PHP, a community-built package that gives you a unified interface to OpenAI, Anthropic, Gemini, and Ollama, with tool calling, streaming, and structured output built in. It’s the closest thing to a Laravel-native AI layer we have right now, and it’s production-worthy.

This guide covers installing and configuring Prism PHP with multiple providers, building tool-calling agents with real error handling, implementing RAG with pgvector, and streaming responses over SSE. Code throughout targets Laravel 11/12 syntax with bootstrap/app.php configuration.

Installation

composer require prism-php/prism

php artisan vendor:publish --tag=prism-config

That drops a config/prism.php file. You configure provider API keys there. Even better, is sourcing them from your .env and reference them via the config. If you plan to register a shared Prism instance through Laravel’s Service Container, bootstrap/app.php is where your singleton binding goes:

// bootstrap/app.php
->withProviders([
    App\Providers\AiServiceProvider::class,
])

Keep provider resolution out of controllers. Bind once, inject everywhere.

Multi-Provider Configuration

Hardcoding a single AI provider is a production risk. OpenAI has outages. Anthropic rate-limits. You want a single config change to swap providers without touching application code. The contract-based abstraction layer that makes this swap clean is covered in the production-grade AI architecture guide. Prism is the implementation, that guide is the architectural reasoning.

OPENAI_API_KEY=sk-proj-...
ANTHROPIC_API_KEY=sk-ant-api03-...
GEMINI_API_KEY=...

Switching is one line:

Prism::text()->using('anthropic', 'claude-sonnet-4-6')
Prism::text()->using('openai', 'gpt-4o')
Prism::text()->using('gemini', 'gemini-2.0-flash')

[Architect’s Note] Build your application code against a provider-agnostic interface from day one. Store the active provider and model name in config, not in your call sites. When you inevitably need to run a cost comparison between claude-sonnet-4-6 and gpt-4o in production, you’ll thank yourself for having a single place to change.

Basic Text Generation

use Prism\Prism\Facades\Prism;

$response = Prism::text()
    ->using('anthropic', 'claude-sonnet-4-6')
    ->withSystemPrompt('You are a luxury fashion consultant with access to our product catalogue.')
    ->withPrompt($userMessage)
    ->asText();

return $response->text;

asText() blocks until the full response is complete. For long-form generation, that means your HTTP connection stays open. Use it for short, server-side tasks. Stream anything user-facing, Section 6 covers that.

Building a Tool-Calling Agent with Laravel Prism PHP

Agents in Prism are not a special class. They’re text requests that include tools. The model decides when to invoke a tool, Prism executes the closure, feeds the result back into the conversation, and the loop continues until the model produces a final answer or withMaxSteps() is reached. Simple, but powerful.

The Stripe Refund Tool

use Prism\Prism\Facades\Tool;
use Stripe\StripeClient;

$refundTool = Tool::as('process_refund')
    ->for('Refund the most recent charge for a customer by their email address.')
    ->withStringParameter('email', 'The customer email address.')
    ->using(function (string $email): string {
        try {
            $stripe = new StripeClient(config('services.stripe.secret'));

            $customer = $stripe->customers->all(['email' => $email, 'limit' => 1])->data[0] ?? null;
            if (!$customer) {
                return 'Error: Customer not found.';
            }

            $charge = $stripe->charges->all(['customer' => $customer->id, 'limit' => 1])->data[0] ?? null;
            if (!$charge) {
                return 'Error: No charges found for this account.';
            }

            $stripe->refunds->create(['charge' => $charge->id]);

            return "Success: Refunded charge {$charge->id} for {$email}.";

        } catch (\Stripe\Exception\ApiErrorException $e) {
            \Log::error('Stripe refund failed', ['email' => $email, 'error' => $e->getMessage()]);
            return 'Error: Payment processor unavailable. Please try again later.';
        }
    });

[Production Pitfall] This tool executes a live financial transaction. As written, it fires without any human confirmation step. Never let an LLM trigger a refund autonomously in production. Wire in a confirmation queue, a webhook callback, or at minimum a human-approval flag before this tool runs against real Stripe credentials.

The Twilio SMS Tool

use Prism\Prism\Facades\Tool;
use Twilio\Rest\Client;

$smsTool = Tool::as('send_sms')
    ->for('Send an SMS notification to a phone number.')
    ->withStringParameter('to', 'Recipient phone number in E.164 format.')
    ->withStringParameter('message', 'The message body.')
    ->using(function (string $to, string $message): string {
        try {
            $twilio = new Client(
                config('services.twilio.sid'),
                config('services.twilio.token')
            );

            $twilio->messages->create($to, [
                'from' => config('services.twilio.from'),
                'body' => $message,
            ]);

            return "SMS sent to {$to}.";

        } catch (\Twilio\Exceptions\RestException $e) {
            \Log::error('Twilio SMS failed', ['to' => $to, 'error' => $e->getMessage()]);
            return 'Error: Could not deliver SMS. Notification skipped.';
        }
    });

Running the Agent

$response = Prism::text()
    ->using('anthropic', 'claude-sonnet-4-6')
    ->withSystemPrompt('You are a customer support agent. You can process refunds and send SMS notifications.')
    ->withTools([$refundTool, $smsTool])
    ->withMaxSteps(5)
    ->withPrompt($userMessage)
    ->asText();

return $response->text;

withMaxSteps(5) is your safety valve. It caps the agent loop at five tool-call rounds before Prism forces a final answer. For most support workflows, that’s more than enough. Bump it only if you have a verified reason to.

RAG with pgvector

Standard LLMs hallucinate on data they weren’t trained on. RAG solves this by pulling semantically relevant chunks from your own database and injecting them into the prompt before the model ever sees the question. If you’re new to the underlying mechanics, the article on embeddings and vector databases covers the conceptual foundation before you start wiring up pgvector. For direct Claude API integration without the Prism abstraction layer (raw HTTP, streaming, and token accounting), the Claude API integration in Laravel guide covers that approach in full.

Step 1: Enable the Extension

-- Run once against your database
CREATE EXTENSION IF NOT EXISTS vector;

Step 2: Migration

OpenAI’s text-embedding-3-small outputs 1,536 dimensions. Your vector column must match exactly, dimensions are not interchangeable across embedding models.

use Illuminate\Database\Migrations\Migration;
use Illuminate\Database\Schema\Blueprint;
use Illuminate\Support\Facades\Schema;
use Illuminate\Support\Facades\DB;

return new class extends Migration
{
    public function up(): void
    {
        DB::statement('CREATE EXTENSION IF NOT EXISTS vector');

        Schema::create('products', function (Blueprint $table) {
            $table->id();
            $table->string('name');
            $table->text('description');
            $table->string('category');
            $table->decimal('price', 10, 2);
            $table->timestamps();
        });

        // Blueprint doesn't support vector columns natively — add it raw
        DB::statement('ALTER TABLE products ADD COLUMN embedding vector(1536)');

        // HNSW index for fast approximate nearest-neighbour search
        DB::statement(
            'CREATE INDEX products_embedding_idx ON products USING hnsw (embedding vector_cosine_ops)'
        );
    }
};

Step 3: Generating and Storing Embeddings

Wrap embedding generation in a dedicated service class, not a global function. This makes it injectable, mockable, and swappable.

// app/Services/EmbeddingService.php

namespace App\Services;

use Illuminate\Support\Facades\Http;
use RuntimeException;

class EmbeddingService
{
    public function generate(string $text): array
    {
        $response = Http::withToken(config('services.openai.api_key'))
            ->timeout(30)
            ->post('https://api.openai.com/v1/embeddings', [
                'model' => 'text-embedding-3-small',
                'input' => $text,
            ]);

        if ($response->failed()) {
            throw new RuntimeException(
                'Embedding generation failed: ' . $response->status()
            );
        }

        $embedding = $response->json('data.0.embedding');

        if (!is_array($embedding) || empty($embedding)) {
            throw new RuntimeException('Invalid embedding response from OpenAI.');
        }

        return $embedding;
    }
}

[Efficiency Gain] Embedding generation costs tokens and adds latency. Always dispatch it as a queued job, never inline during a web request. If you’re re-embedding the same product after minor edits, cache the embedding hash and skip the API call when the source text hasn’t changed.

When storing a product, dispatch the job instead of calling the service directly:

// In your controller or action class
$product = Product::create([
    'name'        => $name,
    'description' => $description,
    'category'    => $category,
    'price'       => $price,
]);

GenerateProductEmbeddingJob::dispatch($product);

// app/Jobs/GenerateProductEmbeddingJob.php

public function handle(EmbeddingService $embeddingService): void
{
    $text = "{$this->product->name} {$this->product->description} {$this->product->category}";

    $vector = $embeddingService->generate($text);

    DB::statement(
        'UPDATE products SET embedding = ?::vector WHERE id = ?',
        ['[' . implode(',', $vector) . ']', $this->product->id]
    );
}

Step 4: Similarity Search

$queryVector = $embeddingService->generate("What are your best waterproof jackets?");
$vectorString = '[' . implode(',', $queryVector) . ']';

$products = DB::select(
    "SELECT id, name, description, price,
            embedding <=> ?::vector AS distance
     FROM products
     ORDER BY distance
     LIMIT 3",
    [$vectorString]
);

[Edge Case Alert] The <=> operator performs cosine distance, lower is closer. If your results look semantically wrong, verify that you’re using vector_cosine_ops in your HNSW index, not vector_l2_ops. Mixing operators and index types silently falls back to a sequential scan on large tables, which destroys query performance.

Step 5: Inject Context into the Prompt

$context = collect($products)
    ->map(fn($p) => "{$p->name}: {$p->description} — R{$p->price}")
    ->implode("\n");

$response = Prism::text()
    ->using('anthropic', 'claude-sonnet-4-6')
    ->withSystemPrompt('You are a product assistant. Use only the provided product context to answer questions. Do not speculate beyond it.')
    ->withPrompt("Context:\n{$context}\n\nQuestion: What are your best waterproof jackets?")
    ->asText();

Constraining the model to “use only the provided context” is not just prompt best practice, it’s the difference between a useful RAG system and a confidently wrong one.

Real-Time Streaming with SSE

asEventStreamResponse() returns a streamed HTTP response over Server-Sent Events. Before choosing SSE, it’s worth reading the breakdown of Livewire vs SSE vs WebSockets for AI UIs. SSE is the right default for one-way AI output streams, but not every use case fits that pattern.

Backend

use Prism\Prism\Facades\Prism;
use Illuminate\Http\Request;

Route::get('/chat/stream', function (Request $request) {
    $message = $request->string('message')->limit(2000)->value();

    return Prism::text()
        ->using('anthropic', 'claude-sonnet-4-6')
        ->withSystemPrompt('You are a helpful product assistant.')
        ->withPrompt($message)
        ->asEventStreamResponse();
});

Validate and sanitise the incoming message parameter before it reaches the model. The ->string()->limit() chain above is a start, add proper validation rules for production.

Frontend

Note the corrected Alpine.js scoping below. The original’s this.output reference inside a plain <script> block does not bind to the Alpine component. It points to window, which silently swallows your streamed content.

<div x-data="chatStream()">
    <textarea x-model="output" class="w-full h-64 p-4 bg-gray-100 rounded" readonly></textarea>
    <input x-model="message" type="text" placeholder="Ask something..." class="w-full p-2 border rounded mt-2" />
    <button @click="startStream()" class="mt-2 px-4 py-2 bg-blue-600 text-white rounded">Ask</button>
</div>

<script>
    function chatStream() {
        return {
            output: '',
            message: '',
            source: null,
            startStream() {
                if (this.source) this.source.close();
                this.output = '';
                this.source = new EventSource(`/chat/stream?message=${encodeURIComponent(this.message)}`);
                this.source.onmessage = (event) => {
                    if (event.data === '[DONE]') {
                        this.source.close();
                        return;
                    }
                    try {
                        const parsed = JSON.parse(event.data);
                        this.output += parsed.text ?? '';
                    } catch (e) {
                        console.error('SSE parse error:', e);
                    }
                };
                this.source.onerror = () => {
                    this.source.close();
                };
            }
        };
    }
</script>

For pushing AI output to a specific authenticated user, rather than a public GET endpoint, Prism also supports asBroadcast() via Laravel Reverb. That’s the pattern to reach for when you need session-scoped streaming.

Caching Repeated Prompts

AI API calls are expensive. If you’re generating the same output repeatedly (summarising a static policy document, answering an FAQ), cache it with Eloquent-adjacent cache layer patterns.

$policy = cache()->remember('ai.return_policy', now()->addDay(), function () {
    return Prism::text()
        ->using('anthropic', 'claude-sonnet-4-6')
        ->withPrompt('Summarise our return policy in three sentences.')
        ->asText()
        ->text;
});

[Word to the Wise] Cache keying strategy matters more than most developers realise until they’ve served stale AI output to the wrong user. Only cache responses generated from prompts with no user-specific context. The moment you interpolate a user ID, a session variable, or any personalised data into the prompt, that response must never be shared across users. Tag your cache keys clearly. ai.policy.v1 is a better key than policy because it lets you bust the cache cleanly when your underlying document changes.

References & Further Reading

Prism PHP Documentation – Official Prism PHP docs covering all providers, tool schemas, and streaming options.
OpenAI Embeddings API Reference – Model dimensions, pricing, and batching guidance for text-embedding-3-small and text-embedding-3-large.
Laravel Reverb – Laravel’s first-party WebSocket server, relevant if you move beyond SSE to bidirectional AI communication.

Dewald Hugo

Senior Laravel Developer and AI Architect with 10+ years in the trenches. Dewald writes about building resilient, cost-aware AI integrations and modernizing the Laravel developer workflow for the 2026 ecosystem.