production-grade AI architecture in Laravel

Production-Grade AI Architecture in Laravel: Contracts, Governance & Telemetry

Most Laravel AI tutorials show you how to send a prompt and return a response. That is useful for a proof-of-concept. It is not architecture. For versioning and deploying your prompt definitions under the same discipline as schema migrations, see the guide on prompt migrations.

If you have not yet decided which provider sits inside this architecture, or you are building a multi-provider system, start with the Laravel AI integration decision guide, it covers the provider comparison and system-level tradeoffs that inform everything this guide builds on top of.

The moment AI features power production-grade AI architecture in Laravel (moderation pipelines, pricing logic, summarization workflows, customer-facing search), the system has a different job. It must be predictable, observable, and financially controlled. A misbehaving prompt should not corrupt domain state. A spiking model latency should not silence your queue workers. A runaway API call should not bankrupt your OpenAI account.

This guide builds the execution layer that makes AI behave like infrastructure. We will design a typed service contract, implement prompt versioning with checksums, enforce token budgets before dispatch, validate structured output, record a full audit trail, and define queue-aware retry patterns that hold under load. We are not covering the first API call. We are covering the thousandth.

For the inference parameter layer beneath this (temperature, max tokens, top-p, and config-driven execution profiles), see the guide on Laravel LLM inference control.

Isolating AI Behind a Service Contract

The first rule: controllers never touch providers directly.

Introduce a contract that represents AI execution at the application boundary. This is the Laravel Service Container doing exactly what it was designed for.

// app/Contracts/AiService.php

namespace App\Contracts;

use App\Models\PromptVersion;
use App\Data\AiResult;

interface AiService
{
    public function execute(PromptVersion $prompt, array $input, int $userId): AiResult;
}

Note the $userId parameter. We pass identity explicitly, not via auth()->user(). That call returns null inside queue workers, and we need this interface to work equally well in HTTP and async contexts.

The response data object:

// app/Data/AiResult.php

namespace App\Data;

class AiResult
{
    public function __construct(
        public readonly string $raw,
        public readonly ?array $structured,
        public readonly int $promptTokens,
        public readonly int $completionTokens,
        public readonly float $latencyMs,
        public readonly string $provider,
        public readonly bool $success,
        public readonly ?string $errorMessage = null,
    ) {}
}

Adding success and errorMessage here means the telemetry recorder receives the full picture: not just the happy path.

Wire the decorator chain in bootstrap/app.php (Laravel 11/12 style, no AppServiceProvider required for singleton bindings, though either works):

// bootstrap/app.php

use App\Contracts\AiService;
use App\Services\AI\OpenAiService;
use App\Services\AI\GovernedAiService;

->withProviders([
    function ($app) {
        $app->bind(AiService::class, function ($app) {
            return new GovernedAiService(
                provider: $app->make(OpenAiService::class),
                budgetManager: $app->make(\App\Services\AI\AiBudgetManager::class),
                telemetry: $app->make(\App\Services\AI\AiTelemetryRecorder::class),
            );
        });
    },
])

Your domain services depend on the interface. Not on HTTP. Not on a specific provider.

Provider Adapter (Infrastructure Layer)

The adapter handles transport only. No budget logic. No telemetry. No domain rules.

If you need a lower-level walkthrough of authentication, request signing, and streaming configuration before building this adapter, the complete Laravel OpenAI integration guide covers the transport layer in full.

// app/Services/AI/OpenAiService.php

namespace App\Services\AI;

use App\Contracts\AiService;
use App\Models\PromptVersion;
use App\Data\AiResult;
use Illuminate\Http\Client\RequestException;
use Illuminate\Http\Client\Response;
use Illuminate\Support\Facades\Http;
use Illuminate\Support\Facades\Log;

class OpenAiService implements AiService
{
    public function execute(PromptVersion $prompt, array $input, int $userId): AiResult
    {
        $compiledPrompt = $this->compilePrompt($prompt, $input);
        $start = microtime(true);

        try {
            $response = Http::withToken(config('services.openai.key'))
                ->timeout(30)
                ->retry(2, 500, fn ($e, Response $r) => $r->status() === 429)
                ->post('https://api.openai.com/v1/chat/completions', [
                    'model'       => $prompt->model, // e.g. 'gpt-4o'
                    'temperature' => $prompt->temperature,
                    'max_tokens'  => $prompt->max_tokens,
                    'messages'    => $compiledPrompt,
                ])
                ->throw()
                ->json();

        } catch (RequestException $e) {
            $latency = (microtime(true) - $start) * 1000;

            Log::error('OpenAI request failed', [
                'status'    => $e->response->status(),
                'body'      => $e->response->body(),
                'prompt'    => $prompt->name,
                'version'   => $prompt->version,
                'user_id'   => $userId,
            ]);

            return new AiResult(
                raw: '',
                structured: null,
                promptTokens: 0,
                completionTokens: 0,
                latencyMs: $latency,
                provider: 'openai',
                success: false,
                errorMessage: $e->getMessage(),
            );
        }

        $latency = (microtime(true) - $start) * 1000;

        return new AiResult(
            raw: $response['choices'][0]['message']['content'],
            structured: null,
            promptTokens: $response['usage']['prompt_tokens'],
            completionTokens: $response['usage']['completion_tokens'],
            latencyMs: $latency,
            provider: 'openai',
            success: true,
        );
    }

    protected function compilePrompt(PromptVersion $prompt, array $input): array
    {
        return [
            ['role' => 'system', 'content' => $prompt->system_prompt],
            [
                'role'    => 'user',
                'content' => str_replace(
                    array_map(fn ($k) => "{{{$k}}}", array_keys($input)),
                    array_values($input),
                    $prompt->user_template
                ),
            ],
        ];
    }
}

The .retry(2, 500, fn ...) callback retries only on 429 responses. Other errors surface immediately. Keep this focused: the adapter knows about HTTP; nothing else.

[Production Pitfall] The retry() callback in Laravel’s HTTP client runs before .throw() is evaluated. If you chain them in the wrong order, retries execute against already-thrown exceptions rather than raw responses. Always call .retry() before .throw().

Prompt Versioning Schema

Prompts drift. A small wording change to a moderation prompt can push a classification boundary in ways that are invisible until production data starts showing anomalies. Version everything. Record checksums.

// database/migrations/xxxx_create_prompt_versions_table.php

Schema::create('prompt_versions', function (Blueprint $table) {
    $table->id();
    $table->string('name');
    $table->unsignedInteger('version');
    $table->string('model')->default('gpt-4o');
    $table->text('system_prompt');
    $table->text('user_template');
    $table->float('temperature')->default(0.2);
    $table->unsignedInteger('max_tokens')->default(1000);
    $table->string('checksum');
    $table->timestamps();

    $table->unique(['name', 'version']);
});

Generate the checksum on save via an Eloquent model observer:

// app/Models/PromptVersion.php

protected static function booted(): void
{
    static::saving(function (self $model) {
        $model->checksum = hash(
            'sha256',
            $model->system_prompt . $model->user_template . $model->temperature
        );
    });
}

Every interaction record references a specific prompt_version_id. When a regression surfaces, you know precisely which prompt, temperature, and model produced it. No guesswork.

Governance Layer: Budget Enforcement Before Dispatch

Here is where most architectures skip a step. They enforce limits via middleware on inbound HTTP requests, which is necessary, but not sufficient. Jobs dispatched from background processes bypass HTTP entirely.

The governance wrapper sits at the service layer and intercepts every execution path, regardless of origin.

// app/Services/AI/GovernedAiService.php

namespace App\Services\AI;

use App\Contracts\AiService;
use App\Data\AiResult;
use App\Exceptions\AiBudgetExceededException;
use App\Models\PromptVersion;
use App\Models\User;

class GovernedAiService implements AiService
{
    public function __construct(
        protected AiService $provider,
        protected AiBudgetManager $budgetManager,
        protected AiTelemetryRecorder $telemetry,
    ) {}

    public function execute(PromptVersion $prompt, array $input, int $userId): AiResult
    {
        $user = User::findOrFail($userId);

        $this->budgetManager->assertWithinBudget($user, $prompt, $input);

        $result = $this->provider->execute($prompt, $input, $userId);

        $this->telemetry->record($prompt, $input, $result, $userId);

        return $result;
    }
}

The budget manager:

// app/Services/AI/AiBudgetManager.php

namespace App\Services\AI;

use App\Exceptions\AiBudgetExceededException;
use App\Models\PromptVersion;
use App\Models\User;

class AiBudgetManager
{
    public function assertWithinBudget(User $user, PromptVersion $prompt, array $input): void
    {
        // Rough estimation: ~4 characters per token.
        // This is a protective gate, not a billing-grade calculation.
        $estimatedTokens = (int) ceil(strlen(json_encode($input)) / 4);

        $dailyUsage = $user->aiInteractions()
            ->whereDate('created_at', today())
            ->sum('total_tokens');

        if (($dailyUsage + $estimatedTokens) > $user->daily_token_limit) {
            throw new AiBudgetExceededException(
                "Daily token budget exceeded for user {$user->id}."
            );
        }
    }
}

f you want to enforce limits at the HTTP layer as well (for example, short-circuiting requests before they even reach the Service Container), our guide on Laravel AI middleware and token tracking covers exactly that pattern. The two approaches are complementary, not redundant.

[Architect’s Note] The token estimation formula (strlen / 4) is intentionally imprecise. It consistently overestimates, which makes it conservative by design. When you need billing-grade accuracy, integrate OpenAI’s tiktoken via a PHP wrapper or a sidecar service. For most applications, the protective gate is what matters, not perfect arithmetic.

Telemetry and Audit Trail

You cannot optimize what you do not observe. Create an ai_interactions table that captures both successful and failed executions.

// database/migrations/xxxx_create_ai_interactions_table.php

Schema::create('ai_interactions', function (Blueprint $table) {
    $table->id();
    $table->foreignId('prompt_version_id')->constrained();
    $table->foreignId('user_id')->nullable()->constrained();
    $table->text('input_payload');
    $table->longText('raw_response')->nullable();
    $table->unsignedInteger('prompt_tokens')->default(0);
    $table->unsignedInteger('completion_tokens')->default(0);
    $table->unsignedInteger('total_tokens')->default(0);
    $table->float('latency_ms');
    $table->boolean('success');
    $table->string('provider');
    $table->string('response_hash')->nullable();
    $table->text('error_message')->nullable();
    $table->timestamps();
});

The recorder:

// app/Services/AI/AiTelemetryRecorder.php

namespace App\Services\AI;

use App\Data\AiResult;
use App\Models\AiInteraction;
use App\Models\PromptVersion;

class AiTelemetryRecorder
{
    public function record(PromptVersion $prompt, array $input, AiResult $result, int $userId): void
    {
        AiInteraction::create([
            'prompt_version_id' => $prompt->id,
            'user_id'           => $userId,
            'input_payload'     => json_encode($input),
            'raw_response'      => $result->raw ?: null,
            'prompt_tokens'     => $result->promptTokens,
            'completion_tokens' => $result->completionTokens,
            'total_tokens'      => $result->promptTokens + $result->completionTokens,
            'latency_ms'        => $result->latencyMs,
            'success'           => $result->success,
            'provider'          => $result->provider,
            'response_hash'     => $result->raw ? hash('sha256', $result->raw) : null,
            'error_message'     => $result->errorMessage,
        ]);
    }
}

Record failures. That is the entire point of an audit trail. Logging only successful interactions gives you cost data; logging both gives you a debuggable system.

Structured Output Validation

If you expect structured data back from the model, enforce the schema before it touches domain logic. Do not trust the LLM to format its own output correctly under all conditions.

$result = $aiService->execute($prompt, $input, $userId);

if (! $result->success) {
    // Handle the failure: retry, notify, or fail gracefully.
    throw new \RuntimeException('AI execution failed: ' . $result->errorMessage);
}

$decoded = json_decode($result->raw, true);

if (json_last_error() !== JSON_ERROR_NONE) {
    // The model returned non-JSON. Mark interaction failed, trigger retry.
    $this->telemetry->markFailed($interactionId, 'invalid_json');
    throw new \UnexpectedValueException('AI response was not valid JSON.');
}

validator($decoded, [
    'summary' => 'required|string|max:500',
    'tags'    => 'required|array|min:1',
    'tags.*'  => 'string|max:50',
])->validate();

Never allow unvalidated AI output into Eloquent models or domain services. This is not a style preference, it is the same discipline you apply to user input.

[Edge Case Alert] When using gpt-4o with a strict JSON system prompt, the model occasionally returns a valid JSON block wrapped in markdown fences (```json```). This passes json_decode() silently if you accidentally strip the fences before decoding, but fails json_last_error() if you do not. Add a pre-decode sanitizer: preg_replace('/^```json\s*|\s*```$/m', '', trim($result->raw)) before passing to json_decode.

Queue-Based Execution with Controlled Retries

AI jobs need an explicit retry contract. Blind retries against a rate-limited provider will saturate your queue and compound the failure.

// app/Jobs/GenerateSummaryJob.php

namespace App\Jobs;

use App\Contracts\AiService;
use App\Models\PromptVersion;
use Illuminate\Bus\Queueable;
use Illuminate\Contracts\Queue\ShouldQueue;
use Illuminate\Foundation\Bus\Dispatchable;
use Illuminate\Queue\InteractsWithQueue;
use Illuminate\Queue\SerializesModels;

class GenerateSummaryJob implements ShouldQueue
{
    use Dispatchable, InteractsWithQueue, Queueable, SerializesModels;

    public int $tries = 3;
    public array $backoff = [10, 30, 60];

    public function __construct(
        public readonly PromptVersion $prompt,
        public readonly array $input,
        public readonly int $userId,
    ) {}

    public function handle(AiService $aiService): void
    {
        try {
            $aiService->execute($this->prompt, $this->input, $this->userId);
        } catch (\Throwable $e) {
            report($e);
            throw $e;
        }
    }
}

The exponential $backoff gives the provider time to recover between attempts. If all three attempts fail, the job lands in failed_jobs with the full payload preserved. Replay is possible. Context is not lost.

Applying This in a Real Domain Service

Here is how a moderation service looks when it depends on the contract properly, and uses structured output instead of fragile string matching.

// app/Services/ContentModerationService.php

namespace App\Services;

use App\Contracts\AiService;
use App\Models\PromptVersion;

class ContentModerationService
{
    public function __construct(protected AiService $ai) {}

    public function moderate(string $content, int $userId): bool
    {
        $prompt = PromptVersion::where('name', 'content-moderation')
            ->latest('version')
            ->firstOrFail();

        $result = $this->ai->execute($prompt, ['content' => $content], $userId);

        if (! $result->success) {
            // Fail open or closed depending on your risk posture.
            // For moderation, fail closed: treat as unsafe.
            return false;
        }

        $decoded = json_decode($result->raw, true);

        // Expect: {"verdict": "safe"} or {"verdict": "unsafe", "reason": "..."}
        return ($decoded['verdict'] ?? 'unsafe') === 'safe';
    }
}

The domain stays clean. The AI layer remains infrastructure. Governance is centralized. The ContentModerationService has no knowledge of OpenAI, Http clients, or token budgets, and it should not.

What This Architecture Actually Buys You

Let’s be direct. This is significantly more code than a controller with an Http::post() call. The tradeoff is deliberate.

Prompt versioning gives you reproducibility: when a model update changes behavior, you know exactly which prompt version was in play. Budget enforcement prevents a misbehaving feature from generating a four-figure API invoice overnight. Structured output validation stops AI responses from corrupting Eloquent models or triggering downstream logic with unexpected data. Telemetry gives you the audit trail to replay, benchmark, and optimize. Queue-aware retries prevent silent operational drift where failed jobs disappear without a trace. For the full hardening approach to agentic output (schema validation against LLM hallucinations in production pipelines), see the dedicated guide.

Once AI is treated as infrastructure rather than a feature bolted onto a controller, your Laravel system becomes predictable under load, debuggable under failure, and economically controlled at scale.


Official Documentation References:


Subscribe
Notify of
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Navigation
Scroll to Top