Laravel Prompt Migrations: AI Prompt Versioning

Q: How do I handle prompt variables that contain JSON or structured data?

The {{variable}} string replacement in PromptDefinition::render() works for simple scalar values. For structured inputs — arrays, nested objects, formatted lists — render them to a string before passing them in. json_encode($payload, JSON_PRETTY_PRINT) is usually sufficient for LLM consumption. If you're building prompts where structure matters significantly, consider adding a format() method to your definition class that accepts typed inputs and returns the fully rendered user message, bypassing the generic template entirely. That approach keeps the rendering logic inside the definition where it belongs, rather than scattered across call sites.

Q: How do we track cost per prompt version over time?

The ai_executions table already stores prompt_tokens and completion_tokens per execution. Multiply those against the model's per-token pricing and you have raw cost data scoped to a specific prompt_version_id. A simple Eloquent query groups cost by version and date range. For operational visibility, pipe those aggregates into a scheduled job that writes daily summaries — or push them directly to a logging service. The moment you can answer "version 3 cost 40% more than version 2 for equivalent input volume," prompt governance stops being an engineering concern and starts being a business conversation. That shift is the whole point.

You version your database schema. You version your code. You version your infrastructure config.

Your prompts? Almost certainly not.

Most teams are editing them directly — in a config file, a database row, or worse, inline inside application logic. It works, right up until output quality silently degrades, costs spike without explanation, or a regression surfaces and you have no idea which prompt version caused it. Laravel prompt migrations solve this. This guide walks you through a complete, production-ready implementation — from the database schema to the executor — versioned in Git, synced through CI, and fully rollback-safe.

The Problem With Unversioned Prompts

Here’s a scenario you’ve probably lived through.

You refine a summarization prompt because the output is too verbose. You tighten the wording, nudge the temperature. Everything looks better in testing. Two weeks later a customer reports inconsistent output and you start digging.

Which prompt version ran for that request? Was the temperature changed before or after that job executed? Did staging and production share the same definition?

If your prompt lives as mutable database content, you cannot answer those questions with any confidence. The problem isn’t AI unpredictability — that’s a given, and it’s the model vendor’s problem. The problem is deployment unpredictability. That one is entirely on us.

Prompts Should Behave Like Migrations

Laravel developers are already comfortable with the migration pattern. A schema migration represents a controlled, reviewable state transition. You don’t edit tables in production. You write a migration, commit it, and deploy it. Prompt migrations follow the exact same model:

Prompt definitions live in version control
Each version is explicitly numbered
Deployment syncs definitions into the database
Historical versions remain fully traceable

You stop editing prompts. You start versioning them.

The Database Schema

Before anything else, we need the table that backs this system. Run:

php artisan make:migration create_prompt_versions_table

<?php

use Illuminate\Database\Migrations\Migration;
use Illuminate\Database\Schema\Blueprint;
use Illuminate\Support\Facades\Schema;

return new class extends Migration
{
    public function up(): void
    {
        Schema::create('prompt_versions', function (Blueprint $table) {
            $table->id();
            $table->string('name', 100)->index();
            $table->unsignedSmallInteger('version');
            $table->string('model', 100);
            $table->text('system_prompt');
            $table->text('user_template');
            $table->decimal('temperature', 3, 2)->default(0.7);
            $table->unsignedSmallInteger('max_tokens')->default(1000);
            $table->char('checksum', 64);
            $table->string('environment', 20)->default('production');
            $table->timestamps();

            $table->unique(['name', 'version', 'environment']);
        });

        Schema::create('ai_executions', function (Blueprint $table) {
            $table->id();
            $table->foreignId('prompt_version_id')->constrained('prompt_versions');
            $table->json('input_payload');
            $table->longText('raw_output')->nullable();
            $table->unsignedInteger('prompt_tokens')->default(0);
            $table->unsignedInteger('completion_tokens')->default(0);
            $table->unsignedSmallInteger('latency_ms')->nullable();
            $table->char('response_hash', 64)->nullable();
            $table->string('status', 20)->default('success');
            $table->text('error_message')->nullable();
            $table->timestamps();

            $table->index(['prompt_version_id', 'created_at']);
        });
    }

    public function down(): void
    {
        Schema::dropIfExists('ai_executions');
        Schema::dropIfExists('prompt_versions');
    }
};

Two tables, not one. prompt_versions holds your definitions. ai_executions is your audit trail — every AI call is logged against the exact version that produced it. That response_hash column is what lets you detect silent output drift between versions running the same input.

[Architect’s Note] The unique(['name', 'version', 'environment']) constraint is intentional. It prevents duplicate syncs from creating phantom rows during parallel CI deployments and enforces the invariant that a given named prompt can only have one definition per version per environment. Don’t remove it.

Generating a Prompt Definition

We introduce a custom Artisan command:

php artisan make:prompt SummarizeArticle

This generates a timestamped definition file inside app/Prompts. Here’s a representative example:

<?php

use App\Ai\PromptDefinition;

return new class extends PromptDefinition
{
    public string $name = 'summarize';

    public int $version = 2;

    public string $model = 'gpt-4o-mini';

    public float $temperature = 0.2;

    public int $maxTokens = 800;

    public function system(): string
    {
        return 'You are a precise summarizer. Return a maximum of three sentences.';
    }

    public function user(): string
    {
        return 'Summarize the following article: {{text}}';
    }
};

And the base class it extends, so there’s no black box:

<?php

namespace App\Ai;

abstract class PromptDefinition
{
    public string $name;
    public int $version;
    public string $model;
    public float $temperature = 0.7;
    public int $maxTokens = 1000;

    abstract public function system(): string;
    abstract public function user(): string;

    public function render(array $variables): string
    {
        $template = $this->user();

        foreach ($variables as $key => $value) {
            $template = str_replace('{{' . $key . '}}', $value, $template);
        }

        return $template;
    }
}

Three things matter here. The prompt has a name. It has an explicit version number. It co-locates execution parameters — model, temperature, token ceiling — with the text itself. You can’t change the wording without touching the same file that controls model config. You’re forced to think about the definition as a unit.

You don’t tweak prompts. You increment versions.

Syncing Prompts to the Database

Manual database inserts have no place in a production deployment pipeline. The sync command makes prompt state deterministic:

php artisan ai:prompts:sync

Here’s the full implementation — including the error handling the naive version skips:

<?php

namespace App\Console\Commands;

use App\Ai\PromptDefinition;
use App\Models\PromptVersion;
use Illuminate\Console\Command;
use Illuminate\Support\Facades\Log;

class SyncPrompts extends Command
{
    protected $signature   = 'ai:prompts:sync';
    protected $description = 'Sync prompt definitions from app/Prompts into the database.';

    public function handle(): int
    {
        $files = glob(app_path('Prompts/*.php'));

        if ($files === false || empty($files)) {
            $this->warn('No prompt definitions found in app/Prompts.');
            return self::SUCCESS;
        }

        $environment = app()->environment();
        $synced      = 0;
        $skipped     = 0;

        foreach ($files as $file) {
            try {
                $definition = require $file;

                if (! $definition instanceof PromptDefinition) {
                    $this->error("Skipping {$file}: does not return a PromptDefinition instance.");
                    $skipped++;
                    continue;
                }

                $checksum = hash(
                    'sha256',
                    $definition->system() .
                    $definition->user() .
                    (string) $definition->temperature .
                    $definition->model
                );

                PromptVersion::updateOrCreate(
                    [
                        'name'        => $definition->name,
                        'version'     => $definition->version,
                        'environment' => $environment,
                    ],
                    [
                        'model'         => $definition->model,
                        'system_prompt' => $definition->system(),
                        'user_template' => $definition->user(),
                        'temperature'   => $definition->temperature,
                        'max_tokens'    => $definition->maxTokens,
                        'checksum'      => $checksum,
                    ]
                );

                $this->info("Synced: {$definition->name} v{$definition->version} [{$environment}]");
                $synced++;

            } catch (\Throwable $e) {
                $this->error("Failed to sync {$file}: {$e->getMessage()}");
                Log::error('Prompt sync failure', [
                    'file'        => $file,
                    'environment' => $environment,
                    'error'       => $e->getMessage(),
                ]);
                $skipped++;
            }
        }

        $this->line('');
        $this->info("Sync complete. Synced: {$synced} | Skipped: {$skipped}");

        return self::SUCCESS;
    }
}

[Production Pitfall] Without the per-file \Throwable catch, a single malformed definition aborts the entire sync and leaves your prompt_versions table partially updated. Under a rolling deploy, some instances reference the new version and some don’t. That failure looks like a flaky AI response in your logs when it’s actually an infrastructure split-brain. Catch per-file. Log and continue. Always.

Now prompt deployment is part of your release process. Add it to your Forge deploy script or CI pipeline after php artisan migrate:

php artisan migrate --force
php artisan ai:prompts:sync

Prompt drift disappears.

Defining the GovernedExecutor

The executor is the component every previous version of this article left undefined. Let’s fix that.

The GovernedExecutor has one job: accept a PromptVersion model, render the user template against the provided variables, call the AI provider, and persist the result to ai_executions. It’s a thin service class, not a god object.

<?php

namespace App\Ai;

use App\Models\AiExecution;
use App\Models\PromptVersion;
use Illuminate\Support\Facades\Http;
use Illuminate\Support\Facades\Log;

class GovernedExecutor
{
    public function run(PromptVersion $prompt, array $variables): AiExecution
    {
        $userMessage = $this->render($prompt->user_template, $variables);

        $start = microtime(true);

        try {
            $response = Http::withToken(config('services.openai.key'))
                ->retry(3, 500, fn ($e) => $e->response?->status() === 429)
                ->timeout(30)
                ->post('https://api.openai.com/v1/chat/completions', [
                    'model'       => $prompt->model,
                    'temperature' => (float) $prompt->temperature,
                    'max_tokens'  => $prompt->max_tokens,
                    'messages'    => [
                        ['role' => 'system', 'content' => $prompt->system_prompt],
                        ['role' => 'user',   'content' => $userMessage],
                    ],
                ]);

            $response->throw();

            $body           = $response->json();
            $rawOutput      = $body['choices'][0]['message']['content'] ?? '';
            $latencyMs      = (int) ((microtime(true) - $start) * 1000);

            return AiExecution::create([
                'prompt_version_id'  => $prompt->id,
                'input_payload'      => $variables,
                'raw_output'         => $rawOutput,
                'prompt_tokens'      => $body['usage']['prompt_tokens'] ?? 0,
                'completion_tokens'  => $body['usage']['completion_tokens'] ?? 0,
                'latency_ms'         => $latencyMs,
                'response_hash'      => hash('sha256', $rawOutput),
                'status'             => 'success',
            ]);

        } catch (\Throwable $e) {
            Log::error('GovernedExecutor failure', [
                'prompt'  => $prompt->name,
                'version' => $prompt->version,
                'error'   => $e->getMessage(),
            ]);

            return AiExecution::create([
                'prompt_version_id' => $prompt->id,
                'input_payload'     => $variables,
                'status'            => 'error',
                'error_message'     => $e->getMessage(),
            ]);
        }
    }

    private function render(string $template, array $variables): string
    {
        foreach ($variables as $key => $value) {
            $template = str_replace('{{' . $key . '}}', $value, $template);
        }

        return $template;
    }
}

A few deliberate decisions here. The retry(3, 500, ...) lambda targets HTTP 429s specifically — you don’t want to retry a 400 validation error three times. The timeout(30) is a ceiling, not a suggestion; without it, a stalled provider connection can hold a queue worker indefinitely. Every execution — success or failure — writes a row. That’s non-negotiable for an audit trail.

// app/Providers/AiServiceProvider.php

$this->app->singleton(GovernedExecutor::class);

For deeper coverage of the rate-limit and token-tracking middleware that should sit above this executor, the Laravel AI Middleware: Token Tracking & Rate Limiting guide covers that layer in full — it pairs directly with what we’re building here.

Deterministic Execution

With the schema in place and the executor defined, consumption is clean:

$prompt = PromptVersion::where('name', 'summarize')
    ->where('environment', app()->environment())
    ->latest('version')
    ->firstOrFail();

$execution = app(GovernedExecutor::class)
    ->run($prompt, ['text' => $article->body]);

echo $execution->raw_output;

Every AI interaction now records the prompt version ID, input payload, raw output, token usage, latency, and a response hash. When output quality changes, you trace it to the exact version. When a regression surfaces, you replay the execution using the stored input and stored prompt config.

This is not about eliminating AI variability. It’s about eliminating deployment ambiguity. Those are different problems. Only one of them is yours to fix.

Rollbacks Are Now Safe

Version 3 of a prompt introduces subtle output issues. Here’s your recovery path:

Revert the definition file in Git to version 2
Redeploy
Run php artisan ai:prompts:sync

The system references the previous stable version. No manual database edits. No guesswork. No 2am messages explaining why you’re directly modifying production records. Operational confidence looks unremarkable when it works — which is exactly the point.

Environment-Specific Prompt Isolation

Staging and production should never share prompt state. The environment column we added to the schema handles this automatically. Every sync scopes definitions to app()->environment(), so an aggressive prompt experiment in staging never touches production’s prompt_versions rows.

You now have version control at the code level, isolation at the environment level, and traceability at the execution level. That combination is genuinely rare in Laravel AI integrations. Most teams have none of it.

If you’re thinking about how this fits into a broader governance architecture — contracts, provider abstraction, telemetry — the Production-Grade AI Architecture in Laravel guide covers that full structural layer and is the natural next step after this one.

When This Actually Matters

AI failures aren’t always loud. Sometimes the model produces output that’s syntactically correct but semantically degraded. A prompt change subtly shifts tone and affects user trust in ways that don’t trigger error logs. Summaries become slightly less accurate over three deploys and nobody can point to why.

If prompts are mutable and undocumented, debugging becomes archaeology. You’re querying updated_at timestamps on database rows and hoping someone left a comment.

Prompt migrations shift AI workflows from experimentation into engineering discipline. They align AI with the same operational standards you already apply to Eloquent schema changes and deployment pipelines. The mental model is already there. We’re extending it, not inventing something new.

[Word to the Wise] The teams that fight hardest against this pattern are usually the ones who’ve never had a billing spike they couldn’t explain. Once you’ve spent an afternoon trying to prove to a client that the prompt change on a Tuesday caused the token usage spike on a Wednesday, you’ll never store a prompt in a mutable database row again.

The Larger Picture

Most Laravel AI tutorials stop at calling a provider. That’s step one. The real challenge begins when AI decisions affect billing, AI moderation affects user trust, or AI automation drives workflow correctness.

At that point, prompts are infrastructure. Infrastructure must be versioned. It must be deployable through CI. It must be rollback-safe. And it must leave an audit trail that survives a post-incident review.

Frequently Asked Questions

Can I use this system with Anthropic Claude instead of OpenAI?

Yes, and the architecture is designed for exactly that. The PromptVersion model stores the model name as a plain string — gpt-4o-mini, claude-sonnet-4-6, whatever you need. The only component that cares about the provider is GovernedExecutor. Swap the endpoint and payload shape for Anthropic’s Messages API and everything else — the sync command, the schema, the audit trail — stays identical. If you’re running multiple providers simultaneously, extract the HTTP call behind an interface and bind the correct implementation per model prefix in your Service Provider. We cover that abstraction pattern in detail in the Production-Grade AI Architecture in Laravel guide.

What happens if two developers increment the same version number in separate branches?

This is the most common real-world collision with this pattern. The unique(['name', 'version', 'environment']) database constraint catches it at sync time — one branch will fail with a constraint violation rather than silently overwrite the other. That’s intentional. The correct resolution is to treat version numbers like migration timestamps: whoever merges second rebases and increments. A pre-merge CI check that validates no two definition files share the same name + version pair catches it even earlier, before it hits the database. Add that check to your pipeline and the collision becomes a PR comment, not a production incident.

How do I handle prompt variables that contain JSON or structured data?

The {{variable}} string replacement in PromptDefinition::render() works for simple scalar values. For structured inputs — arrays, nested objects, formatted lists — render them to a string before passing them in. json_encode($payload, JSON_PRETTY_PRINT) is usually sufficient for LLM consumption. If you’re building prompts where structure matters significantly, consider adding a format() method to your definition class that accepts typed inputs and returns the fully rendered user message, bypassing the generic template entirely. That approach keeps the rendering logic inside the definition where it belongs, rather than scattered across call sites.

Does this work with Laravel queues and background jobs?

It’s actually better suited to queued jobs than synchronous calls. Resolve the PromptVersion before dispatching the job and pass the id rather than the full model — this avoids serializing Eloquent models onto the queue and ensures the worker uses the version that was current at dispatch time, not whatever version exists when the job eventually runs. Inside the job, re-fetch the model by ID and pass it to GovernedExecutor::run() as normal. The ai_executions log then gives you a complete record of which job, which version, and which output — essential for debugging async workflows where the failure is hours removed from the original dispatch.

How do we track cost per prompt version over time?

The ai_executions table already stores prompt_tokens and completion_tokens per execution. Multiply those against the model’s per-token pricing and you have raw cost data scoped to a specific prompt_version_id. A simple Eloquent query groups cost by version and date range. For operational visibility, pipe those aggregates into a scheduled job that writes daily summaries — or push them directly to a logging service. The moment you can answer “version 3 cost 40% more than version 2 for equivalent input volume,” prompt governance stops being an engineering concern and starts being a business conversation. That shift is the whole point.

Dewald Hugo

Senior Laravel Developer and AI Architect with 10+ years in the trenches. Dewald writes about building resilient, cost-aware AI integrations and modernizing the Laravel developer workflow for the 2026 ecosystem.

4 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

Robert Robert

1 month ago

Honestly, I feel like I’m losing my mind with these prompt strings. I’ve got a SummarizationService that’s basically 400 lines of just heredoc prompts. Every time I tweak one to stop the LLM from being too ‘chatty,’ I end up breaking the formatting for another part of the app.
I saw your point about ‘Prompt Migrations’—is the idea that we actually version these in the DB like we do with tables? How do you handle the actual swap in production without a full deployment?

Author

Reply to Robert Robert

I feel your pain. The “hardcoded string” phase is the ‘honeymoon period’ of AI dev—it works great until you have more than three prompts.
To your point: Yes, exactly. Think of it as a “State Machine” for your LLM’s brain.
Instead of your SummarizationService owning the text, it should only own the ID of the prompt it needs. The actual logic—the instructions, the model choice (GPT-4 vs. Claude), and the temperature—lives in a prompts table.
Here’s the workflow I use to avoid those deployment headaches:

The ‘Draft’ Phase: You write a new prompt version in a migration-like file.
The ‘Sync’: You run a custom Artisan command (e.g., php artisan prompts:migrate) that pushes that code into your production DB.
The ‘Switch’: Because it’s in the DB, you can use a simple admin toggle or an environment variable to point the app to v2 instead of v1.

If the LLM starts hallucinating on v2, you don’t need to roll back your entire code via Git. You just flip the database ‘active’ flag back to v1. It takes seconds, and your users are safe while you debug the prompt.
It turns “Prompt Engineering” from a guessing game into a controlled release process.

Reply to Dewald Hugo

Thanks for the quick reply! That “Switch” logic makes sense in theory, but I’m looking at this snippet in your post and I’m a bit stuck:

return Prompt::get(‘summarizer’)->execute($data);

Where is the versioning actually happening there? If I have v1 and v2 in my database for the summarizer key, how do I prevent the code from just grabbing the latest one and breaking everything before I’m ready?

Do I need to hardcode Prompt::get(‘summarizer’, ‘v2’) everywhere in my services, or is there a way to handle this globally so I don’t have to do a massive find-and-replace every time I update the prompt?

Great catch. You definitely don’t want to hardcode 'v2' everywhere—that defeats the whole purpose of being agile!
There are two ways to handle that “Which version?” logic depending on how much control you need:

The “Global Active” Flag: In your prompts table, you have an is_active boolean. Your Prompt::get() method should default to where('name', $name)->where('is_active', true)->first(). When you’re ready to switch, you just run a DB update to flip the bit from v1 to v2. No code change required.
The Config/Env Strategy: If you want more granular control (like testing v2 in staging but keeping v1 in prod), you can use an environment variable: PROMPT_SUMMARIZER_VERSION=2.

In the get() method, it looks like this:

public static function get($name) {
    $version = config("prompts.versions.{$name}", 'latest');
    return static::where('name', $name)->where('version', $version)->first();
}

This way, your Service class stays clean with Prompt::get('summarizer'), but the actual version being pulled is controlled by your DB state or your .env file. It keeps the “decision” out of your business logic.