laravel ai middleware

Laravel AI Middleware: Token Tracking & Rate Limiting

Integrating Claude or GPT-4o into a Laravel app is deceptively easy. One Http::post() and you feel like a genius. Then a single user loops your endpoint, or a bot scripts it, and your OpenAI bill climbs $300 before you wake up. The answer is a dedicated Laravel AI middleware layer — and most tutorials skip it entirely.

That’s not a hypothetical. It happens.

The fix isn’t smarter prompts — it’s infrastructure. In this guide, we build a proper AI management layer: tiered rate limiting, per-user token tracking with async logging, and a Pest test suite that actually validates the guards are working. We’re not demoing a toy. We’re building something you can ship. For prompt versioning and deployment discipline on the prompt side of that infrastructure, see the prompt migrations guide.

1. The Strategy: Why a Laravel AI Middleware Layer?

Middleware in the Laravel lifecycle is the right place for this because it separates infrastructure concerns (cost, rate limits) from business logic (the prompt itself). This decoupling means you can swap gpt-4o for a self-hosted Llama instance without touching a single Controller.

For the output side of that separation — validating what the model returns before your application acts on it — see the guide on schema validation against LLM hallucinations.

We’re tracking cost using the standard token pricing formula:

$$Total Cost = \sum_{i=1}^{n} (Tokens_{in} \cdot Price_{in} + Tokens_{out} \cdot Price_{out})$$

Centralise that calculation once. Never scatter it across Controllers.

2. Database Schema: An Audit Trail, Not Just a Counter

Don’t fall into the trap of incrementing a single total_tokens integer on your users table. You need a row per request so you can audit, refund, and debug.

// database/migrations/xxxx_xx_xx_create_ai_usage_logs_table.php
public function up(): void
{
    Schema::create('ai_usage_logs', function (Blueprint $table) {
        $table->id();
        $table->foreignId('user_id')->constrained()->onDelete('cascade');
        $table->string('model_used');        // e.g., 'gpt-4o', 'claude-sonnet-4-6'
        $table->string('feature_name');      // e.g., 'blog_generator', 'chat'
        $table->integer('prompt_tokens')->default(0);
        $table->integer('completion_tokens')->default(0);
        $table->decimal('estimated_cost', 10, 6)->default(0);
        $table->timestamps();
    });
}

3. Tiered Rate Limiting

Define your tiers in AppServiceProvider. Free users get 5 requests per minute. Premium gets 100. Adjust to whatever your unit economics demand.

// app/Providers/AppServiceProvider.php
use Illuminate\Cache\RateLimiting\Limit;
use Illuminate\Support\Facades\RateLimiter;
use Illuminate\Http\Request;

public function boot(): void
{
    RateLimiter::for('ai-api', function (Request $request) {
        $user = $request->user();

        return $user?->is_premium
            ? Limit::perMinute(100)->by($user->id)
            : Limit::perMinute(5)->by($user->id);
    });
}

Senior Dev Tip: Back this with Redis, not your database. Set CACHE_DRIVER=redis in .env. If you’re running the default file or database driver, every AI request triggers a DB read-write cycle under the rate limiter — that’s a quiet bottleneck that won’t show up until you’re under load. See the Laravel Cache documentation for driver configuration.

4. Building the Middleware

Generate the class:

php artisan make:middleware EnsureUserHasAiCredits

This middleware enforces auth and your per-user spending ceiling before the request ever reaches the Controller:

// app/Http/Middleware/EnsureUserHasAiCredits.php
namespace App\Http\Middleware;

use Closure;
use Illuminate\Http\Request;
use Symfony\Component\HttpFoundation\Response;

class EnsureUserHasAiCredits
{
    public function handle(Request $request, Closure $next): Response
    {
        $user = $request->user();

        if (! $user) {
            return response()->json(['error' => 'Unauthorized'], 401);
        }

        if ($user->monthly_spend >= $user->spending_limit) {
            return response()->json([
                'error'   => 'Monthly spending limit reached.',
                'upgrade' => route('billing.upgrade'),
            ], 402);
        }

        return $next($request);
    }
}

Registering the Middleware (Laravel 11+)

Laravel 11 dropped app/Http/Kernel.php. Register your middleware in bootstrap/app.php:

// bootstrap/app.php
->withMiddleware(function (Middleware $middleware) {
    $middleware->appendToGroup('api', [
        \App\Http\Middleware\EnsureUserHasAiCredits::class,
    ]);
})

Then apply the rate limiter on your route:

// routes/api.php
Route::post('/generate', [AiController::class, 'generate'])
    ->middleware('throttle:ai-api');

5. Externalising Model Pricing

Hardcoding 0.000005 per token inside your Job is a maintenance trap. One pricing change from OpenAI and your cost accounting silently breaks. Put rates in config/ai.php:

// config/ai.php
return [
    'models' => [
        'gpt-4o' => [
            'prompt_rate'     => 0.000005,
            'completion_rate' => 0.000015,
        ],
        'claude-sonnet-4-5' => [
            'prompt_rate'     => 0.000003,
            'completion_rate' => 0.000015,
        ],
    ],
];

6. Async Usage Logging with a Queued Job

Once the API responds, log asynchronously. Don’t make the user wait for a DB write.

// app/Jobs/LogAiUsage.php
namespace App\Jobs;

use App\Models\AiUsageLog;
use Illuminate\Bus\Queueable;
use Illuminate\Contracts\Queue\ShouldQueue;
use Illuminate\Foundation\Bus\Dispatchable;
use Illuminate\Queue\InteractsWithQueue;
use Illuminate\Queue\SerializesModels;

class LogAiUsage implements ShouldQueue
{
    use Dispatchable, InteractsWithQueue, Queueable, SerializesModels;

    public function __construct(
        public readonly int    $userId,
        public readonly array  $usageData,
    ) {}

    public function handle(): void
    {
        $model  = $this->usageData['model'];
        $rates  = config("ai.models.{$model}", [
            'prompt_rate'     => 0.000005,
            'completion_rate' => 0.000015,
        ]);

        $cost = ($this->usageData['prompt_tokens'] * $rates['prompt_rate'])
              + ($this->usageData['completion_tokens'] * $rates['completion_rate']);

        AiUsageLog::create([
            'user_id'           => $this->userId,
            'model_used'        => $model,
            'feature_name'      => $this->usageData['feature_name'],
            'prompt_tokens'     => $this->usageData['prompt_tokens'],
            'completion_tokens' => $this->usageData['completion_tokens'],
            'estimated_cost'    => $cost,
        ]);
    }
}

Notice feature_name is included in both the Job payload and the create() call. The original migration defines that column — if you omit it from the insert, Laravel will either throw a NOT NULL constraint violation or silently ignore it if it’s nullable. Either way, you’ve lost attribution data. Always match your payload to your schema.

7. The Controller: Gluing It Together

// app/Http/Controllers/AiController.php
namespace App\Http\Controllers;

use App\Jobs\LogAiUsage;
use Illuminate\Http\Request;
use Illuminate\Http\JsonResponse;
use OpenAI\Laravel\Facades\OpenAI;
use Illuminate\Validation\ValidationException;

class AiController extends Controller
{
    public function generate(Request $request): JsonResponse
    {
        $validated = $request->validate([
            'prompt'  => ['required', 'string', 'max:4000'],
            'feature' => ['required', 'string', 'in:blog_generator,chat,summariser'],
        ]);

        try {
            $result = OpenAI::chat()->create([
                'model'    => 'gpt-4o',
                'messages' => [['role' => 'user', 'content' => $validated['prompt']]],
            ]);
        } catch (\OpenAI\Exceptions\TransporterException $e) {
            report($e);
            return response()->json(['error' => 'AI service unavailable. Please try again.'], 503);
        } catch (\OpenAI\Exceptions\ErrorException $e) {
            // Catches 429s, quota exceeded, invalid requests from OpenAI
            report($e);
            return response()->json(['error' => $e->getMessage()], $e->getCode() ?: 500);
        }

        LogAiUsage::dispatch(auth()->id(), [
            'model'             => 'gpt-4o',
            'feature_name'      => $validated['feature'],
            'prompt_tokens'     => $result->usage->promptTokens,
            'completion_tokens' => $result->usage->completionTokens,
        ]);

        return response()->json([
            'content' => $result->choices[0]->message->content,
        ]);
    }
}

If you’re integrating Anthropic’s SDK instead, the pattern is identical — swap the client call and update the model string to claude-sonnet-4-5. See the Anthropic API documentation for the current SDK reference.

8. Testing the Architecture with Pest

A production system is only as solid as the test suite backing it. We need to verify three things independently:

  1. The rate limiter blocks users who’ve exhausted their quota.
  2. The middleware halts users who’ve hit their spending ceiling.
  3. The LogAiUsage Job is dispatched with the correct payload — without hitting the real API.

That last point is where most AI testing tutorials fail you. Bus::fake() queues the Job but does nothing about the OpenAI facade. If you don’t mock it, your test suite is burning real API credits.

// tests/Feature/AiGenerationTest.php
use App\Jobs\LogAiUsage;
use App\Models\User;
use Illuminate\Support\Facades\Bus;
use Illuminate\Support\Facades\RateLimiter;
use OpenAI\Laravel\Facades\OpenAI;
use OpenAI\Responses\Chat\CreateResponse;

beforeEach(function () {
    Bus::fake();
});

it('blocks a free user who has hit their rate limit', function () {
    $user = User::factory()->create(['is_premium' => false]);

    // Simulate 5 prior requests — the free-tier ceiling
    foreach (range(1, 5) as $i) {
        RateLimiter::hit('ai-api:' . $user->id);
    }

    $this->actingAs($user)
        ->postJson('/api/generate', [
            'prompt'  => 'Hello AI',
            'feature' => 'chat',
        ])
        ->assertStatus(429);
});

it('blocks a user who has exceeded their monthly spending limit', function () {
    $user = User::factory()->create([
        'monthly_spend'  => 50.00,
        'spending_limit' => 50.00,
    ]);

    $this->actingAs($user)
        ->postJson('/api/generate', [
            'prompt'  => 'Hello AI',
            'feature' => 'chat',
        ])
        ->assertStatus(402)
        ->assertJsonPath('error', 'Monthly spending limit reached.');
});

it('dispatches a LogAiUsage job with correct token data on success', function () {
    $user = User::factory()->create(['is_premium' => true]);

    // Mock the OpenAI facade — never hit the real API in tests
    OpenAI::fake([
        CreateResponse::fake([
            'choices' => [
                ['message' => ['role' => 'assistant', 'content' => 'Mocked response']],
            ],
            'usage' => [
                'prompt_tokens'     => 15,
                'completion_tokens' => 42,
                'total_tokens'      => 57,
            ],
        ]),
    ]);

    $this->actingAs($user)
        ->postJson('/api/generate', [
            'prompt'  => 'Test prompt',
            'feature' => 'chat',
        ])
        ->assertOk()
        ->assertJsonPath('content', 'Mocked response');

    Bus::assertDispatched(LogAiUsage::class, function (LogAiUsage $job) use ($user) {
        return $job->userId === $user->id
            && $job->usageData['prompt_tokens'] === 15
            && $job->usageData['completion_tokens'] === 42
            && $job->usageData['feature_name'] === 'chat';
    });
});

If you’re building out the Eloquent side of this and want to understand how to test model relationships and factory states at scale, the patterns covered in building robust Laravel test factories translate directly to the User and AiUsageLog models here.

9. What You Now Have

Shipping this architecture gives you:

  • Cost protection — tiered rate limiting via the Laravel RateLimiter facade backed by Redis, plus a hard monthly ceiling enforced in middleware.
  • Full audit trail — every token consumed is recorded per-user, per-feature, per-model via an Eloquent model in the Service Container-resolved LogAiUsage Job.
  • Zero latency impact — async logging means your users never wait for a DB write.
  • A test suite that doesn’t cheat — the OpenAI facade is mocked, so your CI pipeline stays free and your assertions are actually trustworthy.

The gap between a weekend project and a production system isn’t the AI integration. It’s everything around it. This is that layer.

References:

Subscribe
Notify of
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Navigation
Scroll to Top