Integrating Claude or GPT-4o into a Laravel app is deceptively easy. One Http::post() and you feel like a genius. Then a single user loops your endpoint, or a bot scripts it, and your OpenAI bill climbs $300 before you wake up. The answer is a dedicated Laravel AI middleware layer — and most tutorials skip it entirely.
That’s not a hypothetical. It happens.
The fix isn’t smarter prompts — it’s infrastructure. In this guide, we build a proper AI management layer: tiered rate limiting, per-user token tracking with async logging, and a Pest test suite that actually validates the guards are working. We’re not demoing a toy. We’re building something you can ship. For prompt versioning and deployment discipline on the prompt side of that infrastructure, see the prompt migrations guide.
1. The Strategy: Why a Laravel AI Middleware Layer?
Middleware in the Laravel lifecycle is the right place for this because it separates infrastructure concerns (cost, rate limits) from business logic (the prompt itself). This decoupling means you can swap gpt-4o for a self-hosted Llama instance without touching a single Controller.
For the output side of that separation — validating what the model returns before your application acts on it — see the guide on schema validation against LLM hallucinations.
We’re tracking cost using the standard token pricing formula:
Centralise that calculation once. Never scatter it across Controllers.
2. Database Schema: An Audit Trail, Not Just a Counter
Don’t fall into the trap of incrementing a single total_tokens integer on your users table. You need a row per request so you can audit, refund, and debug.
// database/migrations/xxxx_xx_xx_create_ai_usage_logs_table.php
public function up(): void
{
Schema::create('ai_usage_logs', function (Blueprint $table) {
$table->id();
$table->foreignId('user_id')->constrained()->onDelete('cascade');
$table->string('model_used'); // e.g., 'gpt-4o', 'claude-sonnet-4-6'
$table->string('feature_name'); // e.g., 'blog_generator', 'chat'
$table->integer('prompt_tokens')->default(0);
$table->integer('completion_tokens')->default(0);
$table->decimal('estimated_cost', 10, 6)->default(0);
$table->timestamps();
});
}
3. Tiered Rate Limiting
Define your tiers in AppServiceProvider. Free users get 5 requests per minute. Premium gets 100. Adjust to whatever your unit economics demand.
// app/Providers/AppServiceProvider.php
use Illuminate\Cache\RateLimiting\Limit;
use Illuminate\Support\Facades\RateLimiter;
use Illuminate\Http\Request;
public function boot(): void
{
RateLimiter::for('ai-api', function (Request $request) {
$user = $request->user();
return $user?->is_premium
? Limit::perMinute(100)->by($user->id)
: Limit::perMinute(5)->by($user->id);
});
}
Senior Dev Tip: Back this with Redis, not your database. Set
CACHE_DRIVER=redisin.env. If you’re running the defaultfileordatabasedriver, every AI request triggers a DB read-write cycle under the rate limiter — that’s a quiet bottleneck that won’t show up until you’re under load. See the Laravel Cache documentation for driver configuration.
4. Building the Middleware
Generate the class:
php artisan make:middleware EnsureUserHasAiCredits
This middleware enforces auth and your per-user spending ceiling before the request ever reaches the Controller:
// app/Http/Middleware/EnsureUserHasAiCredits.php
namespace App\Http\Middleware;
use Closure;
use Illuminate\Http\Request;
use Symfony\Component\HttpFoundation\Response;
class EnsureUserHasAiCredits
{
public function handle(Request $request, Closure $next): Response
{
$user = $request->user();
if (! $user) {
return response()->json(['error' => 'Unauthorized'], 401);
}
if ($user->monthly_spend >= $user->spending_limit) {
return response()->json([
'error' => 'Monthly spending limit reached.',
'upgrade' => route('billing.upgrade'),
], 402);
}
return $next($request);
}
}
Registering the Middleware (Laravel 11+)
Laravel 11 dropped app/Http/Kernel.php. Register your middleware in bootstrap/app.php:
// bootstrap/app.php
->withMiddleware(function (Middleware $middleware) {
$middleware->appendToGroup('api', [
\App\Http\Middleware\EnsureUserHasAiCredits::class,
]);
})
Then apply the rate limiter on your route:
// routes/api.php
Route::post('/generate', [AiController::class, 'generate'])
->middleware('throttle:ai-api');
5. Externalising Model Pricing
Hardcoding 0.000005 per token inside your Job is a maintenance trap. One pricing change from OpenAI and your cost accounting silently breaks. Put rates in config/ai.php:
// config/ai.php
return [
'models' => [
'gpt-4o' => [
'prompt_rate' => 0.000005,
'completion_rate' => 0.000015,
],
'claude-sonnet-4-5' => [
'prompt_rate' => 0.000003,
'completion_rate' => 0.000015,
],
],
];
6. Async Usage Logging with a Queued Job
Once the API responds, log asynchronously. Don’t make the user wait for a DB write.
// app/Jobs/LogAiUsage.php
namespace App\Jobs;
use App\Models\AiUsageLog;
use Illuminate\Bus\Queueable;
use Illuminate\Contracts\Queue\ShouldQueue;
use Illuminate\Foundation\Bus\Dispatchable;
use Illuminate\Queue\InteractsWithQueue;
use Illuminate\Queue\SerializesModels;
class LogAiUsage implements ShouldQueue
{
use Dispatchable, InteractsWithQueue, Queueable, SerializesModels;
public function __construct(
public readonly int $userId,
public readonly array $usageData,
) {}
public function handle(): void
{
$model = $this->usageData['model'];
$rates = config("ai.models.{$model}", [
'prompt_rate' => 0.000005,
'completion_rate' => 0.000015,
]);
$cost = ($this->usageData['prompt_tokens'] * $rates['prompt_rate'])
+ ($this->usageData['completion_tokens'] * $rates['completion_rate']);
AiUsageLog::create([
'user_id' => $this->userId,
'model_used' => $model,
'feature_name' => $this->usageData['feature_name'],
'prompt_tokens' => $this->usageData['prompt_tokens'],
'completion_tokens' => $this->usageData['completion_tokens'],
'estimated_cost' => $cost,
]);
}
}
Notice
feature_nameis included in both the Job payload and thecreate()call. The original migration defines that column — if you omit it from the insert, Laravel will either throw aNOT NULLconstraint violation or silently ignore it if it’s nullable. Either way, you’ve lost attribution data. Always match your payload to your schema.
7. The Controller: Gluing It Together
// app/Http/Controllers/AiController.php
namespace App\Http\Controllers;
use App\Jobs\LogAiUsage;
use Illuminate\Http\Request;
use Illuminate\Http\JsonResponse;
use OpenAI\Laravel\Facades\OpenAI;
use Illuminate\Validation\ValidationException;
class AiController extends Controller
{
public function generate(Request $request): JsonResponse
{
$validated = $request->validate([
'prompt' => ['required', 'string', 'max:4000'],
'feature' => ['required', 'string', 'in:blog_generator,chat,summariser'],
]);
try {
$result = OpenAI::chat()->create([
'model' => 'gpt-4o',
'messages' => [['role' => 'user', 'content' => $validated['prompt']]],
]);
} catch (\OpenAI\Exceptions\TransporterException $e) {
report($e);
return response()->json(['error' => 'AI service unavailable. Please try again.'], 503);
} catch (\OpenAI\Exceptions\ErrorException $e) {
// Catches 429s, quota exceeded, invalid requests from OpenAI
report($e);
return response()->json(['error' => $e->getMessage()], $e->getCode() ?: 500);
}
LogAiUsage::dispatch(auth()->id(), [
'model' => 'gpt-4o',
'feature_name' => $validated['feature'],
'prompt_tokens' => $result->usage->promptTokens,
'completion_tokens' => $result->usage->completionTokens,
]);
return response()->json([
'content' => $result->choices[0]->message->content,
]);
}
}
If you’re integrating Anthropic’s SDK instead, the pattern is identical — swap the client call and update the model string to claude-sonnet-4-5. See the Anthropic API documentation for the current SDK reference.
8. Testing the Architecture with Pest
A production system is only as solid as the test suite backing it. We need to verify three things independently:
- The rate limiter blocks users who’ve exhausted their quota.
- The middleware halts users who’ve hit their spending ceiling.
- The
LogAiUsageJob is dispatched with the correct payload — without hitting the real API.
That last point is where most AI testing tutorials fail you. Bus::fake() queues the Job but does nothing about the OpenAI facade. If you don’t mock it, your test suite is burning real API credits.
// tests/Feature/AiGenerationTest.php
use App\Jobs\LogAiUsage;
use App\Models\User;
use Illuminate\Support\Facades\Bus;
use Illuminate\Support\Facades\RateLimiter;
use OpenAI\Laravel\Facades\OpenAI;
use OpenAI\Responses\Chat\CreateResponse;
beforeEach(function () {
Bus::fake();
});
it('blocks a free user who has hit their rate limit', function () {
$user = User::factory()->create(['is_premium' => false]);
// Simulate 5 prior requests — the free-tier ceiling
foreach (range(1, 5) as $i) {
RateLimiter::hit('ai-api:' . $user->id);
}
$this->actingAs($user)
->postJson('/api/generate', [
'prompt' => 'Hello AI',
'feature' => 'chat',
])
->assertStatus(429);
});
it('blocks a user who has exceeded their monthly spending limit', function () {
$user = User::factory()->create([
'monthly_spend' => 50.00,
'spending_limit' => 50.00,
]);
$this->actingAs($user)
->postJson('/api/generate', [
'prompt' => 'Hello AI',
'feature' => 'chat',
])
->assertStatus(402)
->assertJsonPath('error', 'Monthly spending limit reached.');
});
it('dispatches a LogAiUsage job with correct token data on success', function () {
$user = User::factory()->create(['is_premium' => true]);
// Mock the OpenAI facade — never hit the real API in tests
OpenAI::fake([
CreateResponse::fake([
'choices' => [
['message' => ['role' => 'assistant', 'content' => 'Mocked response']],
],
'usage' => [
'prompt_tokens' => 15,
'completion_tokens' => 42,
'total_tokens' => 57,
],
]),
]);
$this->actingAs($user)
->postJson('/api/generate', [
'prompt' => 'Test prompt',
'feature' => 'chat',
])
->assertOk()
->assertJsonPath('content', 'Mocked response');
Bus::assertDispatched(LogAiUsage::class, function (LogAiUsage $job) use ($user) {
return $job->userId === $user->id
&& $job->usageData['prompt_tokens'] === 15
&& $job->usageData['completion_tokens'] === 42
&& $job->usageData['feature_name'] === 'chat';
});
});
If you’re building out the Eloquent side of this and want to understand how to test model relationships and factory states at scale, the patterns covered in building robust Laravel test factories translate directly to the User and AiUsageLog models here.
9. What You Now Have
Shipping this architecture gives you:
- Cost protection — tiered rate limiting via the Laravel
RateLimiterfacade backed by Redis, plus a hard monthly ceiling enforced in middleware. - Full audit trail — every token consumed is recorded per-user, per-feature, per-model via an Eloquent model in the Service Container-resolved
LogAiUsageJob. - Zero latency impact — async logging means your users never wait for a DB write.
- A test suite that doesn’t cheat — the OpenAI facade is mocked, so your CI pipeline stays free and your assertions are actually trustworthy.
The gap between a weekend project and a production system isn’t the AI integration. It’s everything around it. This is that layer.
References:
Senior Laravel Developer and AI Architect with 10+ years in the trenches. Dewald writes about building resilient, cost-aware AI integrations and modernizing the Laravel developer workflow for the 2026 ecosystem.

