Laravel MCP Server timeout / 504 Gateway Time-out under high concurrency in production

Question

211 views2 days ago

2

Kimberly Powell July 20, 2026 0 Comments

I have a Laravel 13 application where I recently implemented the new laravel/mcp package to expose internal tools and database schemas to our enterprise AI clients via HTTP transport.

In development using php artisan mcp:serve --transport=stdio inside Cursor and Claude Desktop, everything works flawlessly. However, after moving to our staging environment—which utilizes the HTTP transport routed via routes/ai.php and runs on a traditional Nginx + PHP-FPM pool—we are running into severe performance degradation and frequent 504 Gateway Time-out errors when multiple agents invoke complex tools concurrently.

Our setup in routes/ai.php:

use App\Mcp\Servers\EnterpriseContextServer;
use Laravel\Mcp\Facades\Mcp;

Mcp::web('/mcp/v1', EnterpriseContextServer::class)
   ->middleware(['auth:api', 'throttle:60,1']);

One of our primary tools handles structured vector lookups and runs complex Eloquent operations. When an LLM client hits the /mcp/v1 endpoint sequentially, it responds fine. But if 3 or 4 users are interacting with the client simultaneously, the PHP-FPM workers max out, response times spike past 30 seconds, and Nginx cuts the connection.

I noticed the documentation mentions a “Dedicated HTTP Server using a high-performance ReactPHP/Octane loop” as an alternative transport layer, but I can’t find a clear implementation layout for handling HTTP-based production traffic under high concurrency.

1. How should I safely decouple long-running tool execution or LLM streaming context from the synchronous HTTP request lifecycle within laravel/mcp?

2. Is it better to migrate the server to a standalone stateful transport layer (like ReactPHP or Laravel Octane), and if so, how do you handle persistent JSON-RPC session states or garbage collection across multiple workers?

Dewald Hugo Answered question 2 days ago

1 Answer

score 2 · Answer 1 · 2026-07-26T01:29:12+00:00

The core issue: You’re exhausting your PHP-FPM worker pool. Every long-running tool invocation (vector searches, heavy Eloquent ops) blocks an FPM thread. At 3–4 concurrent agent requests, the pool starves, requests queue, and Nginx times out with a 504.

1. Decouple Execution (Request/Response Cycle)
– Offload Heavy Tools: Dispatch heavy vector/DB processing to background jobs via Laravel Queue / Horizon if synchronous context isn’t strictly required.
– Vector Offloading: Query dedicated search daemons (e.g., Pgvector, Meilisearch) rather than doing heavy Eloquent/DB math inside the web request thread.

2. Switch to an Event-Driven Runtime (Octane / FrankenPHP)
– Standard FPM isn’t built for long-lived, concurrent AI tool loops. Use Laravel Octane (Swoole or FrankenPHP) to run an in-memory, non-blocking event loop.

Implementation & State Rules

1. Install Octane:

composer require laravel/octane && php artisan octane:install

2. State in Redis: Store JSON-RPC session states and agent context in Redis—never in PHP memory/singletons—to avoid cross-worker contamination.

3. Flushing & Memory: Let Octane handle worker recycling (e.g., –max-requests=500) to automatically garbage-collect heavy query graphs.

4. Proxy Nginx directly to Octane: Bump timeouts in Nginx to match long tool execution windows:

Nginx

location /mcp/ {
    proxy_pass http://127.0.0.1:8000;
    proxy_read_timeout 300s;
}