Laravel Gemini Integration: The Complete laravel/ai SDK Guide

Q: Which Gemini model should I default to in production?

gemini-2.5-flash for most workloads. It is fast, cheap relative to its capability, and ships with a 1M-token context window. Reserve gemini-2.5-pro for tasks requiring deeper reasoning or structured output at high complexity. Do not use gemini-2.0-flash in any new integration — it is being shut down on June 1, 2026.

Q: How do I handle Gemini content that gets blocked by safety filters?

Check the response finishReason immediately after calling prompt(). If it equals 'SAFETY', throw a custom exception — do not retry, as the same input will produce the same outcome. Surface a clear, non-alarmist message to the user and log the prompt context for review. In tests, use the agent fake's closure form to simulate the blocked condition without touching the live API.

Q: How do I ensure my tests never accidentally hit the live Gemini API?

Call GeminiAgent::preventStray() in your test base class setUp() method. Any agent call without a registered fake will throw immediately rather than fall through to the network. Combine this with GeminiAgent::fake(['expected response']) in individual tests, and your CI pipeline will never incur live API costs or become dependent on external availability.

🕒 10 Minute Read 📅 Date Published: April 16, 2026

Last reviewed: April 2026

If your Laravel application already talks to OpenAI or Anthropic through the laravel/ai SDK, adding Gemini feels like a five-minute job. Set a new API key. Point to a new provider. Deploy. That assumption is mostly correct, but the gap between “mostly” and “done” is where production systems fall apart.

This is the complete Laravel Gemini integration guide. We cover the full path: authentication, provider configuration using the correct laravel/ai Agent API, streaming via SSE, the Gemini-specific behaviours that differ from OpenAI and Anthropic, error handling you can trust under load, cost awareness at 1M-token scale, and a testing suite that never hits the live API. By the end, Gemini is a first-class citizen in your stack.

Why Gemini, The Honest Case

Three concrete advantages.

Context window. Gemini 2.5 Flash and Pro both ship with a 1M-token context window as standard. GPT-4o is 128K. Claude Sonnet 4.5 is 200K. For document summarisation, RAG over large corpora, or classification pipelines that need to see a lot of context at once, that gap is not a rounding error – it changes your chunking strategy and your retrieval architecture.

Cost at scale. Flash is genuinely cheap for what it delivers.

Model	Input (per 1M tokens)	Output (per 1M tokens)	Context
gemini-2.5-flash	~$0.15	~$0.60	1M
gemini-2.5-pro	~$1.25	~$10.00	1M
gpt-4o	~$2.50	~$10.00	128K
claude-sonnet-4-5	~$3.00	~$15.00	200K

* Verify against current pricing pages before committing to cost projections.

Resilience. A single-provider AI architecture has one failure mode: when your provider rate-limits or goes down, your feature goes dark. Gemini as a hot fallback changes that. This is not hypothetical, we have seen a 2 AM batch job go silent because OpenAI was having a bad night. A properly configured failover would have rerouted automatically.

The laravel/ai SDK was built with provider-agnosticism as a first principle. Gemini is a first-class supported driver. The architecture already expects you to run multiple providers. If you have not yet settled on which providers belong in your stack, the Laravel AI provider comparison and architecture guide covers how Gemini fits alongside OpenAI and Claude across different task types and cost profiles, worth reading before you design the adapter layer.

Authentication & Configuration

Gemini supports two authentication models and the distinction matters before you write a line of code.

Google AI Studio API key. The right starting point. Create a key at aistudio.google.com, drop it in .env, done. This works for all development and moderate-traffic production use cases. The limitation is quota: AI Studio keys have default rate limits that you will hit on high-throughput workloads. When you hit that ceiling, you need to request a quota increase via Google Cloud or migrate to Vertex AI.

Vertex AI service account. The right endpoint for high-volume production. Authentication uses a Google Cloud service account JSON credential file rather than a plain API key. The base URL also changes. Planning for this migration before you hit your Studio quota ceiling is worth the hour it takes. You do not want to be doing auth refactoring under production pressure.

For AI Studio (the common path), your .env is simple:

GEMINI_API_KEY=your-google-ai-studio-key

Install the SDK and publish config:

composer require laravel/ai
php artisan vendor:publish --provider="Laravel\Ai\AiServiceProvider"
php artisan migrate

In config/ai.php, register the Gemini provider — note the correct key names:

// config/ai.php
'default' => env('AI_DEFAULT_PROVIDER', 'gemini'),

'providers' => [
    'openai' => [
        'driver' => 'openai',
        'key'    => env('OPENAI_API_KEY'),
    ],

    'anthropic' => [
        'driver' => 'anthropic',
        'key'    => env('ANTHROPIC_API_KEY'),
    ],

    'gemini' => [
        'driver' => 'gemini',
        'key'    => env('GEMINI_API_KEY'),
    ],
],

Two things to notice. The array key is providers, not connections. The credential key is key, not api_key.

If you need a custom base URL (for Vertex AI or a proxy gateway), the SDK supports it cleanly:

'gemini' => [
    'driver' => 'gemini',
    'key'    => env('GEMINI_API_KEY'),
    'url'    => env('GEMINI_BASE_URL'), // for Vertex AI or LiteLLM
],

Model Selection – Configurable, Not Hardcoded

Two models belong in a production Gemini integration.

gemini-2.5-flash is your default. Fast, cost-efficient, 1M-token context window. For text generation, summarisation, classification, and most RAG workflows, it is the correct choice. Hardcoded model strings in your codebase are a maintenance liability, when Google releases Gemini 3.x stable, you want a one-line config change, not a grep exercise.

gemini-2.5-pro is for heavy lifting: complex reasoning chains, code generation over large codebases, long-form synthesis. It is 8x more expensive than Flash on input tokens and slower under concurrent load. Do not reach for it as a default.

gemini-2.0-flash (do not use this in any new integration). Google has confirmed shutdown on June 1, 2026. Any production system pointing at this model string will start throwing errors in five weeks with no further warning.

Make model selection configurable through environment:

# .env.production
AI_GEMINI_MODEL=gemini-2.5-flash

# .env.staging
AI_GEMINI_MODEL=gemini-2.5-pro

// config/services.php
'gemini' => [
    'model' => env('AI_GEMINI_MODEL', 'gemini-2.5-flash'),
],

Your agents read from config, not from hardcoded strings.

Building a GeminiAgent With the Laravel AI SDK

The laravel/ai SDK is Agent-based. Text generation does not happen through a bare facade call like AI::connection()->text() — that API does not exist in the published SDK. The correct abstraction is an Agent class: a dedicated PHP class that encapsulates instructions, conversation context, tools, and output schema.

Create a Gemini summarisation agent:

php artisan make:agent GeminiSummarizer

<?php

namespace App\Ai\Agents;

use Laravel\Ai\Contracts\Agent;
use Laravel\Ai\Promptable;
use Stringable;

class GeminiSummarizer implements Agent
{
    use Promptable;

    public function instructions(): Stringable|string
    {
        return 'You are a concise technical summariser. Return a plain-text summary
                with no markdown formatting. Three sentences maximum.';
    }
}

The agent picks up the default provider from config/ai.php. With default set to gemini, all agents resolve to Gemini unless overridden.

Prompt it directly:

$response = (new GeminiSummarizer)->prompt($articleBody);
echo (string) $response;

For production use, wrap it in a service class that enforces a typed response contract. This is the pattern the rest of your application depends on — your controllers and jobs should never know which provider answered.

<?php

namespace App\Services\AI;

use App\Ai\Agents\GeminiSummarizer;
use App\DTOs\AiResponse;
use Illuminate\Support\Facades\Log;

class GeminiService
{
    public function __construct(
        private readonly GeminiSummarizer $agent
    ) {}

    public function summarize(string $text, int $maxSentences = 3): AiResponse
    {
        try {
            $response = $this->agent->prompt(
                "Summarise the following in {$maxSentences} sentences:\n\n{$text}"
            );

            return new AiResponse(
                content:      (string) $response,
                model:        config('services.gemini.model'),
                provider:     'gemini',
                inputTokens:  $response->usage?->promptTokens ?? 0,
                outputTokens: $response->usage?->completionTokens ?? 0,
            );

        } catch (\Throwable $e) {
            Log::error('GeminiService::summarize failed', [
                'error'        => $e->getMessage(),
                'text_length'  => strlen($text),
            ]);

            throw $e;
        }
    }
}

The AiResponse DTO keeps the response contract stable regardless of which provider runs:

<?php

namespace App\DTOs;

readonly class AiResponse
{
    public function __construct(
        public string $content,
        public string $model,
        public string $provider,
        public int    $inputTokens  = 0,
        public int    $outputTokens = 0,
    ) {}
}

Bind the service in a Service Provider rather than directly in bootstrap/app.php:

<?php

namespace App\Providers;

use App\Services\AI\GeminiService;
use Illuminate\Support\ServiceProvider;

class AiServiceProvider extends ServiceProvider
{
    public function register(): void
    {
        $this->app->singleton(GeminiService::class);
    }
}

->withProviders([
    App\Providers\AiServiceProvider::class,
])

[Architect’s Note] This DTO-as-contract pattern is the same architectural discipline described in the production-grade AI architecture guide for Laravel. Nothing outside the service layer should care whether the response came from Gemini, OpenAI, or a fallback. The moment a Blade template branches on $response->provider, you have an architecture problem, not a template problem.

Streaming Responses

Synchronous AI responses are a UX problem for anything over a second or two. Gemini 2.5 Flash is fast, but a 500-token summary still takes a moment. Token streaming solves the perceived latency problem — the user sees output immediately.

The laravel/ai SDK streams via the Agent’s stream() method. It returns a StreamableAgentResponse that you can return directly from a route — the SDK handles the SSE headers automatically:

<?php

namespace App\Http\Controllers;

use App\Ai\Agents\GeminiSummarizer;
use Illuminate\Http\Request;

class SummaryController extends Controller
{
    public function stream(Request $request): mixed
    {
        $text = $request->validate(['text' => 'required|string|max:50000'])['text'];

        return (new GeminiSummarizer)
            ->stream("Summarise the following:\n\n{$text}");
    }
}

Wire the route:

// routes/web.php
Route::get('/summary/stream', [SummaryController::class, 'stream'])
    ->middleware('auth');

The SDK sets Content-Type: text/event-stream, Cache-Control: no-cache, and X-Accel-Buffering: no for you. You do not write those headers manually.

For cases where you need to process the stream before sending it (logging token chunks, applying content filters, or writing partial results to a database), iterate manually:

$stream = (new GeminiSummarizer)->stream($prompt);

foreach ($stream as $chunk) {
    $text = (string) $chunk;

    if (connection_aborted()) {
        break;
    }

    echo "data: " . json_encode(['chunk' => $text]) . "\n\n";
    ob_flush();
    flush();
}

The then() callback fires when the full stream completes — use it for post-stream logging or database writes:

return (new GeminiSummarizer)
    ->stream($prompt)
    ->then(function ($response) use ($articleId) {
        Article::find($articleId)?->update([
            'summary' => $response->text,
            'tokens'  => $response->usage?->totalTokens ?? 0,
        ]);
    });

For a deep-dive on choosing between Livewire, SSE, and WebSockets for your streaming transport layer, the Laravel AI streaming transport comparison covers that decision in detail — including exactly when SSE stops being the right answer.

[Production Pitfall] Streaming and queue workers do not mix. $agent->stream() must run in a web context that can flush output to the client. Never dispatch a streaming call into a queued job, it will silently buffer the entire response and return it as a batch, defeating the purpose. Queue jobs should use $agent->prompt() and store the result. Stream directly from your controller.

Gemini-Specific Quirks

Gemini behaves differently from OpenAI and Anthropic in ways that do not surface during development. They surface in production, at 3 AM, in a way that is hard to debug unless you knew to expect them.

Safety Filters and Blocked Responses

Gemini applies content safety filters by default, across categories including harassment, hate speech, sexually explicit content, and dangerous content. When a request is blocked, the response finishReason is SAFETY rather than STOP. The response text will be empty or null.

Your service layer must handle this:

use App\Exceptions\GeminiContentBlockedException;

public function summarize(string $text): AiResponse
{
    $response = $this->agent->prompt($text);

    // Check for safety filter block
    if (isset($response->finishReason) && $response->finishReason === 'SAFETY') {
        Log::warning('Gemini safety filter triggered', [
            'text_preview' => substr($text, 0, 200),
        ]);

        throw new GeminiContentBlockedException(
            'The provided content was blocked by Gemini safety filters.'
        );
    }

    if (empty((string) $response)) {
        throw new \RuntimeException('Gemini returned an empty response with no error.');
    }

    return new AiResponse(content: (string) $response, /* ... */);
}

[Edge Case Alert] The finish reason RECITATION is distinct from SAFETY. It means the model detected that it was about to recite training data verbatim. You will see this on prompts that ask the model to reproduce copyrighted or well-known text. Handle it as a content policy rejection, not a model error – retrying will not help.

Multi-Turn Conversation Role Differences

When building conversation history manually (rather than using the SDK’s RemembersConversations trait), Gemini uses "model" as the assistant role name. OpenAI and Anthropic use "assistant". If you are porting conversation history code from an OpenAI integration, every historical assistant message needs its role key changed.

The SDK’s Agent pattern with Conversational interface handles this transparently. If you are bypassing agents and constructing messages arrays directly — stop. Use the Agent pattern. The SDK abstracts provider-specific message formats for exactly this reason.

The Thinking Budget Parameter

Gemini 2.5 models support a thinking_budget parameter that controls how much internal reasoning the model applies before generating a response. Higher budgets improve quality on complex tasks at the cost of latency and token usage. Pass it via provider options if the SDK exposes that surface on your agent:

// Check your SDK version for exact provider options API
$response = $this->agent
    ->withOptions(['thinking_budget' => 2048])
    ->prompt($complexPrompt);

For most text generation tasks, the default budget is sufficient. Reserve higher budgets for structured output generation, complex reasoning chains, or tasks where output quality has direct downstream consequences.

Context Caching Gotcha

Gemini supports context caching for long-context requests, you upload a large document once and reference it by cache ID across multiple requests, paying only for incremental tokens. This is genuinely useful but has a session-scoped cache lifetime. If your application assumes a cache is warm across multiple requests and it has expired, you will get a graceful fallback to the uncached cost, not an error. Budget for cache misses.

Error Handling

Three distinct error types need distinct handling strategies.

Rate limits (HTTP 429). Retry with backoff. Transient. Gemini rate limits are per-minute by default on AI Studio keys.

Quota exhaustion. Do not retry indefinitely. Log and alert. This is a billing/capacity signal.

Safety filter rejections. Do not retry. The same input will produce the same rejection. Surface to the user or fall back to a different provider.

<?php

namespace App\Services\AI;

use App\Ai\Agents\GeminiSummarizer;
use App\DTOs\AiResponse;
use App\Exceptions\GeminiContentBlockedException;
use App\Exceptions\GeminiQuotaExhaustedException;
use Illuminate\Http\Client\RequestException;
use Illuminate\Support\Facades\Log;

class GeminiService
{
    private int $maxRetries = 3;

    public function summarize(string $text): AiResponse
    {
        $attempt = 0;

        while ($attempt < $this->maxRetries) {
            try {
                $response = (new GeminiSummarizer)->prompt($text);

                if (isset($response->finishReason) && $response->finishReason === 'SAFETY') {
                    throw new GeminiContentBlockedException('Safety filter triggered.');
                }

                return new AiResponse(
                    content:      (string) $response,
                    model:        config('services.gemini.model'),
                    provider:     'gemini',
                    inputTokens:  $response->usage?->promptTokens ?? 0,
                    outputTokens: $response->usage?->completionTokens ?? 0,
                );

            } catch (GeminiContentBlockedException $e) {
                // Do not retry. Surface immediately.
                throw $e;

            } catch (RequestException $e) {
                $status = $e->response?->status();

                if ($status === 429) {
                    // Rate limit — retry with exponential backoff
                    $attempt++;
                    if ($attempt >= $this->maxRetries) {
                        Log::error('Gemini rate limit exhausted after retries.');
                        throw $e;
                    }
                    sleep((int) pow(2, $attempt)); // 2s, 4s, 8s
                    continue;
                }

                if ($status === 429 && str_contains($e->getMessage(), 'quota')) {
                    Log::critical('Gemini quota exhausted. Check billing dashboard.');
                    throw new GeminiQuotaExhaustedException('API quota exhausted.');
                }

                if (in_array($status, [401, 403])) {
                    Log::critical('Gemini authentication failure.', ['status' => $status]);
                    throw new \RuntimeException('Gemini auth failed. Verify GEMINI_API_KEY.');
                }

                throw $e;

            } catch (\Throwable $e) {
                Log::warning('Gemini prompt failed.', ['attempt' => $attempt, 'error' => $e->getMessage()]);
                $attempt++;

                if ($attempt >= $this->maxRetries) {
                    throw $e;
                }
                sleep((int) pow(2, $attempt));
            }
        }

        throw new \RuntimeException('Gemini request failed after all retries.');
    }
}

For token tracking and rate-limit observability across all your AI providers, the Laravel AI Middleware: Token Tracking & Rate Limiting pattern feeds AiResponse token counts into request-level middleware – a clean complement to the error handling above.

Cost Awareness at Scale

The 1M-token context window is Gemini’s headline feature. It is also a budget trap if you do not model the cost implications before enabling it.

Sending 1M tokens of context with every request costs roughly $0.15 per call at Flash pricing. For a feature that runs 10,000 times a day, that is $1,500 daily – from context alone, before output tokens. The numbers change the architecture conversation.

A few concrete strategies.

Count tokens before you send. The $response->usage object gives you prompt and completion token counts after the fact. For cost projections, use the Gemini token counting API endpoint (countTokens) before committing to large context requests in batch workflows.

Use Flash for most things, Pro only when justified. The 8x price gap between Flash and Pro is real. If you are running Pro for summarisation tasks, you are paying 8x for capability you do not need. Define a COMPLEXITY_THRESHOLD — document length, presence of structured output requirements, or explicit flags — and route to Pro only above it.

Context caching for repeated large-context calls. If your workflow repeatedly sends the same large document (a system prompt, a knowledge base, a code file) across multiple turns, Gemini’s context caching reduces the repeated cost to incremental tokens only. Budget for cache expiry, not just cache hit.

Chunk intelligently, not naively. The large context window does not mean chunking is dead — for production RAG architectures in Laravel, smart chunk selection via vector similarity still beats stuffing 1M raw tokens and hoping the model finds the relevant passage. Use the context window strategically, not as a substitute for retrieval design.

Testing Without Hitting the Live API

This section is where most Gemini integration tutorials stop being useful. We want feature tests that are fast, deterministic, and free.

The laravel/ai SDK ships with built-in fakes for every agent. This is the correct testing approach. Do not mock HTTP clients. Do not stub the Gemini API at the network layer.

Faking Normal Responses

<?php

namespace Tests\Feature;

use App\Ai\Agents\GeminiSummarizer;
use App\Models\Article;
use App\Services\AI\GeminiService;
use Tests\TestCase;

class ArticleSummaryTest extends TestCase
{
    public function test_article_summary_is_stored_after_generation(): void
    {
        GeminiSummarizer::fake([
            'This article covers Eloquent ORM performance patterns. ' .
            'Key topics include eager loading and query scopes. ' .
            'The author recommends profiling before optimising.',
        ]);

        $article = Article::factory()->create();
        $service = app(GeminiService::class);

        $response = $service->summarize($article->body);

        $this->assertStringContainsString('Eloquent ORM', $response->content);
        $this->assertSame('gemini', $response->provider);

        GeminiSummarizer::assertPrompted(fn ($prompt) =>
            str_contains($prompt, $article->body)
        );
    }
}

Asserting Streamed Responses

For streamed agent responses, the fake intercepts the stream and returns the faked content. Test the downstream side effect, not the stream mechanism itself:

public function test_streaming_route_returns_sse_response(): void
{
    GeminiSummarizer::fake(['Streamed summary content here.']);

    $this->get('/summary/stream?text=' . urlencode('Long article body...'))
        ->assertOk()
        ->assertHeader('Content-Type', 'text/event-stream');

    GeminiSummarizer::assertPrompted();
}

Testing Safety Filter Rejection Paths

Fake a safety-blocked response by using a closure that simulates the blocked condition. The exact approach depends on your error handling design – if you check finishReason, fake the response accordingly:

public function test_safety_filter_rejection_is_handled_gracefully(): void
{
    // Simulate a blocked response by having the fake throw
    // the exception your service raises on SAFETY finish_reason
    GeminiSummarizer::fake(function () {
        throw new \App\Exceptions\GeminiContentBlockedException(
            'Safety filter triggered.'
        );
    });

    $this->expectException(\App\Exceptions\GeminiContentBlockedException::class);

    app(GeminiService::class)->summarize('Content that would trigger safety filters.');
}

Then test that your controller or job handles the exception gracefully rather than surfacing a 500:

public function test_api_returns_422_on_safety_rejection(): void
{
    GeminiSummarizer::fake(function () {
        throw new \App\Exceptions\GeminiContentBlockedException('Blocked.');
    });

    $this->postJson('/api/summarize', ['text' => 'test'])
        ->assertStatus(422)
        ->assertJsonFragment(['error' => 'Content blocked by safety filters.']);
}

Mocking Quota Errors

public function test_quota_exhaustion_triggers_alert_and_re_throws(): void
{
    GeminiSummarizer::fake(function () {
        throw new \App\Exceptions\GeminiQuotaExhaustedException('Quota exhausted.');
    });

    Log::shouldReceive('critical')
        ->once()
        ->with('Gemini quota exhausted. Check billing dashboard.');

    $this->expectException(\App\Exceptions\GeminiQuotaExhaustedException::class);

    app(GeminiService::class)->summarize('any text');
}

Preventing Accidental Live Calls

Use preventStray in your test base class or test setup to catch any agent call that does not have a registered fake:

// tests/TestCase.php
protected function setUp(): void
{
    parent::setUp();
    GeminiSummarizer::preventStray();
}

This ensures a test that forgets to set up a fake fails loudly rather than silently hitting the live API, and charging your account.

[Word to the Wise] Versioning prompts matters the moment you start comparing output quality across model upgrades. When gemini-2.5-flash eventually gives way to whatever Google releases next, you need a way to verify that the new model produces equivalent output on your specific tasks. The prompt migrations pattern for Laravel gives you the tooling to manage that comparison systematically, not by gut feel.

Official Documentation

Two references worth keeping bookmarked:

Laravel AI SDK Documentation (12.x). The canonical reference for provider configuration, agent API, streaming, and built-in fakes. Verify method signatures here, not from community articles written against pre-release API surfaces.
Google Gemini API — OpenAI Compatibility Reference. Documents what is and is not available through the compatibility layer. Essential reading before deciding whether to use the compat route for any production feature.

Frequently Asked Questions

What is the difference between the OpenAI-compatible Gemini endpoint and the native Gemini driver in `laravel/ai`?

The compatibility endpoint lets you point an existing OpenAI-driver config at Google’s API by swapping the base URL. It requires no driver change but silently drops Gemini-native features: the thinking budget parameter, grounding via Google Search, native multimodal tool use, and fine-grained safety settings. The native Gemini driver in laravel/ai exposes all of these. The compat approach is a reasonable migration shortcut from raw openai-php/client code; it is not a long-term production choice.

Which Gemini model should I default to in production?

gemini-2.5-flash for most workloads. It is fast, cheap relative to its capability, and ships with a 1M-token context window. Reserve gemini-2.5-pro for tasks requiring deeper reasoning or structured output at high complexity. Do not use gemini-2.0-flash in any new integration — it is being shut down on June 1, 2026.

How do I handle Gemini content that gets blocked by safety filters?

Check the response finishReason immediately after calling prompt(). If it equals 'SAFETY', throw a custom exception — do not retry, as the same input will produce the same outcome. Surface a clear, non-alarmist message to the user and log the prompt context for review. In tests, use the agent fake’s closure form to simulate the blocked condition without touching the live API.

Do I need a Google Cloud account, or is an AI Studio key sufficient?

An AI Studio key is sufficient for development and moderate-traffic production. For high-volume workloads, request a quota increase through Google Cloud or migrate to Vertex AI. Vertex AI uses service account credentials (a JSON file, not an API key), so plan for that authentication change before you hit your Studio quota — you do not want to refactor auth under production pressure.

How do I ensure my tests never accidentally hit the live Gemini API?

Call GeminiAgent::preventStray() in your test base class setUp() method. Any agent call without a registered fake will throw immediately rather than fall through to the network. Combine this with GeminiAgent::fake(['expected response']) in individual tests, and your CI pipeline will never incur live API costs or become dependent on external availability.

Dewald Hugo

A software architect with 15+ years of experience in the PHP and Laravel ecosystem. Dewald created Origin Main to provide the engineering rigour required to integrate AI into professional, high-concurrency production systems. He writes for developers who care less about "getting it to work" and more about "getting it to last".