Most Claude chatbot tutorials stop at a single prompt and a single response. That is not a chatbot. That is a stateless API call wearing a chat interface as a costume.

A real chatbot remembers. It tracks what you said two turns ago. It can be corrected, redirected, and built upon over the course of a conversation. Building that in Laravel is not complicated, but it requires deliberate architecture. The wrong decisions early will cost you in token spend, context leaks, and brittle memory behaviour under load.

In this tutorial, we build a Laravel Claude chatbot with real conversation memory from the ground up. We will use Eloquent for message persistence, the Laravel HTTP client for API calls, a dedicated ClaudeService wired through the Service Container, and a token-aware pruning strategy that respects your cost budget as much as your context window. No third-party AI packages. No magic. Just Laravel doing what Laravel does well.

What Laravel Claude Chatbot Conversation Memory Actually Means

Kill this misconception immediately: Claude does not remember anything between requests. Every call to the Anthropic API is completely stateless. The model has no session, no memory, no awareness that you have spoken before.

Conversation memory is infrastructure you build. You build it by re-sending prior messages as structured context with every new API request. Claude receives the full message history, processes it, and responds as if it remembers: because, from its perspective, everything it needs is right there in the request payload. This builds on the service layer structure from the complete Laravel Claude API integration guide, if your HTTP client setup differs, reconcile it there first. For streaming responses from Claude in real time (rendering tokens as they arrive rather than waiting for the full response), the streaming chatbot guide covers that implementation.

The practical consequence of this is significant. Your database is the brain. Your pruning strategy is the attention span. And your token budget is the hard ceiling everything must fit inside. Claude is just the inference engine at the end of the pipe.

Every architectural decision in this tutorial flows from that reality.

Architecture Overview

The system has five moving parts:

A ChatMessage Eloquent model backed by a chat_messages table
A ClaudeService class registered with the Service Container
A ConversationMemoryManager responsible for history retrieval and pruning
A ChatController handling request orchestration
An API route exposing the chatbot endpoint

The request lifecycle:

Request arrives → Controller validates input → Memory manager loads pruned history → Service formats and sends to Claude → Reply stored via Eloquent → JSON response returned

Each component has a single responsibility. This is not over-engineering, it is the minimum required to make this testable, extensible, and maintainable as your requirements grow.

Prerequisites

Laravel 11 or 12
PHP 8.2+
An Anthropic API key in your .env
MySQL or PostgreSQL as your primary storage
Redis (optional, for session-scoped caching of history)

Add your key to .env:

ANTHROPIC_API_KEY=sk-ant-...
ANTHROPIC_API_VERSION=2023-06-01

And expose it through config/services.php:

'anthropic' => [
    'key'     => env('ANTHROPIC_API_KEY'),
    'version' => env('ANTHROPIC_API_VERSION', '2023-06-01'),
    'model'   => env('ANTHROPIC_MODEL', 'claude-sonnet-4-6'),
],

Centralising this in services.php means a model upgrade is a one-line .env change, not a grep-and-replace exercise across your codebase.

The Eloquent Model and Migration

Create the migration:

php artisan make:migration create_chat_messages_table
php artisan make:model ChatMessage

The migration:

<?php

use Illuminate\Database\Migrations\Migration;
use Illuminate\Database\Schema\Blueprint;
use Illuminate\Support\Facades\Schema;

return new class extends Migration
{
    public function up(): void
    {
        Schema::create('chat_messages', function (Blueprint $table) {
            $table->id();
            $table->string('conversation_id', 64)->index();
            $table->enum('role', ['user', 'assistant', 'system']);
            $table->text('content');
            $table->unsignedInteger('token_estimate')->nullable();
            $table->foreignId('user_id')->nullable()->constrained()->nullOnDelete();
            $table->timestamps();

            $table->index(['conversation_id', 'id']);
        });
    }

    public function down(): void
    {
        Schema::dropIfExists('chat_messages');
    }
};

The token_estimate column stores a cheap approximation at write time. We use this for pruning decisions without making a secondary API call. The compound index on (conversation_id, id) is intentional, every history query filters by conversation and orders by insertion sequence, and you want that to be fast.

The ChatMessage model:

<?php

namespace App\Models;

use Illuminate\Database\Eloquent\Model;
use Illuminate\Database\Eloquent\Builder;

class ChatMessage extends Model
{
    protected $fillable = [
        'conversation_id',
        'role',
        'content',
        'token_estimate',
        'user_id',
    ];

    protected $casts = [
        'token_estimate' => 'integer',
    ];

    public function scopeForConversation(Builder $query, string $conversationId): Builder
    {
        return $query->where('conversation_id', $conversationId);
    }

    public function scopeExcludingSystem(Builder $query): Builder
    {
        return $query->where('role', '!=', 'system');
    }
}

The query scopes are small but they keep your service methods readable. forConversation() and excludingSystem() compose cleanly, and they are trivial to test in isolation.

The ConversationMemoryManager

This class owns one job: giving you the right slice of history for a given conversation. It does not call the API. It does not store messages. It retrieves and prunes.

<?php

namespace App\Services\Chat;

use App\Models\ChatMessage;
use Illuminate\Support\Collection;

class ConversationMemoryManager
{
    private const CHARS_PER_TOKEN = 3.5;

    public function __construct(
        private readonly int $maxTokenBudget = 80_000,
        private readonly int $fetchLimit     = 50,
    ) {}

    public function getHistory(string $conversationId): Collection
    {
        $messages = ChatMessage::forConversation($conversationId)
            ->excludingSystem()
            ->orderBy('id')
            ->limit($this->fetchLimit)
            ->get(['role', 'content', 'token_estimate']);

        return $this->pruneToTokenBudget($messages);
    }

    private function pruneToTokenBudget(Collection $messages): Collection
    {
        $budget = $this->maxTokenBudget;
        $kept   = collect();

        foreach ($messages->reverse() as $message) {
            $tokens = $message->token_estimate ?? $this->estimateTokens($message->content);

            if ($budget - $tokens < 0) {
                break;
            }

            $budget -= $tokens;
            $kept->prepend($message);
        }

        return $kept;
    }

    public function estimateTokens(string $text): int
    {
        return (int) ceil(mb_strlen($text) / self::CHARS_PER_TOKEN);
    }
}

Walking backwards through the collection and prepending to the result is the correct approach here. You always want to preserve the most recent context and trim from the oldest end. The fetchLimit acts as a hard cap before token counting even begins, a safety net for conversations that have grown very long.

[Efficiency Gain] claude-sonnet-4-6 supports a 200,000-token context window. The 80,000-token budget here is not a technical limit, it is a cost management decision. Sending 180,000 tokens of conversation history on every request is roughly 2× more expensive than sending 80,000 tokens. Set your budget based on what the conversation genuinely needs, not what the model can theoretically handle.

The ClaudeService

This is where the Anthropic API interaction lives. The Service Container will inject this wherever you need it, and it is cleanly testable by swapping the HTTP client in your tests.

<?php

namespace App\Services\Chat;

use Illuminate\Http\Client\PendingRequest;
use Illuminate\Support\Facades\Http;
use Illuminate\Support\Collection;
use RuntimeException;

class ClaudeService
{
    private PendingRequest $client;

    public function __construct()
    {
        $this->client = Http::baseUrl('https://api.anthropic.com/v1')
            ->withHeaders([
                'x-api-key'         => config('services.anthropic.key'),
                'anthropic-version' => config('services.anthropic.version'),
                'Content-Type'      => 'application/json',
            ])
            ->timeout(60)
            ->retry(3, sleepMilliseconds: 0, when: function (\Exception $e, $request) {
                // Retry on rate limit (429) and overloaded (529) responses
                if ($e instanceof \Illuminate\Http\Client\RequestException) {
                    return in_array($e->response->status(), [429, 529], true);
                }
                return false;
            }, throw: true);
    }

    public function chat(Collection $history, string $systemPrompt = ''): string
    {
        $messages = $this->formatMessages($history);

        $payload = [
            'model'      => config('services.anthropic.model'),
            'max_tokens' => 1024,
            'messages'   => $messages,
        ];

        if (!empty($systemPrompt)) {
            $payload['system'] = $systemPrompt;
        }

        $response = $this->client->post('/messages', $payload);

        if ($response->failed()) {
            $error = $response->json('error.message', 'Unknown API error');
            throw new RuntimeException("Claude API error (HTTP {$response->status()}): {$error}");
        }

        return $response->json('content.0.text', '');
    }

    private function formatMessages(Collection $history): array
    {
        return $history
            ->filter(fn($m) => in_array($m->role, ['user', 'assistant'], true))
            ->map(fn($m) => [
                'role'    => $m->role,
                'content' => [
                    ['type' => 'text', 'text' => $m->content]
                ],
            ])
            ->values()
            ->toArray();
    }
}

The Laravel HTTP client’s retry() method handles exponential back-off natively. The when callback restricts retries to 429 and 529 status codes (rate limit and overloaded responses), and leaves 400-level validation errors and 500-level server errors to fail immediately. This is the correct behaviour. Retrying a 422 (malformed request) is pointless and wasteful.

[Production Pitfall] The timeout(60) setting is intentionally generous. Claude’s response latency scales directly with output length. With max_tokens set to 1,024, you will occasionally see responses that take 15–25 seconds on a loaded API. Set your timeout too aggressively (say, 10 seconds), and you will generate false failures on responses that would have succeeded. Monitor your actual P95 API latency before tuning this number down.

If you plan to extend this with per-user token budgets and tiered rate limiting at the middleware layer, which you should for any multi-user application, the Laravel AI Middleware: Token Tracking & Rate Limiting guide covers that architecture in detail and integrates cleanly with the service structure we are building here.

Registering Services With the Service Container

In app/Providers/AppServiceProvider.php:

<?php

namespace App\Providers;

use App\Services\Chat\ClaudeService;
use App\Services\Chat\ConversationMemoryManager;
use Illuminate\Support\ServiceProvider;

class AppServiceProvider extends ServiceProvider
{
    public function register(): void
    {
        $this->app->singleton(ClaudeService::class, fn() => new ClaudeService());

        $this->app->singleton(ConversationMemoryManager::class, fn() => new ConversationMemoryManager(
            maxTokenBudget: 80_000,
            fetchLimit: 50,
        ));
    }
}

Both services are singletons because they carry no per-request state. The HTTP client configuration in ClaudeService is built once and reused. There is no reason to instantiate a new HTTP client on every request.

The ChatController

<?php

namespace App\Http\Controllers;

use App\Http\Requests\ChatMessageRequest;
use App\Models\ChatMessage;
use App\Services\Chat\ClaudeService;
use App\Services\Chat\ConversationMemoryManager;
use Illuminate\Http\JsonResponse;
use Illuminate\Support\Str;

class ChatController extends Controller
{
    public function __construct(
        private readonly ClaudeService           $claude,
        private readonly ConversationMemoryManager $memory,
    ) {}

    public function send(ChatMessageRequest $request): JsonResponse
    {
        $conversationId = $request->input('conversation_id', Str::uuid()->toString());
        $userInput      = $request->input('message');

        // Store the user's message
        ChatMessage::create([
            'conversation_id' => $conversationId,
            'role'            => 'user',
            'content'         => $userInput,
            'token_estimate'  => $this->memory->estimateTokens($userInput),
            'user_id'         => auth()->id(),
        ]);

        // Load and prune conversation history
        $history = $this->memory->getHistory($conversationId);

        // Call Claude
        $reply = $this->claude->chat($history, $this->buildSystemPrompt());

        // Store the assistant reply
        ChatMessage::create([
            'conversation_id' => $conversationId,
            'role'            => 'assistant',
            'content'         => $reply,
            'token_estimate'  => $this->memory->estimateTokens($reply),
        ]);

        return response()->json([
            'conversation_id' => $conversationId,
            'reply'           => $reply,
        ]);
    }

    private function buildSystemPrompt(): string
    {
        return 'You are a knowledgeable, direct assistant for a software development team. '
            . 'Answer technical questions concisely and accurately. '
            . 'If you do not know something, say so clearly. '
            . 'Do not fabricate package names, version numbers, or API specifications.';
        }
}

The system prompt lives in the controller for now. In a real product, you would move this to a dedicated prompt class or a versioned prompt migration. A pattern covered in depth in the Prompt Migrations: Bringing Determinism to AI in Laravel guide, which is worth reading before you start editing prompts in production.

The Form Request

Never trust raw controller input. Laravel’s Form Request classes handle validation cleanly and keep your controllers thin.

<?php

namespace App\Http\Requests;

use Illuminate\Foundation\Http\FormRequest;

class ChatMessageRequest extends FormRequest
{
    public function authorize(): bool
    {
        return true; // Adjust to your auth strategy
    }

    public function rules(): array
    {
        return [
            'message'         => ['required', 'string', 'min:1', 'max:4000'],
            'conversation_id' => ['nullable', 'string', 'max:64', 'regex:/^[a-f0-9\-]+$/'],
        ];
    }

    public function messages(): array
    {
        return [
            'message.max'          => 'Messages cannot exceed 4,000 characters.',
            'conversation_id.regex' => 'Invalid conversation identifier.',
        ];
    }
}

The 4,000-character cap on message input is deliberate. A single user message that consumes half your context budget is not something you want to silently allow. Validate at the boundary.

The Route

// routes/api.php

use App\Http\Controllers\ChatController;
use Illuminate\Support\Facades\Route;

Route::middleware(['auth:sanctum', 'throttle:60,1'])->group(function () {
    Route::post('/chat', [ChatController::class, 'send']);
});

The throttle:60,1 middleware limits each authenticated user to 60 requests per minute at the Laravel level. This is your first line of defence against accidental abuse. It does not replace API-level rate limiting, but it stops a misfiring frontend from hammering your endpoint before a single request reaches Anthropic.

Memory Management: Three Strategies in Laravel

The ConversationMemoryManager we built uses token-aware pruning. That is the right default. But there are two other strategies worth understanding.

Strategy 1: Fixed Message Window

The simplest approach. Keep the last N messages, discard everything older.

public function getHistoryWindowed(string $conversationId, int $limit = 20): Collection
{
    return ChatMessage::forConversation($conversationId)
        ->excludingSystem()
        ->orderBy('id', 'desc')
        ->limit($limit)
        ->get()
        ->reverse()
        ->values();
}

Simple. Predictable. The risk is abrupt context loss. A user who established important preferences 25 messages ago finds the model acting confused at message 26. For transactional bots (support widgets, quick Q&A), this is fine. For a long-lived assistant relationship, it is not.

Strategy 2: Token-Aware Pruning (Already Built)

This is what ConversationMemoryManager::getHistory() does. Walk backwards through history, accumulate estimated token counts, and stop when you hit your budget. Always preserves the most recent context. Automatically adapts to conversation density, a conversation of one-liners keeps far more turns than one full of dense technical explanations.

Strategy 3: Summarisation with a Queued Job

When a conversation exceeds your token budget, rather than discarding the oldest context, summarise it and store the summary as a high-signal system message. This preserves long-range context at a fraction of the token cost.

<?php

namespace App\Jobs;

use App\Models\ChatMessage;
use App\Services\Chat\ClaudeService;
use Illuminate\Bus\Queueable;
use Illuminate\Contracts\Queue\ShouldQueue;
use Illuminate\Foundation\Bus\Dispatchable;
use Illuminate\Support\Collection;

class SummariseConversationHistory implements ShouldQueue
{
    use Dispatchable, Queueable;

    public function __construct(
        private readonly string $conversationId,
        private readonly Collection $messagesToSummarise,
    ) {}

    public function handle(ClaudeService $claude): void
    {
        $historyText = $this->messagesToSummarise
            ->map(fn($m) => strtoupper($m->role) . ': ' . $m->content)
            ->implode("\n");

        $summary = $claude->chat(collect([
            (object) [
                'role'    => 'user',
                'content' => "Summarise the following conversation concisely. "
                    . "Focus on facts, decisions, user preferences, and context established. "
                    . "Omit pleasantries and filler.\n\n" . $historyText,
            ],
        ]));

        // Store summary as a system message at the start of the conversation
        ChatMessage::create([
            'conversation_id' => $this->conversationId,
            'role'            => 'system',
            'content'         => '[CONVERSATION SUMMARY]: ' . $summary,
            'token_estimate'  => (int) ceil(mb_strlen($summary) / 3.5),
        ]);

        // Mark old messages as summarised (soft delete or custom flag)
        ChatMessage::forConversation($this->conversationId)
            ->whereIn('id', $this->messagesToSummarise->pluck('id'))
            ->delete();
    }
}

Dispatch this job asynchronously when a conversation’s total estimated token count crosses a threshold. The user sees no latency spike. The next request loads the compact summary plus the recent message window, full context at a fraction of the original cost.

[Architect’s Note] Storing summaries as system-role rows is a clean pattern but it does introduce a retrieval wrinkle: your excludingSystem() scope will filter them out. You have two options. Either use a dedicated summary role in your ENUM and include it in retrieval, or store summaries in a separate conversation_summaries table and prepend them to the history before sending. The separate table is cleaner at scale, it gives you a clean audit trail and avoids role-type conflicts in your message schema.

Redis Caching for Hot Conversations

For high-traffic applications, hitting the database on every message to load conversation history adds latency. Cache the history in Redis with a short TTL for active conversations.

<?php

namespace App\Services\Chat;

use App\Models\ChatMessage;
use Illuminate\Support\Collection;
use Illuminate\Support\Facades\Cache;

class CachedConversationMemoryManager extends ConversationMemoryManager
{
    private const CACHE_TTL_SECONDS = 300; // 5 minutes

    public function getHistory(string $conversationId): Collection
    {
        return Cache::remember(
            key: "chat_history:{$conversationId}",
            ttl: self::CACHE_TTL_SECONDS,
            callback: fn() => parent::getHistory($conversationId),
        );
    }

    public function invalidate(string $conversationId): void
    {
        Cache::forget("chat_history:{$conversationId}");
    }
}

Call invalidate() after every new message is stored. The next history load hits the database once, then caches the result for the next five minutes of that active session. For a chatbot with dozens of concurrent users, this meaningfully reduces database pressure.

[Edge Case Alert] Do not cache the pre-pruned raw history. Cache the output of pruneToTokenBudget(). If you cache the full unpruned history and your pruning logic changes between requests (e.g., you adjust the token budget), you will serve stale, oversized histories from cache. Cache the result, not the input.

Exception Handling and Graceful Degradation

The RuntimeException thrown by ClaudeService needs to be caught at the controller level and converted into a user-friendly response. But do not just catch and swallow it, log it with enough context to debug.

Update your ChatController::send() method:

try {
    $history = $this->memory->getHistory($conversationId);
    $reply   = $this->claude->chat($history, $this->buildSystemPrompt());
} catch (\Illuminate\Http\Client\RequestException $e) {
    report($e);
    return response()->json([
        'error' => 'The AI service is temporarily unavailable. Please try again in a moment.',
    ], 503);
} catch (\RuntimeException $e) {
    report($e);
    return response()->json([
        'error' => 'Could not process your message. Please try again.',
    ], 500);
}

The report() helper sends the exception to your configured logging driver (Laravel Telescope, Sentry, Bugsnag), whatever you are using. The user gets a clean, non-technical error message. You get the full stack trace with context in your monitoring tool. Do not conflate those two audiences.

Testing the Chatbot

Testing AI integrations requires discipline. You cannot assert on Claude’s exact output, it is probabilistic. What you can test is the architecture around it.

Unit test: Memory pruning:

<?php

namespace Tests\Unit;

use App\Services\Chat\ConversationMemoryManager;
use Illuminate\Support\Collection;
use PHPUnit\Framework\TestCase;

class ConversationMemoryManagerTest extends TestCase
{
    public function test_prunes_to_token_budget(): void
    {
        $manager = new ConversationMemoryManager(maxTokenBudget: 100, fetchLimit: 50);

        // Each message is roughly 35 characters = ~10 tokens
        $messages = collect(range(1, 20))->map(fn($i) => (object) [
            'role'           => $i % 2 === 0 ? 'assistant' : 'user',
            'content'        => str_repeat('a', 35),
            'token_estimate' => 10,
        ]);

        $history = $manager->getHistory('fake-id');

        // With a 100-token budget and 10 tokens per message, expect ~10 messages max
        $this->assertLessThanOrEqual(10, $history->count());
    }

    public function test_estimates_tokens_from_character_length(): void
    {
        $manager  = new ConversationMemoryManager();
        $estimate = $manager->estimateTokens(str_repeat('a', 350));

        $this->assertEquals(100, $estimate);
    }
}

Feature test: Controller integration (with faked HTTP):

<?php

namespace Tests\Feature;

use Illuminate\Support\Facades\Http;
use Tests\TestCase;

class ChatControllerTest extends TestCase
{
    public function test_returns_conversation_id_and_reply(): void
    {
        Http::fake([
            'https://api.anthropic.com/*' => Http::response([
                'content' => [
                    ['type' => 'text', 'text' => 'Hello! How can I help you?']
                ]
            ], 200),
        ]);

        $response = $this->postJson('/api/chat', [
            'message' => 'Hello there.',
        ]);

        $response->assertStatus(200)
            ->assertJsonStructure(['conversation_id', 'reply'])
            ->assertJsonFragment(['reply' => 'Hello! How can I help you?']);
    }

    public function test_rejects_empty_message(): void
    {
        $response = $this->postJson('/api/chat', ['message' => '']);
        $response->assertStatus(422);
    }

    public function test_rejects_message_exceeding_max_length(): void
    {
        $response = $this->postJson('/api/chat', [
            'message' => str_repeat('a', 4001),
        ]);
        $response->assertStatus(422);
    }
}

Http::fake() is one of Laravel’s genuinely excellent testing utilities. It intercepts outbound HTTP calls and returns your stubbed response — no real API calls, no API spend, full control over response shape. Use it for all your AI integration tests.

Observability: Logging Token Usage

Blind API spend is the silent killer of AI projects. You need to know, per conversation and per user, how many tokens you are consuming. The Anthropic API returns usage data in every response, use it.

Update ClaudeService::chat() to return a structured result object rather than a bare string:

public function chat(Collection $history, string $systemPrompt = ''): ChatResult
{
    // ... (same payload construction as before)

    $json = $response->json();

    return new ChatResult(
        text:           $json['content'][0]['text'] ?? '',
        inputTokens:    $json['usage']['input_tokens'] ?? 0,
        outputTokens:   $json['usage']['output_tokens'] ?? 0,
    );
}

<?php

namespace App\Services\Chat;

readonly class ChatResult
{
    public function __construct(
        public string $text,
        public int    $inputTokens,
        public int    $outputTokens,
    ) {}

    public function totalTokens(): int
    {
        return $this->inputTokens + $this->outputTokens;
    }
}

Log token usage against the authenticated user in your controller:

$result = $this->claude->chat($history, $this->buildSystemPrompt());

// Log to a token_usage table or your monitoring stack
TokenUsage::create([
    'user_id'         => auth()->id(),
    'conversation_id' => $conversationId,
    'input_tokens'    => $result->inputTokens,
    'output_tokens'   => $result->outputTokens,
    'model'           => config('services.anthropic.model'),
]);

This data feeds your cost dashboards, your per-user rate limiting decisions, and your billing logic if you are building a commercial product. It is also the only way to validate whether your token estimation is anywhere close to accurate.

When Not to Use Conversation Memory

Memory adds complexity. Be honest about whether your use case needs it.

Skip memory when requests are structurally independent: document summarisation, text classification, single-turn Q&A endpoints. Skip it when you need strict determinism: memory introduces variability in responses that is genuinely difficult to trace in production debugging. Skip it for high-volume, low-latency features like autocomplete or short-form generation, where loading and pruning history adds latency you cannot afford.

The rule is simple. If the value of the next response depends meaningfully on a previous response, use memory. If it does not, do not. The complexity is real. Deploy it only where it earns its keep.

Official Documentation References

Anthropic Messages API Reference. Authoritative source for request structure, response shape, model identifiers, and usage object format.
Laravel HTTP Client Documentation. Covers retry configuration, faking in tests, and request/response handling.

Dewald Hugo

A software architect with 15+ years of experience in the PHP and Laravel ecosystem. Dewald created Origin Main to provide the engineering rigour required to integrate AI into professional, high-concurrency production systems. He writes for developers who care less about "getting it to work" and more about "getting it to last".