Files

David Blanc Brioir 8f4c0884cc Add Extended Thinking feature specification

Created comprehensive spec for integrating Claude's Extended Thinking capability
into the Claude.ai Clone project. This feature enables enhanced reasoning for
complex tasks by exposing Claude's step-by-step thought process.

Specification includes:
- Complete architecture (backend + frontend)
- 6-phase implementation plan (12-16h estimated)
- Full code examples for all components
- Streaming thinking deltas handling
- ThinkingBlock React component design
- Settings UI for thinking toggle and budget control
- Database schema modifications for thinking storage
- Token management and pricing considerations
- Tool use compatibility (thinking block preservation)
- Testing checklist and best practices
- User documentation

Key features:
- Collapsible thinking blocks with real-time streaming
- Per-conversation thinking toggle
- Adjustable thinking budget (1K-32K tokens)
- Visual indicators (badges, animations)
- Full compatibility with existing memory tools
- Proper handling of summarized thinking (Claude 4+)
- Support for redacted thinking blocks

Implementation phases:
1. Backend Core (2-3h)
2. Frontend UI (3-4h)
3. Streaming & Real-time (2-3h)
4. Tools Integration (2h)
5. Polish & Optimization (2h)
6. Testing & Deployment (1-2h)

Models supported:
- Claude Sonnet 4.5, 4 (summarized thinking)
- Claude Opus 4.5, 4.1, 4 (summarized + preserved blocks)
- Claude Haiku 4.5 (summarized thinking)

🤖 Generated with Claude Code

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

2025-12-25 19:44:24 +01:00

40 KiB

Raw Blame History

Extended Thinking Feature Specification

Claude.ai Clone - Enhanced Reasoning Integration

1. Vue d'ensemble

Extended Thinking est une fonctionnalité de Claude qui permet d'activer des capacités de raisonnement améliorées pour les tâches complexes. Claude génère des blocs thinking où il expose son processus de réflexion interne étape par étape avant de fournir sa réponse finale.

Fonctionnement

Claude crée des blocs thinking contenant son raisonnement interne
Ces blocs sont suivis de blocs text avec la réponse finale
Le processus de réflexion est résumé (pour Claude 4+) mais facturé au tarif complet
Améliore significativement la qualité des réponses pour les tâches complexes

Cas d'usage

Mathématiques complexes et calculs
Programmation et débogage
Analyse approfondie de documents
Raisonnement logique multi-étapes
Résolution de problèmes complexes

2. Modèles supportés

Modèle	ID	Support Extended Thinking
Claude Sonnet 4.5	claude-sonnet-4-5-20250929	✅ Oui
Claude Sonnet 4	claude-sonnet-4-20250514	✅ Oui
Claude Haiku 4.5	claude-haiku-4-5-20251001	✅ Oui
Claude Opus 4.5	claude-opus-4-5-20251101	✅ Oui (avec préservation thinking)
Claude Opus 4.1	claude-opus-4-1-20250805	✅ Oui
Claude Opus 4	claude-opus-4-20250514	✅ Oui
Claude Sonnet 3.7	claude-3-7-sonnet-20250219	✅ Oui (déprécié, thinking complet)

Note: Claude 4+ retourne du thinking résumé. Claude 3.7 retourne du thinking complet.

3. Architecture Backend

3.1 Modifications API Routes

`server/routes/claude.js`

Ajouts nécessaires:

// POST /api/claude/chat - Non-streaming avec thinking
router.post('/chat', async (req, res) => {
  const {
    messages,
    model,
    system,
    maxTokens = 4096,
    temperature = 1,
    enableMemoryTools = true,
    // Nouveaux paramètres thinking
    enableThinking = false,
    thinkingBudgetTokens = 10000
  } = req.body;

  const apiParams = {
    model,
    max_tokens: maxTokens,
    temperature,
    system: buildSystemPrompt(system, enableMemoryTools),
    messages: conversationMessages
  };

  // Ajouter thinking si activé
  if (enableThinking) {
    apiParams.thinking = {
      type: 'enabled',
      budget_tokens: thinkingBudgetTokens
    };
  }

  // Ajouter tools si activé
  if (tools.length > 0) {
    apiParams.tools = tools;
  }

  const response = await anthropic.messages.create(apiParams);
  // ... rest of logic
});

`server/routes/messages.js`

Modifications dans les endpoints de streaming:

// POST /:conversationId/messages/stream
router.post('/:conversationId/messages', async (req, res) => {
  // Parse settings avec thinking support
  const settings = JSON.parse(conversation.settings || '{}');
  const model = conversation.model || 'claude-sonnet-4-5-20250929';
  const temperature = settings.temperature || 1;
  const maxTokens = settings.maxTokens || 4096;
  const enableThinking = settings.enableThinking || false;
  const thinkingBudgetTokens = settings.thinkingBudgetTokens || 10000;

  // Build request options
  const requestOptions = {
    model,
    max_tokens: maxTokens,
    temperature,
    messages: conversationMessages
  };

  // Add thinking if enabled
  if (enableThinking) {
    requestOptions.thinking = {
      type: 'enabled',
      budget_tokens: thinkingBudgetTokens
    };
  }

  // Add system prompt
  if (systemPrompt) {
    requestOptions.system = systemPrompt;
  }

  // Add tools
  if (tools.length > 0) {
    requestOptions.tools = tools;
  }

  // Create streaming response
  const stream = await anthropic.messages.stream(requestOptions);

  // Handle thinking deltas in stream
  for await (const event of stream) {
    if (event.type === 'content_block_start') {
      if (event.content_block.type === 'thinking') {
        console.log('[Messages] Thinking block started');
        res.write(`data: ${JSON.stringify({
          type: 'thinking_start',
          index: event.index
        })}\n\n`);
      }
    } else if (event.type === 'content_block_delta') {
      if (event.delta.type === 'thinking_delta') {
        fullThinkingContent += event.delta.thinking;
        res.write(`data: ${JSON.stringify({
          type: 'thinking',
          text: event.delta.thinking
        })}\n\n`);
      } else if (event.delta.type === 'text_delta') {
        fullContent += event.delta.text;
        res.write(`data: ${JSON.stringify({
          type: 'content',
          text: event.delta.text
        })}\n\n`);
      }
    }
  }
});

3.2 Nouveaux Types de Réponse

Structure de réponse avec thinking:

{
  "content": [
    {
      "type": "thinking",
      "thinking": "Let me analyze this step by step...",
      "signature": "WaUjzkypQ2mUEVM36O2TxuC06KN8xyfbJwyem2dw3URve..."
    },
    {
      "type": "text",
      "text": "Based on my analysis..."
    }
  ]
}

Events de streaming:

// Événement de début de thinking
{
  "type": "thinking_start",
  "index": 0
}

// Événement de delta thinking
{
  "type": "thinking",
  "text": "Let me analyze..."
}

// Événement de fin de thinking (automatique avec content_block_stop)
{
  "type": "thinking_stop",
  "index": 0
}

// Événements de contenu normal
{
  "type": "content",
  "text": "Based on..."
}

3.3 Base de Données

Modifications du schéma `conversations`

-- Ajouter colonne pour activer thinking par conversation
ALTER TABLE conversations ADD COLUMN enable_thinking INTEGER DEFAULT 0;
ALTER TABLE conversations ADD COLUMN thinking_budget_tokens INTEGER DEFAULT 10000;

Modifications du schéma `messages`

-- Ajouter colonne pour stocker thinking content
ALTER TABLE messages ADD COLUMN thinking_content TEXT DEFAULT NULL;
ALTER TABLE messages ADD COLUMN thinking_signature TEXT DEFAULT NULL;

Migration dans server/db/index.js:

// Add thinking columns if they don't exist
const hasThinkingColumns = db.prepare(`
  SELECT COUNT(*) as count FROM pragma_table_info('conversations')
  WHERE name IN ('enable_thinking', 'thinking_budget_tokens')
`).get();

if (hasThinkingColumns.count < 2) {
  console.log('Adding thinking columns to conversations table...');
  db.exec(`
    ALTER TABLE conversations ADD COLUMN enable_thinking INTEGER DEFAULT 0;
    ALTER TABLE conversations ADD COLUMN thinking_budget_tokens INTEGER DEFAULT 10000;
  `);
}

const hasMessageThinking = db.prepare(`
  SELECT COUNT(*) as count FROM pragma_table_info('messages')
  WHERE name IN ('thinking_content', 'thinking_signature')
`).get();

if (hasMessageThinking.count < 2) {
  console.log('Adding thinking columns to messages table...');
  db.exec(`
    ALTER TABLE messages ADD COLUMN thinking_content TEXT DEFAULT NULL;
    ALTER TABLE messages ADD COLUMN thinking_signature TEXT DEFAULT NULL;
  `);
}

4. Architecture Frontend

4.1 Interface Utilisateur

Nouveau composant: `ThinkingBlock`

Fichier: src/components/ThinkingBlock.jsx

import React, { useState } from 'react';

function ThinkingBlock({ thinking, signature, isStreaming }) {
  const [isExpanded, setIsExpanded] = useState(false);

  return (
    <div className="my-4 rounded-lg border border-blue-200 bg-blue-50 dark:border-blue-800 dark:bg-blue-950">
      {/* Header avec toggle */}
      <button
        onClick={() => setIsExpanded(!isExpanded)}
        className="w-full flex items-center justify-between p-3 text-left"
      >
        <div className="flex items-center gap-2">
          {/* Icône cerveau/pensée */}
          <svg className="w-5 h-5 text-blue-600 dark:text-blue-400" fill="currentColor" viewBox="0 0 20 20">
            <path d="M10 2a8 8 0 100 16 8 8 0 000-16zm1 11H9v-2h2v2zm0-4H9V5h2v4z"/>
          </svg>
          <span className="font-medium text-blue-900 dark:text-blue-100">
            {isStreaming ? 'Thinking...' : 'Claude\'s reasoning'}
          </span>
          {isStreaming && (
            <div className="flex gap-1">
              <div className="w-2 h-2 bg-blue-500 rounded-full animate-bounce" style={{animationDelay: '0ms'}}></div>
              <div className="w-2 h-2 bg-blue-500 rounded-full animate-bounce" style={{animationDelay: '150ms'}}></div>
              <div className="w-2 h-2 bg-blue-500 rounded-full animate-bounce" style={{animationDelay: '300ms'}}></div>
            </div>
          )}
        </div>
        <svg
          className={`w-5 h-5 text-blue-600 dark:text-blue-400 transition-transform ${isExpanded ? 'rotate-180' : ''}`}
          fill="none"
          stroke="currentColor"
          viewBox="0 0 24 24"
        >
          <path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M19 9l-7 7-7-7" />
        </svg>
      </button>

      {/* Contenu thinking (collapsible) */}
      {isExpanded && (
        <div className="px-3 pb-3 text-sm text-blue-800 dark:text-blue-200 whitespace-pre-wrap font-mono">
          {thinking || 'Thinking in progress...'}
        </div>
      )}
    </div>
  );
}

export default ThinkingBlock;

Modifications dans `src/App.jsx`

1. État pour thinking dans Message Component:

function Message({ message, isStreaming }) {
  const [thinkingContent, setThinkingContent] = useState(message.thinking_content || '');
  const [isThinkingStreaming, setIsThinkingStreaming] = useState(false);

  return (
    <div className="message">
      {/* Afficher thinking block si présent */}
      {thinkingContent && (
        <ThinkingBlock
          thinking={thinkingContent}
          signature={message.thinking_signature}
          isStreaming={isThinkingStreaming}
        />
      )}

      {/* Contenu normal du message */}
      <div className="message-content">
        {message.content}
      </div>
    </div>
  );
}

2. Settings Panel - Ajouter contrôles thinking:

function ConversationSettings({ conversation, onUpdate }) {
  const [settings, setSettings] = useState(JSON.parse(conversation.settings || '{}'));

  return (
    <div className="settings-panel">
      {/* Existing settings */}
      <div className="setting-group">
        <label>Temperature</label>
        <input type="range" ... />
      </div>

      {/* Nouveau: Extended Thinking Toggle */}
      <div className="setting-group">
        <label className="flex items-center justify-between">
          <span className="flex items-center gap-2">
            <svg className="w-4 h-4" fill="currentColor" viewBox="0 0 20 20">
              <path d="M10 2a8 8 0 100 16 8 8 0 000-16zm1 11H9v-2h2v2zm0-4H9V5h2v4z"/>
            </svg>
            Extended Thinking
          </span>
          <input
            type="checkbox"
            checked={settings.enableThinking || false}
            onChange={(e) => {
              const newSettings = {
                ...settings,
                enableThinking: e.target.checked
              };
              setSettings(newSettings);
              onUpdate(newSettings);
            }}
            className="w-4 h-4"
          />
        </label>
        <p className="text-xs text-gray-500 mt-1">
          Enable enhanced reasoning for complex tasks
        </p>
      </div>

      {/* Thinking Budget (si thinking activé) */}
      {settings.enableThinking && (
        <div className="setting-group">
          <label>Thinking Budget</label>
          <input
            type="range"
            min="1024"
            max="32000"
            step="1024"
            value={settings.thinkingBudgetTokens || 10000}
            onChange={(e) => {
              const newSettings = {
                ...settings,
                thinkingBudgetTokens: parseInt(e.target.value)
              };
              setSettings(newSettings);
              onUpdate(newSettings);
            }}
            className="w-full"
          />
          <div className="flex justify-between text-xs text-gray-500">
            <span>1K tokens</span>
            <span>{(settings.thinkingBudgetTokens || 10000).toLocaleString()} tokens</span>
            <span>32K tokens</span>
          </div>
          <p className="text-xs text-gray-500 mt-1">
            Higher budgets enable more thorough analysis
          </p>
        </div>
      )}
    </div>
  );
}

3. Streaming Handler - Gérer thinking deltas:

async function sendMessage(content) {
  // ... existing code ...

  const eventSource = new EventSource(`${API_BASE}/conversations/${conversationId}/messages`);

  let currentThinking = '';
  let currentContent = '';
  let isInThinkingBlock = false;

  eventSource.addEventListener('message', (event) => {
    const data = JSON.parse(event.data);

    switch (data.type) {
      case 'thinking_start':
        isInThinkingBlock = true;
        currentThinking = '';
        setIsThinkingStreaming(true);
        break;

      case 'thinking':
        currentThinking += data.text;
        // Update thinking content in real-time
        setThinkingContent(currentThinking);
        break;

      case 'thinking_stop':
        isInThinkingBlock = false;
        setIsThinkingStreaming(false);
        break;

      case 'content':
        currentContent += data.text;
        // Update message content
        setMessageContent(currentContent);
        break;

      case 'done':
        eventSource.close();
        // Save message with thinking
        saveMessage({
          content: currentContent,
          thinking_content: currentThinking,
          thinking_signature: data.thinking_signature
        });
        break;
    }
  });
}

4.2 Indicateurs Visuels

Badge "Thinking Enabled" dans conversation list

function ConversationListItem({ conversation }) {
  const settings = JSON.parse(conversation.settings || '{}');

  return (
    <div className="conversation-item">
      <div className="conversation-title">{conversation.title}</div>

      {/* Badge thinking */}
      {settings.enableThinking && (
        <span className="inline-flex items-center gap-1 px-2 py-0.5 text-xs rounded-full bg-blue-100 text-blue-700 dark:bg-blue-900 dark:text-blue-200">
          <svg className="w-3 h-3" fill="currentColor" viewBox="0 0 20 20">
            <path d="M10 2a8 8 0 100 16 8 8 0 000-16zm1 11H9v-2h2v2zm0-4H9V5h2v4z"/>
          </svg>
          Thinking
        </span>
      )}
    </div>
  );
}

5. Gestion du Streaming

5.1 Events Sequence

User: "Solve this complex math problem..."

Event 1: thinking_start
  → Show thinking block with loading animation

Event 2-N: thinking deltas
  → Update thinking content incrementally
  → Show typing animation

Event N+1: thinking_stop (implicit with content_block_stop)
  → Stop thinking animation
  → Mark thinking complete

Event N+2: content_start
  → Start showing answer

Event N+3-M: content deltas
  → Stream answer text

Event M+1: done
  → Save complete message with thinking + content

5.2 Error Handling

Timeout pour thinking:

const THINKING_TIMEOUT = 120000; // 2 minutes

let thinkingTimeout = setTimeout(() => {
  console.warn('[Thinking] Timeout reached');
  res.write(`data: ${JSON.stringify({
    type: 'thinking_timeout',
    message: 'Thinking process is taking longer than expected...'
  })}\n\n`);
}, THINKING_TIMEOUT);

// Clear timeout when thinking completes
stream.on('content_block_stop', () => {
  clearTimeout(thinkingTimeout);
});

Redacted thinking blocks:

if (event.content_block.type === 'redacted_thinking') {
  console.log('[Thinking] Redacted thinking block detected');
  res.write(`data: ${JSON.stringify({
    type: 'thinking_redacted',
    message: 'Some reasoning has been encrypted for safety'
  })}\n\n`);
}

6. Compatibilité avec Tools

6.1 Préservation des Thinking Blocks

Important: Lors de l'utilisation de tools avec thinking, il faut préserver les thinking blocks:

// Quand Claude utilise un tool
if (finalMessage.stop_reason === 'tool_use') {
  // Extraire tous les blocks thinking ET tool_use
  const thinkingBlocks = finalMessage.content.filter(b =>
    b.type === 'thinking' || b.type === 'redacted_thinking'
  );
  const toolUseBlocks = finalMessage.content.filter(b =>
    b.type === 'tool_use'
  );

  // Ajouter à la conversation
  conversationMessages.push({
    role: 'assistant',
    content: [...thinkingBlocks, ...toolUseBlocks]
  });

  // Exécuter tools
  const toolResults = await processToolCalls(toolUseBlocks);

  // Continuer avec les résultats
  conversationMessages.push({
    role: 'user',
    content: toolResults
  });
}

6.2 Interleaved Thinking (Beta)

Pour activer le thinking entre les tool calls:

// Ajouter le beta header
const response = await anthropic.messages.create({
  model: 'claude-opus-4-5',
  thinking: { type: 'enabled', budget_tokens: 20000 },
  tools: memoryTools,
  messages: conversationMessages
}, {
  headers: {
    'anthropic-beta': 'interleaved-thinking-2025-05-14'
  }
});

7. Pricing & Token Management

7.1 Facturation

Résumé (Claude 4+):

Input tokens: Tokens de la requête (excluant thinking précédents)
Output tokens (facturés): Tokens thinking originaux complets
Output tokens (visibles): Tokens thinking résumés affichés
Pas de charge: Tokens utilisés pour générer le résumé

Important: Le nombre de tokens facturés ≠ tokens visibles dans la réponse.

7.2 Token Tracking

Backend - Logging détaillé:

// Après réponse avec thinking
console.log('[Thinking Tokens]', {
  input_tokens: response.usage.input_tokens,
  output_tokens: response.usage.output_tokens, // Inclut thinking complet
  visible_thinking_tokens: calculateTokens(thinkingContent), // Thinking résumé
  text_output_tokens: calculateTokens(textContent)
});

// Sauvegarder dans usage_tracking
db.prepare(`
  INSERT INTO usage_tracking (
    id, user_id, conversation_id, message_id, model,
    input_tokens, output_tokens, thinking_tokens, created_at
  ) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?)
`).run(
  uuidv4(), 'default', conversationId, messageId, model,
  response.usage.input_tokens,
  response.usage.output_tokens,
  calculateTokens(thinkingContent), // Pour tracking
  new Date().toISOString()
);

Frontend - Affichage dans usage stats:

function UsageStats({ conversation }) {
  return (
    <div className="usage-stats">
      <div className="stat">
        <label>Total Tokens</label>
        <span>{conversation.token_count.toLocaleString()}</span>
      </div>

      {conversation.thinking_tokens > 0 && (
        <>
          <div className="stat text-blue-600">
            <label>Thinking Tokens</label>
            <span>{conversation.thinking_tokens.toLocaleString()}</span>
          </div>
          <p className="text-xs text-gray-500">
            Thinking tokens are summarized but billed at full rate
          </p>
        </>
      )}
    </div>
  );
}

8. Best Practices

8.1 Quand activer thinking

Recommandé pour:

✅ Problèmes mathématiques complexes
✅ Analyse de code et debugging
✅ Raisonnement logique multi-étapes
✅ Analyse approfondie de documents
✅ Tâches nécessitant planification

Pas nécessaire pour:

❌ Questions simples
❌ Tâches créatives (écriture, brainstorming)
❌ Conversations courtes
❌ Réponses rapides

8.2 Budget Recommendations

Type de tâche	Budget recommandé
Calculs simples	1,024 - 4,096 tokens
Analyse standard	4,096 - 10,000 tokens
Problèmes complexes	10,000 - 16,000 tokens
Tâches très complexes	16,000 - 32,000 tokens
Recherche approfondie	32,000+ tokens (batch)

Note: Au-delà de 32K tokens, utiliser batch processing pour éviter les timeouts.

8.3 UI/UX Guidelines

Visibility: Thinking blocks doivent être collapsibles par défaut
Feedback: Montrer animation pendant le thinking streaming
Transparency: Indiquer clairement quand thinking est actif
Performance: Thinking peut augmenter le temps de réponse de 2-5x
Settings: Permettre d'activer/désactiver par conversation

9. Plan d'Implémentation

Phase 1: Backend Core (2-3h)

Modifier server/routes/claude.js pour supporter thinking parameter
Modifier server/routes/messages.js pour streaming thinking
Ajouter colonnes DB pour thinking storage
Migration base de données
Tests API avec thinking enabled

Phase 2: Frontend UI (3-4h)

Créer composant ThinkingBlock.jsx
Intégrer thinking display dans messages
Ajouter toggle thinking dans settings
Ajouter thinking budget slider
Tests visuels et UX

Phase 3: Streaming & Real-time (2-3h)

Implémenter thinking_delta handling
Animations de streaming
Gestion des timeouts
Error handling pour redacted thinking
Tests de streaming

Phase 4: Tools Integration (2h)

Préservation thinking blocks avec tools
Tests thinking + memory tools
Tests thinking + autres tools futurs

Phase 5: Polish & Optimization (2h)

Token tracking et logging
Usage analytics pour thinking
Documentation utilisateur
Performance optimization
Tests end-to-end

Phase 6: Testing & Deployment (1-2h)

Tests avec différents modèles
Tests avec différents budgets
Tests cas d'edge (redacted, timeouts)
Commit et push
Documentation finale

Temps total estimé: 12-16 heures

10. Exemples de Code Complets

10.1 Exemple Backend Complet

// server/routes/messages.js - POST /:conversationId/messages

router.post('/:conversationId/messages', async (req, res) => {
  const db = getDatabase();
  const { conversationId } = req.params;
  const { content, images } = req.body;

  // Validate conversation exists
  const conversation = db.prepare('SELECT * FROM conversations WHERE id = ? AND is_deleted = 0')
    .get(conversationId);

  if (!conversation) {
    return res.status(404).json({ error: { message: 'Conversation not found', status: 404 } });
  }

  // Set up SSE headers
  res.setHeader('Content-Type', 'text/event-stream');
  res.setHeader('Cache-Control', 'no-cache');
  res.setHeader('Connection', 'keep-alive');

  // Parse settings with thinking support
  const settings = JSON.parse(conversation.settings || '{}');
  const model = conversation.model || 'claude-sonnet-4-5-20250929';
  const temperature = settings.temperature || 1;
  const maxTokens = settings.maxTokens || 4096;
  const enableThinking = settings.enableThinking || false;
  const thinkingBudgetTokens = settings.thinkingBudgetTokens || 10000;
  const enableMemoryTools = true;

  // Save user message
  const userMessageId = uuidv4();
  const now = new Date().toISOString();

  db.prepare(`
    INSERT INTO messages (id, conversation_id, role, content, created_at, images)
    VALUES (?, ?, ?, ?, ?, ?)
  `).run(userMessageId, conversationId, 'user', content, now, JSON.stringify(images || []));

  // Get conversation history
  const dbMessages = db.prepare(`
    SELECT role, content, images FROM messages
    WHERE conversation_id = ?
    ORDER BY created_at ASC
  `).all(conversationId);

  // Format messages for Claude API
  const apiMessages = dbMessages.map(m => ({
    role: m.role,
    content: m.content
  }));

  // Get tools and system prompt
  const tools = enableMemoryTools ? getMemoryTools() : [];
  const systemPrompt = buildSystemPrompt(
    getGlobalCustomInstructions(),
    getProjectCustomInstructions(conversation.project_id),
    enableMemoryTools
  );

  // Tracking variables
  const assistantMessageId = uuidv4();
  let fullThinkingContent = '';
  let thinkingSignature = '';
  let fullContent = '';
  let totalInputTokens = 0;
  let totalOutputTokens = 0;

  try {
    // Build request options
    const requestOptions = {
      model,
      max_tokens: maxTokens,
      temperature,
      messages: apiMessages
    };

    if (systemPrompt) {
      requestOptions.system = systemPrompt;
    }

    if (tools.length > 0) {
      requestOptions.tools = tools;
    }

    // Add thinking if enabled
    if (enableThinking) {
      requestOptions.thinking = {
        type: 'enabled',
        budget_tokens: thinkingBudgetTokens
      };
      console.log(`[Messages] Extended thinking enabled with budget: ${thinkingBudgetTokens}`);
    }

    // Create streaming response
    const stream = await anthropic.messages.stream(requestOptions);

    let isInThinkingBlock = false;
    let currentBlockIndex = -1;

    // Stream events to client
    for await (const event of stream) {
      if (event.type === 'content_block_start') {
        currentBlockIndex = event.index;

        if (event.content_block.type === 'thinking') {
          isInThinkingBlock = true;
          console.log('[Messages] Thinking block started');
          res.write(`data: ${JSON.stringify({
            type: 'thinking_start',
            index: currentBlockIndex
          })}\n\n`);
        } else if (event.content_block.type === 'tool_use') {
          console.log(`[Messages] Tool use requested: ${event.content_block.name}`);
          res.write(`data: ${JSON.stringify({
            type: 'tool_use',
            tool: event.content_block.name,
            id: event.content_block.id
          })}\n\n`);
        }
      } else if (event.type === 'content_block_delta') {
        if (event.delta.type === 'thinking_delta') {
          fullThinkingContent += event.delta.thinking;
          res.write(`data: ${JSON.stringify({
            type: 'thinking',
            text: event.delta.thinking
          })}\n\n`);
        } else if (event.delta.type === 'text_delta') {
          fullContent += event.delta.text;
          res.write(`data: ${JSON.stringify({
            type: 'content',
            text: event.delta.text
          })}\n\n`);
        } else if (event.delta.type === 'signature_delta') {
          thinkingSignature += event.delta.signature;
        }
      } else if (event.type === 'content_block_stop') {
        if (isInThinkingBlock) {
          isInThinkingBlock = false;
          console.log('[Messages] Thinking block completed');
          res.write(`data: ${JSON.stringify({
            type: 'thinking_stop',
            index: currentBlockIndex
          })}\n\n`);
        }
      } else if (event.type === 'message_delta') {
        if (event.usage) {
          totalInputTokens += event.usage.input_tokens || 0;
          totalOutputTokens += event.usage.output_tokens || 0;
        }
      }
    }

    // Get final message
    const finalMessage = await stream.finalMessage();
    totalInputTokens = finalMessage.usage?.input_tokens || totalInputTokens;
    totalOutputTokens = finalMessage.usage?.output_tokens || totalOutputTokens;

    // Save assistant message with thinking
    const assistantNow = new Date().toISOString();
    db.prepare(`
      INSERT INTO messages (
        id, conversation_id, role, content,
        thinking_content, thinking_signature,
        created_at, tokens, finish_reason
      ) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?)
    `).run(
      assistantMessageId, conversationId, 'assistant', fullContent,
      fullThinkingContent || null, thinkingSignature || null,
      assistantNow, totalOutputTokens, finalMessage.stop_reason
    );

    // Update conversation
    db.prepare(`
      UPDATE conversations
      SET last_message_at = ?, updated_at = ?,
          message_count = message_count + 2,
          token_count = token_count + ?
      WHERE id = ?
    `).run(assistantNow, assistantNow, totalInputTokens + totalOutputTokens, conversationId);

    // Track usage
    db.prepare(`
      INSERT INTO usage_tracking (
        id, user_id, conversation_id, message_id, model,
        input_tokens, output_tokens, created_at
      ) VALUES (?, ?, ?, ?, ?, ?, ?, ?)
    `).run(
      uuidv4(), 'default', conversationId, assistantMessageId, model,
      totalInputTokens, totalOutputTokens, assistantNow
    );

    // Send done event
    res.write(`data: ${JSON.stringify({
      type: 'done',
      id: assistantMessageId,
      model: finalMessage.model,
      stopReason: finalMessage.stop_reason,
      usage: {
        inputTokens: totalInputTokens,
        outputTokens: totalOutputTokens
      },
      thinkingTokens: fullThinkingContent.length > 0 ?
        Math.ceil(fullThinkingContent.length / 4) : 0
    })}\n\n`);

    res.end();

  } catch (error) {
    console.error('Claude API stream error:', error);
    res.write(`data: ${JSON.stringify({
      type: 'error',
      message: error.message
    })}\n\n`);
    res.end();
  }
});

10.2 Exemple Frontend Complet

// src/App.jsx - Message Component with Thinking

function Message({ message, isStreaming }) {
  const [thinkingContent, setThinkingContent] = useState(message.thinking_content || '');
  const [isThinkingExpanded, setIsThinkingExpanded] = useState(false);
  const [isThinkingStreaming, setIsThinkingStreaming] = useState(false);

  return (
    <div className={`message ${message.role === 'assistant' ? 'assistant' : 'user'}`}>
      {/* Thinking Block (si présent) */}
      {thinkingContent && message.role === 'assistant' && (
        <div className="my-4 rounded-lg border border-blue-200 bg-blue-50 dark:border-blue-800 dark:bg-blue-950">
          {/* Header */}
          <button
            onClick={() => setIsThinkingExpanded(!isThinkingExpanded)}
            className="w-full flex items-center justify-between p-3 text-left hover:bg-blue-100 dark:hover:bg-blue-900 transition-colors"
          >
            <div className="flex items-center gap-2">
              {/* Brain Icon */}
              <svg
                className="w-5 h-5 text-blue-600 dark:text-blue-400"
                fill="currentColor"
                viewBox="0 0 20 20"
              >
                <path d="M10 2a6 6 0 00-6 6v3.586l-.707.707A1 1 0 004 14h12a1 1 0 00.707-1.707L16 11.586V8a6 6 0 00-6-6zM10 18a3 3 0 01-3-3h6a3 3 0 01-3 3z"/>
              </svg>
              <span className="font-medium text-blue-900 dark:text-blue-100">
                {isThinkingStreaming ? 'Claude is thinking...' : 'Claude\'s reasoning process'}
              </span>

              {/* Loading dots si streaming */}
              {isThinkingStreaming && (
                <div className="flex gap-1 ml-2">
                  <div
                    className="w-2 h-2 bg-blue-500 rounded-full animate-bounce"
                    style={{animationDelay: '0ms'}}
                  />
                  <div
                    className="w-2 h-2 bg-blue-500 rounded-full animate-bounce"
                    style={{animationDelay: '150ms'}}
                  />
                  <div
                    className="w-2 h-2 bg-blue-500 rounded-full animate-bounce"
                    style={{animationDelay: '300ms'}}
                  />
                </div>
              )}
            </div>

            {/* Chevron */}
            <svg
              className={`w-5 h-5 text-blue-600 dark:text-blue-400 transition-transform duration-200 ${
                isThinkingExpanded ? 'rotate-180' : ''
              }`}
              fill="none"
              stroke="currentColor"
              viewBox="0 0 24 24"
            >
              <path
                strokeLinecap="round"
                strokeLinejoin="round"
                strokeWidth={2}
                d="M19 9l-7 7-7-7"
              />
            </svg>
          </button>

          {/* Thinking Content (collapsible) */}
          {isThinkingExpanded && (
            <div className="px-3 pb-3 border-t border-blue-200 dark:border-blue-800">
              <div className="pt-3 text-sm text-blue-800 dark:text-blue-200 whitespace-pre-wrap font-mono leading-relaxed">
                {thinkingContent || (
                  <div className="italic text-blue-600 dark:text-blue-400">
                    Thinking in progress...
                  </div>
                )}
              </div>

              {/* Stats footer */}
              <div className="mt-3 pt-2 border-t border-blue-200 dark:border-blue-800 flex items-center justify-between text-xs text-blue-600 dark:text-blue-400">
                <span>
                  ~{Math.ceil(thinkingContent.length / 4)} tokens
                </span>
                <span className="italic">
                  Summarized for display
                </span>
              </div>
            </div>
          )}
        </div>
      )}

      {/* Message Content */}
      <div className="message-content prose dark:prose-invert max-w-none">
        <ReactMarkdown>{message.content}</ReactMarkdown>
      </div>
    </div>
  );
}

// Streaming handler avec thinking support
async function sendMessage(conversationId, content) {
  const response = await fetch(`${API_BASE}/conversations/${conversationId}/messages`, {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({ content })
  });

  const reader = response.body.getReader();
  const decoder = new TextDecoder();

  let currentThinking = '';
  let currentContent = '';
  let isInThinkingBlock = false;
  let messageId = null;

  // Create temporary message
  const tempMessage = {
    id: 'temp-' + Date.now(),
    role: 'assistant',
    content: '',
    thinking_content: '',
    isStreaming: true
  };

  setMessages(prev => [...prev, tempMessage]);

  while (true) {
    const { done, value } = await reader.read();
    if (done) break;

    const chunk = decoder.decode(value);
    const lines = chunk.split('\n');

    for (const line of lines) {
      if (!line.startsWith('data: ')) continue;

      try {
        const data = JSON.parse(line.slice(6));

        switch (data.type) {
          case 'thinking_start':
            isInThinkingBlock = true;
            setMessages(prev => prev.map(m =>
              m.id === tempMessage.id
                ? { ...m, isThinkingStreaming: true }
                : m
            ));
            break;

          case 'thinking':
            currentThinking += data.text;
            setMessages(prev => prev.map(m =>
              m.id === tempMessage.id
                ? { ...m, thinking_content: currentThinking }
                : m
            ));
            break;

          case 'thinking_stop':
            isInThinkingBlock = false;
            setMessages(prev => prev.map(m =>
              m.id === tempMessage.id
                ? { ...m, isThinkingStreaming: false }
                : m
            ));
            break;

          case 'content':
            currentContent += data.text;
            setMessages(prev => prev.map(m =>
              m.id === tempMessage.id
                ? { ...m, content: currentContent }
                : m
            ));
            break;

          case 'done':
            messageId = data.id;
            // Update with final message
            setMessages(prev => prev.map(m =>
              m.id === tempMessage.id
                ? {
                    id: messageId,
                    role: 'assistant',
                    content: currentContent,
                    thinking_content: currentThinking,
                    isStreaming: false,
                    isThinkingStreaming: false,
                    usage: data.usage
                  }
                : m
            ));
            break;

          case 'error':
            console.error('Streaming error:', data.message);
            setMessages(prev => prev.filter(m => m.id !== tempMessage.id));
            alert('Error: ' + data.message);
            break;
        }
      } catch (e) {
        console.error('Error parsing SSE data:', e);
      }
    }
  }
}

11. Testing Checklist

11.1 Tests Fonctionnels

Thinking activé pour conversation → blocs thinking apparaissent
Thinking désactivé → pas de blocs thinking
Streaming thinking fonctionne en temps réel
Toggle thinking dans settings fonctionne
Budget slider fonctionne (1K-32K)
Thinking blocks sont collapsibles
Thinking blocks persistent après refresh
Thinking + memory tools fonctionnent ensemble
Multiple thinking blocks dans une réponse
Redacted thinking est géré correctement

11.2 Tests Edge Cases

Thinking timeout (>2 min) géré gracefully
Erreurs réseau pendant thinking stream
Thinking avec très grand budget (>32K)
Thinking avec petit budget (1K)
Conversation avec 100+ messages et thinking
Regenerate avec thinking activé
Edit message avec thinking
Export conversation avec thinking

11.3 Tests Performance

Temps de réponse thinking vs non-thinking
Mémoire utilisée avec thinking streaming
Database performance avec thinking storage
UI responsive pendant thinking
Multiple conversations avec thinking simultanées

12. Documentation Utilisateur

Guide Rapide

Qu'est-ce que Extended Thinking?

Extended Thinking permet à Claude de "montrer son travail" en exposant son processus de raisonnement étape par étape avant de donner sa réponse finale. Particulièrement utile pour:

Problèmes mathématiques complexes
Analyse de code approfondie
Raisonnement logique multi-étapes
Planification de tâches complexes

Comment l'activer?

Ouvrir les paramètres de conversation (icône ⚙️)
Activer "Extended Thinking"
Ajuster le budget si nécessaire (10K par défaut)
Commencer à discuter

Interpréter les blocs de thinking

🧠 Thinking blocks (bleu): Processus de réflexion de Claude
Cliquer pour expand/collapse
Contenu est résumé mais bille au tarif complet
Peut augmenter le temps de réponse de 2-5x

Quand l'utiliser?

✅ OUI: Calculs, code, analyse, logique complexe ❌ NON: Questions simples, chat rapide, créativité

13. Notes Importantes

13.1 Limitations

Incompatibilités:
- ❌ Pas compatible avec temperature custom ou top_k
- ❌ Pas de pre-fill responses avec thinking
- ❌ Pas de forced tool use (tool_choice: "any")
- ✅ Compatible avec top_p (0.95-1)
Context Window:
- Thinking blocks précédents retirés automatiquement
- Token budget thinking compte vers max_tokens
- Formule: context = current_input + (thinking + encrypted + output)
Caching:
- Changer thinking parameters invalide message cache
- System prompt reste en cache
- Thinking blocks comptent comme input tokens en cache

13.2 Modèles Spécifiques

Claude Opus 4.5 (unique):

Préserve thinking blocks par défaut
Meilleure optimization cache
Économies de tokens sur multi-turn

Claude 3.7 (déprécié):

Retourne thinking COMPLET (non résumé)
Tokens visibles = tokens facturés
Migration vers Claude 4+ recommandée

Annexes

A. Structure de fichiers complète

generations/my_project/
├── server/
│   ├── routes/
│   │   ├── claude.js          # Modifié: thinking support
│   │   └── messages.js         # Modifié: thinking streaming
│   ├── db/
│   │   └── index.js            # Modifié: thinking columns migration
│   └── config/
│       └── thinkingDefaults.js # Nouveau: configuration thinking
├── src/
│   ├── components/
│   │   └── ThinkingBlock.jsx   # Nouveau: composant thinking
│   ├── App.jsx                 # Modifié: thinking UI integration
│   └── utils/
│       └── thinkingHelpers.js  # Nouveau: helpers thinking
└── prompts/
    └── extended_thinking_spec.md # Cette spec

B. Variables d'environnement

Aucune nouvelle variable nécessaire. Extended Thinking fonctionne avec les credentials Anthropic existants.

C. Compatibilité navigateurs

Extended Thinking utilise EventSource (SSE) qui est supporté par:

✅ Chrome/Edge 79+
✅ Firefox 65+
✅ Safari 13+
❌ IE11 (non supporté)

Fin de la spécification Extended Thinking

Version: 1.0 Date: 2025-12-18 Auteur: Claude Sonnet 4.5

40 KiB Raw Blame History