diff --git a/MODIFICATIONS_BACKUP_SYSTEM.md b/MODIFICATIONS_BACKUP_SYSTEM.md new file mode 100644 index 0000000..6302631 --- /dev/null +++ b/MODIFICATIONS_BACKUP_SYSTEM.md @@ -0,0 +1,318 @@ +# Modifications du système de backup des conversations + +**Date** : 2025-12-20 +**Objectif** : Utiliser `append_to_conversation` au lieu de `addThought` pour avoir des embeddings complets par message + +--- + +## Problème identifié + +### Ancien système (conversationBackup.js) +```javascript +// ❌ Tronquait chaque message à 200 chars +const preview = msg.content.substring(0, 200); + +// ❌ Utilisait addThought() qui crée UN SEUL document +await addThought(summary, context); +``` + +**Résultat** : +- Messages tronqués à 200 caractères +- Un seul document pour toute la conversation +- Perte massive d'information +- Modèle BAAI/bge-m3 (8192 tokens) sous-utilisé + +--- + +## Nouveau système + +### 1. memoryService_updated.js + +**Changements** : +- `{role, content}` → `{author, content, timestamp, thinking}` +- Ajout de `options.participants` (requis pour création) +- Ajout de `options.context` (requis pour création) + +```javascript +export async function appendToConversation(conversationId, newMessages, options = {}) { + // newMessages: [{author, content, timestamp, thinking}, ...] + // options.participants: ["user", "assistant"] + // options.context: {category, tags, summary, date, ...} + + const args = { + conversation_id: conversationId, + new_messages: newMessages + }; + + if (options.participants) { + args.participants = options.participants; + } + + if (options.context) { + args.context = options.context; + } + + const response = await callMCPTool('append_to_conversation', args); +} +``` + +### 2. conversationBackup_updated.js + +**Changements** : + +#### Avant (addThought) : +```javascript +// ❌ Tronqué +messages.forEach((msg) => { + const preview = msg.content.substring(0, 200); + summary += `[${msg.role}]: ${preview}...\n\n`; +}); + +await addThought(summary, {...}); +``` + +#### Après (appendToConversation) : +```javascript +// ✅ Messages COMPLETS +const formattedMessages = messages.map(msg => ({ + author: msg.role, + content: msg.content, // PAS DE TRUNCATION ! + timestamp: msg.created_at, + thinking: msg.thinking_content // Support Extended Thinking +})); + +await appendToConversation( + conversationId, + formattedMessages, // Tous les messages complets + { + participants: ['user', 'assistant'], + context: { + category, + tags, + summary, + date, + title, + key_insights: [] + } + } +); +``` + +--- + +## Architecture ChromaDB + +### Ce que append_to_conversation fait dans mcp_ikario_memory.py : + +```python +# 1. Document PRINCIPAL : conversation complète (contexte global) +conversations.add( + documents=[full_conversation_text], # Texte complet + metadatas=[main_metadata], + ids=[conversation_id] +) + +# 2. Documents INDIVIDUELS : chaque message séparément +for msg in messages: + conversations.add( + documents=[msg_content], # Message COMPLET (8192 tokens max) + metadatas=[msg_metadata], + ids=[f"{conversation_id}_msg_{i}"] + ) +``` + +### Résultat : +- 1 conversation de 31 messages = **32 documents ChromaDB** : + - 1 document principal (vue d'ensemble) + - 31 documents individuels (granularité message par message) +- Chaque message a son **embedding complet** (jusqu'à 8192 tokens avec BAAI/bge-m3) +- Recherche sémantique précise par message + +--- + +## Avantages + +### 1. Couverture complète +| Taille message | Ancien système | Nouveau système | +|----------------|----------------|-----------------| +| 200 chars | 100% | 100% | +| 1,000 chars | 20% | 100% | +| 5,000 chars | 4% | 100% | +| 10,000 chars | 2% | 100% | + +### 2. Recherche sémantique précise +- Une conversation longue avec plusieurs sujets → plusieurs embeddings pertinents +- Recherche "concept X" trouve exactement le message qui en parle +- Pas de noyade dans un résumé global + +### 3. Support Extended Thinking +- Le champ `thinking_content` est préservé +- Inclus dans les embeddings pour enrichir la sémantique +- Visible dans les métadonnées + +### 4. Idempotence +- `append_to_conversation` auto-détecte si la conversation existe +- Si nouvelle → crée avec `add_conversation` +- Si existe → ajoute seulement nouveaux messages +- Pas d'erreur si on re-backup + +--- + +## Fichiers créés + +### 1. `/server/services/memoryService_updated.js` +- Version mise à jour de `appendToConversation()` +- Accepte `participants` et `context` +- Utilise `{author, content, timestamp, thinking}` + +### 2. `/server/services/conversationBackup_updated.js` +- Remplace `addThought()` par `appendToConversation()` +- Envoie tous les messages COMPLETS +- Support Extended Thinking +- Logs détaillés + +### 3. `/test_backup_conversation.js` +- Script de test standalone +- Backup manuel d'une conversation +- Affiche statistiques et couverture +- Vérification des résultats + +--- + +## Test du nouveau système + +### Étape 1 : Lancer le serveur my_project + +```bash +cd C:/GitHub/Linear_coding/generations/my_project/server +npm start +``` + +### Étape 2 : Lancer le serveur MCP Ikario RAG + +```bash +cd C:/Users/david/SynologyDrive/ikario/ikario_rag +python -m mcp_server +``` + +### Étape 3 : Tester le backup + +```bash +cd C:/GitHub/Linear_coding/generations/my_project +node test_backup_conversation.js +``` + +### Résultat attendu : + +``` +TESTING BACKUP FOR: "test tes mémoires" +ID: 37fe0a0c-475c-4048-8433-adb40217dce7 +Messages: 31 +================================================================================= + +Message breakdown: + 1. user: 45 chars + 2. assistant: 1234 chars + 3. user: 67 chars + ... + 31. assistant: 890 chars + +Total: 12,345 chars (~2,469 words) + +Embedding coverage estimation: + OLD (all-MiniLM-L6-v2, 256 tokens): 8.3% + NEW (BAAI/bge-m3, 8192 tokens): 100.0% + Improvement: +91.7% + +Starting backup... + +SUCCESS! Conversation backed up to Ikario RAG + +What was saved: + - 31 COMPLETE messages + - Each message has its own embedding (no truncation) + - Model: BAAI/bge-m3 (8192 tokens max per message) + - Category: thematique + - Tags: Intelligence, Philosophie, Mémoire +``` + +--- + +## Vérification dans ChromaDB + +```bash +cd C:/Users/david/SynologyDrive/ikario/ikario_rag +python -c " +import chromadb +client = chromadb.PersistentClient(path='./index') +conv = client.get_collection('conversations') + +# Compter documents +all_docs = conv.get() +print(f'Total documents: {len(all_docs[\"ids\"])}') + +# Compter pour conversation test +conv_docs = [id for id in all_docs['ids'] if id.startswith('37fe0a0c')] +print(f'Documents pour conversation test: {len(conv_docs)}') +print(f' - 1 document principal + {len(conv_docs)-1} messages individuels') +" +``` + +--- + +## Prochaines étapes + +### Phase 2 (optionnel) : Chunking pour messages >8192 tokens + +Si certains messages dépassent 8192 tokens : +- Implémenter chunking intelligent +- Préserver la cohérence sémantique +- Metadata: message_id + chunk_position + +**Pour l'instant** : 8192 tokens = ~32,000 caractères = suffisant pour 99% des messages. + +--- + +## Migration + +### Pour activer le nouveau système : + +1. **Remplacer** `memoryService.js` par `memoryService_updated.js` +2. **Remplacer** `conversationBackup.js` par `conversationBackup_updated.js` +3. **Redémarrer** le serveur my_project +4. Les nouveaux backups utiliseront automatiquement le nouveau système +5. Les anciennes conversations peuvent être re-backupées (réinitialiser `has_memory_backup`) + +### Commandes : + +```bash +cd C:/GitHub/Linear_coding/generations/my_project/server/services + +# Backup des fichiers originaux +cp memoryService.js memoryService.original.js +cp conversationBackup.js conversationBackup.original.js + +# Activer les nouvelles versions +cp memoryService_updated.js memoryService.js +cp conversationBackup_updated.js conversationBackup.js + +# Redémarrer le serveur +npm start +``` + +--- + +## Résumé + +| Aspect | Avant | Après | +|--------|-------|-------| +| **Méthode** | `addThought()` | `appendToConversation()` | +| **Stockage** | Collection `thoughts` | Collection `conversations` | +| **Granularité** | 1 doc/conversation | 1 doc principal + N docs messages | +| **Troncation** | 200 chars/message ❌ | Aucune (8192 tokens) ✅ | +| **Embedding** | Résumé tronqué | Chaque message complet | +| **Thinking** | Non supporté | Supporté ✅ | +| **Recherche** | Approximative | Précise par message ✅ | +| **Idempotence** | Non | Oui (auto-detect) ✅ | + +**Gain** : De 1.2% à 38-40% de couverture pour conversations longues (>20,000 mots) diff --git a/fix_stats.mjs b/fix_stats.mjs new file mode 100644 index 0000000..7a02548 --- /dev/null +++ b/fix_stats.mjs @@ -0,0 +1,143 @@ +// Script pour corriger getMemoryStats() dans memoryService.js +import fs from 'fs'; + +const filePath = 'C:/GitHub/Linear_coding/generations/my_project/server/services/memoryService.js'; +let content = fs.readFileSync(filePath, 'utf8'); + +// Trouver et remplacer la fonction getMemoryStats +const oldFunction = `/** + * Get basic statistics about the memory store + * This is a convenience function that uses searchMemories to estimate count + * + * @returns {Promise} Statistics about the memory store + */ +export async function getMemoryStats() { + const status = getMCPStatus(); + + if (!isMCPConnected()) { + return { + connected: false, + enabled: status.enabled, + configured: status.configured, + total_memories: 0, + last_save: null, + error: status.error, + serverPath: status.serverPath, + }; + } + + try { + // Try to get a rough count by searching with a broad query + const result = await searchMemories('*', 1); + + return { + connected: true, + enabled: status.enabled, + configured: status.configured, + total_memories: result.count || 0, + last_save: new Date().toISOString(), // Would need to track this separately + error: null, + serverPath: status.serverPath, + }; + } catch (error) { + return { + connected: true, + enabled: status.enabled, + configured: status.configured, + total_memories: 0, + last_save: null, + error: error.message, + serverPath: status.serverPath, + }; + } +}`; + +const newFunction = `/** + * Get basic statistics about the memory store + * Counts thoughts and conversations separately using dedicated search tools + * + * @returns {Promise} Statistics about the memory store + */ +export async function getMemoryStats() { + const status = getMCPStatus(); + + if (!isMCPConnected()) { + return { + connected: false, + enabled: status.enabled, + configured: status.configured, + total_memories: 0, + thoughts_count: 0, + conversations_count: 0, + last_save: null, + error: status.error, + serverPath: status.serverPath, + }; + } + + try { + // Count thoughts using search_thoughts with broad query + let thoughtsCount = 0; + try { + const thoughtsResult = await callMCPTool('search_thoughts', { + query: 'a', // Simple query that will match most thoughts + n_results: 100 + }); + + // Parse the text response to count thoughts + const thoughtsText = thoughtsResult.content?.[0]?.text || ''; + const thoughtMatches = thoughtsText.match(/\\[Pertinence:/g); + thoughtsCount = thoughtMatches ? thoughtMatches.length : 0; + } catch (err) { + console.log('[getMemoryStats] Could not count thoughts:', err.message); + } + + // Count conversations using search_conversations with search_level="full" + let conversationsCount = 0; + try { + const convsResult = await callMCPTool('search_conversations', { + query: 'a', // Simple query that will match most conversations + n_results: 100, + search_level: 'full' + }); + + // Parse the text response to count conversations + const convsText = convsResult.content?.[0]?.text || ''; + const convMatches = convsText.match(/\\[Pertinence:/g); + conversationsCount = convMatches ? convMatches.length : 0; + } catch (err) { + console.log('[getMemoryStats] Could not count conversations:', err.message); + } + + const totalMemories = thoughtsCount + conversationsCount; + + return { + connected: true, + enabled: status.enabled, + configured: status.configured, + total_memories: totalMemories, + thoughts_count: thoughtsCount, + conversations_count: conversationsCount, + last_save: new Date().toISOString(), // Would need to track this separately + error: null, + serverPath: status.serverPath, + }; + } catch (error) { + return { + connected: true, + enabled: status.enabled, + configured: status.configured, + total_memories: 0, + thoughts_count: 0, + conversations_count: 0, + last_save: null, + error: error.message, + serverPath: status.serverPath, + }; + } +}`; + +content = content.replace(oldFunction, newFunction); + +fs.writeFileSync(filePath, content, 'utf8'); +console.log('File updated successfully'); diff --git a/patch_stats.py b/patch_stats.py new file mode 100644 index 0000000..0edcc67 --- /dev/null +++ b/patch_stats.py @@ -0,0 +1,151 @@ +#!/usr/bin/env python3 +""" +Patch getMemoryStats to count thoughts and conversations separately +""" + +file_path = "C:/GitHub/Linear_coding/generations/my_project/server/services/memoryService.js" + +# Lire le fichier +with open(file_path, 'r', encoding='utf-8') as f: + lines = f.readlines() + +# Trouver la ligne qui contient "export async function getMemoryStats" +start_line = None +for i, line in enumerate(lines): + if 'export async function getMemoryStats()' in line: + start_line = i + break + +if start_line is None: + print("ERROR: Could not find getMemoryStats function") + exit(1) + +# Trouver la fin de la fonction (ligne qui contient uniquement '}') +end_line = None +brace_count = 0 +for i in range(start_line, len(lines)): + if '{' in lines[i]: + brace_count += lines[i].count('{') + if '}' in lines[i]: + brace_count -= lines[i].count('}') + if brace_count == 0 and i > start_line: + end_line = i + break + +if end_line is None: + print("ERROR: Could not find end of getMemoryStats function") + exit(1) + +print(f"Found getMemoryStats from line {start_line+1} to {end_line+1}") + +# Nouvelle fonction +new_function = '''export async function getMemoryStats() { + const status = getMCPStatus(); + + if (!isMCPConnected()) { + return { + connected: false, + enabled: status.enabled, + configured: status.configured, + total_memories: 0, + thoughts_count: 0, + conversations_count: 0, + last_save: null, + error: status.error, + serverPath: status.serverPath, + }; + } + + try { + // Count thoughts using search_thoughts with broad query + let thoughtsCount = 0; + try { + const thoughtsResult = await callMCPTool('search_thoughts', { + query: 'a', // Simple query that will match most thoughts + n_results: 100 + }); + + // Parse the text response to count thoughts + const thoughtsText = thoughtsResult.content?.[0]?.text || ''; + const thoughtMatches = thoughtsText.match(/\\[Pertinence:/g); + thoughtsCount = thoughtMatches ? thoughtMatches.length : 0; + } catch (err) { + console.log('[getMemoryStats] Could not count thoughts:', err.message); + } + + // Count conversations using search_conversations with search_level="full" + let conversationsCount = 0; + try { + const convsResult = await callMCPTool('search_conversations', { + query: 'a', // Simple query that will match most conversations + n_results: 100, + search_level: 'full' + }); + + // Parse the text response to count conversations + const convsText = convsResult.content?.[0]?.text || ''; + const convMatches = convsText.match(/\\[Pertinence:/g); + conversationsCount = convMatches ? convMatches.length : 0; + } catch (err) { + console.log('[getMemoryStats] Could not count conversations:', err.message); + } + + const totalMemories = thoughtsCount + conversationsCount; + + return { + connected: true, + enabled: status.enabled, + configured: status.configured, + total_memories: totalMemories, + thoughts_count: thoughtsCount, + conversations_count: conversationsCount, + last_save: new Date().toISOString(), // Would need to track this separately + error: null, + serverPath: status.serverPath, + }; + } catch (error) { + return { + connected: true, + enabled: status.enabled, + configured: status.configured, + total_memories: 0, + thoughts_count: 0, + conversations_count: 0, + last_save: null, + error: error.message, + serverPath: status.serverPath, + }; + } +} +''' + +# Conserver le commentaire JSDoc avant la fonction +comment_start = start_line - 1 +while comment_start >= 0 and (lines[comment_start].strip().startswith('*') or lines[comment_start].strip().startswith('/**') or lines[comment_start].strip() == ''): + comment_start -= 1 +comment_start += 1 + +# Construire le nouveau fichier +new_lines = lines[:comment_start] + +# Ajouter le nouveau commentaire JSDoc +new_lines.append('/**\n') +new_lines.append(' * Get basic statistics about the memory store\n') +new_lines.append(' * Counts thoughts and conversations separately using dedicated search tools\n') +new_lines.append(' *\n') +new_lines.append(' * @returns {Promise} Statistics about the memory store\n') +new_lines.append(' */\n') + +# Ajouter la nouvelle fonction +new_lines.append(new_function) +new_lines.append('\n') + +# Ajouter le reste du fichier +new_lines.extend(lines[end_line+1:]) + +# Écrire le fichier +with open(file_path, 'w', encoding='utf-8') as f: + f.writelines(new_lines) + +print(f"✓ Successfully patched getMemoryStats (lines {comment_start+1} to {end_line+1})") +print(f"✓ File saved: {file_path}") diff --git a/test_backup_python.py b/test_backup_python.py new file mode 100644 index 0000000..6383610 --- /dev/null +++ b/test_backup_python.py @@ -0,0 +1,186 @@ +#!/usr/bin/env python3 +""" +Test direct du backup - utilise append_to_conversation depuis my_project SQLite vers ikario_rag ChromaDB +""" + +import sqlite3 +import sys +import os + +# Ajouter le chemin vers ikario_rag +sys.path.insert(0, 'C:/Users/david/SynologyDrive/ikario/ikario_rag') + +from mcp_ikario_memory import IkarioMemoryMCP +import asyncio +from datetime import datetime + +async def test_backup(): + print("=" * 80) + print("TEST BACKUP CONVERSATION - PYTHON DIRECT") + print("=" * 80) + print() + + # Connexion à la base SQLite de my_project + db_path = "C:/GitHub/Linear_coding/generations/my_project/server/data/claude-clone.db" + conn = sqlite3.connect(db_path) + cursor = conn.cursor() + + # Trouver la conversation "test tes mémoires" + cursor.execute(""" + SELECT id, title, message_count, is_pinned, has_memory_backup, created_at + FROM conversations + WHERE title LIKE '%test tes mémoires%' + LIMIT 1 + """) + + conv = cursor.fetchone() + + if not conv: + print("ERROR: Conversation 'test tes mémoires' not found") + return + + conv_id, title, msg_count, is_pinned, has_backup, created_at = conv + + print(f"FOUND: '{title}'") + print(f"ID: {conv_id}") + print(f"Messages: {msg_count}") + print(f"Pinned: {'Yes' if is_pinned else 'No'}") + print(f"Already backed up: {'Yes' if has_backup else 'No'}") + print(f"Created: {created_at}") + print("=" * 80) + print() + + # Récupérer TOUS les messages COMPLETS + cursor.execute(""" + SELECT role, content, thinking_content, created_at + FROM messages + WHERE conversation_id = ? + ORDER BY created_at ASC + """, (conv_id,)) + + messages = cursor.fetchall() + + print(f"Retrieved {len(messages)} messages from SQLite:") + print() + + total_chars = 0 + formatted_messages = [] + + for i, (role, content, thinking, msg_created_at) in enumerate(messages, 1): + char_len = len(content) + total_chars += char_len + + thinking_note = " [+ thinking]" if thinking else "" + print(f" {i}. {role}: {char_len} chars{thinking_note}") + + # Formater pour MCP append_to_conversation + msg = { + "author": role, + "content": content, # COMPLET, pas de truncation! + "timestamp": msg_created_at or datetime.now().isoformat() + } + + # Ajouter thinking si présent + if thinking: + msg["thinking"] = thinking + + formatted_messages.append(msg) + + total_words = total_chars // 5 + print(f"\nTotal: {total_chars} chars (~{total_words} words)") + print() + + # Calcul couverture + old_coverage = min(100, (256 * 4 / total_chars) * 100) + new_coverage = min(100, (8192 * 4 / total_chars) * 100) + + print("Embedding coverage estimation:") + print(f" OLD (all-MiniLM-L6-v2, 256 tokens): {old_coverage:.1f}%") + print(f" NEW (BAAI/bge-m3, 8192 tokens): {new_coverage:.1f}%") + print(f" Improvement: +{(new_coverage - old_coverage):.1f}%") + print() + + # Initialiser Ikario Memory MCP + print("Initializing Ikario RAG (ChromaDB + BAAI/bge-m3)...") + ikario_db_path = "C:/Users/david/SynologyDrive/ikario/ikario_rag/index" + memory = IkarioMemoryMCP(db_path=ikario_db_path) + print("OK Ikario Memory initialized") + print() + + # Préparer les participants et le contexte + participants = ["user", "assistant"] + + context = { + "category": "fondatrice" if is_pinned else "thematique", + "tags": ["test", "mémoire", "conversation"], + "summary": f"{title} ({msg_count} messages)", + "date": created_at, + "title": title, + "key_insights": [] + } + + print("Starting backup with append_to_conversation...") + print(f" - Conversation ID: {conv_id}") + print(f" - Messages: {len(formatted_messages)} COMPLETE messages") + print(f" - Participants: {participants}") + print(f" - Category: {context['category']}") + print() + + try: + # Appeler append_to_conversation (auto-create si n'existe pas) + result = await memory.append_to_conversation( + conversation_id=conv_id, + new_messages=formatted_messages, + participants=participants, + context=context + ) + + print("=" * 80) + print("BACKUP RESULT:") + print("=" * 80) + print(f"Status: {result}") + print() + + if "updated" in result or "ajoutée" in result or "added" in result.lower(): + print("SUCCESS! Conversation backed up to ChromaDB") + print() + print("What was saved:") + print(f" - {len(formatted_messages)} COMPLETE messages (no truncation!)") + print(f" - Each message has its own embedding (BAAI/bge-m3)") + print(f" - Max tokens per message: 8192 (vs 256 old)") + print(f" - Category: {context['category']}") + print() + print("ChromaDB structure created:") + print(f" - 1 document principal (full conversation)") + print(f" - {len(formatted_messages)} documents individuels (one per message)") + print(f" - Total: {len(formatted_messages) + 1} documents with embeddings") + print() + + # Marquer comme backupé dans SQLite + cursor.execute(""" + UPDATE conversations + SET has_memory_backup = 1 + WHERE id = ? + """, (conv_id,)) + conn.commit() + print("✓ Marked as backed up in SQLite") + + else: + print("WARNING: Unexpected result format") + + except Exception as e: + print(f"ERROR during backup: {e}") + import traceback + traceback.print_exc() + + finally: + conn.close() + + print() + print("=" * 80) + print("TEST COMPLETED") + print("=" * 80) + + +if __name__ == "__main__": + asyncio.run(test_backup())