Go to file

dependabot[bot] 7707429d87 chore(deps): Bump basic-ftp in the npm_and_yarn group across 1 directory

Bumps the npm_and_yarn group with 1 update in the / directory: [basic-ftp](https://github.com/patrickjuchli/basic-ftp).


Updates `basic-ftp` from 5.1.0 to 5.2.0
- [Release notes](https://github.com/patrickjuchli/basic-ftp/releases)
- [Changelog](https://github.com/patrickjuchli/basic-ftp/blob/master/CHANGELOG.md)
- [Commits](https://github.com/patrickjuchli/basic-ftp/compare/v5.1.0...v5.2.0)

---
updated-dependencies:
- dependency-name: basic-ftp
  dependency-version: 5.2.0
  dependency-type: indirect
  dependency-group: npm_and_yarn
...

Signed-off-by: dependabot[bot] <support@github.com>

2026-02-25 23:38:01 +00:00

.claude

chore: Clean up obsolete files and add Puppeteer chat test

2026-01-15 21:40:56 +01:00

docs

Update migration plan - all phases completed

2026-01-31 21:37:34 +01:00

generations/library_rag

fix: MCP tools migrated from StateVector V1 to StateTensor V2

2026-02-06 19:42:04 +01:00

ikario_processual

Reduce daemon cycle frequency to 100/day

2026-02-01 22:51:02 +01:00

memory

fix: MCP tools migrated from StateVector V1 to StateTensor V2

2026-02-06 19:42:04 +01:00

mistral/embeddings

Créé à l'aide de Colab

2026-01-23 12:27:12 +01:00

prompts

chore: Add autonomous agent infrastructure and cleanup old files

2026-01-04 16:42:42 +01:00

utils

feat: Add vectorized summary field and migration tools

2026-01-07 22:56:03 +01:00

.gitignore

chore: Clean up obsolete files and add Puppeteer chat test

2026-01-15 21:40:56 +01:00

agent.py

Update framework configuration and clean up obsolete specs

2025-12-25 12:53:14 +01:00

ANALYSE_MCP_TOOLS.md

refactor: Rename Chunk_v2/Summary_v2 collections to Chunk/Summary

2026-01-14 23:59:03 +01:00

autonomous_agent_demo.py

Add Library RAG project and cleanup root directory

2025-12-30 11:57:12 +01:00

client.py

Update framework configuration and clean up obsolete specs

2025-12-25 12:53:14 +01:00

docker-compose.ikario_body.yml

Rename my_project to ikario_body across all project files

2025-12-25 19:53:45 +01:00

export_conversations.py

feat: Add Weaviate memory export scripts and documentation

2026-01-09 17:15:44 +01:00

export_thoughts.py

feat: Add Weaviate memory export scripts and documentation

2026-01-09 17:15:44 +01:00

LICENSE

chore: Add autonomous agent infrastructure and cleanup old files

2026-01-04 16:42:42 +01:00

package-lock.json

chore(deps): Bump basic-ftp in the npm_and_yarn group across 1 directory

2026-02-25 23:38:01 +00:00

package.json

chore: Update project configuration and improve chat prompts

2026-01-09 12:45:20 +01:00

progress.py

Initial commit: Linear-integrated autonomous coding agent with Initializer Bis support

2025-12-14 00:45:40 +01:00

prompts.py

Update framework configuration and clean up obsolete specs

2025-12-25 12:53:14 +01:00

README.md

refactor: Rename Chunk_v2/Summary_v2 collections to Chunk/Summary

2026-01-14 23:59:03 +01:00

requirements.txt

Update framework configuration and clean up obsolete specs

2025-12-25 12:53:14 +01:00

security.py

Update framework configuration and clean up obsolete specs

2025-12-25 12:53:14 +01:00

README.md

Library RAG - Système de Recherche Philosophique Avancé

Système RAG (Retrieval-Augmented Generation) dual pour la recherche philosophique et la mémoire conversationnelle, propulsé par GPU embedder et Weaviate.

🎯 Vue d'Ensemble

Library RAG combine deux systèmes de recherche sémantique distincts:

📚 Library Philosophique - Base documentaire de textes philosophiques (œuvres, chunks, résumés)
🧠 Memory Ikario - Système de mémoire conversationnelle (pensées et conversations)

Architecture: 5 collections Weaviate + GPU embedder (NVIDIA RTX 4070) + Mistral API

🏗️ Architecture

Collections Weaviate (5)

📦 Library Philosophique (3 collections)
├─ Work           → Métadonnées des œuvres philosophiques
├─ Chunk       → 5355 passages de texte (1024-dim vectors)
└─ Summary     → Résumés hiérarchiques des documents

🧠 Memory Ikario (2 collections)
├─ Thought        → 104 pensées (réflexions, insights)
└─ Conversation   → 12 conversations avec 380 messages

GPU Embedder

Modèle: BAAI/bge-m3 (1024 dimensions, 8192 tokens context)
GPU: NVIDIA RTX 4070 Laptop (PyTorch CUDA + FP16)
Performance: 30-70x plus rapide que Docker text2vec-transformers
Usage: Vectorisation manuelle pour ingestion + requêtes

Stack Technique

Composant	Technologie	Rôle
Vector DB	Weaviate 1.34.4	Stockage + recherche vectorielle
Embeddings	Python GPU embedder	Vectorisation (ingestion + requêtes)
OCR	Mistral OCR API	Extraction texte depuis PDF
LLM	Mistral Large / Ollama	Génération de réponses RAG
Web	Flask 3.0 + SSE	Interface web avec streaming
Tests	Puppeteer + pytest	Validation automatisée

🚀 Démarrage Rapide

1. Prérequis

# Python 3.10+
python --version

# CUDA 12.4+ (pour GPU embedder)
nvidia-smi

# Docker (pour Weaviate)
docker --version

2. Installation

# Cloner le projet
git clone <repo-url>
cd linear_coding_library_rag

# Créer environnement virtuel
cd generations/library_rag
python -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate

# Installer dépendances
pip install -r requirements.txt

# PyTorch avec CUDA (si pas déjà installé)
pip install torch --index-url https://download.pytorch.org/whl/cu124

3. Configuration

# Copier le fichier d'exemple
cp .env.example .env

# Éditer .env avec vos clés API
nano .env

Variables requises:

# Mistral API (OCR + LLM)
MISTRAL_API_KEY=your-mistral-api-key

# Ollama (optionnel, pour LLM local)
OLLAMA_BASE_URL=http://localhost:11434

4. Lancer les Services

# Démarrer Weaviate
docker compose up -d

# Vérifier que Weaviate est prêt
curl http://localhost:8080/v1/.well-known/ready

# Lancer Flask
python flask_app.py

URLs:

🌐 Flask: http://localhost:5000
🗄️ Weaviate: http://localhost:8080

📖 Utilisation

Interface Web

Accéder à http://localhost:5000 pour:

Page	URL	Description
Accueil	`/`	Dashboard principal
Recherche	`/search`	Recherche dans library philosophique
Chat	`/chat`	Chat RAG avec contexte sémantique
Memories	`/memories`	Recherche dans pensées et messages
Conversations	`/conversations`	Historique des conversations
Upload	`/upload`	Ingestion de nouveaux PDF

1. Recherche Philosophique

Modes de recherche (via /search):

📄 Simple: Recherche directe dans les chunks
🌳 Hiérarchique: Recherche par sections avec contexte
📚 Résumés: Recherche dans les résumés de haut niveau

Exemple:

Requête: "la conscience selon Turing"
→ 16 résultats pertinents
→ Filtrage par auteur/œuvre
→ GPU embedder: ~17ms/requête

2. Chat RAG

Fonctionnalités (via /chat):

💬 Réponses longues et détaillées (500-800 mots)
📚 Citations directes des passages sources
🎯 Filtrage par œuvres (18 œuvres disponibles)
🔄 Streaming SSE (Server-Sent Events)
📖 Section "Sources utilisées" obligatoire

Exemple de session:

Question: "What is a Turing machine?"
→ Recherche sémantique: 11 chunks sur 5 sections
→ Génération LLM: ~30 secondes (Mistral Large)
→ Réponse académique détaillée avec sources

3. Memory Ikario

Recherche dans pensées (via /memories):

Requête: "test search"
→ 10 pensées pertinentes
→ Type: reflection, test, spontaneous
→ Concepts associés

Recherche dans conversations:

Requête: "philosophie intelligence"
→ Conversations pertinentes
→ Messages contextuels
→ Métadonnées (catégorie, date)

4. Ingestion de Documents

Via interface web (/upload):

Upload PDF (max 100 MB)
Sélection options:
- LLM provider (Mistral/Ollama)
- Chunking sémantique (optionnel)
- OCR annotations (optionnel)
Traitement automatique:
- OCR Mistral (~0.003€/page)
- Extraction métadonnées (auteur, titre, année)
- Chunking intelligent
- Vectorisation GPU (~15ms/chunk)
- Insertion Weaviate

Via Python:

from utils.pdf_pipeline import process_pdf

result = process_pdf(
    pdf_path="document.pdf",
    use_llm=True,
    llm_provider="mistral",
    ingest_to_weaviate=True
)

print(f"Chunks: {result['chunks_count']}")
print(f"Cost: €{result['cost_total']:.4f}")

🧪 Tests

Tests Automatisés

# Test ingestion GPU
python test_gpu_mistral.py

# Test recherche sémantique (Puppeteer)
node test_search_simple.js

# Test chat RAG (Puppeteer)
node test_chat_puppeteer.js

# Test memories/conversations (Puppeteer)
node test_memories_conversations.js

Résultats attendus:

✅ Ingestion: 9 chunks en ~1.2s
✅ Recherche: 16 résultats en ~2s
✅ Chat: 11 chunks, 5 sections, réponse complète
✅ Memories: API backend fonctionnelle

Tests Manuels

# Vérifier GPU embedder
curl http://localhost:5000/search?q=Turing

# Vérifier Weaviate
curl http://localhost:8080/v1/meta

# Vérifier nombre de chunks
python -c "import weaviate; c=weaviate.connect_to_local(); print(c.collections.get('Chunk').aggregate.over_all()); c.close()"

📊 Métriques de Performance

Ingestion

Métrique	Avant (Docker)	Après (GPU)	Amélioration
Vitesse	500-1000ms/chunk	15ms/chunk	30-70x
RAM	10 GB (container)	0 GB	-10 GB
VRAM	0 GB	2.6 GB	+2.6 GB
Architecture	Hybride	Unifiée	Simplifiée

Recherche

Opération	Temps	Détails
Vectorisation requête	~17ms	GPU embedder (modèle chargé)
Recherche Weaviate	~100-500ms	Selon complexité
Recherche hiérarchique	~500ms	11 chunks sur 5 sections
Chat complet	~30s	Inclut génération LLM

Ressources

VRAM: 2.6 GB peak (RTX 4070, 8 GB disponibles)
Modèle: BAAI/bge-m3 (1024 dims, FP16 precision)
Batch size: 48 (optimal pour RTX 4070)

🔧 Configuration Avancée

GPU Embedder

Fichier: memory/core/embedding_service.py

class GPUEmbeddingService:
    model_name = "BAAI/bge-m3"
    embedding_dim = 1024
    optimal_batch_size = 48  # Ajuster selon GPU

Réduire VRAM (si Out of Memory):

optimal_batch_size = 24  # Au lieu de 48

Weaviate

Fichier: docker-compose.yml

services:
  weaviate:
    mem_limit: 8g        # Limiter RAM
    cpus: 4              # Limiter CPU

LLM Chat

Fichier: flask_app.py (ligne 1272)

# Personnaliser le prompt système
system_instruction = """
Vous êtes un assistant expert en philosophie...
"""

📚 Documentation

Structure du Projet

generations/library_rag/
├── flask_app.py              # Application Flask principale
├── schema.py                 # Schémas Weaviate (5 collections)
├── docker-compose.yml        # Weaviate (sans text2vec-transformers)
├── requirements.txt          # Dépendances Python
├── .env.example              # Configuration exemple
├── utils/
│   ├── pdf_pipeline.py       # Pipeline ingestion PDF
│   ├── weaviate_ingest.py    # Ingestion GPU vectorization
│   ├── llm_metadata.py       # Extraction métadonnées LLM
│   └── ocr_processor.py      # Mistral OCR
├── memory/
│   └── core/
│       └── embedding_service.py  # GPU embedder
├── templates/                # Templates HTML
└── static/                   # CSS, JS, images

docs/
├── migration-gpu/            # Documentation migration GPU embedder
│   ├── MIGRATION_GPU_EMBEDDER_SUCCESS.md
│   ├── TESTS_COMPLETS_GPU_EMBEDDER.md
│   └── ...
└── project_progress.md       # Historique développement

tests/
├── test_gpu_mistral.py       # Test ingestion
├── test_search_simple.js     # Test recherche
├── test_chat_puppeteer.js    # Test chat
└── test_memories_conversations.js  # Test memories

Documentation Détaillée

Migration GPU Embedder - Rapport de migration détaillé
Tests Complets - Résultats de tous les tests
Project Progress - Historique du développement
CHANGELOG - Historique des versions

🐛 Dépannage

Problème: "No module named 'memory'"

Solution:

# Vérifier sys.path dans weaviate_ingest.py
sys.path.insert(0, str(Path(__file__).parent.parent.parent.parent))

Problème: "CUDA not available"

Solution:

# Réinstaller PyTorch avec CUDA
pip uninstall torch
pip install torch --index-url https://download.pytorch.org/whl/cu124

Problème: "Out of Memory (VRAM)"

Solution:

# Réduire batch size dans embedding_service.py
optimal_batch_size = 24  # Au lieu de 48

Problème: Weaviate connection failed

Solution:

# Vérifier que Weaviate est lancé
docker compose ps

# Vérifier les logs
docker compose logs weaviate

# Redémarrer si nécessaire
docker compose restart

Problème: Recherche ne renvoie rien

Solution:

# Vérifier nombre de chunks dans Weaviate
python -c "import weaviate; c=weaviate.connect_to_local(); print(f'Chunks: {c.collections.get(\"Chunk\").aggregate.over_all().total_count}'); c.close()"

# Réinjecter les données si nécessaire
python schema.py --recreate-chunk

🔐 Sécurité

.env dans .gitignore (ne jamais commit les clés API)
API Mistral: Facturation par usage (~€0.003/page OCR)
Weaviate: Pas d'authentification (dev local uniquement)
Flask: Mode debug (désactiver en production)

📈 Roadmap

Court Terme

Monitorer performance GPU en production
Benchmarks formels sur gros documents (100+ pages)
Tests unitaires pour vectorize_chunks_batch()

Moyen Terme

API REST complète (OpenAPI/Swagger)
Support multi-utilisateurs avec authentification
Export résultats (PDF, Word, citations)

Long Terme

Fine-tuning BGE-M3 sur corpus philosophique
Support langues supplémentaires (grec ancien, latin)
Clustering automatique des concepts philosophiques

🤝 Contribution

Fork le projet
Créer une branche (git checkout -b feature/amazing)
Commit (git commit -m 'Add amazing feature')
Push (git push origin feature/amazing)
Ouvrir une Pull Request

📄 Licence

MIT License - voir LICENSE pour détails.

🙏 Remerciements

Weaviate - Vector database
BAAI - BGE-M3 embedding model
Mistral AI - OCR et LLM API
Anthropic - Claude pour développement assisté

Généré avec: Claude Sonnet 4.5 Dernière mise à jour: Janvier 2026 Version: 2.0 (GPU Embedder Migration)

Languages

Python 70.3%

Jupyter Notebook 18.7%

HTML 9.9%

JavaScript 0.4%

CSS 0.3%

Other 0.4%