linear-coding-agent

Author	SHA1	Message	Date
David Blanc Brioir	1bf570e201	refactor: Rename Chunk_v2/Summary_v2 collections to Chunk/Summary - Add migrate_rename_collections.py script for data migration - Update flask_app.py to use new collection names - Update weaviate_ingest.py to use new collection names - Update schema.py documentation - Update README.md and ANALYSE_MCP_TOOLS.md Migration completed: 5372 chunks + 114 summaries preserved with vectors. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-14 23:59:03 +01:00
David Blanc Brioir	eb2bf45281	chore: Update project configuration and improve chat prompts Configuration Updates: - .claude/settings.local.json: Add security permissions for WebFetch, WebSearch, nvidia-smi - package.json: Add puppeteer dependency for browser automation tests - package-lock.json: Update lockfile with puppeteer@24.34.0 and dependencies - Remove root .env.example (superseded by generations/library_rag/.env.example) Flask App Improvements: - Enhanced chat prompt to REQUIRE "Sources utilisées" section in responses - Added explicit warnings against inventing citations not in provided passages - Improved source citation format with mandatory author, work, and passage number - Strengthened instructions to prevent hallucinated references Benefits: - Chat responses now consistently include proper source citations - Better academic rigor in philosophical analyses - Prevents LLM from inventing non-existent references - Automated testing infrastructure with Puppeteer Related to GPU embedder migration testing and validation. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-01-09 12:45:20 +01:00
David Blanc Brioir	b70b796ef8	feat: Add multi-file batch upload with sequential processing Implements comprehensive batch upload system with real-time progress tracking: Backend Infrastructure: - Add batch_jobs global dict for batch orchestration - Add BatchFileInfo and BatchJob TypedDicts to utils/types.py - Create run_batch_sequential() worker function with thread.join() synchronization - Modify /upload POST route to detect single vs multi-file uploads - Add 3 batch API routes: /upload/batch/progress, /status, /result - Add timestamp_to_date Jinja2 template filter Frontend: - Update upload.html with 'multiple' attribute and file counter - Create upload_batch_progress.html: Real-time dashboard with SSE per file - Create upload_batch_result.html: Final summary with statistics Architecture: - Backward compatible: single-file upload unchanged - Sequential processing: one file after another (respects API limits) - N parallel SSE connections: one per file for real-time progress - Polling mechanism to discover job IDs as files start processing - 1-hour timeout per file with error handling and continuation Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-01-08 22:41:52 +01:00
David Blanc Brioir	187ba4854e	chore: Major cleanup - archive migration scripts and remove temp files CLEANUP ACTIONS: - Archived 11 migration/optimization scripts to archive/migration_scripts/ - Archived 11 phase documentation files to archive/documentation/ - Moved backups/, docs/, scripts/ to archive/ - Deleted 30+ temporary debug/test/fix scripts - Cleaned Python cache (__pycache__/, .pyc) - Cleaned log files (.log) NEW FILES: - CHANGELOG.md: Consolidated project history and migration documentation - Updated .gitignore: Added .log, .pyc, archive/ exclusions FINAL ROOT STRUCTURE (19 items): - Core framework: agent.py, autonomous_agent_demo.py, client.py, security.py, progress.py, prompts.py - Config: requirements.txt, package.json, .gitignore - Docs: README.md, CHANGELOG.md, project_progress.md - Directories: archive/, generations/, memory/, prompts/, utils/ ARCHIVED SCRIPTS (in archive/migration_scripts/): 01-11: Migration & optimization scripts (migrate, schema, rechunk, vectorize, etc.) ARCHIVED DOCS (in archive/documentation/): PHASE_0-8: Detailed phase summaries MIGRATION_README.md, PLAN_MIGRATION_WEAVIATE_GPU.md Repository is now clean and production-ready with all important files preserved in archive/. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-01-08 18:05:43 +01:00
David Blanc Brioir	6596a4e32f	fix: Resolve works filter display and encoding issues Problem 1: Only 3 works visible despite 8/10 badge - Added max-height: 300px and overflow-y: auto to .works-list - Now all 10 works are scrollable in the filter section Problem 2: UnicodeEncodeError with → character in console - Replaced Unicode arrow (→) with ASCII arrow (->) in print statements - Fixes 'charmap' codec error on Windows console 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-01-04 16:47:28 +01:00
David Blanc Brioir	7e8367863d	LRP-138: Implement Weaviate filter for selected_works in chat search - Add selected_works parameter to rag_search() function - Build Weaviate filter using Filter.by_property("workTitle").contains_any() - Add selected_works parameter to diverse_author_search() function - Pass selected_works from run_chat_generation to diverse_author_search - Preserve work filter in fallback search path - Add logging for applied work filters The filter allows restricting RAG search to specific works selected by the user. When selected_works is empty or None, all works are searched (no filter). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-04 14:32:47 +01:00
David Blanc Brioir	930615d239	feat: Add selected_works parameter to /chat/send route - Add optional selected_works parameter to /chat/send endpoint - Validate that selected_works is a list of strings - Pass parameter to run_chat_generation function - Backward compatible (works without the parameter) - Add logging for selected_works filter Linear issue: LRP-137 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-04 13:29:37 +01:00
David Blanc Brioir	d106e91d56	feat: Add /api/get-works route for works filtering - Add new API endpoint GET /api/get-works - Returns JSON array of all unique works with metadata - Each work includes: title, author, chunks_count - Results sorted by author then title - Proper error handling for Weaviate connection issues - Fixed gRPC serialization issue with nested objects Linear issue: LRP-136 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-04 13:23:24 +01:00
David Blanc Brioir	8c0e1cef0d	refactor: Integrate summary search into dropdown and fix hierarchical mode Previously created a separate page for summary search, which was redundant since hierarchical mode already demonstrates the summary→chunk pattern. Refactored to integrate summary-only mode as a dropdown option in the main search interface, reducing code duplication by ~370 lines. Also fixed critical bug in hierarchical search where return_properties excluded the nested "document" object, causing source_id to be empty and all sections to be filtered out. Solution: removed return_properties to let Weaviate return all properties including nested objects. All 4 search modes now functional: - Auto-detection (default) - Simple chunks (10% visibility) - Hierarchical summary→chunks (variable) - Summary-only (90% visibility) Tests: 14/14 passed for dropdown integration, hierarchical mode confirmed working with 13 passages across 4 section groups. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-01-03 17:59:58 +01:00
David Blanc Brioir	1cec07b284	feat: Group chunks under sections in hierarchical search - Stage 2 now searches chunks for EACH section using section summary as query - Chunks distributed across sections (limit / sections_limit) - Template displays sections with nested chunks underneath - Each section shows: title, summary, concepts, chunk count, and passages - Removes separate global passages list - now fully grouped by section Structure: Section 1 → Chunks 1-3, Section 2 → Chunks 4-6, etc.	2026-01-01 18:25:11 +01:00
David Blanc Brioir	d824269606	fix: Adapt hierarchical display for mismatched sectionPath formats Root cause: - Summary.sectionPath: '635. As for the subject...' (paragraph numbers) - Chunk.sectionPath: 'Peirce: CP 4.47 > 47. §3 THE NATURE...' (canonical refs) - No way to match them with prefix/equal filters Solution (workaround until summaries are regenerated): - Show sections as context (relevant high-level topics found) - Show chunks globally (top 20 most relevant passages) - Don't try to group chunks under sections UI changes: - '📚 Sections pertinentes trouvées' (context cards with summary) - '📄 Passages les plus pertinents' (top chunks, not grouped) - Cleaner, more honest representation of what we found Next steps to fully fix: - Regenerate Summary collection with correct sectionPath format - Or create a mapping between Summary titles and Chunk sectionPaths	2026-01-01 15:51:11 +01:00
David Blanc Brioir	47cf21867f	fix: Use prefix matching for sectionPath to find chunks in sections Problem: - Summary.sectionPath: "Peirce: CP 2.504" - Chunk.sectionPath: "Peirce: CP 2.504 > 504. Text..." - Filter.equal() found 0 matches (no exact match exists) Solution: - Single semantic query to get all relevant chunks - Distribute chunks to sections using Python startswith() - This correctly matches chunks to their parent sections Performance improvement: - 1 query instead of N queries (one per section) - Python-side filtering is fast for small result sets Result: Chunks should now appear in their corresponding sections	2026-01-01 15:45:37 +01:00
David Blanc Brioir	474edf75e5	fix: Display work/author metadata and improve section titles Backend fix: - Remove return_properties from hierarchical chunk query - Weaviate returns nested objects (work, document) when return_properties is not specified - This allows chunks to have work.author and work.title available Frontend improvements: - Truncate long section titles to 80 chars with ellipsis - Hide section_path if identical to title (avoid duplication) - Work and author badges should now display correctly in chunk metadata	2026-01-01 15:42:03 +01:00
David Blanc Brioir	f49279fee3	fix: Remove nested objects from return_properties to fix gRPC serialization error - Remove 'document' from Summary query return_properties - Remove 'work' from Document query return_properties - Nested objects (OBJECT datatype) cause gRPC proto serialization error - Weaviate should return nested objects automatically without explicit request - Fixes: 'proto: invalid type: map[string]interface {}' error	2026-01-01 15:30:54 +01:00
David Blanc Brioir	9c6ba3f4a1	fix: Prevent context manager conflict by never calling simple_search from hierarchical_search - Add @contextmanager decorator for proper exception handling - Remove all simple_search() calls from within hierarchical_search() - Return mode='error' to signal fallback needed - Handle fallback in search_passages() (outside context manager) - This eliminates 'generator didn't stop after throw()' error	2026-01-01 15:27:14 +01:00
David Blanc Brioir	22ac9a030e	fix: Never call simple_search from exception handler during context cleanup	2026-01-01 15:19:33 +01:00
David Blanc Brioir	4492814891	fix: Exit context manager before calling simple_search in exception handler	2026-01-01 15:16:44 +01:00
David Blanc Brioir	8153ea35a4	fix: Prevent context manager conflict in hierarchical_search ## Problem "generator didn't stop after throw()" error when hierarchical_search falls back to simple_search. Both functions use 'with get_weaviate_client()', creating nested context managers on the same generator. ## Solution - Use ValueError("FALLBACK_TO_SIMPLE") signal instead of calling simple_search() inside the context manager - Catch ValueError in except block and call simple_search() outside context - Applied to all 3 fallback points: 1. No Weaviate client 2. No summaries found (Stage 1) 3. No sections after filtering ## Result Fallback now works correctly without context manager conflicts. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-01-01 15:10:06 +01:00
David Blanc Brioir	f6000de230	feat: Add force_hierarchical mode to prevent fallback ## Changes Allow users to force hierarchical search mode without fallback to simple search, enabling testing of hierarchical UI even when 0 summaries are found. Backend (flask_app.py): - Added `force_hierarchical` parameter to `hierarchical_search()` - When True, never fallback to simple search (return empty hierarchical result) - Added `fallback_reason` field to explain why no results - Pass `force_hierarchical=True` when `force_mode == "hierarchical"` - Applied to all fallback points: - No Weaviate client - No summaries found in Stage 1 - No sections after author/work filtering - Exception during search Frontend (templates/search.html): - Display warning message when `fallback_reason` exists - Yellow alert box with explanation and suggestions - Works even when `results_data.results` is empty ## Usage 1. Select "🌳 Hiérarchique (2-étapes)" in Mode dropdown 2. Enter any query (even if no matching summaries) 3. See hierarchical UI with warning instead of fallback ## Example Query: "Qu'est-ce que la justice ?" (not in Peirce corpus) - Mode forced: Hierarchical - Result: 0 sections, warning displayed - No silent fallback to simple search 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-01-01 14:24:44 +01:00
David Blanc Brioir	0dcccc93d1	feat: Implement hierarchical 2-stage semantic search with auto-detection ## Overview Implemented intelligent hierarchical search that automatically selects between simple (1-stage) and hierarchical (2-stage) search based on query complexity. Utilizes the Summary collection (previously unused) for better precision. ## Architecture Auto-Detection Strategy: - Long queries (≥15 chars) → hierarchical - Multi-concept queries (2+ significant words) → hierarchical - Queries with logical connectors (et, ou, mais, donc) → hierarchical - Short single-concept queries → simple Hierarchical Search (2-stage): 1. Stage 1: Query Summary collection → find top N relevant sections 2. Stage 2: Query Chunk collection filtered by section paths 3. Group chunks by section with context (summary text + concepts) Simple Search (1-stage): - Direct query on Chunk collection (original implementation) - Fallback for simple queries and errors ## Implementation Details Backend (flask_app.py): - `simple_search()`: Extracted original search logic - `hierarchical_search()`: 2-stage search implementation - Stage 1: Summary near_text query - Post-filtering by author/work via Document collection - Stage 2: Chunk near_text query per section with sectionPath filter - Fallback to simple search if 0 summaries found - `should_use_hierarchical_search()`: Auto-detection logic - 3 criteria: length, connectors, multi-concept - Stop words filtering for French - `search_passages()`: Intelligent dispatcher - Auto-detection or force mode (simple/hierarchical) - Unified return format: {mode, results, sections?, total_chunks} Frontend (templates/search.html): - New form controls: - sections_limit selector (3, 5, 10, 20 sections) - mode selector (🤖 Auto, 📄 Simple, 🌳 Hiérarchique) - Conditional display: - Mode indicator badge (simple vs hierarchical) - Hierarchical: sections grouped with summary + concepts + chunks - Simple: flat list (original) - New CSS: .section-group, .section-header, .chunks-list, .chunk-item Route (/search): - Added parameters: sections_limit (default: 5), mode (default: auto) - Passes force_mode to search_passages() ## Testing Created test_hierarchical.py: - Tests auto-detection logic with 7 test cases - All tests passing ✅ ## Results Before: - Only 1-stage search on Chunk collection - Summary collection unused (8,425 summaries idle) After: - Intelligent auto-detection (90%+ accuracy expected) - Hierarchical search for complex queries (better precision) - Simple search for basic queries (better performance) - User can override with force mode - Full context display (sections + summaries + concepts) ## Benefits 1. Better Precision: Section-level filtering reduces noise 2. Better Context: Users see relevant sections first 3. Automatic: No user configuration required 4. Flexible: Can force mode if needed 5. Backwards Compatible: Simple mode identical to original ## Example Queries - "justice" → Simple (short, 1 concept) - "Qu'est-ce que la justice selon Platon ?" → Hierarchical (long, complex) - "vertu et sagesse" → Hierarchical (multi-concept + connector) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-01-01 12:04:28 +01:00
David Blanc Brioir	9e4108def1	Intégration Word dans Flask: upload et traitement web Modifications: - flask_app.py: * Ajout de "docx" dans ALLOWED_EXTENSIONS * Nouvelle fonction run_word_processing_job() avec: - Gestion tempfile pour python-docx (besoin d'un path) - Intégration du callback de progression SSE - Nettoyage automatique du fichier temporaire * Modification upload() route: - Détection du type de fichier (PDF/Word) - Routage vers le bon processeur (run_processing_job vs run_word_processing_job) - Messages d'erreur adaptés pour PDF et Word * Mise à jour des docstrings - templates/upload.html: * Titre: "Parser PDF/Word/Markdown" (au lieu de PDF/Markdown) * Accept attribute: ".pdf,.docx,.md" * Tooltips: Explique que Word n'a pas besoin d'OCR * Pipeline de traitement: Section séparée pour PDF vs Word * Labels mis à jour pour inclure Word Fonctionnalités: ✅ Upload de fichiers .docx via interface web ✅ Traitement en arrière-plan avec SSE ✅ Pas d'OCR nécessaire pour Word (économie ~0.003€/page) ✅ Réutilisation complète des modules LLM existants ✅ Extraction directe via python-docx ✅ Construction TOC depuis styles Heading 1-9 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>	2025-12-30 22:03:50 +01:00
David Blanc Brioir	fd66917f03	Génération TTS asynchrone pour éviter le blocage Flask Backend: - Nouveau dictionnaire global tts_jobs pour tracker les jobs TTS - Fonction _generate_audio_background() pour génération en thread - POST /chat/generate-audio: lance génération et retourne job_id - GET /chat/audio-status/<job_id>: polling du statut - GET /chat/download-audio/<job_id>: télécharge l'audio terminé - États: pending → processing → completed/failed Frontend: - Fonction exportToAudio() asynchrone avec polling (1s) - Spinner animé pendant génération ("Génération...") - Téléchargement automatique quand prêt - Restauration bouton en cas d'erreur - Animation CSS @keyframes spin pour le spinner Avantages: - Flask reste responsive pendant génération TTS - Navigation possible pendant génération audio - Expérience utilisateur améliorée avec feedback visuel 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>	2025-12-30 19:45:29 +01:00
David Blanc Brioir	d91abd3566	Ajout de la fonctionnalité TTS (Text-to-Speech) avec XTTS v2 - Ajout de TTS>=0.22.0 aux dépendances - Création du module utils/tts_generator.py avec Coqui XTTS v2 * Support GPU avec mixed precision (FP16) * Lazy loading avec singleton pattern * Chunking automatique pour textes longs * Support multilingue (fr, en, es, de, etc.) - Ajout de la route /chat/export-audio dans flask_app.py - Ajout du bouton Audio dans chat.html (côté Word/PDF) - Génération audio WAV téléchargeable depuis les réponses Optimisé pour GPU 4070 (8GB VRAM) : utilise 4-6GB, génération rapide Qualité : voix naturelle française avec prosodie expressive 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>	2025-12-30 14:31:30 +01:00
David Blanc Brioir	b835cd13ea	Ajout des fonctionnalités d'export Word et PDF pour le chat RAG - Ajout de python-docx et reportlab aux dépendances - Création du module utils/word_exporter.py pour l'export Word - Création du module utils/pdf_exporter.py pour l'export PDF - Ajout des routes /chat/export-word et /chat/export-pdf dans flask_app.py - Ajout des boutons d'export (Word et PDF) dans chat.html - Les boutons apparaissent après chaque réponse de l'assistant - Support des questions reformulées avec question originale 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>	2025-12-30 14:02:11 +01:00
David Blanc Brioir	48470236da	Amélioration majeure du système RAG avec diversification par auteur ## Nouvelles fonctionnalités ### 1. Recherche RAG avec diversification par auteur (flask_app.py) - Fonction `diverse_author_search()` : agrégation intelligente par auteur - Résout le problème de biais corpus (auteurs prolifiques vs peu représentés) - Allocation adaptative : * 1 auteur → jusqu'à 25 chunks pour contexte riche * 2-3 auteurs → distribution équitable (12 chunks/auteur) * 4+ auteurs → limitation à 3 chunks/auteur pour diversité - Pool initial de 200 chunks pour identifier tous les auteurs pertinents ### 2. Re-ranking LLM amélioré (flask_app.py) - Prompt ultra-strict : force réponse sans markdown ni explications - Parsing robuste : nettoie markdown (texte, __texte__) - Fallback intelligent : garde tous les chunks si re-ranking trop strict (<50%) - Logs détaillés des chunks exclus pour debugging ### 3. Interface utilisateur améliorée (chat.html) - Accordéon pour chunks RAG : expansion/collapse avec chevron - Reformulation avec choix utilisateur : * Endpoint `/chat/reformulate` séparé * Affichage côte-à-côte (originale vs reformulée) * Boutons de sélection avant lancement RAG * Badge "✓ Utilisée" sur version choisie - Layout full-width : 60% conversation / 40% contexte RAG - Sidebar navigation : menu hamburger avec overlay ### 4. Logs et debugging - Logs détaillés à chaque étape du pipeline - Affichage des auteurs trouvés et scores moyens - Liste des chunks exclus par re-ranking avec extraits ## Améliorations techniques - Reformulation expansive 4-6 lignes (concepts, filiations, contextes) - Re-ranking avec minimum 8 chunks garantis - Gestion des modèles GPT-5.x et o1 (max_completion_tokens) - Prompts optimisés pour réponses longues (500-800 mots) 🤖 Generated with Claude Code (https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>	2025-12-29 22:46:39 +01:00

25 Commits