- Create memory/mcp/unified_tools.py with 4 new handlers:
- search_memories: unified search across Thoughts and Conversations
- trace_concept_evolution: track concept development over time
- check_consistency: verify statement alignment with past content
- update_thought_evolution_stage: update thought maturity stage
- Export new tools from memory/mcp/__init__.py
- Register new tools in mcp_server.py with full docstrings
These tools complete the Ikario memory toolset to match memoryTools.js expectations.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The text2vec-transformers Docker service was removed in Jan 2026,
but retrieval_tools.py still used near_text() which requires it.
Now uses GPU embedder (BGE-M3) with near_vector() like flask_app.py.
Changes:
- Add GPU embedder singleton (get_gpu_embedder)
- search_chunks_handler: near_text → near_vector + BGE-M3
- search_summaries_handler: near_text → near_vector + BGE-M3
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The sidebar content was hidden by default (display: none) but
displayContext() never made it visible when chunks arrived.
Added sidebarContent.style.display = 'block' to show the
context panel with all RAG chunks.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Remove obsolete documentation, examples, and utility scripts
- Remove temporary screenshots and test files from root
- Add test_chat_backend.js for Puppeteer testing of chat RAG
- Update .gitignore
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add migrate_rename_collections.py script for data migration
- Update flask_app.py to use new collection names
- Update weaviate_ingest.py to use new collection names
- Update schema.py documentation
- Update README.md and ANALYSE_MCP_TOOLS.md
Migration completed: 5372 chunks + 114 summaries preserved with vectors.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Fixes issue where LLM was copying placeholder instructions from the
prompt template into actual metadata fields.
Changes:
1. Created fix_work_titles.py script to correct existing bad titles
- Detects patterns like "(si c'est bien...)", "Titre corrigé...", "Auteur à identifier"
- Extracts correct metadata from chunks JSON files
- Updates Work entries and associated chunks (44 chunks updated)
- Fixed 3 Works with placeholder contamination
2. Improved llm_metadata.py prompt to prevent future issues
- Added explicit INTERDIT/OBLIGATOIRE rules with ❌/✅ markers
- Replaced placeholder examples with real concrete examples
- Added two example responses (high confidence + low confidence)
- Final empty JSON template guides structure without placeholders
- Reinforced: use "confidence" field for uncertainty, not annotations
Results:
- "A Cartesian critique... (si c'est bien le titre)" → "A Cartesian critique of the artificial intelligence"
- "Titre corrigé si nécessaire (ex: ...)" → "Computationalism and The Case When the Brain Is Not a Computer"
- "Titre de l'article principal (à identifier)" → "Computationalism in the Philosophy of Mind"
All future document uploads will now extract clean metadata without
LLM commentary or placeholder instructions.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Adds automatic Work object creation to ensure all uploaded documents
appear on the /documents page. Previously, chunks were ingested but
Work entries were missing, causing documents to be invisible in the UI.
Changes:
- Add create_or_get_work() function to weaviate_ingest.py
- Checks for existing Work by sourceId (prevents duplicates)
- Creates new Work with metadata (title, author, year, pages)
- Returns UUID for potential future reference
- Integrate Work creation into ingest_document() flow
- Add helper scripts for retroactive fixes and verification:
- create_missing_works.py: Create Works for already-ingested documents
- reingest_batch_documents.py: Re-ingest documents after bug fixes
- check_batch_results.py: Verify batch upload results in Weaviate
This completes the batch upload feature - documents now properly appear
on /documents page immediately after ingestion.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Fixes batch upload ingestion that was failing silently due to schema mismatches:
Schema Fixes:
- Update collection names from "Chunk" to "Chunk_v2"
- Update collection names from "Summary" to "Summary_v2"
Object Structure Fixes:
- Replace nested objects (work: {title, author}) with flat fields
- Use workTitle and workAuthor instead of nested work object
- Add year field to chunks
- Remove document nested object (not used in current schema)
- Disable nested objects validation for flat schema
Impact:
- Batch upload now successfully ingests chunks to Weaviate
- Single-file upload also benefits from fixes
- All new documents will be properly indexed and searchable
Testing:
- Verified with 2-file batch upload (7 + 11 chunks = 18 total)
- Total chunks increased from 5,304 to 5,322
- All chunks properly searchable with workTitle/workAuthor filters
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Implements comprehensive batch upload system with real-time progress tracking:
Backend Infrastructure:
- Add batch_jobs global dict for batch orchestration
- Add BatchFileInfo and BatchJob TypedDicts to utils/types.py
- Create run_batch_sequential() worker function with thread.join() synchronization
- Modify /upload POST route to detect single vs multi-file uploads
- Add 3 batch API routes: /upload/batch/progress, /status, /result
- Add timestamp_to_date Jinja2 template filter
Frontend:
- Update upload.html with 'multiple' attribute and file counter
- Create upload_batch_progress.html: Real-time dashboard with SSE per file
- Create upload_batch_result.html: Final summary with statistics
Architecture:
- Backward compatible: single-file upload unchanged
- Sequential processing: one file after another (respects API limits)
- N parallel SSE connections: one per file for real-time progress
- Polling mechanism to discover job IDs as files start processing
- 1-hour timeout per file with error handling and continuation
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Works filter section: Increase max-height from 250px to 70vh (full screen)
- Context RAG section: Closed by default (display: none)
- Mobile responsive: Adjust works filter to 50vh on mobile
- Enhances visibility of available works at page load
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Implemented chunking optimization to resolve oversized chunks and improve
semantic search quality:
CHUNKING IMPROVEMENTS:
- Added strict 1000-word max limit (vs previous 1500-2000)
- Implemented 100-word overlap between consecutive chunks
- Created llm_chunker_improved.py with overlap functionality
- Added 3 fallback points in llm_chunker.py for robustness
RE-CHUNKING RESULTS:
- Identified and re-chunked 31 oversized chunks (>2000 tokens)
- Split into 92 optimally-sized chunks (max 1995 tokens)
- Preserved all metadata (workTitle, workAuthor, sectionPath, etc.)
- 0 chunks now exceed 2000 tokens (vs 31 before)
VECTORIZATION:
- Created manual vectorization script for chunks without vectors
- Successfully vectorized all 92 new chunks (100% coverage)
- All 5,304 chunks now have BGE-M3 embeddings
DOCKER CONFIGURATION:
- Exposed text2vec-transformers port 8090 for manual vectorization
- Added cluster configuration to fix "No private IP address found"
- Increased worker timeout to 600s for large chunks
TESTING:
- Created comprehensive search quality test suite
- Tests distribution, overlap detection, and semantic search
- Modified to use near_vector() (Chunk_v2 has no vectorizer)
Scripts:
- 08_fix_summaries_properties.py - Add missing Work metadata to summaries
- 09_rechunk_oversized.py - Re-chunk giant chunks with overlap
- 10_test_search_quality.py - Validate search improvements
- 11_vectorize_missing_chunks.py - Manual vectorization via API
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Add 'summary' field to Chunk collection (vectorized with text2vec)
- Migrate from Dynamic index to HNSW + RQ for both Chunk and Summary
- Add LLM summarizer module (utils/llm_summarizer.py)
- Add migration scripts (migrate_add_summary.py, restore_*.py)
- Add summary generation utilities and progress tracking
- Add testing and cleaning tools (outils_test_and_cleaning/)
- Add comprehensive documentation (ANALYSE_*.md, guides)
- Remove obsolete files (linear_config.py, old test files)
- Update .gitignore to exclude backups and temp files
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Removed max-height: 300px from .works-list
- Keeps only the Unicode encoding fix (→ to ->)
- Avoids having two scrollbars in the works filter section
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Problem 1: Only 3 works visible despite 8/10 badge
- Added max-height: 300px and overflow-y: auto to .works-list
- Now all 10 works are scrollable in the filter section
Problem 2: UnicodeEncodeError with → character in console
- Replaced Unicode arrow (→) with ASCII arrow (->) in print statements
- Fixes 'charmap' codec error on Windows console
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Disable CLAUDE.md confirmation rules for autonomous agent operation
- Add utility scripts: check_linear_status.py, check_meta_issue.py, move_issues_to_todo.py
- Add works filter specification: prompts/app_spec_works_filter.txt
- Update .linear_project.json with works filter issues
- Remove old/stale scripts and documentation files
- Update search.html template
This commit completes the infrastructure for the autonomous agent that
successfully implemented all 13 works filter issues (LRP-136 to LRP-148).
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- WORKS_FILTER.md: Complete user documentation in French
- Feature overview and location
- Selection/deselection instructions
- Quick action buttons (Tout/Aucun)
- Badge counter explanation
- Collapse functionality
- Default behavior and localStorage persistence
- Impact on semantic search
- Recommended use cases (comparative study, focus, exclusion)
- Responsive mobile support
- API Reference section:
- GET /api/get-works endpoint documentation
- POST /chat/send selected_works parameter
- Error codes and validation
- Troubleshooting guide:
- No works displayed
- Filter not working
- How to reset selection
- Chunks count explanation
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Test /api/get-works route:
- Unique works extraction with correct chunk counts
- Sorting by author then title
- Connection failure and query exception handling
- Edge cases: empty database, missing title/author
- Test /chat/send selected_works parameter:
- Accepts empty list (search all works)
- Accepts valid work title list
- Rejects non-list types (string, dict)
- Rejects mixed types in list
- Verifies parameter passed to background thread
- Test rag_search works filter:
- No filter when selected_works is empty/None
- Contains_any filter applied when works selected
18 tests, all passing, no real Weaviate calls (fully mocked)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add works filter section HTML above Context RAG sidebar
- Add CSS styles for works filter with checkboxes, badges, and collapse
- Implement JavaScript for loading works from /api/get-works
- Add localStorage persistence for selected works
- Integrate selected_works parameter with /chat/send API call
- Add Tout/Aucun buttons for quick selection
- Add collapsible section with chevron toggle
- Responsive design for mobile screens
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add selected_works parameter to rag_search() function
- Build Weaviate filter using Filter.by_property("workTitle").contains_any()
- Add selected_works parameter to diverse_author_search() function
- Pass selected_works from run_chat_generation to diverse_author_search
- Preserve work filter in fallback search path
- Add logging for applied work filters
The filter allows restricting RAG search to specific works selected by the user.
When selected_works is empty or None, all works are searched (no filter).
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add optional selected_works parameter to /chat/send endpoint
- Validate that selected_works is a list of strings
- Pass parameter to run_chat_generation function
- Backward compatible (works without the parameter)
- Add logging for selected_works filter
Linear issue: LRP-137
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add new API endpoint GET /api/get-works
- Returns JSON array of all unique works with metadata
- Each work includes: title, author, chunks_count
- Results sorted by author then title
- Proper error handling for Weaviate connection issues
- Fixed gRPC serialization issue with nested objects
Linear issue: LRP-136
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Previously created a separate page for summary search, which was redundant since hierarchical mode already demonstrates the summary→chunk pattern. Refactored to integrate summary-only mode as a dropdown option in the main search interface, reducing code duplication by ~370 lines.
Also fixed critical bug in hierarchical search where return_properties excluded the nested "document" object, causing source_id to be empty and all sections to be filtered out. Solution: removed return_properties to let Weaviate return all properties including nested objects.
All 4 search modes now functional:
- Auto-detection (default)
- Simple chunks (10% visibility)
- Hierarchical summary→chunks (variable)
- Summary-only (90% visibility)
Tests: 14/14 passed for dropdown integration, hierarchical mode confirmed working with 13 passages across 4 section groups.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Ligne 2 et 3 ont exactement le même style CSS
- Même couleur (var(--color-accent))
- Même background beige (rgba(125, 110, 88, 0.08))
- Même padding (0.25rem 0.5rem)
- Même border-radius (4px)
- Seule différence : icône et contenu
- Présentation ultra-cohérente visuellement
- Retiré font-weight: 600 du titre section
- Lignes 2 et 3 ont maintenant exactement le même style
- Police par défaut, pas de variations de graisse
- Présentation ultra-simplifiée et cohérente
- Seule différence : couleurs (accent vs text-strong)
- Ligne 2 (hiérarchie) : police normale, pas de font-size
- Ligne 3 (titre) : police normale, pas de font-size ni font-family spéciale
- Changé h4 en span pour cohérence typographique
- Gardé font-weight: 600 sur le titre pour légère emphase
- Résultat : lignes 2 et 3 visuellement cohérentes
Ligne 1 : Auteur | Œuvre | Similarité | Nb passages
Ligne 2 : 🗂️ Hiérarchie (chapterTitle)
Ligne 3 : 📂 Titre section
Plus compact et hiérarchie mieux visible avant le titre
- Badge auteur (récupéré du premier chunk de la section)
- Badge œuvre (récupéré du premier chunk de la section)
- Hiérarchie complète avec icône 🗂️ (chapterTitle du premier chunk)
Ex: "Peirce: CP 7.316"
- Fond beige léger pour la hiérarchie
- Affichage au-dessus du titre de section
Structure header de section:
1. Auteur + Œuvre (badges)
2. Titre section avec icône 📂
3. Hiérarchie complète (chapterTitle)
4. Similarité + nombre passages
5. Résumé LLM
6. Concepts
- Header section avec fond beige dégradé distinct des chunks
- Icône 📂 + label "Section :" explicite avant le titre
- Titre section en plus gros (1.2em, font-weight 600)
- Badge nombre de passages en couleur accent
- Zone chunks avec fond blanc pur pour contraster
- Bordure section plus épaisse (2px) et arrondie (10px)
- Summary text avec fond blanc semi-transparent pour lisibilité
- Label "Concepts :" avant la liste des concepts
Résultat: Hiérarchie visuelle très claire entre section et passages
- Stage 2 now searches chunks for EACH section using section summary as query
- Chunks distributed across sections (limit / sections_limit)
- Template displays sections with nested chunks underneath
- Each section shows: title, summary, concepts, chunk count, and passages
- Removes separate global passages list - now fully grouped by section
Structure: Section 1 → Chunks 1-3, Section 2 → Chunks 4-6, etc.
Problem: Sections showed title twice (once as title, once as summary_text)
Cause: summary_text contains same content as title in current data
Solution: Only show summary_text if different from title and section_path
Condition: summary_text != title AND summary_text != section_path
Error: 'Encountered unknown tag else' - endif was closing the if block too early
Fix: Removed extra {% endif %} before {% else %}
- Line 232: Removed incorrect closing tag
- The {% else %} at line 234 is part of the hierarchical/simple mode conditional
- Proper structure: if hierarchical ... else simple ... endif
Tests:
- Template syntax validates ✓
- Search page loads ✓
- Hierarchical mode works ✓
Root cause:
- Summary.sectionPath: '635. As for the subject...' (paragraph numbers)
- Chunk.sectionPath: 'Peirce: CP 4.47 > 47. §3 THE NATURE...' (canonical refs)
- No way to match them with prefix/equal filters
Solution (workaround until summaries are regenerated):
- Show sections as **context** (relevant high-level topics found)
- Show chunks **globally** (top 20 most relevant passages)
- Don't try to group chunks under sections
UI changes:
- '📚 Sections pertinentes trouvées' (context cards with summary)
- '📄 Passages les plus pertinents' (top chunks, not grouped)
- Cleaner, more honest representation of what we found
Next steps to fully fix:
- Regenerate Summary collection with correct sectionPath format
- Or create a mapping between Summary titles and Chunk sectionPaths
Problem:
- Summary.sectionPath: "Peirce: CP 2.504"
- Chunk.sectionPath: "Peirce: CP 2.504 > 504. Text..."
- Filter.equal() found 0 matches (no exact match exists)
Solution:
- Single semantic query to get all relevant chunks
- Distribute chunks to sections using Python startswith()
- This correctly matches chunks to their parent sections
Performance improvement:
- 1 query instead of N queries (one per section)
- Python-side filtering is fast for small result sets
Result: Chunks should now appear in their corresponding sections
Backend fix:
- Remove return_properties from hierarchical chunk query
- Weaviate returns nested objects (work, document) when return_properties is not specified
- This allows chunks to have work.author and work.title available
Frontend improvements:
- Truncate long section titles to 80 chars with ellipsis
- Hide section_path if identical to title (avoid duplication)
- Work and author badges should now display correctly in chunk metadata
Hierarchical search improvements:
- Display author and work for each chunk using badge-author and badge-work
- Show section hierarchy (sectionPath) in chunk metadata
- Add 📍 icon for section path in headers
Color alignment with charter:
- Replace Bootstrap colors (#007bff, #28a745, #6c757d) with charter variables
- section-group: border and shadow use accent colors (125,110,88)
- section-header: border uses var(--color-accent)
- chunk-item: border-left uses var(--color-accent-alt)
- Mode badges: hierarchical=accent-alt, simple=accent
- Concept badges: subtle beige background with accent border
- Alert boxes: beige background instead of yellow
Visual improvements:
- Add hover transform effect on chunks (translateX)
- Smoother color transitions using CSS variables
- Add @contextmanager decorator for proper exception handling
- Remove all simple_search() calls from within hierarchical_search()
- Return mode='error' to signal fallback needed
- Handle fallback in search_passages() (outside context manager)
- This eliminates 'generator didn't stop after throw()' error
## Problem
"generator didn't stop after throw()" error when hierarchical_search
falls back to simple_search. Both functions use 'with get_weaviate_client()',
creating nested context managers on the same generator.
## Solution
- Use ValueError("FALLBACK_TO_SIMPLE") signal instead of calling simple_search()
inside the context manager
- Catch ValueError in except block and call simple_search() outside context
- Applied to all 3 fallback points:
1. No Weaviate client
2. No summaries found (Stage 1)
3. No sections after filtering
## Result
Fallback now works correctly without context manager conflicts.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>