8 Commits

Author SHA1 Message Date
2a8098f17a chore: Clean up obsolete files and add Puppeteer chat test
- Remove obsolete documentation, examples, and utility scripts
- Remove temporary screenshots and test files from root
- Add test_chat_backend.js for Puppeteer testing of chat RAG
- Update .gitignore

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-15 21:40:56 +01:00
53f6a92365 feat: Remove Document collection from schema
BREAKING CHANGE: Document collection removed from Weaviate schema

Architecture simplification:
- Removed Document collection (unused by Flask app)
- All metadata now in Work collection or file-based (chunks.json)
- Simplified from 4 collections to 3 (Work, Chunk_v2, Summary_v2)

Schema changes (schema.py):
- Removed create_document_collection() function
- Updated verify_schema() to expect 3 collections
- Updated display_schema() and print_summary()
- Updated documentation to reflect Chunk_v2/Summary_v2

Ingestion changes (weaviate_ingest.py):
- Removed ingest_document_metadata() function
- Removed ingest_document_collection parameter
- Updated IngestResult to use work_uuid instead of document_uuid
- Removed Document deletion from delete_document_chunks()
- Updated DeleteResult TypedDict

Type changes (types.py):
- WeaviateIngestResult: document_uuid → work_uuid

Documentation updates (.claude/CLAUDE.md):
- Updated schema diagram (4 → 3 collections)
- Removed Document references
- Updated to reflect manual GPU vectorization

Database changes:
- Deleted Document collection (13 objects)
- Deleted Chunk collection (0 objects, old schema)

Benefits:
- Simpler architecture (3 collections vs 4)
- No redundant data storage
- All metadata available via Work or file-based storage
- Reduced Weaviate memory footprint

Migration:
- See DOCUMENT_COLLECTION_ANALYSIS.md for detailed analysis
- See migrate_chunk_v2_to_none_vectorizer.py for vectorizer migration

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-09 14:13:51 +01:00
054cc52a76 chore: Clean up temporary test files and update .gitignore
Removed temporary files:
- check_chunks.py, verify_works.py - Ad-hoc verification scripts
- chunk_v2_backup.json, chunks_to_vectorize.json - Temporary data files
- complete_*.py, extract_*.py, stream_extract.py - Migration scripts
- fast_extract.py, quick_vectorize.py, vectorize_remaining.py - Experimental scripts
- migrate_chunk_v2_named_vector.py - Completed migration script
- test_chat_sources.py, test_search_modes.py, test_search_puppeteer.js - Experimental tests
- test_direct_ingestion.py, test_gpu_ingestion.py, test_upload.py - Dev tests
- test_ingestion_log.txt - Temporary log file
- output/ - Temporary output directory
- generations/library_rag/.env copy - Duplicate file

Updated .gitignore:
- Added patterns for test files (test_*.txt, test_ingestion*.py, etc.)
- Added patterns for backup JSON files (*_backup.json)
- Added patterns for temporary migration scripts (migrate_chunk_*.py, etc.)
- Added patterns for experimental scripts (complete_*.py, extract_*.py, etc.)

Kept committed test files:
 test_chat_puppeteer.js - Chat validation
 test_search_simple.js - Search validation
 test_memories_conversations.js - Memories/conversations validation
 test_gpu_mistral.py - GPU ingestion validation
 Screenshots (chat_*.png, search_*.png, etc.)

Result: Clean repository with only production code and validated tests

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-09 12:18:07 +01:00
187ba4854e chore: Major cleanup - archive migration scripts and remove temp files
CLEANUP ACTIONS:
- Archived 11 migration/optimization scripts to archive/migration_scripts/
- Archived 11 phase documentation files to archive/documentation/
- Moved backups/, docs/, scripts/ to archive/
- Deleted 30+ temporary debug/test/fix scripts
- Cleaned Python cache (__pycache__/, *.pyc)
- Cleaned log files (*.log)

NEW FILES:
- CHANGELOG.md: Consolidated project history and migration documentation
- Updated .gitignore: Added *.log, *.pyc, archive/ exclusions

FINAL ROOT STRUCTURE (19 items):
- Core framework: agent.py, autonomous_agent_demo.py, client.py, security.py, progress.py, prompts.py
- Config: requirements.txt, package.json, .gitignore
- Docs: README.md, CHANGELOG.md, project_progress.md
- Directories: archive/, generations/, memory/, prompts/, utils/

ARCHIVED SCRIPTS (in archive/migration_scripts/):
01-11: Migration & optimization scripts (migrate, schema, rechunk, vectorize, etc.)

ARCHIVED DOCS (in archive/documentation/):
PHASE_0-8: Detailed phase summaries
MIGRATION_README.md, PLAN_MIGRATION_WEAVIATE_GPU.md

Repository is now clean and production-ready with all important files preserved in archive/.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-08 18:05:43 +01:00
636ad6206c feat: Add vectorized summary field and migration tools
- Add 'summary' field to Chunk collection (vectorized with text2vec)
- Migrate from Dynamic index to HNSW + RQ for both Chunk and Summary
- Add LLM summarizer module (utils/llm_summarizer.py)
- Add migration scripts (migrate_add_summary.py, restore_*.py)
- Add summary generation utilities and progress tracking
- Add testing and cleaning tools (outils_test_and_cleaning/)
- Add comprehensive documentation (ANALYSE_*.md, guides)
- Remove obsolete files (linear_config.py, old test files)
- Update .gitignore to exclude backups and temp files

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-07 22:56:03 +01:00
48470236da Amélioration majeure du système RAG avec diversification par auteur
## Nouvelles fonctionnalités

### 1. Recherche RAG avec diversification par auteur (flask_app.py)
- Fonction `diverse_author_search()` : agrégation intelligente par auteur
- Résout le problème de biais corpus (auteurs prolifiques vs peu représentés)
- Allocation adaptative :
  * 1 auteur → jusqu'à 25 chunks pour contexte riche
  * 2-3 auteurs → distribution équitable (12 chunks/auteur)
  * 4+ auteurs → limitation à 3 chunks/auteur pour diversité
- Pool initial de 200 chunks pour identifier tous les auteurs pertinents

### 2. Re-ranking LLM amélioré (flask_app.py)
- Prompt ultra-strict : force réponse sans markdown ni explications
- Parsing robuste : nettoie markdown (**texte**, __texte__)
- Fallback intelligent : garde tous les chunks si re-ranking trop strict (<50%)
- Logs détaillés des chunks exclus pour debugging

### 3. Interface utilisateur améliorée (chat.html)
- **Accordéon pour chunks RAG** : expansion/collapse avec chevron
- **Reformulation avec choix utilisateur** :
  * Endpoint `/chat/reformulate` séparé
  * Affichage côte-à-côte (originale vs reformulée)
  * Boutons de sélection avant lancement RAG
  * Badge "✓ Utilisée" sur version choisie
- **Layout full-width** : 60% conversation / 40% contexte RAG
- **Sidebar navigation** : menu hamburger avec overlay

### 4. Logs et debugging
- Logs détaillés à chaque étape du pipeline
- Affichage des auteurs trouvés et scores moyens
- Liste des chunks exclus par re-ranking avec extraits

## Améliorations techniques
- Reformulation expansive 4-6 lignes (concepts, filiations, contextes)
- Re-ranking avec minimum 8 chunks garantis
- Gestion des modèles GPT-5.x et o1 (max_completion_tokens)
- Prompts optimisés pour réponses longues (500-800 mots)

🤖 Generated with Claude Code (https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-29 22:46:39 +01:00
705cd1bfa9 Add time/date access for Ikario and Tavily MCP specification
Major changes:
- Added current date/time to system prompt so Ikario always knows when it is
- Created comprehensive Tavily MCP integration spec (10 features)
- Updated .gitignore to exclude node_modules

Time Access Feature:
- Modified buildSystemPrompt in server/routes/messages.js
- Modified buildSystemPrompt in server/routes/claude.js
- Ikario now receives: date, time, ISO timestamp, timezone
- Added debug logging to verify system prompt

Tavily MCP Spec (app_spec_tavily_mcp.txt):
- Internet access via Tavily search API
- 10 detailed features with implementation steps
- Compatible with existing ikario-memory MCP
- Provides real-time web search and news search

🤖 Generated with Claude Code

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-25 19:52:52 +01:00
a310d4b3cf Initial commit: Linear-integrated autonomous coding agent with Initializer Bis support 2025-12-14 00:45:40 +01:00