Commit Graph

3 Commits

Author SHA1 Message Date
c533f67e2f LRP-146: Add unit tests for works filter backend routes
- Test /api/get-works route:
  - Unique works extraction with correct chunk counts
  - Sorting by author then title
  - Connection failure and query exception handling
  - Edge cases: empty database, missing title/author

- Test /chat/send selected_works parameter:
  - Accepts empty list (search all works)
  - Accepts valid work title list
  - Rejects non-list types (string, dict)
  - Rejects mixed types in list
  - Verifies parameter passed to background thread

- Test rag_search works filter:
  - No filter when selected_works is empty/None
  - Contains_any filter applied when works selected

18 tests, all passing, no real Weaviate calls (fully mocked)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-04 16:27:41 +01:00
04ee3f9e39 feat: Add data quality verification & cleanup scripts
## Data Quality & Cleanup (Priorities 1-6)

Added comprehensive data quality verification and cleanup system:

**Scripts créés**:
- verify_data_quality.py: Analyse qualité complète œuvre par œuvre
- clean_duplicate_documents.py: Nettoyage doublons Documents
- populate_work_collection.py/clean.py: Peuplement Work collection
- fix_chunks_count.py: Correction chunksCount incohérents
- manage_orphan_chunks.py: Gestion chunks orphelins (3 options)
- clean_orphan_works.py: Suppression Works sans chunks
- add_missing_work.py: Création Work manquant
- generate_schema_stats.py: Génération stats auto
- migrate_add_work_collection.py: Migration sûre Work collection

**Documentation**:
- WEAVIATE_GUIDE_COMPLET.md: Guide consolidé complet (600+ lignes)
- WEAVIATE_SCHEMA.md: Référence schéma rapide
- NETTOYAGE_COMPLETE_RAPPORT.md: Rapport nettoyage session
- ANALYSE_QUALITE_DONNEES.md: Analyse qualité initiale
- rapport_qualite_donnees.txt: Output brut vérification

**Résultats nettoyage**:
- Documents: 16 → 9 (7 doublons supprimés)
- Works: 0 → 9 (peuplé + nettoyé)
- Chunks: 5,404 → 5,230 (174 orphelins supprimés)
- chunksCount: Corrigés (231 → 5,230 déclaré = réel)
- Cohérence parfaite: 9 Works = 9 Documents = 9 œuvres

**Modifications code**:
- schema.py: Ajout Work collection avec vectorisation
- utils/weaviate_ingest.py: Support Work ingestion
- utils/word_pipeline.py: Désactivation concepts (problème .lower())
- utils/word_toc_extractor.py: Métadonnées Word correctes
- .gitignore: Exclusion fichiers temporaires (*.wav, output/*, NUL)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-01 11:57:26 +01:00
d2f7165120 Add Library RAG project and cleanup root directory
- Add complete Library RAG application (Flask + MCP server)
  - PDF processing pipeline with OCR and LLM extraction
  - Weaviate vector database integration (BGE-M3 embeddings)
  - Flask web interface with search and document management
  - MCP server for Claude Desktop integration
  - Comprehensive test suite (134 tests)

- Clean up root directory
  - Remove obsolete documentation files
  - Remove backup and temporary files
  - Update autonomous agent configuration

- Update prompts
  - Enhance initializer bis prompt with better instructions

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-30 11:57:12 +01:00