Commit Graph

2 Commits

Author SHA1 Message Date
0c8ea8fa48 fix: Correct Work titles and improve LLM metadata extraction
Fixes issue where LLM was copying placeholder instructions from the
prompt template into actual metadata fields.

Changes:
1. Created fix_work_titles.py script to correct existing bad titles
   - Detects patterns like "(si c'est bien...)", "Titre corrigé...", "Auteur à identifier"
   - Extracts correct metadata from chunks JSON files
   - Updates Work entries and associated chunks (44 chunks updated)
   - Fixed 3 Works with placeholder contamination

2. Improved llm_metadata.py prompt to prevent future issues
   - Added explicit INTERDIT/OBLIGATOIRE rules with / markers
   - Replaced placeholder examples with real concrete examples
   - Added two example responses (high confidence + low confidence)
   - Final empty JSON template guides structure without placeholders
   - Reinforced: use "confidence" field for uncertainty, not annotations

Results:
- "A Cartesian critique... (si c'est bien le titre)" → "A Cartesian critique of the artificial intelligence"
- "Titre corrigé si nécessaire (ex: ...)" → "Computationalism and The Case When the Brain Is Not a Computer"
- "Titre de l'article principal (à identifier)" → "Computationalism in the Philosophy of Mind"

All future document uploads will now extract clean metadata without
LLM commentary or placeholder instructions.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-08 23:59:25 +01:00
d2f7165120 Add Library RAG project and cleanup root directory
- Add complete Library RAG application (Flask + MCP server)
  - PDF processing pipeline with OCR and LLM extraction
  - Weaviate vector database integration (BGE-M3 embeddings)
  - Flask web interface with search and document management
  - MCP server for Claude Desktop integration
  - Comprehensive test suite (134 tests)

- Clean up root directory
  - Remove obsolete documentation files
  - Remove backup and temporary files
  - Update autonomous agent configuration

- Update prompts
  - Enhance initializer bis prompt with better instructions

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-30 11:57:12 +01:00