Add Library RAG project and cleanup root directory

- Add complete Library RAG application (Flask + MCP server) - PDF processing pipeline with OCR and LLM extraction - Weaviate vector database integration (BGE-M3 embeddings) - Flask web interface with search and document management - MCP server for Claude Desktop integration - Comprehensive test suite (134 tests) - Clean up root directory - Remove obsolete documentation files - Remove backup and temporary files - Update autonomous agent configuration - Update prompts - Enhance initializer bis prompt with better instructions 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-30 11:57:12 +01:00
parent 48470236da
commit d2f7165120
84 changed files with 26517 additions and 2 deletions
--- a/generations/library_rag/docs_techniques/SEARCH_QUALITY_RESULTS.md
+++ b/generations/library_rag/docs_techniques/SEARCH_QUALITY_RESULTS.md
@@ -0,0 +1,113 @@
+# BGE-M3 Search Quality Validation Results
+
+**Generated:** (Run `python test_bge_m3_quality.py --output SEARCH_QUALITY_RESULTS.md` to populate)
+
+**Weaviate Version:** TBD
+
+## Database Statistics
+
+- **Total Documents:** TBD
+- **Total Chunks:** TBD
+- **Vector Dimensions:** TBD (expected: 1024)
+
+## Vector Dimension Verification
+
+Run the validation script to confirm BGE-M3 (1024-dim) vectors are properly configured.
+
+Expected output: **BGE-M3 (1024-dim) vectors confirmed.**
+
+## Test Categories
+
+### 1. Multilingual Queries
+
+Tests the model's ability to understand philosophical terms in multiple languages:
+
+| Language | Test Terms |
+|----------|------------|
+| French | justice, vertu, liberte, verite, connaissance |
+| English | virtue, knowledge, ethics, wisdom, justice |
+| Greek | arete, telos, psyche, logos, eudaimonia |
+| Latin | virtus, sapientia, forma, anima, ratio |
+
+### 2. Semantic Understanding
+
+Tests concept mapping for philosophical questions:
+
+| Query | Expected Topics |
+|-------|----------------|
+| "What is the nature of reality?" | ontology, metaphysics, being |
+| "How should we live?" | ethics, virtue, good life |
+| "What can we know?" | epistemology, knowledge, truth |
+| "What is the meaning of life?" | purpose, existence, value |
+| "What is beauty?" | aesthetics, art, form |
+
+### 3. Long Query Handling
+
+Tests the extended 8192 token context (vs MiniLM-L6's 512 tokens):
+
+- Uses a 100+ word query about Plato's Meno
+- Verifies no truncation occurs
+- Measures semantic accuracy of results
+
+### 4. Performance Metrics
+
+Performance targets:
+- **Query Latency:** < 500ms average
+- **Throughput:** Measured across 10 iterations per query
+
+## Running the Tests
+
+```bash
+# Run all tests with verbose output
+python test_bge_m3_quality.py --verbose
+
+# Generate markdown report
+python test_bge_m3_quality.py --output SEARCH_QUALITY_RESULTS.md
+
+# Output as JSON
+python test_bge_m3_quality.py --json
+```
+
+## Prerequisites
+
+1. Weaviate must be running:
+   ```bash
+   docker-compose up -d
+   ```
+
+2. Documents must be ingested with BGE-M3 vectorizer
+
+3. Schema must be created with 1024-dim vectors
+
+## Expected Improvements over MiniLM-L6
+
+| Feature | MiniLM-L6 | BGE-M3 |
+|---------|-----------|--------|
+| Vector Dimensions | 384 | 1024 (2.7x richer) |
+| Context Window | 512 tokens | 8192 tokens (16x larger) |
+| Multilingual | Limited | Excellent (Greek, Latin, French, English) |
+| Academic Texts | Good | Superior (trained on research papers) |
+
+## Troubleshooting
+
+### "Connection error: Failed to connect to Weaviate"
+
+Ensure Weaviate is running:
+```bash
+docker-compose up -d
+docker-compose ps  # Check status
+```
+
+### "No vectors found in Chunk collection"
+
+Ensure documents have been ingested:
+```bash
+python reingest_from_cache.py
+```
+
+### Vector dimensions show 384 instead of 1024
+
+The BGE-M3 migration is incomplete. Re-run:
+```bash
+python migrate_to_bge_m3.py
+```