- Add complete Library RAG application (Flask + MCP server) - PDF processing pipeline with OCR and LLM extraction - Weaviate vector database integration (BGE-M3 embeddings) - Flask web interface with search and document management - MCP server for Claude Desktop integration - Comprehensive test suite (134 tests) - Clean up root directory - Remove obsolete documentation files - Remove backup and temporary files - Update autonomous agent configuration - Update prompts - Enhance initializer bis prompt with better instructions 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2.8 KiB
2.8 KiB
BGE-M3 Search Quality Validation Results
Generated: (Run python test_bge_m3_quality.py --output SEARCH_QUALITY_RESULTS.md to populate)
Weaviate Version: TBD
Database Statistics
- Total Documents: TBD
- Total Chunks: TBD
- Vector Dimensions: TBD (expected: 1024)
Vector Dimension Verification
Run the validation script to confirm BGE-M3 (1024-dim) vectors are properly configured.
Expected output: BGE-M3 (1024-dim) vectors confirmed.
Test Categories
1. Multilingual Queries
Tests the model's ability to understand philosophical terms in multiple languages:
| Language | Test Terms |
|---|---|
| French | justice, vertu, liberte, verite, connaissance |
| English | virtue, knowledge, ethics, wisdom, justice |
| Greek | arete, telos, psyche, logos, eudaimonia |
| Latin | virtus, sapientia, forma, anima, ratio |
2. Semantic Understanding
Tests concept mapping for philosophical questions:
| Query | Expected Topics |
|---|---|
| "What is the nature of reality?" | ontology, metaphysics, being |
| "How should we live?" | ethics, virtue, good life |
| "What can we know?" | epistemology, knowledge, truth |
| "What is the meaning of life?" | purpose, existence, value |
| "What is beauty?" | aesthetics, art, form |
3. Long Query Handling
Tests the extended 8192 token context (vs MiniLM-L6's 512 tokens):
- Uses a 100+ word query about Plato's Meno
- Verifies no truncation occurs
- Measures semantic accuracy of results
4. Performance Metrics
Performance targets:
- Query Latency: < 500ms average
- Throughput: Measured across 10 iterations per query
Running the Tests
# Run all tests with verbose output
python test_bge_m3_quality.py --verbose
# Generate markdown report
python test_bge_m3_quality.py --output SEARCH_QUALITY_RESULTS.md
# Output as JSON
python test_bge_m3_quality.py --json
Prerequisites
-
Weaviate must be running:
docker-compose up -d -
Documents must be ingested with BGE-M3 vectorizer
-
Schema must be created with 1024-dim vectors
Expected Improvements over MiniLM-L6
| Feature | MiniLM-L6 | BGE-M3 |
|---|---|---|
| Vector Dimensions | 384 | 1024 (2.7x richer) |
| Context Window | 512 tokens | 8192 tokens (16x larger) |
| Multilingual | Limited | Excellent (Greek, Latin, French, English) |
| Academic Texts | Good | Superior (trained on research papers) |
Troubleshooting
"Connection error: Failed to connect to Weaviate"
Ensure Weaviate is running:
docker-compose up -d
docker-compose ps # Check status
"No vectors found in Chunk collection"
Ensure documents have been ingested:
python reingest_from_cache.py
Vector dimensions show 384 instead of 1024
The BGE-M3 migration is incomplete. Re-run:
python migrate_to_bge_m3.py