Add Library RAG project and cleanup root directory
- Add complete Library RAG application (Flask + MCP server) - PDF processing pipeline with OCR and LLM extraction - Weaviate vector database integration (BGE-M3 embeddings) - Flask web interface with search and document management - MCP server for Claude Desktop integration - Comprehensive test suite (134 tests) - Clean up root directory - Remove obsolete documentation files - Remove backup and temporary files - Update autonomous agent configuration - Update prompts - Enhance initializer bis prompt with better instructions 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,113 @@
|
||||
# BGE-M3 Search Quality Validation Results
|
||||
|
||||
**Generated:** (Run `python test_bge_m3_quality.py --output SEARCH_QUALITY_RESULTS.md` to populate)
|
||||
|
||||
**Weaviate Version:** TBD
|
||||
|
||||
## Database Statistics
|
||||
|
||||
- **Total Documents:** TBD
|
||||
- **Total Chunks:** TBD
|
||||
- **Vector Dimensions:** TBD (expected: 1024)
|
||||
|
||||
## Vector Dimension Verification
|
||||
|
||||
Run the validation script to confirm BGE-M3 (1024-dim) vectors are properly configured.
|
||||
|
||||
Expected output: **BGE-M3 (1024-dim) vectors confirmed.**
|
||||
|
||||
## Test Categories
|
||||
|
||||
### 1. Multilingual Queries
|
||||
|
||||
Tests the model's ability to understand philosophical terms in multiple languages:
|
||||
|
||||
| Language | Test Terms |
|
||||
|----------|------------|
|
||||
| French | justice, vertu, liberte, verite, connaissance |
|
||||
| English | virtue, knowledge, ethics, wisdom, justice |
|
||||
| Greek | arete, telos, psyche, logos, eudaimonia |
|
||||
| Latin | virtus, sapientia, forma, anima, ratio |
|
||||
|
||||
### 2. Semantic Understanding
|
||||
|
||||
Tests concept mapping for philosophical questions:
|
||||
|
||||
| Query | Expected Topics |
|
||||
|-------|----------------|
|
||||
| "What is the nature of reality?" | ontology, metaphysics, being |
|
||||
| "How should we live?" | ethics, virtue, good life |
|
||||
| "What can we know?" | epistemology, knowledge, truth |
|
||||
| "What is the meaning of life?" | purpose, existence, value |
|
||||
| "What is beauty?" | aesthetics, art, form |
|
||||
|
||||
### 3. Long Query Handling
|
||||
|
||||
Tests the extended 8192 token context (vs MiniLM-L6's 512 tokens):
|
||||
|
||||
- Uses a 100+ word query about Plato's Meno
|
||||
- Verifies no truncation occurs
|
||||
- Measures semantic accuracy of results
|
||||
|
||||
### 4. Performance Metrics
|
||||
|
||||
Performance targets:
|
||||
- **Query Latency:** < 500ms average
|
||||
- **Throughput:** Measured across 10 iterations per query
|
||||
|
||||
## Running the Tests
|
||||
|
||||
```bash
|
||||
# Run all tests with verbose output
|
||||
python test_bge_m3_quality.py --verbose
|
||||
|
||||
# Generate markdown report
|
||||
python test_bge_m3_quality.py --output SEARCH_QUALITY_RESULTS.md
|
||||
|
||||
# Output as JSON
|
||||
python test_bge_m3_quality.py --json
|
||||
```
|
||||
|
||||
## Prerequisites
|
||||
|
||||
1. Weaviate must be running:
|
||||
```bash
|
||||
docker-compose up -d
|
||||
```
|
||||
|
||||
2. Documents must be ingested with BGE-M3 vectorizer
|
||||
|
||||
3. Schema must be created with 1024-dim vectors
|
||||
|
||||
## Expected Improvements over MiniLM-L6
|
||||
|
||||
| Feature | MiniLM-L6 | BGE-M3 |
|
||||
|---------|-----------|--------|
|
||||
| Vector Dimensions | 384 | 1024 (2.7x richer) |
|
||||
| Context Window | 512 tokens | 8192 tokens (16x larger) |
|
||||
| Multilingual | Limited | Excellent (Greek, Latin, French, English) |
|
||||
| Academic Texts | Good | Superior (trained on research papers) |
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### "Connection error: Failed to connect to Weaviate"
|
||||
|
||||
Ensure Weaviate is running:
|
||||
```bash
|
||||
docker-compose up -d
|
||||
docker-compose ps # Check status
|
||||
```
|
||||
|
||||
### "No vectors found in Chunk collection"
|
||||
|
||||
Ensure documents have been ingested:
|
||||
```bash
|
||||
python reingest_from_cache.py
|
||||
```
|
||||
|
||||
### Vector dimensions show 384 instead of 1024
|
||||
|
||||
The BGE-M3 migration is incomplete. Re-run:
|
||||
```bash
|
||||
python migrate_to_bge_m3.py
|
||||
```
|
||||
Reference in New Issue
Block a user