# BGE-M3 Search Quality Validation Results

**Generated:** (Run `python test_bge_m3_quality.py --output SEARCH_QUALITY_RESULTS.md` to populate)

**Weaviate Version:** TBD

## Database Statistics

- **Total Documents:** TBD
- **Total Chunks:** TBD
- **Vector Dimensions:** TBD (expected: 1024)

## Vector Dimension Verification

Run the validation script to confirm BGE-M3 (1024-dim) vectors are properly configured.

Expected output: **BGE-M3 (1024-dim) vectors confirmed.**

## Test Categories

### 1. Multilingual Queries

Tests the model's ability to understand philosophical terms in multiple languages:

| Language | Test Terms |
|----------|------------|
| French | justice, vertu, liberte, verite, connaissance |
| English | virtue, knowledge, ethics, wisdom, justice |
| Greek | arete, telos, psyche, logos, eudaimonia |
| Latin | virtus, sapientia, forma, anima, ratio |

### 2. Semantic Understanding

Tests concept mapping for philosophical questions:

| Query | Expected Topics |
|-------|----------------|
| "What is the nature of reality?" | ontology, metaphysics, being |
| "How should we live?" | ethics, virtue, good life |
| "What can we know?" | epistemology, knowledge, truth |
| "What is the meaning of life?" | purpose, existence, value |
| "What is beauty?" | aesthetics, art, form |

### 3. Long Query Handling

Tests the extended 8192 token context (vs MiniLM-L6's 512 tokens):

- Uses a 100+ word query about Plato's Meno
- Verifies no truncation occurs
- Measures semantic accuracy of results

### 4. Performance Metrics

Performance targets:
- **Query Latency:** < 500ms average
- **Throughput:** Measured across 10 iterations per query

## Running the Tests

```bash
# Run all tests with verbose output
python test_bge_m3_quality.py --verbose

# Generate markdown report
python test_bge_m3_quality.py --output SEARCH_QUALITY_RESULTS.md

# Output as JSON
python test_bge_m3_quality.py --json
```

## Prerequisites

1. Weaviate must be running:
   ```bash
   docker-compose up -d
   ```

2. Documents must be ingested with BGE-M3 vectorizer

3. Schema must be created with 1024-dim vectors

## Expected Improvements over MiniLM-L6

| Feature | MiniLM-L6 | BGE-M3 |
|---------|-----------|--------|
| Vector Dimensions | 384 | 1024 (2.7x richer) |
| Context Window | 512 tokens | 8192 tokens (16x larger) |
| Multilingual | Limited | Excellent (Greek, Latin, French, English) |
| Academic Texts | Good | Superior (trained on research papers) |

## Troubleshooting

### "Connection error: Failed to connect to Weaviate"

Ensure Weaviate is running:
```bash
docker-compose up -d
docker-compose ps  # Check status
```

### "No vectors found in Chunk collection"

Ensure documents have been ingested:
```bash
python reingest_from_cache.py
```

### Vector dimensions show 384 instead of 1024

The BGE-M3 migration is incomplete. Re-run:
```bash
python migrate_to_bge_m3.py
```