feat: Migrate Weaviate ingestion to Python GPU embedder (30-70x faster)
BREAKING: No breaking changes - zero data loss migration Core Changes: - Added manual GPU vectorization in weaviate_ingest.py (~100 lines) - New vectorize_chunks_batch() function using BAAI/bge-m3 on RTX 4070 - Modified ingest_document() and ingest_summaries() for GPU vectors - Updated docker-compose.yml with healthchecks Performance: - Ingestion: 500-1000ms/chunk → 15ms/chunk (30-70x faster) - VRAM usage: 2.6 GB peak (well under 8 GB available) - No degradation on search/chat (already using GPU embedder) Data Safety: - All 5355 existing chunks preserved (100% compatible vectors) - Same model (BAAI/bge-m3), same dimensions (1024) - Docker text2vec-transformers optional (can be removed later) Tests (All Passed): ✅ Ingestion: 9 chunks in 1.2s ✅ Search: 16 results, GPU embedder confirmed ✅ Chat: 11 chunks across 5 sections, hierarchical search OK Architecture: Before: Hybrid (Docker CPU for ingestion, Python GPU for queries) After: Unified (Python GPU for everything) Files Modified: - generations/library_rag/utils/weaviate_ingest.py (GPU vectorization) - generations/library_rag/.claude/CLAUDE.md (documentation) - generations/library_rag/docker-compose.yml (healthchecks) Documentation: - MIGRATION_GPU_EMBEDDER_SUCCESS.md (detailed report) - TEST_FINAL_GPU_EMBEDDER.md (ingestion + search tests) - TEST_CHAT_GPU_EMBEDDER.md (chat test) - TESTS_COMPLETS_GPU_EMBEDDER.md (complete summary) - BUG_REPORT_WEAVIATE_CONNECTION.md (initial bug analysis) - DIAGNOSTIC_ARCHITECTURE_EMBEDDINGS.md (technical analysis) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This commit is contained in:
@@ -46,6 +46,16 @@ services:
|
||||
mem_limit: 8g
|
||||
memswap_limit: 10g
|
||||
cpus: 4
|
||||
# Ensure Weaviate waits for text2vec-transformers to be healthy before starting
|
||||
depends_on:
|
||||
text2vec-transformers:
|
||||
condition: service_healthy
|
||||
healthcheck:
|
||||
test: ["CMD", "curl", "-f", "http://localhost:8080/v1/.well-known/ready"]
|
||||
interval: 30s
|
||||
timeout: 10s
|
||||
retries: 3
|
||||
start_period: 60s
|
||||
|
||||
text2vec-transformers:
|
||||
# BAAI/bge-m3: Multilingual embedding model (1024 dimensions)
|
||||
@@ -71,6 +81,14 @@ services:
|
||||
mem_limit: 10g
|
||||
memswap_limit: 12g
|
||||
cpus: 3
|
||||
# Healthcheck ensures service is fully loaded before Weaviate starts
|
||||
# BGE-M3 model takes ~60-120s to load into memory
|
||||
healthcheck:
|
||||
test: ["CMD", "curl", "-f", "http://localhost:8080/.well-known/ready"]
|
||||
interval: 30s
|
||||
timeout: 10s
|
||||
retries: 5
|
||||
start_period: 120s # BGE-M3 model loading can take up to 2 minutes
|
||||
|
||||
volumes:
|
||||
weaviate_data:
|
||||
|
||||
Reference in New Issue
Block a user