Files
linear-coding-agent/generations/library_rag/docker-compose.yml
David Blanc Brioir 7045907173 feat: Optimize chunk sizes with 1000-word limit and overlap
Implemented chunking optimization to resolve oversized chunks and improve
semantic search quality:

CHUNKING IMPROVEMENTS:
- Added strict 1000-word max limit (vs previous 1500-2000)
- Implemented 100-word overlap between consecutive chunks
- Created llm_chunker_improved.py with overlap functionality
- Added 3 fallback points in llm_chunker.py for robustness

RE-CHUNKING RESULTS:
- Identified and re-chunked 31 oversized chunks (>2000 tokens)
- Split into 92 optimally-sized chunks (max 1995 tokens)
- Preserved all metadata (workTitle, workAuthor, sectionPath, etc.)
- 0 chunks now exceed 2000 tokens (vs 31 before)

VECTORIZATION:
- Created manual vectorization script for chunks without vectors
- Successfully vectorized all 92 new chunks (100% coverage)
- All 5,304 chunks now have BGE-M3 embeddings

DOCKER CONFIGURATION:
- Exposed text2vec-transformers port 8090 for manual vectorization
- Added cluster configuration to fix "No private IP address found"
- Increased worker timeout to 600s for large chunks

TESTING:
- Created comprehensive search quality test suite
- Tests distribution, overlap detection, and semantic search
- Modified to use near_vector() (Chunk_v2 has no vectorizer)

Scripts:
- 08_fix_summaries_properties.py - Add missing Work metadata to summaries
- 09_rechunk_oversized.py - Re-chunk giant chunks with overlap
- 10_test_search_quality.py - Validate search improvements
- 11_vectorize_missing_chunks.py - Manual vectorization via API

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-08 17:37:49 +01:00

77 lines
2.9 KiB
YAML

# Library RAG - Weaviate + BGE-M3 Embeddings
# ===========================================
#
# This docker-compose runs Weaviate with BAAI/bge-m3 embedding model.
#
# BGE-M3 Advantages:
# - 1024 dimensions (vs 384 for MiniLM-L6) - 2.7x richer representation
# - 8192 token context (vs 512) - 16x longer sequences
# - Superior multilingual support (Greek, Latin, French, English)
# - Better trained on academic/philosophical texts
#
# GPU Configuration:
# - ENABLE_CUDA="1" - Uses NVIDIA GPU for faster vectorization
# - ENABLE_CUDA="0" - Uses CPU only (slower but functional)
# - GPU device mapping included for CUDA acceleration
#
# Migration Note (2024-12):
# Migrated from sentence-transformers-multi-qa-MiniLM-L6-cos-v1 (384-dim)
# to BAAI/bge-m3 (1024-dim). All collections were deleted and recreated.
# See MIGRATION_BGE_M3.md for details.
services:
weaviate:
image: cr.weaviate.io/semitechnologies/weaviate:1.34.4
restart: on-failure:0
ports:
- "8080:8080"
- "50051:50051"
environment:
QUERY_DEFAULTS_LIMIT: "25"
AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: "true" # ok pour dev/local
PERSISTENCE_DATA_PATH: "/var/lib/weaviate"
CLUSTER_HOSTNAME: "node1"
CLUSTER_GOSSIP_BIND_PORT: "7946"
CLUSTER_DATA_BIND_PORT: "7947"
# Fix for "No private IP address found" error
CLUSTER_JOIN: ""
DEFAULT_VECTORIZER_MODULE: "text2vec-transformers"
ENABLE_MODULES: "text2vec-transformers"
TRANSFORMERS_INFERENCE_API: "http://text2vec-transformers:8080"
# Limits to prevent OOM crashes
GOMEMLIMIT: "6GiB"
GOGC: "100"
volumes:
- weaviate_data:/var/lib/weaviate
mem_limit: 8g
memswap_limit: 10g
cpus: 4
text2vec-transformers:
# BAAI/bge-m3: Multilingual embedding model (1024 dimensions)
# Superior for philosophical texts (Greek, Latin, French, English)
# 8192 token context window (16x longer than MiniLM-L6)
# Using ONNX version (only available format in Weaviate registry)
#
# GPU LIMITATION (Dec 2024):
# - Weaviate only provides ONNX version of BGE-M3 (no PyTorch)
# - ONNX runtime is CPU-optimized (no native CUDA support)
# - GPU acceleration would require NVIDIA NIM (different architecture)
# - Current setup: CPU-only with AVX2 optimization (functional but slower)
image: cr.weaviate.io/semitechnologies/transformers-inference:baai-bge-m3-onnx-latest
restart: on-failure:0
ports:
- "8090:8080" # Expose vectorizer API for manual vectorization
environment:
# ONNX runtime - CPU only (CUDA not supported in ONNX version)
ENABLE_CUDA: "0"
# Increased timeouts for very long chunks (e.g., Peirce CP 3.403, CP 8.388, Menon chunk 10)
# Default is 60s, increased to 600s (10 minutes) for exceptionally large texts (e.g., CP 8.388: 218k chars)
WORKER_TIMEOUT: "600"
mem_limit: 10g
memswap_limit: 12g
cpus: 3
volumes:
weaviate_data: