Update framework configuration and clean up obsolete specs

Configuration updates: - Added .env.example template for environment variables - Updated README.md with better setup instructions (.env usage) - Enhanced .claude/settings.local.json with additional Bash permissions - Added .claude/CLAUDE.md framework documentation Spec cleanup: - Removed obsolete spec files (language_selection, mistral_extensible, template, theme_customization) - Consolidated app_spec.txt (Claude Clone example) - Added app_spec_model.txt as reference template - Added app_spec_library_rag_types_docs.txt - Added coding_prompt_library.md Framework improvements: - Updated agent.py, autonomous_agent_demo.py, client.py with minor fixes - Enhanced dockerize_my_project.py - Updated prompts (initializer, initializer_bis) with better guidance - Added docker-compose.my_project.yml example This commit consolidates improvements made during development sessions. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-25 12:53:14 +01:00
parent bf790b63a0
commit 2e33637dae
27 changed files with 3862 additions and 2378 deletions
--- a/prompts/spec_embed_BAAI.txt
+++ b/prompts/spec_embed_BAAI.txt
@@ -0,0 +1,576 @@
+<project_specification>
+  <project_name>Library RAG - Migration to BGE-M3 Embeddings</project_name>
+
+  <overview>
+    Migrate the Library RAG embedding model from sentence-transformers MiniLM-L6 (384-dim)
+    to BAAI/bge-m3 (1024-dim) for superior performance on multilingual philosophical texts.
+
+    **Why BGE-M3?**
+    - 1024 dimensions vs 384 (2.7x richer semantic representation)
+    - 8192 token context vs 512 (16x longer sequences)
+    - Superior multilingual support (Greek, Latin, French, English)
+    - Better trained on academic/research texts
+    - Captures philosophical nuances more effectively
+
+    **Scope:**
+    This is a focused migration that only affects the vectorization layer.
+    LLM processing (Ollama/Mistral) remains completely unchanged.
+
+    **Migration Strategy:**
+    - Auto-detect GPU availability and configure accordingly
+    - Delete existing collections (384-dim vectors incompatible with 1024-dim)
+    - Recreate schema with BGE-M3 vectorizer
+    - Re-ingest existing 2 documents from cached chunks
+    - Validate search quality improvements
+  </overview>
+
+  <technology_stack>
+    <backend>
+      <weaviate>1.34.4 (no change)</weaviate>
+      <new_vectorizer>BAAI/bge-m3 via text2vec-transformers</new_vectorizer>
+      <old_vectorizer>sentence-transformers-multi-qa-MiniLM-L6-cos-v1</old_vectorizer>
+      <gpu_support>Auto-detect CUDA availability (ENABLE_CUDA="1" if GPU, "0" if CPU)</gpu_support>
+    </backend>
+    <unchanged>
+      <llm>Ollama/Mistral (no impact on LLM processing)</llm>
+      <ocr>Mistral OCR (no change)</ocr>
+      <pipeline>PDF pipeline steps 1-9 unchanged</pipeline>
+    </unchanged>
+  </technology_stack>
+
+  <prerequisites>
+    <environment_setup>
+      - Existing Library RAG application (generations/library_rag/)
+      - Docker and Docker Compose installed
+      - NVIDIA Docker runtime (if GPU available)
+      - Only 2 documents currently ingested (will be re-ingested)
+      - No production data to preserve
+      - RTX 4070 GPU available (will be auto-detected and used)
+    </environment_setup>
+  </prerequisites>
+
+  <architecture_impact>
+    <independent_components>
+      **LLM Processing (Steps 1-9):**
+      - OCR extraction (Mistral API)
+      - Metadata extraction (Ollama/Mistral)
+      - TOC extraction (Ollama/Mistral)
+      - Section classification (Ollama/Mistral)
+      - Semantic chunking (Ollama/Mistral)
+      - Cleaning and validation (Ollama/Mistral)
+
+      → **None of these are affected by embedding model change**
+
+      **Vectorization (Step 10):**
+      - Text → Vector conversion (text2vec-transformers in Weaviate)
+      - This is the ONLY component that changes
+      - Happens automatically during Weaviate ingestion
+      - No Python code changes required
+    </independent_components>
+
+    <breaking_changes>
+      **IMPORTANT: Vector dimensions are incompatible**
+
+      - Existing collections use 384-dim vectors (MiniLM-L6)
+      - New model generates 1024-dim vectors (BGE-M3)
+      - Weaviate cannot mix dimensions in same collection
+      - All collections must be deleted and recreated
+      - All documents must be re-ingested
+
+      **Why this is safe:**
+      - Only 2 documents currently ingested
+      - Source chunks.json files preserved in output/ directory
+      - No OCR/LLM re-processing needed (reuse existing chunks)
+      - No additional costs incurred
+      - Estimated total migration time: 20-25 minutes
+    </breaking_changes>
+  </architecture_impact>
+
+  <implementation_steps>
+    <feature_1>
+      <title>Complete BGE-M3 Setup with GPU Auto-Detection</title>
+      <description>
+        Atomic migration: GPU detection → Docker configuration → Schema deletion → Recreation.
+        This feature must be completed entirely in one session (cannot be partially done).
+
+        **Step 1: GPU Auto-Detection**
+        - Check for NVIDIA GPU availability: nvidia-smi or docker run --gpus all nvidia/cuda
+        - If GPU detected: Set ENABLE_CUDA="1"
+        - If no GPU: Set ENABLE_CUDA="0"
+        - Verify NVIDIA Docker runtime if GPU available
+
+        **Step 2: Update Docker Compose**
+        - Backup current docker-compose.yml to docker-compose.yml.backup
+        - Update text2vec-transformers service:
+          * Change image to: cr.weaviate.io/semitechnologies/transformers-inference:sentence-transformers-BAAI-bge-m3
+          * Set ENABLE_CUDA based on GPU detection
+          * Add GPU device mapping if CUDA enabled
+        - Update comments to reflect BGE-M3 model
+        - Stop containers: docker-compose down
+        - Remove old transformers image: docker rmi [old-image-name]
+        - Start new containers: docker-compose up -d
+        - Verify BGE-M3 loaded: docker-compose logs text2vec-transformers | grep -i "model"
+        - If GPU enabled, verify GPU usage: nvidia-smi (should show transformers process)
+
+        **Step 3: Delete Existing Collections**
+        - Create migrate_to_bge_m3.py script with safety checks
+        - List all existing collections and object counts
+        - Confirm deletion prompt: "Delete all collections? (yes/no)"
+        - Delete all collections: client.collections.delete_all()
+        - Verify deletion: client.collections.list_all() should return empty
+        - Log deleted collections and counts for reference
+
+        **Step 4: Recreate Schema with BGE-M3**
+        - Update schema.py docstring (line 40: MiniLM-L6 → BGE-M3)
+        - Add migration note at top of schema.py
+        - Run: python schema.py to recreate all collections
+        - Weaviate will auto-detect 1024-dim from text2vec-transformers service
+        - Verify collections created: Work, Document, Chunk, Summary
+        - Verify vectorizer configured: display_schema() should show text2vec-transformers
+        - Query text2vec-transformers service to confirm 1024 dimensions
+
+        **Validation:**
+        - All containers running (docker-compose ps)
+        - BGE-M3 model loaded successfully
+        - GPU utilized if available (check nvidia-smi)
+        - All collections exist with empty state
+        - Vector dimensions = 1024 (query Weaviate schema)
+
+        **Rollback if needed:**
+        - Restore docker-compose.yml.backup
+        - docker-compose down && docker-compose up -d
+        - python schema.py to recreate with old model
+      </description>
+      <priority>1</priority>
+      <category>migration</category>
+      <test_steps>
+        1. Run GPU detection: nvidia-smi or equivalent
+        2. Verify ENABLE_CUDA set correctly based on GPU availability
+        3. Backup docker-compose.yml created
+        4. Stop containers: docker-compose down
+        5. Start with BGE-M3: docker-compose up -d
+        6. Check logs: docker-compose logs text2vec-transformers
+        7. Verify "BAAI/bge-m3" appears in logs
+        8. If GPU: verify nvidia-smi shows transformers process
+        9. Run migrate_to_bge_m3.py and confirm deletion
+        10. Verify all collections deleted
+        11. Run schema.py to recreate
+        12. Verify 4 collections exist: Work, Document, Chunk, Summary
+        13. Query Weaviate API to confirm vector dimensions = 1024
+        14. Verify collections are empty (object count = 0)
+      </test_steps>
+    </feature_1>
+
+    <feature_2>
+      <title>Document Re-ingestion from Cached Chunks</title>
+      <description>
+        Re-ingest the 2 existing documents using their cached chunks.json files.
+        No OCR or LLM re-processing needed (saves time and cost).
+
+        **Process:**
+        1. Identify existing documents in output/ directory
+        2. For each document directory:
+           - Read {document_name}_chunks.json
+           - Verify chunk structure contains all required fields
+           - Extract Work metadata (title, author, year, language, genre)
+           - Extract Document metadata (sourceId, edition, pages, toc, hierarchy)
+           - Extract Chunk data (text, keywords, sectionPath, etc.)
+
+        3. Ingest to Weaviate using utils/weaviate_ingest.py:
+           - Create Work object (if not exists)
+           - Create Document object with nested Work reference
+           - Create Chunk objects with nested Document and Work references
+           - text2vec-transformers will auto-generate 1024-dim vectors
+
+        4. Verify ingestion success:
+           - Query Weaviate for each document by sourceId
+           - Verify chunk counts match original
+           - Check that vectors are 1024 dimensions
+           - Verify nested Work/Document metadata accessible
+
+        **Example code:**
+        ```python
+        import json
+        from pathlib import Path
+        from utils.weaviate_ingest import (
+            create_work, create_document, ingest_chunks_to_weaviate
+        )
+
+        output_dir = Path("output")
+        for doc_dir in output_dir.iterdir():
+            if doc_dir.is_dir():
+                chunks_file = doc_dir / f"{doc_dir.name}_chunks.json"
+                if chunks_file.exists():
+                    with open(chunks_file) as f:
+                        data = json.load(f)
+
+                    # Create Work
+                    work_id = create_work(client, data["work_metadata"])
+
+                    # Create Document
+                    doc_id = create_document(client, data["document_metadata"], work_id)
+
+                    # Ingest chunks
+                    ingest_chunks_to_weaviate(client, data["chunks"], doc_id, work_id)
+
+                    print(f"✓ Ingested {doc_dir.name}")
+        ```
+
+        **Success criteria:**
+        - All documents from output/ directory ingested
+        - Chunk counts match original (verify in Weaviate)
+        - No vectorization errors in logs
+        - All vectors are 1024 dimensions
+      </description>
+      <priority>1</priority>
+      <category>data</category>
+      <test_steps>
+        1. List all directories in output/
+        2. For each directory, verify {name}_chunks.json exists
+        3. Load first chunks.json and inspect structure
+        4. Run re-ingestion script for all documents
+        5. Query Weaviate for total Chunk count
+        6. Verify count matches sum of all original chunks
+        7. Query a sample chunk and verify:
+           - Vector dimensions = 1024
+           - Nested work.title and work.author present
+           - Nested document.sourceId present
+        8. Verify no errors in Weaviate logs
+        9. Check text2vec-transformers logs for vectorization activity
+      </test_steps>
+    </feature_2>
+
+    <feature_3>
+      <title>Search Quality Validation and Performance Testing</title>
+      <description>
+        Validate that BGE-M3 provides superior search quality for philosophical texts.
+        Test multilingual capabilities and measure performance improvements.
+
+        **Create test script: test_bge_m3_quality.py**
+
+        **Test 1: Multilingual Queries**
+        - Test French philosophical terms: "justice", "vertu", "liberté"
+        - Test English philosophical terms: "virtue", "knowledge", "ethics"
+        - Test Greek philosophical terms: "ἀρετή" (arete), "τέλος" (telos), "ψυχή" (psyche)
+        - Test Latin philosophical terms: "virtus", "sapientia", "forma"
+        - Verify results are semantically relevant
+        - Compare with expected passages (if baseline available)
+
+        **Test 2: Long Query Handling**
+        - Test query with 100+ words (BGE-M3 supports 8192 tokens)
+        - Test query with complex philosophical argument
+        - Verify no truncation warnings
+        - Verify semantically appropriate results
+
+        **Test 3: Semantic Understanding**
+        - Query: "What is the nature of reality?"
+        - Expected: Results about ontology, metaphysics, being
+        - Query: "How should we live?"
+        - Expected: Results about ethics, virtue, good life
+        - Query: "What can we know?"
+        - Expected: Results about epistemology, knowledge, certainty
+
+        **Test 4: Performance Metrics**
+        - Measure query latency (should be &lt;500ms)
+        - Measure indexing speed during ingestion
+        - Monitor GPU utilization (if enabled)
+        - Monitor memory usage (~2GB for BGE-M3)
+        - Compare with baseline (MiniLM-L6) if metrics available
+
+        **Test 5: Vector Dimension Verification**
+        - Query Weaviate schema API
+        - Verify all Chunk vectors are 1024 dimensions
+        - Verify no 384-dim vectors remain (from old model)
+
+        **Example test script:**
+        ```python
+        import weaviate
+        import weaviate.classes.query as wvq
+        import time
+
+        client = weaviate.connect_to_local()
+        chunks = client.collections.get("Chunk")
+
+        # Test multilingual
+        test_queries = [
+            ("justice", "French philosophical concept"),
+            ("ἀρετή", "Greek virtue/excellence"),
+            ("What is the good life?", "Long philosophical query"),
+        ]
+
+        for query, description in test_queries:
+            start = time.time()
+            result = chunks.query.near_text(
+                query=query,
+                limit=5,
+                return_metadata=wvq.MetadataQuery(distance=True),
+            )
+            latency = (time.time() - start) * 1000
+
+            print(f"\nQuery: {query} ({description})")
+            print(f"Latency: {latency:.1f}ms")
+
+            for obj in result.objects:
+                similarity = (1 - obj.metadata.distance) * 100
+                print(f"  [{similarity:.1f}%] {obj.properties['work']['title']}")
+                print(f"    {obj.properties['text'][:150]}...")
+
+        client.close()
+        ```
+
+        **Document results:**
+        - Create SEARCH_QUALITY_RESULTS.md with:
+          * Sample queries and results
+          * Performance metrics
+          * Comparison with MiniLM-L6 (if available)
+          * Notes on quality improvements observed
+      </description>
+      <priority>1</priority>
+      <category>validation</category>
+      <test_steps>
+        1. Create test_bge_m3_quality.py script
+        2. Run multilingual query tests (French, English, Greek, Latin)
+        3. Verify results are semantically relevant
+        4. Test long queries (100+ words)
+        5. Measure average query latency over 10 queries
+        6. Verify latency &lt;500ms
+        7. Query Weaviate schema to verify vector dimensions = 1024
+        8. If GPU enabled, monitor nvidia-smi during queries
+        9. Document search quality improvements in markdown file
+        10. Compare results with expected philosophical passages
+      </test_steps>
+    </feature_3>
+
+    <feature_4>
+      <title>Documentation Update</title>
+      <description>
+        Update all documentation to reflect BGE-M3 migration.
+
+        **Files to update:**
+
+        1. **docker-compose.yml**
+           - Update comments to mention BGE-M3
+           - Note GPU auto-detection logic
+           - Document ENABLE_CUDA setting
+
+        2. **README.md**
+           - Update "Embedding Model" section
+           - Change: MiniLM-L6 (384-dim) → BGE-M3 (1024-dim)
+           - Add benefits: multilingual, longer context, better quality
+           - Update docker-compose instructions if needed
+
+        3. **CLAUDE.md**
+           - Update schema documentation (line ~35)
+           - Change vectorizer description
+           - Update example queries to showcase multilingual
+           - Add migration notes section
+
+        4. **schema.py**
+           - Update module docstring (line 40)
+           - Change "MiniLM-L6" references to "BGE-M3"
+           - Add migration date and rationale in comments
+           - Update display_schema() output text
+
+        5. **Create MIGRATION_BGE_M3.md**
+           - Document migration process
+           - Explain why BGE-M3 chosen
+           - List breaking changes (dimension incompatibility)
+           - Document rollback procedure
+           - Include before/after comparison
+           - Note LLM independence (Ollama/Mistral unaffected)
+           - Document search quality improvements
+
+        6. **MCP_README.md** (if exists)
+           - Update technical details about embeddings
+           - Update vector dimension references
+
+        **Migration notes template:**
+        ```markdown
+        # BGE-M3 Migration - [Date]
+
+        ## Why
+        - Superior multilingual support (Greek, Latin, French, English)
+        - 1024-dim vectors (2.7x richer than MiniLM-L6)
+        - 8192 token context (16x longer than MiniLM-L6)
+        - Better trained on academic/philosophical texts
+
+        ## What Changed
+        - Embedding model: MiniLM-L6 → BAAI/bge-m3
+        - Vector dimensions: 384 → 1024
+        - All collections deleted and recreated
+        - 2 documents re-ingested
+
+        ## Impact
+        - LLM processing (Ollama/Mistral): **No impact**
+        - Search quality: **Significantly improved**
+        - GPU acceleration: **Auto-enabled** (if available)
+        - Migration time: ~25 minutes
+
+        ## Search Quality Improvements
+        [Insert results from Feature 3 testing]
+        ```
+
+        **Verify:**
+        - Search all files for "MiniLM-L6" references
+        - Search all files for "384" dimension references
+        - Replace with "BGE-M3" and "1024" respectively
+        - Grep for "text2vec" and update comments where needed
+      </description>
+      <priority>2</priority>
+      <category>documentation</category>
+      <test_steps>
+        1. Update docker-compose.yml comments
+        2. Update README.md embedding section
+        3. Update CLAUDE.md schema documentation
+        4. Update schema.py docstring and comments
+        5. Create MIGRATION_BGE_M3.md with full migration notes
+        6. Search codebase for "MiniLM-L6" references: grep -r "MiniLM" .
+        7. Replace all with "BGE-M3"
+        8. Search for "384" dimension references
+        9. Replace with "1024" where appropriate
+        10. Review all updated files for consistency
+        11. Verify no outdated references remain
+      </test_steps>
+    </feature_4>
+  </implementation_steps>
+
+  <deliverables>
+    <code>
+      - Updated docker-compose.yml with BGE-M3 and GPU auto-detection
+      - migrate_to_bge_m3.py script for safe collection deletion
+      - Updated schema.py with BGE-M3 documentation
+      - Re-ingestion script (or integration with existing utils)
+      - test_bge_m3_quality.py for validation
+    </code>
+
+    <documentation>
+      - MIGRATION_BGE_M3.md with complete migration notes
+      - Updated README.md with BGE-M3 details
+      - Updated CLAUDE.md with schema changes
+      - SEARCH_QUALITY_RESULTS.md with validation results
+      - Updated inline comments in all affected files
+    </documentation>
+  </deliverables>
+
+  <success_criteria>
+    <functionality>
+      - BGE-M3 model loads successfully in Weaviate
+      - GPU auto-detected and utilized if available
+      - All collections recreated with 1024-dim vectors
+      - Documents re-ingested successfully from cached chunks
+      - Semantic search returns relevant results
+      - Multilingual queries work correctly (Greek, Latin, French, English)
+    </functionality>
+
+    <quality>
+      - Search quality demonstrably improved vs MiniLM-L6
+      - Greek/Latin philosophical terms properly embedded
+      - Long queries (&gt;512 tokens) handled correctly
+      - No vectorization errors in logs
+      - Vector dimensions verified as 1024 across all collections
+    </quality>
+
+    <performance>
+      - Query latency acceptable (&lt;500ms average)
+      - GPU utilized if available (verified via nvidia-smi)
+      - Memory usage stable (~2GB for text2vec-transformers)
+      - Indexing throughput acceptable during re-ingestion
+      - No performance degradation vs MiniLM-L6
+    </performance>
+
+    <documentation>
+      - All documentation updated to reflect BGE-M3
+      - No outdated MiniLM-L6 references remain
+      - Migration process fully documented
+      - Rollback procedure documented and tested
+      - Search quality improvements quantified
+    </documentation>
+  </success_criteria>
+
+  <migration_notes>
+    <breaking_changes>
+      **IMPORTANT: This is a destructive migration**
+
+      - All existing Weaviate collections must be deleted
+      - Vector dimensions change: 384 → 1024 (incompatible)
+      - Weaviate cannot mix dimensions in same collection
+      - All documents must be re-ingested
+
+      **Low impact:**
+      - Only 2 documents currently ingested
+      - Source chunks.json files preserved in output/ directory
+      - No OCR re-processing needed (saves ~0.006€ per doc)
+      - No LLM re-processing needed (saves time and cost)
+      - Estimated migration time: 20-25 minutes total
+    </breaking_changes>
+
+    <rollback_plan>
+      If BGE-M3 causes issues, rollback is straightforward:
+
+      1. Stop containers: docker-compose down
+      2. Restore backup: mv docker-compose.yml.backup docker-compose.yml
+      3. Start containers: docker-compose up -d
+      4. Recreate schema: python schema.py
+      5. Re-ingest documents from output/ directory (same process as Feature 2)
+
+      **Time to rollback: ~15 minutes**
+
+      **Note:** Backup of docker-compose.yml created automatically in Feature 1
+    </rollback_plan>
+
+    <gpu_auto_detection>
+      **GPU is NOT optional - it's auto-detected**
+
+      The system will automatically detect GPU availability and configure accordingly:
+
+      - **If GPU available (RTX 4070 detected):**
+        * ENABLE_CUDA="1" in docker-compose.yml
+        * GPU device mapping added to text2vec-transformers service
+        * Vectorization uses GPU (5-10x faster)
+        * ~2GB VRAM used (plenty of headroom on 4070)
+        * Ollama/Qwen can still use remaining VRAM
+
+      - **If NO GPU available:**
+        * ENABLE_CUDA="0" in docker-compose.yml
+        * Vectorization uses CPU (slower but functional)
+        * No GPU device mapping needed
+
+      **Detection method:**
+      ```bash
+      # Try nvidia-smi
+      if command -v nvidia-smi &> /dev/null; then
+          GPU_AVAILABLE=true
+      else
+          # Try Docker GPU test
+          if docker run --rm --gpus all nvidia/cuda:11.8.0-base-ubuntu22.04 nvidia-smi &> /dev/null; then
+              GPU_AVAILABLE=true
+          else
+              GPU_AVAILABLE=false
+          fi
+      fi
+      ```
+
+      **User has RTX 4070:** GPU will be detected and used automatically.
+    </gpu_auto_detection>
+
+    <llm_independence>
+      **Ollama/Mistral are NOT affected by this change**
+
+      The embedding model migration ONLY affects Weaviate vectorization (pipeline step 10).
+      All LLM processing (steps 1-9) remains unchanged:
+      - OCR extraction (Mistral API)
+      - Metadata extraction (Ollama/Mistral)
+      - TOC extraction (Ollama/Mistral)
+      - Section classification (Ollama/Mistral)
+      - Semantic chunking (Ollama/Mistral)
+      - Cleaning and validation (Ollama/Mistral)
+
+      **No Python code changes required.**
+      Weaviate handles vectorization automatically via text2vec-transformers service.
+
+      **Ollama can still use GPU:**
+      BGE-M3 uses ~2GB VRAM. RTX 4070 has 12GB.
+      Ollama/Qwen can use remaining 10GB without conflict.
+    </llm_independence>
+  </migration_notes>
+</project_specification>