BREAKING CHANGE: Document collection removed from Weaviate schema Architecture simplification: - Removed Document collection (unused by Flask app) - All metadata now in Work collection or file-based (chunks.json) - Simplified from 4 collections to 3 (Work, Chunk_v2, Summary_v2) Schema changes (schema.py): - Removed create_document_collection() function - Updated verify_schema() to expect 3 collections - Updated display_schema() and print_summary() - Updated documentation to reflect Chunk_v2/Summary_v2 Ingestion changes (weaviate_ingest.py): - Removed ingest_document_metadata() function - Removed ingest_document_collection parameter - Updated IngestResult to use work_uuid instead of document_uuid - Removed Document deletion from delete_document_chunks() - Updated DeleteResult TypedDict Type changes (types.py): - WeaviateIngestResult: document_uuid → work_uuid Documentation updates (.claude/CLAUDE.md): - Updated schema diagram (4 → 3 collections) - Removed Document references - Updated to reflect manual GPU vectorization Database changes: - Deleted Document collection (13 objects) - Deleted Chunk collection (0 objects, old schema) Benefits: - Simpler architecture (3 collections vs 4) - No redundant data storage - All metadata available via Work or file-based storage - Reduced Weaviate memory footprint Migration: - See DOCUMENT_COLLECTION_ANALYSIS.md for detailed analysis - See migrate_chunk_v2_to_none_vectorizer.py for vectorizer migration Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
49 lines
755 B
Plaintext
49 lines
755 B
Plaintext
# Agent-generated output directories
|
|
generations/*
|
|
!generations/library_rag/
|
|
|
|
# Python cache and compiled files
|
|
__pycache__/
|
|
*.pyc
|
|
*.pyo
|
|
*.pyd
|
|
|
|
# Log files
|
|
logs/
|
|
*.log
|
|
|
|
.env
|
|
venv
|
|
|
|
# Node modules (if any)
|
|
node_modules/
|
|
package-lock.json
|
|
|
|
# Backup and temporary files
|
|
backup_migration_*/
|
|
restoration_log.txt
|
|
restoration_remaining_log.txt
|
|
summary_generation_progress.json
|
|
nul
|
|
|
|
# Test files and temporary scripts (Jan 2026)
|
|
test_*.txt
|
|
test_ingestion*.py
|
|
test_direct*.py
|
|
test_upload*.py
|
|
*_backup.json
|
|
chunks_to_vectorize.json
|
|
output/
|
|
check_chunks.py
|
|
verify_works.py
|
|
complete_*.py
|
|
extract_*.py
|
|
fast_extract.py
|
|
stream_extract.py
|
|
quick_vectorize.py
|
|
vectorize_remaining.py
|
|
migrate_chunk_*.py
|
|
|
|
# Archives (migration scripts moved here)
|
|
archive/chunk_v2_backup.json
|