Adds automatic Work object creation to ensure all uploaded documents
appear on the /documents page. Previously, chunks were ingested but
Work entries were missing, causing documents to be invisible in the UI.
Changes:
- Add create_or_get_work() function to weaviate_ingest.py
- Checks for existing Work by sourceId (prevents duplicates)
- Creates new Work with metadata (title, author, year, pages)
- Returns UUID for potential future reference
- Integrate Work creation into ingest_document() flow
- Add helper scripts for retroactive fixes and verification:
- create_missing_works.py: Create Works for already-ingested documents
- reingest_batch_documents.py: Re-ingest documents after bug fixes
- check_batch_results.py: Verify batch upload results in Weaviate
This completes the batch upload feature - documents now properly appear
on /documents page immediately after ingestion.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Fixes batch upload ingestion that was failing silently due to schema mismatches:
Schema Fixes:
- Update collection names from "Chunk" to "Chunk_v2"
- Update collection names from "Summary" to "Summary_v2"
Object Structure Fixes:
- Replace nested objects (work: {title, author}) with flat fields
- Use workTitle and workAuthor instead of nested work object
- Add year field to chunks
- Remove document nested object (not used in current schema)
- Disable nested objects validation for flat schema
Impact:
- Batch upload now successfully ingests chunks to Weaviate
- Single-file upload also benefits from fixes
- All new documents will be properly indexed and searchable
Testing:
- Verified with 2-file batch upload (7 + 11 chunks = 18 total)
- Total chunks increased from 5,304 to 5,322
- All chunks properly searchable with workTitle/workAuthor filters
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>