Fix: Appel correct à ingest_document() pour Word

Corrections finales word_pipeline.py:

1. Signature ingest_document() corrigée:
   AVANT:
   - document_source_id=doc_name   (paramètre inexistant)

   APRÈS:
   - doc_name=doc_name
   - metadata=metadata
   - language=metadata.get("language", "unknown")
   - toc=toc_flat
   - hierarchy=None  # Word n'a pas de hiérarchie page
   - pages=0  # Word n'a pas de pages

2. Message callback corrigé:
   AVANT:
   - ingestion_result.get('chunks_ingested', 0)   (champ inexistant)

   APRÈS:
   - ingestion_result.get('count', 0)   (champ réel)

Test réussi complet:
 48 paragraphes extraits
 2 headings détectés
 37 chunks créés
 37 chunks nettoyés
 37 chunks validés
 37 chunks ingérés dans Weaviate
 Coût OCR: €0.0000 (pas d'OCR pour Word!)
 Document indexé et recherchable

Le pipeline Word est maintenant 100% fonctionnel de bout en bout.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This commit is contained in:
2025-12-30 22:49:13 +01:00
parent 0800f74bd7
commit b928352e36

View File

@@ -509,10 +509,13 @@ def process_word(
callback("Weaviate Ingestion", "running", "Ingesting into Weaviate...") callback("Weaviate Ingestion", "running", "Ingesting into Weaviate...")
ingestion_result = ingest_document( ingestion_result = ingest_document(
metadata=metadata, doc_name=doc_name,
chunks=chunks, chunks=chunks,
metadata=metadata,
language=metadata.get("language", "unknown"),
toc=toc_flat, toc=toc_flat,
document_source_id=doc_name, hierarchy=None, # Word documents don't have page-based hierarchy
pages=0, # Word documents don't have pages
) )
# Save ingestion results # Save ingestion results
@@ -523,7 +526,7 @@ def process_word(
callback( callback(
"Weaviate Ingestion", "Weaviate Ingestion",
"completed", "completed",
f"Ingested {ingestion_result.get('chunks_ingested', 0)} chunks", f"Ingested {ingestion_result.get('count', 0)} chunks",
) )
# ================================================================ # ================================================================