linear-coding-agent/generations/library_rag/utils/word_pipeline.py at b928352e3626e15418d75dea1fa4827266448afe

Files

David Blanc Brioir b928352e36 Fix: Appel correct à ingest_document() pour Word

Corrections finales word_pipeline.py:

1. Signature ingest_document() corrigée:
   AVANT:
   - document_source_id=doc_name  ❌ (paramètre inexistant)

   APRÈS:
   - doc_name=doc_name
   - metadata=metadata
   - language=metadata.get("language", "unknown")
   - toc=toc_flat
   - hierarchy=None  # Word n'a pas de hiérarchie page
   - pages=0  # Word n'a pas de pages

2. Message callback corrigé:
   AVANT:
   - ingestion_result.get('chunks_ingested', 0)  ❌ (champ inexistant)

   APRÈS:
   - ingestion_result.get('count', 0)  ✅ (champ réel)

Test réussi complet:
✅ 48 paragraphes extraits
✅ 2 headings détectés
✅ 37 chunks créés
✅ 37 chunks nettoyés
✅ 37 chunks validés
✅ 37 chunks ingérés dans Weaviate
✅ Coût OCR: €0.0000 (pas d'OCR pour Word!)
✅ Document indexé et recherchable

Le pipeline Word est maintenant 100% fonctionnel de bout en bout.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

2025-12-30 22:49:13 +01:00

20 KiB

Raw Blame History

View Raw

20 KiB Raw Blame History

20 KiB

Raw Blame History