linear-coding-agent

davebb/linear-coding-agent

Fork 0

Commit Graph

Author	SHA1	Message	Date
David Blanc Brioir	0c8ea8fa48	fix: Correct Work titles and improve LLM metadata extraction Fixes issue where LLM was copying placeholder instructions from the prompt template into actual metadata fields. Changes: 1. Created fix_work_titles.py script to correct existing bad titles - Detects patterns like "(si c'est bien...)", "Titre corrigé...", "Auteur à identifier" - Extracts correct metadata from chunks JSON files - Updates Work entries and associated chunks (44 chunks updated) - Fixed 3 Works with placeholder contamination 2. Improved llm_metadata.py prompt to prevent future issues - Added explicit INTERDIT/OBLIGATOIRE rules with ❌/✅ markers - Replaced placeholder examples with real concrete examples - Added two example responses (high confidence + low confidence) - Final empty JSON template guides structure without placeholders - Reinforced: use "confidence" field for uncertainty, not annotations Results: - "A Cartesian critique... (si c'est bien le titre)" → "A Cartesian critique of the artificial intelligence" - "Titre corrigé si nécessaire (ex: ...)" → "Computationalism and The Case When the Brain Is Not a Computer" - "Titre de l'article principal (à identifier)" → "Computationalism in the Philosophy of Mind" All future document uploads will now extract clean metadata without LLM commentary or placeholder instructions. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-01-08 23:59:25 +01:00

Author

SHA1

Message

Date

David Blanc Brioir

0c8ea8fa48

fix: Correct Work titles and improve LLM metadata extraction

Fixes issue where LLM was copying placeholder instructions from the
prompt template into actual metadata fields.

Changes:
1. Created fix_work_titles.py script to correct existing bad titles
   - Detects patterns like "(si c'est bien...)", "Titre corrigé...", "Auteur à identifier"
   - Extracts correct metadata from chunks JSON files
   - Updates Work entries and associated chunks (44 chunks updated)
   - Fixed 3 Works with placeholder contamination

2. Improved llm_metadata.py prompt to prevent future issues
   - Added explicit INTERDIT/OBLIGATOIRE rules with ❌/✅ markers
   - Replaced placeholder examples with real concrete examples
   - Added two example responses (high confidence + low confidence)
   - Final empty JSON template guides structure without placeholders
   - Reinforced: use "confidence" field for uncertainty, not annotations

Results:
- "A Cartesian critique... (si c'est bien le titre)" → "A Cartesian critique of the artificial intelligence"
- "Titre corrigé si nécessaire (ex: ...)" → "Computationalism and The Case When the Brain Is Not a Computer"
- "Titre de l'article principal (à identifier)" → "Computationalism in the Philosophy of Mind"

All future document uploads will now extract clean metadata without
LLM commentary or placeholder instructions.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

2026-01-08 23:59:25 +01:00

1 Commits