fix: Adapt hierarchical display for mismatched sectionPath formats

Root cause:
- Summary.sectionPath: '635. As for the subject...' (paragraph numbers)
- Chunk.sectionPath: 'Peirce: CP 4.47 > 47. §3 THE NATURE...' (canonical refs)
- No way to match them with prefix/equal filters

Solution (workaround until summaries are regenerated):
- Show sections as **context** (relevant high-level topics found)
- Show chunks **globally** (top 20 most relevant passages)
- Don't try to group chunks under sections

UI changes:
- '📚 Sections pertinentes trouvées' (context cards with summary)
- '📄 Passages les plus pertinents' (top chunks, not grouped)
- Cleaner, more honest representation of what we found

Next steps to fully fix:
- Regenerate Summary collection with correct sectionPath format
- Or create a mapping between Summary titles and Chunk sectionPaths
This commit is contained in:
2026-01-01 15:51:11 +01:00
parent 47cf21867f
commit d824269606
2 changed files with 63 additions and 67 deletions

View File

@@ -457,22 +457,18 @@ def hierarchical_search(
for obj in chunks_result.objects
]
# Distribute chunks to sections using prefix matching
all_chunks = []
# NOTE: Summary.sectionPath format doesn't match Chunk.sectionPath
# This is a data quality issue that needs to be fixed at ingestion
# For now, sections provide context, chunks are shown globally
print(f"[HIERARCHICAL] Got {len(all_chunks_list)} chunks total")
print(f"[HIERARCHICAL] Found {len(sections_data)} relevant sections")
all_chunks = all_chunks_list
# Clear chunks from sections (they're displayed separately)
for section in sections_data:
section_ref = section["section_path"] # e.g., "Peirce: CP 2.504"
# Find chunks whose sectionPath starts with this reference
section_chunks = [
chunk for chunk in all_chunks_list
if chunk.get("sectionPath", "").startswith(section_ref)
]
# Sort by similarity and limit per section
section_chunks.sort(key=lambda x: x.get("similarity", 0) or 0, reverse=True)
section["chunks"] = section_chunks[:limit]
section["chunks_count"] = len(section["chunks"])
all_chunks.extend(section["chunks"])
section["chunks"] = []
section["chunks_count"] = 0
# Sort all chunks by similarity (descending)
all_chunks.sort(key=lambda x: x.get("similarity", 0) or 0, reverse=True)