fix: Adapt hierarchical display for mismatched sectionPath formats

Root cause: - Summary.sectionPath: '635. As for the subject...' (paragraph numbers) - Chunk.sectionPath: 'Peirce: CP 4.47 > 47. §3 THE NATURE...' (canonical refs) - No way to match them with prefix/equal filters Solution (workaround until summaries are regenerated): - Show sections as **context** (relevant high-level topics found) - Show chunks **globally** (top 20 most relevant passages) - Don't try to group chunks under sections UI changes: - '📚 Sections pertinentes trouvées' (context cards with summary) - '📄 Passages les plus pertinents' (top chunks, not grouped) - Cleaner, more honest representation of what we found Next steps to fully fix: - Regenerate Summary collection with correct sectionPath format - Or create a mapping between Summary titles and Chunk sectionPaths
2026-01-01 15:51:11 +01:00
parent 47cf21867f
commit d824269606
2 changed files with 63 additions and 67 deletions
--- a/generations/library_rag/flask_app.py
+++ b/generations/library_rag/flask_app.py
@@ -457,22 +457,18 @@ def hierarchical_search(
                for obj in chunks_result.objects
            ]

-            # Distribute chunks to sections using prefix matching
-            all_chunks = []
+            # NOTE: Summary.sectionPath format doesn't match Chunk.sectionPath
+            # This is a data quality issue that needs to be fixed at ingestion
+            # For now, sections provide context, chunks are shown globally
+            print(f"[HIERARCHICAL] Got {len(all_chunks_list)} chunks total")
+            print(f"[HIERARCHICAL] Found {len(sections_data)} relevant sections")
+
+            all_chunks = all_chunks_list
+
+            # Clear chunks from sections (they're displayed separately)
            for section in sections_data:
-                section_ref = section["section_path"]  # e.g., "Peirce: CP 2.504"
-
-                # Find chunks whose sectionPath starts with this reference
-                section_chunks = [
-                    chunk for chunk in all_chunks_list
-                    if chunk.get("sectionPath", "").startswith(section_ref)
-                ]
-
-                # Sort by similarity and limit per section
-                section_chunks.sort(key=lambda x: x.get("similarity", 0) or 0, reverse=True)
-                section["chunks"] = section_chunks[:limit]
-                section["chunks_count"] = len(section["chunks"])
-                all_chunks.extend(section["chunks"])
+                section["chunks"] = []
+                section["chunks_count"] = 0

            # Sort all chunks by similarity (descending)
            all_chunks.sort(key=lambda x: x.get("similarity", 0) or 0, reverse=True)