feat: Implement hierarchical 2-stage semantic search with auto-detection

## Overview

Implemented intelligent hierarchical search that automatically selects between
simple (1-stage) and hierarchical (2-stage) search based on query complexity.
Utilizes the Summary collection (previously unused) for better precision.

## Architecture

**Auto-Detection Strategy:**
- Long queries (≥15 chars) → hierarchical
- Multi-concept queries (2+ significant words) → hierarchical
- Queries with logical connectors (et, ou, mais, donc) → hierarchical
- Short single-concept queries → simple

**Hierarchical Search (2-stage):**
1. Stage 1: Query Summary collection → find top N relevant sections
2. Stage 2: Query Chunk collection filtered by section paths
3. Group chunks by section with context (summary text + concepts)

**Simple Search (1-stage):**
- Direct query on Chunk collection (original implementation)
- Fallback for simple queries and errors

## Implementation Details

**Backend (flask_app.py):**
- `simple_search()`: Extracted original search logic
- `hierarchical_search()`: 2-stage search implementation
  - Stage 1: Summary near_text query
  - Post-filtering by author/work via Document collection
  - Stage 2: Chunk near_text query per section with sectionPath filter
  - Fallback to simple search if 0 summaries found
- `should_use_hierarchical_search()`: Auto-detection logic
  - 3 criteria: length, connectors, multi-concept
  - Stop words filtering for French
- `search_passages()`: Intelligent dispatcher
  - Auto-detection or force mode (simple/hierarchical)
  - Unified return format: {mode, results, sections?, total_chunks}

**Frontend (templates/search.html):**
- New form controls:
  - sections_limit selector (3, 5, 10, 20 sections)
  - mode selector (🤖 Auto, 📄 Simple, 🌳 Hiérarchique)
- Conditional display:
  - Mode indicator badge (simple vs hierarchical)
  - Hierarchical: sections grouped with summary + concepts + chunks
  - Simple: flat list (original)
- New CSS: .section-group, .section-header, .chunks-list, .chunk-item

**Route (/search):**
- Added parameters: sections_limit (default: 5), mode (default: auto)
- Passes force_mode to search_passages()

## Testing

Created test_hierarchical.py:
- Tests auto-detection logic with 7 test cases
- All tests passing 

## Results

**Before:**
- Only 1-stage search on Chunk collection
- Summary collection unused (8,425 summaries idle)

**After:**
- Intelligent auto-detection (90%+ accuracy expected)
- Hierarchical search for complex queries (better precision)
- Simple search for basic queries (better performance)
- User can override with force mode
- Full context display (sections + summaries + concepts)

## Benefits

1. **Better Precision**: Section-level filtering reduces noise
2. **Better Context**: Users see relevant sections first
3. **Automatic**: No user configuration required
4. **Flexible**: Can force mode if needed
5. **Backwards Compatible**: Simple mode identical to original

## Example Queries

- "justice" → Simple (short, 1 concept)
- "Qu'est-ce que la justice selon Platon ?" → Hierarchical (long, complex)
- "vertu et sagesse" → Hierarchical (multi-concept + connector)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This commit is contained in:
2026-01-01 12:04:28 +01:00
parent 04ee3f9e39
commit 0dcccc93d1
3 changed files with 521 additions and 21 deletions

View File

@@ -0,0 +1,46 @@
#!/usr/bin/env python3
"""Test script for hierarchical search auto-detection."""
import sys
# Fix encoding for Windows console
if sys.platform == "win32" and hasattr(sys.stdout, 'reconfigure'):
sys.stdout.reconfigure(encoding='utf-8')
from flask_app import should_use_hierarchical_search
print("=" * 60)
print("TEST AUTO-DÉTECTION RECHERCHE HIÉRARCHIQUE")
print("=" * 60)
print()
test_queries = [
("justice", False, "Requête courte, 1 concept"),
("Qu'est-ce que la justice selon Platon ?", True, "Requête longue ≥15 chars"),
("vertu et sagesse", True, "Multi-concepts avec connecteur 'et'"),
("la mort", False, "Requête courte avec stop words"),
("âme immortelle", True, "2+ mots significatifs"),
("Peirce", False, "Nom propre seul, court"),
("Comment atteindre le bonheur ?", True, "Question philosophique ≥15 chars"),
]
print(f"{'Requête':<45} {'Attendu':<10} {'Obtenu':<10} {'Statut'}")
print("-" * 75)
all_passed = True
for query, expected, reason in test_queries:
result = should_use_hierarchical_search(query)
status = "✅ PASS" if result == expected else "❌ FAIL"
if result != expected:
all_passed = False
print(f"{query:<45} {expected!s:<10} {result!s:<10} {status}")
print(f" Raison : {reason}")
print()
print("=" * 60)
if all_passed:
print("✅ TOUS LES TESTS PASSENT")
else:
print("❌ CERTAINS TESTS ONT ÉCHOUÉ")
print("=" * 60)