Files
linear-coding-agent/generations/library_rag/templates/search.html
David Blanc Brioir 0dcccc93d1 feat: Implement hierarchical 2-stage semantic search with auto-detection
## Overview

Implemented intelligent hierarchical search that automatically selects between
simple (1-stage) and hierarchical (2-stage) search based on query complexity.
Utilizes the Summary collection (previously unused) for better precision.

## Architecture

**Auto-Detection Strategy:**
- Long queries (≥15 chars) → hierarchical
- Multi-concept queries (2+ significant words) → hierarchical
- Queries with logical connectors (et, ou, mais, donc) → hierarchical
- Short single-concept queries → simple

**Hierarchical Search (2-stage):**
1. Stage 1: Query Summary collection → find top N relevant sections
2. Stage 2: Query Chunk collection filtered by section paths
3. Group chunks by section with context (summary text + concepts)

**Simple Search (1-stage):**
- Direct query on Chunk collection (original implementation)
- Fallback for simple queries and errors

## Implementation Details

**Backend (flask_app.py):**
- `simple_search()`: Extracted original search logic
- `hierarchical_search()`: 2-stage search implementation
  - Stage 1: Summary near_text query
  - Post-filtering by author/work via Document collection
  - Stage 2: Chunk near_text query per section with sectionPath filter
  - Fallback to simple search if 0 summaries found
- `should_use_hierarchical_search()`: Auto-detection logic
  - 3 criteria: length, connectors, multi-concept
  - Stop words filtering for French
- `search_passages()`: Intelligent dispatcher
  - Auto-detection or force mode (simple/hierarchical)
  - Unified return format: {mode, results, sections?, total_chunks}

**Frontend (templates/search.html):**
- New form controls:
  - sections_limit selector (3, 5, 10, 20 sections)
  - mode selector (🤖 Auto, 📄 Simple, 🌳 Hiérarchique)
- Conditional display:
  - Mode indicator badge (simple vs hierarchical)
  - Hierarchical: sections grouped with summary + concepts + chunks
  - Simple: flat list (original)
- New CSS: .section-group, .section-header, .chunks-list, .chunk-item

**Route (/search):**
- Added parameters: sections_limit (default: 5), mode (default: auto)
- Passes force_mode to search_passages()

## Testing

Created test_hierarchical.py:
- Tests auto-detection logic with 7 test cases
- All tests passing 

## Results

**Before:**
- Only 1-stage search on Chunk collection
- Summary collection unused (8,425 summaries idle)

**After:**
- Intelligent auto-detection (90%+ accuracy expected)
- Hierarchical search for complex queries (better precision)
- Simple search for basic queries (better performance)
- User can override with force mode
- Full context display (sections + summaries + concepts)

## Benefits

1. **Better Precision**: Section-level filtering reduces noise
2. **Better Context**: Users see relevant sections first
3. **Automatic**: No user configuration required
4. **Flexible**: Can force mode if needed
5. **Backwards Compatible**: Simple mode identical to original

## Example Queries

- "justice" → Simple (short, 1 concept)
- "Qu'est-ce que la justice selon Platon ?" → Hierarchical (long, complex)
- "vertu et sagesse" → Hierarchical (multi-concept + connector)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-01 12:04:28 +01:00

270 lines
13 KiB
HTML

{% extends "base.html" %}
{% block title %}Recherche{% endblock %}
{% block content %}
<style>
.section-group {
margin-bottom: 2rem;
border: 1px solid #dee2e6;
border-radius: 8px;
padding: 1.5rem;
background: linear-gradient(135deg, #f8f9fa 0%, #ffffff 100%);
box-shadow: 0 2px 4px rgba(0,0,0,0.05);
}
.section-header {
margin-bottom: 1rem;
padding-bottom: 0.75rem;
border-bottom: 2px solid #007bff;
}
.summary-text {
margin: 0.5rem 0;
font-style: italic;
color: #555;
line-height: 1.5;
}
.concepts {
margin-top: 0.5rem;
}
.chunks-list {
margin-left: 1.5rem;
margin-top: 1rem;
}
.chunk-item {
background: white;
padding: 1rem;
margin-bottom: 0.75rem;
border-left: 3px solid #007bff;
border-radius: 4px;
box-shadow: 0 1px 3px rgba(0,0,0,0.08);
transition: box-shadow 0.2s ease;
}
.chunk-item:hover {
box-shadow: 0 2px 6px rgba(0,0,0,0.12);
}
</style>
<section class="section">
<h1>🔍 Recherche sémantique</h1>
<p class="lead">Posez une question en langage naturel pour trouver des passages pertinents</p>
<!-- Search form -->
<div class="search-box">
<form method="get" action="/search">
<div class="form-group">
<label class="form-label" for="q">Votre question</label>
<input
type="text"
name="q"
id="q"
class="form-control search-input"
value="{{ query }}"
placeholder="Ex: Qu'est-ce que la sagesse ? Pourquoi philosopher ?"
autofocus
>
</div>
<div class="form-row">
<div class="form-group">
<label class="form-label" for="author">Auteur</label>
<select name="author" id="author" class="form-control">
<option value="">Tous les auteurs</option>
{% if stats and stats.author_list %}
{% for author in stats.author_list %}
<option value="{{ author }}" {{ 'selected' if author_filter == author else '' }}>{{ author }}</option>
{% endfor %}
{% endif %}
</select>
</div>
<div class="form-group">
<label class="form-label" for="work">Œuvre</label>
<select name="work" id="work" class="form-control">
<option value="">Toutes les œuvres</option>
{% if stats and stats.work_list %}
{% for work in stats.work_list %}
<option value="{{ work }}" {{ 'selected' if work_filter == work else '' }}>{{ work }}</option>
{% endfor %}
{% endif %}
</select>
</div>
<div class="form-group">
<label class="form-label" for="limit">Résultats</label>
<select name="limit" id="limit" class="form-control">
<option value="5" {{ 'selected' if limit == 5 else '' }}>5</option>
<option value="10" {{ 'selected' if limit == 10 else '' }}>10</option>
<option value="20" {{ 'selected' if limit == 20 else '' }}>20</option>
</select>
</div>
</div>
<div class="form-row">
<div class="form-group">
<label class="form-label" for="sections_limit">Sections (hiérarchique)</label>
<select name="sections_limit" id="sections_limit" class="form-control">
<option value="3" {{ 'selected' if sections_limit == 3 else '' }}>3 sections</option>
<option value="5" {{ 'selected' if sections_limit == 5 else '' }}>5 sections</option>
<option value="10" {{ 'selected' if sections_limit == 10 else '' }}>10 sections</option>
<option value="20" {{ 'selected' if sections_limit == 20 else '' }}>20 sections</option>
</select>
</div>
<div class="form-group">
<label class="form-label" for="mode">Mode de recherche</label>
<select name="mode" id="mode" class="form-control">
<option value="" {{ 'selected' if not mode else '' }}>🤖 Auto-détection</option>
<option value="simple" {{ 'selected' if mode == 'simple' else '' }}>📄 Simple (1-étape)</option>
<option value="hierarchical" {{ 'selected' if mode == 'hierarchical' else '' }}>🌳 Hiérarchique (2-étapes)</option>
</select>
</div>
</div>
<div class="mt-2">
<button type="submit" class="btn btn-primary">Rechercher</button>
<a href="/search" class="btn" style="margin-left: 0.5rem;">Réinitialiser</a>
</div>
</form>
</div>
<!-- Results -->
{% if query %}
<div class="ornament">·</div>
{% if results_data and results_data.results %}
<!-- Mode indicator and stats -->
<div class="mb-3" style="display: flex; align-items: center; gap: 1rem; flex-wrap: wrap;">
<div>
<strong>{{ results_data.total_chunks }}</strong> passage{% if results_data.total_chunks > 1 %}s{% endif %} trouvé{% if results_data.total_chunks > 1 %}s{% endif %}
</div>
<div>
{% if results_data.mode == "hierarchical" %}
<span class="badge" style="background-color: #28a745; color: white; font-size: 0.9em;">
🌳 Recherche hiérarchique ({{ results_data.sections|length }} sections)
</span>
{% else %}
<span class="badge" style="background-color: #6c757d; color: white; font-size: 0.9em;">
📄 Recherche simple
</span>
{% endif %}
</div>
{% if author_filter or work_filter %}
<div style="flex-grow: 1;"></div>
<div>
{% if author_filter %}
<span class="badge badge-author">{{ author_filter }}</span>
{% endif %}
{% if work_filter %}
<span class="badge badge-work">{{ work_filter }}</span>
{% endif %}
</div>
{% endif %}
</div>
<!-- Hierarchical display -->
{% if results_data.mode == "hierarchical" and results_data.sections %}
{% for section in results_data.sections %}
<div class="section-group">
<div class="section-header">
<h3 style="margin: 0 0 0.5rem 0; font-size: 1.3em;">
📂 {{ section.title }}
<span class="badge" style="background-color: #007bff; color: white; margin-left: 0.5rem;">{{ section.chunks_count }} passage{% if section.chunks_count > 1 %}s{% endif %}</span>
<span class="badge badge-success" style="margin-left: 0.5rem;">⚡ {{ section.similarity }}% similaire</span>
</h3>
<p class="text-muted" style="margin: 0.25rem 0; font-size: 0.9em;">{{ section.section_path }}</p>
{% if section.summary_text %}
<p class="summary-text" style="margin: 0.5rem 0; font-style: italic; color: #555;">{{ section.summary_text }}</p>
{% endif %}
{% if section.concepts %}
<div class="concepts" style="margin-top: 0.5rem;">
{% for concept in section.concepts %}
<span class="badge badge-info" style="margin-right: 0.25rem;">{{ concept }}</span>
{% endfor %}
</div>
{% endif %}
</div>
<!-- Chunks within this section -->
{% if section.chunks %}
<div class="chunks-list" style="margin-left: 1.5rem; margin-top: 1rem;">
{% for chunk in section.chunks %}
<div class="chunk-item" style="background: white; padding: 1rem; margin-bottom: 0.75rem; border-left: 3px solid #007bff; border-radius: 4px;">
<div style="margin-bottom: 0.5rem;">
<span class="badge badge-success">⚡ {{ chunk.similarity }}% similaire</span>
</div>
<div class="passage-text" style="margin-bottom: 0.5rem;">"{{ chunk.text }}"</div>
<div class="passage-meta" style="font-size: 0.85em; color: #666;">
<strong>Type :</strong> {{ chunk.unitType or '—' }} &nbsp;&nbsp;
<strong>Langue :</strong> {{ (chunk.language or '—') | upper }} &nbsp;&nbsp;
<strong>Index :</strong> {{ chunk.orderIndex or '—' }}
</div>
{% if chunk.keywords %}
<div style="margin-top: 0.5rem;">
{% for kw in chunk.keywords %}
<span class="keyword-tag">{{ kw }}</span>
{% endfor %}
</div>
{% endif %}
</div>
{% endfor %}
</div>
{% endif %}
</div>
{% endfor %}
<!-- Simple display (original) -->
{% else %}
{% for result in results_data.results %}
<div class="passage-card">
<div class="passage-header">
<div>
<span class="badge badge-work">{{ result.work.title if result.work else '?' }} {{ result.sectionPath or '' }}</span>
<span class="badge badge-author">{{ result.work.author if result.work else 'Anonyme' }}</span>
</div>
{% if result.similarity %}
<span class="badge badge-similarity">⚡ {{ result.similarity }}% similaire</span>
{% endif %}
</div>
<div class="passage-text">"{{ result.text }}"</div>
<div class="passage-meta">
<strong>Type :</strong> {{ result.unitType or '—' }} &nbsp;&nbsp;
<strong>Langue :</strong> {{ (result.language or '—') | upper }} &nbsp;&nbsp;
<strong>Index :</strong> {{ result.orderIndex or '—' }}
</div>
{% if result.keywords %}
<div class="mt-2">
{% for kw in result.keywords %}
<span class="keyword-tag">{{ kw }}</span>
{% endfor %}
</div>
{% endif %}
</div>
{% endfor %}
{% endif %}
{% else %}
<div class="empty-state">
<div class="empty-state-icon">🔮</div>
<h3>Aucun résultat trouvé</h3>
<p class="text-muted">Essayez une autre formulation ou modifiez vos filtres.</p>
</div>
{% endif %}
{% else %}
<!-- Suggestions -->
<div class="card">
<h3>💡 Suggestions de recherche</h3>
<div class="mt-2">
<p class="mb-2">Voici quelques exemples de questions que vous pouvez poser :</p>
<div>
<a href="/search?q=Qu%27est-ce%20que%20la%20vertu%20%3F" class="badge" style="cursor: pointer;">Qu'est-ce que la vertu ?</a>
<a href="/search?q=La%20mort%20est-elle%20%C3%A0%20craindre%20%3F" class="badge" style="cursor: pointer;">La mort est-elle à craindre ?</a>
<a href="/search?q=Comment%20atteindre%20le%20bonheur%20%3F" class="badge" style="cursor: pointer;">Comment atteindre le bonheur ?</a>
<a href="/search?q=Qu%27est-ce%20que%20la%20justice%20%3F" class="badge" style="cursor: pointer;">Qu'est-ce que la justice ?</a>
<a href="/search?q=L%27%C3%A2me%20est-elle%20immortelle%20%3F" class="badge" style="cursor: pointer;">L'âme est-elle immortelle ?</a>
</div>
</div>
</div>
{% endif %}
</section>
{% endblock %}