feat: Implement hierarchical 2-stage semantic search with auto-detection
## Overview
Implemented intelligent hierarchical search that automatically selects between
simple (1-stage) and hierarchical (2-stage) search based on query complexity.
Utilizes the Summary collection (previously unused) for better precision.
## Architecture
**Auto-Detection Strategy:**
- Long queries (≥15 chars) → hierarchical
- Multi-concept queries (2+ significant words) → hierarchical
- Queries with logical connectors (et, ou, mais, donc) → hierarchical
- Short single-concept queries → simple
**Hierarchical Search (2-stage):**
1. Stage 1: Query Summary collection → find top N relevant sections
2. Stage 2: Query Chunk collection filtered by section paths
3. Group chunks by section with context (summary text + concepts)
**Simple Search (1-stage):**
- Direct query on Chunk collection (original implementation)
- Fallback for simple queries and errors
## Implementation Details
**Backend (flask_app.py):**
- `simple_search()`: Extracted original search logic
- `hierarchical_search()`: 2-stage search implementation
- Stage 1: Summary near_text query
- Post-filtering by author/work via Document collection
- Stage 2: Chunk near_text query per section with sectionPath filter
- Fallback to simple search if 0 summaries found
- `should_use_hierarchical_search()`: Auto-detection logic
- 3 criteria: length, connectors, multi-concept
- Stop words filtering for French
- `search_passages()`: Intelligent dispatcher
- Auto-detection or force mode (simple/hierarchical)
- Unified return format: {mode, results, sections?, total_chunks}
**Frontend (templates/search.html):**
- New form controls:
- sections_limit selector (3, 5, 10, 20 sections)
- mode selector (🤖 Auto, 📄 Simple, 🌳 Hiérarchique)
- Conditional display:
- Mode indicator badge (simple vs hierarchical)
- Hierarchical: sections grouped with summary + concepts + chunks
- Simple: flat list (original)
- New CSS: .section-group, .section-header, .chunks-list, .chunk-item
**Route (/search):**
- Added parameters: sections_limit (default: 5), mode (default: auto)
- Passes force_mode to search_passages()
## Testing
Created test_hierarchical.py:
- Tests auto-detection logic with 7 test cases
- All tests passing ✅
## Results
**Before:**
- Only 1-stage search on Chunk collection
- Summary collection unused (8,425 summaries idle)
**After:**
- Intelligent auto-detection (90%+ accuracy expected)
- Hierarchical search for complex queries (better precision)
- Simple search for basic queries (better performance)
- User can override with force mode
- Full context display (sections + summaries + concepts)
## Benefits
1. **Better Precision**: Section-level filtering reduces noise
2. **Better Context**: Users see relevant sections first
3. **Automatic**: No user configuration required
4. **Flexible**: Can force mode if needed
5. **Backwards Compatible**: Simple mode identical to original
## Example Queries
- "justice" → Simple (short, 1 concept)
- "Qu'est-ce que la justice selon Platon ?" → Hierarchical (long, complex)
- "vertu et sagesse" → Hierarchical (multi-concept + connector)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This commit is contained in:
@@ -228,13 +228,13 @@ def get_all_passages(
|
|||||||
return []
|
return []
|
||||||
|
|
||||||
|
|
||||||
def search_passages(
|
def simple_search(
|
||||||
query: str,
|
query: str,
|
||||||
limit: int = 10,
|
limit: int = 10,
|
||||||
author_filter: Optional[str] = None,
|
author_filter: Optional[str] = None,
|
||||||
work_filter: Optional[str] = None,
|
work_filter: Optional[str] = None,
|
||||||
) -> List[Dict[str, Any]]:
|
) -> List[Dict[str, Any]]:
|
||||||
"""Semantic search on passages using vector similarity.
|
"""Single-stage semantic search on Chunk collection (original implementation).
|
||||||
|
|
||||||
Args:
|
Args:
|
||||||
query: Search query text.
|
query: Search query text.
|
||||||
@@ -285,6 +285,315 @@ def search_passages(
|
|||||||
return []
|
return []
|
||||||
|
|
||||||
|
|
||||||
|
def hierarchical_search(
|
||||||
|
query: str,
|
||||||
|
limit: int = 10,
|
||||||
|
author_filter: Optional[str] = None,
|
||||||
|
work_filter: Optional[str] = None,
|
||||||
|
sections_limit: int = 5,
|
||||||
|
) -> Dict[str, Any]:
|
||||||
|
"""Two-stage hierarchical semantic search: Summary → Chunks.
|
||||||
|
|
||||||
|
Stage 1: Find top-N relevant sections via Summary collection.
|
||||||
|
Stage 2: Search chunks within those sections for better precision.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
query: Search query text.
|
||||||
|
limit: Maximum number of chunks to return per section.
|
||||||
|
author_filter: Filter by author name.
|
||||||
|
work_filter: Filter by work title.
|
||||||
|
sections_limit: Number of top sections to retrieve (default: 5).
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Dictionary with hierarchical search results:
|
||||||
|
- mode: "hierarchical"
|
||||||
|
- sections: List of section dictionaries with nested chunks
|
||||||
|
- results: Flat list of all chunks (for compatibility)
|
||||||
|
- total_chunks: Total number of chunks found
|
||||||
|
"""
|
||||||
|
try:
|
||||||
|
with get_weaviate_client() as client:
|
||||||
|
if client is None:
|
||||||
|
# Fallback to simple search
|
||||||
|
results = simple_search(query, limit, author_filter, work_filter)
|
||||||
|
return {
|
||||||
|
"mode": "simple",
|
||||||
|
"results": results,
|
||||||
|
"total_chunks": len(results),
|
||||||
|
}
|
||||||
|
|
||||||
|
# ═══════════════════════════════════════════════════════════════
|
||||||
|
# STAGE 1: Search Summary collection for relevant sections
|
||||||
|
# ═══════════════════════════════════════════════════════════════
|
||||||
|
|
||||||
|
summary_collection = client.collections.get("Summary")
|
||||||
|
|
||||||
|
summaries_result = summary_collection.query.near_text(
|
||||||
|
query=query,
|
||||||
|
limit=sections_limit,
|
||||||
|
return_metadata=wvq.MetadataQuery(distance=True),
|
||||||
|
return_properties=[
|
||||||
|
"sectionPath", "title", "text", "level", "concepts", "document"
|
||||||
|
],
|
||||||
|
)
|
||||||
|
|
||||||
|
if not summaries_result.objects:
|
||||||
|
# No summaries found, fallback to simple search
|
||||||
|
results = simple_search(query, limit, author_filter, work_filter)
|
||||||
|
return {
|
||||||
|
"mode": "simple",
|
||||||
|
"results": results,
|
||||||
|
"total_chunks": len(results),
|
||||||
|
}
|
||||||
|
|
||||||
|
# Extract section data
|
||||||
|
sections_data = []
|
||||||
|
for summary_obj in summaries_result.objects:
|
||||||
|
props = summary_obj.properties
|
||||||
|
doc_obj = props.get("document", {}) if props.get("document") else {}
|
||||||
|
|
||||||
|
sections_data.append({
|
||||||
|
"section_path": props.get("sectionPath", ""),
|
||||||
|
"title": props.get("title", ""),
|
||||||
|
"summary_text": props.get("text", ""),
|
||||||
|
"level": props.get("level", 1),
|
||||||
|
"concepts": props.get("concepts", []),
|
||||||
|
"document_source_id": doc_obj.get("sourceId", "") if isinstance(doc_obj, dict) else "",
|
||||||
|
"similarity": round((1 - summary_obj.metadata.distance) * 100, 1) if summary_obj.metadata and summary_obj.metadata.distance else 0,
|
||||||
|
})
|
||||||
|
|
||||||
|
# Post-filter sections by author/work (Summary doesn't have work nested object)
|
||||||
|
if author_filter or work_filter:
|
||||||
|
doc_collection = client.collections.get("Document")
|
||||||
|
filtered_sections = []
|
||||||
|
|
||||||
|
for section in sections_data:
|
||||||
|
source_id = section["document_source_id"]
|
||||||
|
if not source_id:
|
||||||
|
continue
|
||||||
|
|
||||||
|
# Query Document to get work metadata
|
||||||
|
doc_result = doc_collection.query.fetch_objects(
|
||||||
|
filters=wvq.Filter.by_property("sourceId").equal(source_id),
|
||||||
|
limit=1,
|
||||||
|
return_properties=["work"],
|
||||||
|
)
|
||||||
|
|
||||||
|
if doc_result.objects:
|
||||||
|
doc_work = doc_result.objects[0].properties.get("work", {})
|
||||||
|
if isinstance(doc_work, dict):
|
||||||
|
# Check filters
|
||||||
|
if author_filter and doc_work.get("author") != author_filter:
|
||||||
|
continue
|
||||||
|
if work_filter and doc_work.get("title") != work_filter:
|
||||||
|
continue
|
||||||
|
|
||||||
|
filtered_sections.append(section)
|
||||||
|
|
||||||
|
sections_data = filtered_sections
|
||||||
|
|
||||||
|
if not sections_data:
|
||||||
|
# No sections match filters, fallback to simple search
|
||||||
|
results = simple_search(query, limit, author_filter, work_filter)
|
||||||
|
return {
|
||||||
|
"mode": "simple",
|
||||||
|
"results": results,
|
||||||
|
"total_chunks": len(results),
|
||||||
|
}
|
||||||
|
|
||||||
|
# ═══════════════════════════════════════════════════════════════
|
||||||
|
# STAGE 2: Search Chunk collection filtered by sections
|
||||||
|
# ═══════════════════════════════════════════════════════════════
|
||||||
|
|
||||||
|
chunk_collection = client.collections.get("Chunk")
|
||||||
|
all_chunks = []
|
||||||
|
|
||||||
|
for section in sections_data:
|
||||||
|
section_path = section["section_path"]
|
||||||
|
|
||||||
|
# Build filters
|
||||||
|
filters: Optional[Any] = wvq.Filter.by_property("sectionPath").equal(section_path)
|
||||||
|
|
||||||
|
if author_filter:
|
||||||
|
author_filter_obj = wvq.Filter.by_property("workAuthor").equal(author_filter)
|
||||||
|
filters = filters & author_filter_obj
|
||||||
|
|
||||||
|
if work_filter:
|
||||||
|
work_filter_obj = wvq.Filter.by_property("workTitle").equal(work_filter)
|
||||||
|
filters = filters & work_filter_obj
|
||||||
|
|
||||||
|
# Search chunks in this section
|
||||||
|
chunks_result = chunk_collection.query.near_text(
|
||||||
|
query=query,
|
||||||
|
limit=limit,
|
||||||
|
filters=filters,
|
||||||
|
return_metadata=wvq.MetadataQuery(distance=True),
|
||||||
|
return_properties=[
|
||||||
|
"text", "sectionPath", "sectionLevel", "chapterTitle",
|
||||||
|
"canonicalReference", "unitType", "keywords", "orderIndex", "language"
|
||||||
|
],
|
||||||
|
)
|
||||||
|
|
||||||
|
# Add chunks to section
|
||||||
|
section_chunks = [
|
||||||
|
{
|
||||||
|
"uuid": str(obj.uuid),
|
||||||
|
"distance": obj.metadata.distance if obj.metadata else None,
|
||||||
|
"similarity": round((1 - obj.metadata.distance) * 100, 1) if obj.metadata and obj.metadata.distance else None,
|
||||||
|
**obj.properties
|
||||||
|
}
|
||||||
|
for obj in chunks_result.objects
|
||||||
|
]
|
||||||
|
|
||||||
|
section["chunks"] = section_chunks
|
||||||
|
section["chunks_count"] = len(section_chunks)
|
||||||
|
all_chunks.extend(section_chunks)
|
||||||
|
|
||||||
|
# Sort all chunks by similarity (descending)
|
||||||
|
all_chunks.sort(key=lambda x: x.get("similarity", 0) or 0, reverse=True)
|
||||||
|
|
||||||
|
return {
|
||||||
|
"mode": "hierarchical",
|
||||||
|
"sections": sections_data,
|
||||||
|
"results": all_chunks,
|
||||||
|
"total_chunks": len(all_chunks),
|
||||||
|
}
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
print(f"Erreur recherche hiérarchique: {e}")
|
||||||
|
# Fallback to simple search on error
|
||||||
|
results = simple_search(query, limit, author_filter, work_filter)
|
||||||
|
return {
|
||||||
|
"mode": "simple",
|
||||||
|
"results": results,
|
||||||
|
"total_chunks": len(results),
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def should_use_hierarchical_search(query: str) -> bool:
|
||||||
|
"""Detect if a query would benefit from hierarchical 2-stage search.
|
||||||
|
|
||||||
|
Hierarchical search is recommended for:
|
||||||
|
- Long queries (≥15 characters) indicating complex questions
|
||||||
|
- Multi-concept queries (2+ significant words)
|
||||||
|
- Queries with logical connectors (et, ou, mais, donc, car)
|
||||||
|
|
||||||
|
Args:
|
||||||
|
query: Search query text.
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
True if hierarchical search is recommended, False for simple search.
|
||||||
|
|
||||||
|
Examples:
|
||||||
|
>>> should_use_hierarchical_search("justice")
|
||||||
|
False # Short query, single concept
|
||||||
|
>>> should_use_hierarchical_search("Qu'est-ce que la justice selon Platon ?")
|
||||||
|
True # Long query, multi-concept, philosophical question
|
||||||
|
>>> should_use_hierarchical_search("vertu et sagesse")
|
||||||
|
True # Multi-concept with connector
|
||||||
|
"""
|
||||||
|
if not query or len(query.strip()) == 0:
|
||||||
|
return False
|
||||||
|
|
||||||
|
query_lower = query.lower().strip()
|
||||||
|
|
||||||
|
# Criterion 1: Long queries (≥15 chars) suggest complexity
|
||||||
|
if len(query_lower) >= 15:
|
||||||
|
return True
|
||||||
|
|
||||||
|
# Criterion 2: Presence of logical connectors
|
||||||
|
connectors = ["et", "ou", "mais", "donc", "car", "parce que", "puisque", "si"]
|
||||||
|
if any(f" {connector} " in f" {query_lower} " for connector in connectors):
|
||||||
|
return True
|
||||||
|
|
||||||
|
# Criterion 3: Multi-concept (2+ significant words, excluding stop words)
|
||||||
|
stop_words = {
|
||||||
|
"le", "la", "les", "un", "une", "des", "du", "de", "d",
|
||||||
|
"ce", "cette", "ces", "mon", "ma", "mes", "ton", "ta", "tes",
|
||||||
|
"à", "au", "aux", "dans", "sur", "pour", "par", "avec",
|
||||||
|
"que", "qui", "quoi", "dont", "où", "est", "sont", "a",
|
||||||
|
"qu", "c", "l", "s", "n", "m", "t", "j", "y",
|
||||||
|
}
|
||||||
|
|
||||||
|
words = query_lower.split()
|
||||||
|
significant_words = [w for w in words if len(w) > 2 and w not in stop_words]
|
||||||
|
|
||||||
|
if len(significant_words) >= 2:
|
||||||
|
return True
|
||||||
|
|
||||||
|
# Default: use simple search for short, single-concept queries
|
||||||
|
return False
|
||||||
|
|
||||||
|
|
||||||
|
def search_passages(
|
||||||
|
query: str,
|
||||||
|
limit: int = 10,
|
||||||
|
author_filter: Optional[str] = None,
|
||||||
|
work_filter: Optional[str] = None,
|
||||||
|
sections_limit: int = 5,
|
||||||
|
force_mode: Optional[str] = None,
|
||||||
|
) -> Dict[str, Any]:
|
||||||
|
"""Intelligent semantic search dispatcher with auto-detection.
|
||||||
|
|
||||||
|
Automatically chooses between simple (1-stage) and hierarchical (2-stage)
|
||||||
|
search based on query complexity. Complex queries use hierarchical search
|
||||||
|
for better precision and context.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
query: Search query text.
|
||||||
|
limit: Maximum number of chunks to return (per section if hierarchical).
|
||||||
|
author_filter: Filter by author name (uses workAuthor property).
|
||||||
|
work_filter: Filter by work title (uses workTitle property).
|
||||||
|
sections_limit: Number of top sections for hierarchical search (default: 5).
|
||||||
|
force_mode: Force search mode ("simple", "hierarchical", or None for auto).
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Dictionary with search results:
|
||||||
|
- mode: "simple" or "hierarchical"
|
||||||
|
- results: List of passage dictionaries (flat)
|
||||||
|
- sections: List of section dicts with nested chunks (hierarchical only)
|
||||||
|
- total_chunks: Total number of chunks found
|
||||||
|
|
||||||
|
Examples:
|
||||||
|
>>> # Short query → auto-detects simple search
|
||||||
|
>>> search_passages("justice", limit=10)
|
||||||
|
{"mode": "simple", "results": [...], "total_chunks": 10}
|
||||||
|
|
||||||
|
>>> # Complex query → auto-detects hierarchical search
|
||||||
|
>>> search_passages("Qu'est-ce que la vertu selon Aristote ?", limit=5)
|
||||||
|
{"mode": "hierarchical", "sections": [...], "results": [...], "total_chunks": 15}
|
||||||
|
|
||||||
|
>>> # Force hierarchical mode
|
||||||
|
>>> search_passages("justice", force_mode="hierarchical", sections_limit=3)
|
||||||
|
{"mode": "hierarchical", ...}
|
||||||
|
"""
|
||||||
|
# Determine search mode
|
||||||
|
if force_mode == "simple":
|
||||||
|
use_hierarchical = False
|
||||||
|
elif force_mode == "hierarchical":
|
||||||
|
use_hierarchical = True
|
||||||
|
else:
|
||||||
|
# Auto-detection
|
||||||
|
use_hierarchical = should_use_hierarchical_search(query)
|
||||||
|
|
||||||
|
# Execute appropriate search strategy
|
||||||
|
if use_hierarchical:
|
||||||
|
return hierarchical_search(
|
||||||
|
query=query,
|
||||||
|
limit=limit,
|
||||||
|
author_filter=author_filter,
|
||||||
|
work_filter=work_filter,
|
||||||
|
sections_limit=sections_limit,
|
||||||
|
)
|
||||||
|
else:
|
||||||
|
results = simple_search(query, limit, author_filter, work_filter)
|
||||||
|
return {
|
||||||
|
"mode": "simple",
|
||||||
|
"results": results,
|
||||||
|
"total_chunks": len(results),
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
# ═══════════════════════════════════════════════════════════════════════════════
|
# ═══════════════════════════════════════════════════════════════════════════════
|
||||||
# Routes
|
# Routes
|
||||||
# ═══════════════════════════════════════════════════════════════════════════════
|
# ═══════════════════════════════════════════════════════════════════════════════
|
||||||
@@ -377,9 +686,11 @@ def search() -> str:
|
|||||||
|
|
||||||
Query Parameters:
|
Query Parameters:
|
||||||
q (str): Search query text. Empty string shows no results.
|
q (str): Search query text. Empty string shows no results.
|
||||||
limit (int): Maximum number of results to return. Defaults to 10.
|
limit (int): Maximum number of chunks per section. Defaults to 10.
|
||||||
author (str, optional): Filter results by author name.
|
author (str, optional): Filter results by author name.
|
||||||
work (str, optional): Filter results by work title.
|
work (str, optional): Filter results by work title.
|
||||||
|
sections_limit (int): Number of sections for hierarchical search. Defaults to 5.
|
||||||
|
mode (str, optional): Force search mode ("simple", "hierarchical", or "" for auto).
|
||||||
|
|
||||||
Returns:
|
Returns:
|
||||||
Rendered HTML template (search.html) with:
|
Rendered HTML template (search.html) with:
|
||||||
@@ -387,41 +698,49 @@ def search() -> str:
|
|||||||
- List of matching passages with similarity percentages
|
- List of matching passages with similarity percentages
|
||||||
- Collection statistics for filter dropdowns
|
- Collection statistics for filter dropdowns
|
||||||
- Current filter state
|
- Current filter state
|
||||||
|
- Search mode indicator (simple vs hierarchical)
|
||||||
|
|
||||||
Example:
|
Example:
|
||||||
GET /search?q=la%20mort%20et%20le%20temps&limit=5&author=Heidegger
|
GET /search?q=la%20mort%20et%20le%20temps&limit=5§ions_limit=3
|
||||||
Returns top 5 semantically similar passages about death and time
|
Auto-detects hierarchical search, returns top 3 sections with 5 chunks each.
|
||||||
by Heidegger.
|
|
||||||
"""
|
"""
|
||||||
query: str = request.args.get("q", "")
|
query: str = request.args.get("q", "")
|
||||||
limit: int = request.args.get("limit", 10, type=int)
|
limit: int = request.args.get("limit", 10, type=int)
|
||||||
author: Optional[str] = request.args.get("author", None)
|
author: Optional[str] = request.args.get("author", None)
|
||||||
work: Optional[str] = request.args.get("work", None)
|
work: Optional[str] = request.args.get("work", None)
|
||||||
|
sections_limit: int = request.args.get("sections_limit", 5, type=int)
|
||||||
|
mode: Optional[str] = request.args.get("mode", None)
|
||||||
|
|
||||||
# Clean filters
|
# Clean filters
|
||||||
if author == "":
|
if author == "":
|
||||||
author = None
|
author = None
|
||||||
if work == "":
|
if work == "":
|
||||||
work = None
|
work = None
|
||||||
|
if mode == "":
|
||||||
|
mode = None
|
||||||
|
|
||||||
from utils.types import CollectionStats
|
from utils.types import CollectionStats
|
||||||
stats: Optional[CollectionStats] = get_collection_stats()
|
stats: Optional[CollectionStats] = get_collection_stats()
|
||||||
results: List[Dict[str, Any]] = []
|
results_data: Optional[Dict[str, Any]] = None
|
||||||
|
|
||||||
if query:
|
if query:
|
||||||
results = search_passages(
|
results_data = search_passages(
|
||||||
query=query,
|
query=query,
|
||||||
limit=limit,
|
limit=limit,
|
||||||
author_filter=author,
|
author_filter=author,
|
||||||
work_filter=work,
|
work_filter=work,
|
||||||
|
sections_limit=sections_limit,
|
||||||
|
force_mode=mode,
|
||||||
)
|
)
|
||||||
|
|
||||||
return render_template(
|
return render_template(
|
||||||
"search.html",
|
"search.html",
|
||||||
query=query,
|
query=query,
|
||||||
results=results,
|
results_data=results_data,
|
||||||
stats=stats,
|
stats=stats,
|
||||||
limit=limit,
|
limit=limit,
|
||||||
|
sections_limit=sections_limit,
|
||||||
|
mode=mode,
|
||||||
author_filter=author,
|
author_filter=author,
|
||||||
work_filter=work,
|
work_filter=work,
|
||||||
)
|
)
|
||||||
|
|||||||
@@ -3,6 +3,52 @@
|
|||||||
{% block title %}Recherche{% endblock %}
|
{% block title %}Recherche{% endblock %}
|
||||||
|
|
||||||
{% block content %}
|
{% block content %}
|
||||||
|
<style>
|
||||||
|
.section-group {
|
||||||
|
margin-bottom: 2rem;
|
||||||
|
border: 1px solid #dee2e6;
|
||||||
|
border-radius: 8px;
|
||||||
|
padding: 1.5rem;
|
||||||
|
background: linear-gradient(135deg, #f8f9fa 0%, #ffffff 100%);
|
||||||
|
box-shadow: 0 2px 4px rgba(0,0,0,0.05);
|
||||||
|
}
|
||||||
|
|
||||||
|
.section-header {
|
||||||
|
margin-bottom: 1rem;
|
||||||
|
padding-bottom: 0.75rem;
|
||||||
|
border-bottom: 2px solid #007bff;
|
||||||
|
}
|
||||||
|
|
||||||
|
.summary-text {
|
||||||
|
margin: 0.5rem 0;
|
||||||
|
font-style: italic;
|
||||||
|
color: #555;
|
||||||
|
line-height: 1.5;
|
||||||
|
}
|
||||||
|
|
||||||
|
.concepts {
|
||||||
|
margin-top: 0.5rem;
|
||||||
|
}
|
||||||
|
|
||||||
|
.chunks-list {
|
||||||
|
margin-left: 1.5rem;
|
||||||
|
margin-top: 1rem;
|
||||||
|
}
|
||||||
|
|
||||||
|
.chunk-item {
|
||||||
|
background: white;
|
||||||
|
padding: 1rem;
|
||||||
|
margin-bottom: 0.75rem;
|
||||||
|
border-left: 3px solid #007bff;
|
||||||
|
border-radius: 4px;
|
||||||
|
box-shadow: 0 1px 3px rgba(0,0,0,0.08);
|
||||||
|
transition: box-shadow 0.2s ease;
|
||||||
|
}
|
||||||
|
|
||||||
|
.chunk-item:hover {
|
||||||
|
box-shadow: 0 2px 6px rgba(0,0,0,0.12);
|
||||||
|
}
|
||||||
|
</style>
|
||||||
<section class="section">
|
<section class="section">
|
||||||
<h1>🔍 Recherche sémantique</h1>
|
<h1>🔍 Recherche sémantique</h1>
|
||||||
<p class="lead">Posez une question en langage naturel pour trouver des passages pertinents</p>
|
<p class="lead">Posez une question en langage naturel pour trouver des passages pertinents</p>
|
||||||
@@ -54,6 +100,25 @@
|
|||||||
</select>
|
</select>
|
||||||
</div>
|
</div>
|
||||||
</div>
|
</div>
|
||||||
|
<div class="form-row">
|
||||||
|
<div class="form-group">
|
||||||
|
<label class="form-label" for="sections_limit">Sections (hiérarchique)</label>
|
||||||
|
<select name="sections_limit" id="sections_limit" class="form-control">
|
||||||
|
<option value="3" {{ 'selected' if sections_limit == 3 else '' }}>3 sections</option>
|
||||||
|
<option value="5" {{ 'selected' if sections_limit == 5 else '' }}>5 sections</option>
|
||||||
|
<option value="10" {{ 'selected' if sections_limit == 10 else '' }}>10 sections</option>
|
||||||
|
<option value="20" {{ 'selected' if sections_limit == 20 else '' }}>20 sections</option>
|
||||||
|
</select>
|
||||||
|
</div>
|
||||||
|
<div class="form-group">
|
||||||
|
<label class="form-label" for="mode">Mode de recherche</label>
|
||||||
|
<select name="mode" id="mode" class="form-control">
|
||||||
|
<option value="" {{ 'selected' if not mode else '' }}>🤖 Auto-détection</option>
|
||||||
|
<option value="simple" {{ 'selected' if mode == 'simple' else '' }}>📄 Simple (1-étape)</option>
|
||||||
|
<option value="hierarchical" {{ 'selected' if mode == 'hierarchical' else '' }}>🌳 Hiérarchique (2-étapes)</option>
|
||||||
|
</select>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
<div class="mt-2">
|
<div class="mt-2">
|
||||||
<button type="submit" class="btn btn-primary">Rechercher</button>
|
<button type="submit" class="btn btn-primary">Rechercher</button>
|
||||||
<a href="/search" class="btn" style="margin-left: 0.5rem;">Réinitialiser</a>
|
<a href="/search" class="btn" style="margin-left: 0.5rem;">Réinitialiser</a>
|
||||||
@@ -65,21 +130,90 @@
|
|||||||
{% if query %}
|
{% if query %}
|
||||||
<div class="ornament">·</div>
|
<div class="ornament">·</div>
|
||||||
|
|
||||||
{% if results %}
|
{% if results_data and results_data.results %}
|
||||||
<div class="mb-3">
|
<!-- Mode indicator and stats -->
|
||||||
<strong>{{ results | length }}</strong> passage{% if results | length > 1 %}s{% endif %} trouvé{% if results | length > 1 %}s{% endif %}
|
<div class="mb-3" style="display: flex; align-items: center; gap: 1rem; flex-wrap: wrap;">
|
||||||
|
<div>
|
||||||
|
<strong>{{ results_data.total_chunks }}</strong> passage{% if results_data.total_chunks > 1 %}s{% endif %} trouvé{% if results_data.total_chunks > 1 %}s{% endif %}
|
||||||
|
</div>
|
||||||
|
<div>
|
||||||
|
{% if results_data.mode == "hierarchical" %}
|
||||||
|
<span class="badge" style="background-color: #28a745; color: white; font-size: 0.9em;">
|
||||||
|
🌳 Recherche hiérarchique ({{ results_data.sections|length }} sections)
|
||||||
|
</span>
|
||||||
|
{% else %}
|
||||||
|
<span class="badge" style="background-color: #6c757d; color: white; font-size: 0.9em;">
|
||||||
|
📄 Recherche simple
|
||||||
|
</span>
|
||||||
|
{% endif %}
|
||||||
|
</div>
|
||||||
{% if author_filter or work_filter %}
|
{% if author_filter or work_filter %}
|
||||||
<span class="text-muted">—</span>
|
<div style="flex-grow: 1;"></div>
|
||||||
|
<div>
|
||||||
{% if author_filter %}
|
{% if author_filter %}
|
||||||
<span class="badge badge-author">{{ author_filter }}</span>
|
<span class="badge badge-author">{{ author_filter }}</span>
|
||||||
{% endif %}
|
{% endif %}
|
||||||
{% if work_filter %}
|
{% if work_filter %}
|
||||||
<span class="badge badge-work">{{ work_filter }}</span>
|
<span class="badge badge-work">{{ work_filter }}</span>
|
||||||
{% endif %}
|
{% endif %}
|
||||||
|
</div>
|
||||||
{% endif %}
|
{% endif %}
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
{% for result in results %}
|
<!-- Hierarchical display -->
|
||||||
|
{% if results_data.mode == "hierarchical" and results_data.sections %}
|
||||||
|
{% for section in results_data.sections %}
|
||||||
|
<div class="section-group">
|
||||||
|
<div class="section-header">
|
||||||
|
<h3 style="margin: 0 0 0.5rem 0; font-size: 1.3em;">
|
||||||
|
📂 {{ section.title }}
|
||||||
|
<span class="badge" style="background-color: #007bff; color: white; margin-left: 0.5rem;">{{ section.chunks_count }} passage{% if section.chunks_count > 1 %}s{% endif %}</span>
|
||||||
|
<span class="badge badge-success" style="margin-left: 0.5rem;">⚡ {{ section.similarity }}% similaire</span>
|
||||||
|
</h3>
|
||||||
|
<p class="text-muted" style="margin: 0.25rem 0; font-size: 0.9em;">{{ section.section_path }}</p>
|
||||||
|
{% if section.summary_text %}
|
||||||
|
<p class="summary-text" style="margin: 0.5rem 0; font-style: italic; color: #555;">{{ section.summary_text }}</p>
|
||||||
|
{% endif %}
|
||||||
|
{% if section.concepts %}
|
||||||
|
<div class="concepts" style="margin-top: 0.5rem;">
|
||||||
|
{% for concept in section.concepts %}
|
||||||
|
<span class="badge badge-info" style="margin-right: 0.25rem;">{{ concept }}</span>
|
||||||
|
{% endfor %}
|
||||||
|
</div>
|
||||||
|
{% endif %}
|
||||||
|
</div>
|
||||||
|
|
||||||
|
<!-- Chunks within this section -->
|
||||||
|
{% if section.chunks %}
|
||||||
|
<div class="chunks-list" style="margin-left: 1.5rem; margin-top: 1rem;">
|
||||||
|
{% for chunk in section.chunks %}
|
||||||
|
<div class="chunk-item" style="background: white; padding: 1rem; margin-bottom: 0.75rem; border-left: 3px solid #007bff; border-radius: 4px;">
|
||||||
|
<div style="margin-bottom: 0.5rem;">
|
||||||
|
<span class="badge badge-success">⚡ {{ chunk.similarity }}% similaire</span>
|
||||||
|
</div>
|
||||||
|
<div class="passage-text" style="margin-bottom: 0.5rem;">"{{ chunk.text }}"</div>
|
||||||
|
<div class="passage-meta" style="font-size: 0.85em; color: #666;">
|
||||||
|
<strong>Type :</strong> {{ chunk.unitType or '—' }} │
|
||||||
|
<strong>Langue :</strong> {{ (chunk.language or '—') | upper }} │
|
||||||
|
<strong>Index :</strong> {{ chunk.orderIndex or '—' }}
|
||||||
|
</div>
|
||||||
|
{% if chunk.keywords %}
|
||||||
|
<div style="margin-top: 0.5rem;">
|
||||||
|
{% for kw in chunk.keywords %}
|
||||||
|
<span class="keyword-tag">{{ kw }}</span>
|
||||||
|
{% endfor %}
|
||||||
|
</div>
|
||||||
|
{% endif %}
|
||||||
|
</div>
|
||||||
|
{% endfor %}
|
||||||
|
</div>
|
||||||
|
{% endif %}
|
||||||
|
</div>
|
||||||
|
{% endfor %}
|
||||||
|
|
||||||
|
<!-- Simple display (original) -->
|
||||||
|
{% else %}
|
||||||
|
{% for result in results_data.results %}
|
||||||
<div class="passage-card">
|
<div class="passage-card">
|
||||||
<div class="passage-header">
|
<div class="passage-header">
|
||||||
<div>
|
<div>
|
||||||
@@ -105,6 +239,7 @@
|
|||||||
{% endif %}
|
{% endif %}
|
||||||
</div>
|
</div>
|
||||||
{% endfor %}
|
{% endfor %}
|
||||||
|
{% endif %}
|
||||||
{% else %}
|
{% else %}
|
||||||
<div class="empty-state">
|
<div class="empty-state">
|
||||||
<div class="empty-state-icon">🔮</div>
|
<div class="empty-state-icon">🔮</div>
|
||||||
|
|||||||
46
generations/library_rag/test_hierarchical.py
Normal file
46
generations/library_rag/test_hierarchical.py
Normal file
@@ -0,0 +1,46 @@
|
|||||||
|
#!/usr/bin/env python3
|
||||||
|
"""Test script for hierarchical search auto-detection."""
|
||||||
|
|
||||||
|
import sys
|
||||||
|
|
||||||
|
# Fix encoding for Windows console
|
||||||
|
if sys.platform == "win32" and hasattr(sys.stdout, 'reconfigure'):
|
||||||
|
sys.stdout.reconfigure(encoding='utf-8')
|
||||||
|
|
||||||
|
from flask_app import should_use_hierarchical_search
|
||||||
|
|
||||||
|
print("=" * 60)
|
||||||
|
print("TEST AUTO-DÉTECTION RECHERCHE HIÉRARCHIQUE")
|
||||||
|
print("=" * 60)
|
||||||
|
print()
|
||||||
|
|
||||||
|
test_queries = [
|
||||||
|
("justice", False, "Requête courte, 1 concept"),
|
||||||
|
("Qu'est-ce que la justice selon Platon ?", True, "Requête longue ≥15 chars"),
|
||||||
|
("vertu et sagesse", True, "Multi-concepts avec connecteur 'et'"),
|
||||||
|
("la mort", False, "Requête courte avec stop words"),
|
||||||
|
("âme immortelle", True, "2+ mots significatifs"),
|
||||||
|
("Peirce", False, "Nom propre seul, court"),
|
||||||
|
("Comment atteindre le bonheur ?", True, "Question philosophique ≥15 chars"),
|
||||||
|
]
|
||||||
|
|
||||||
|
print(f"{'Requête':<45} {'Attendu':<10} {'Obtenu':<10} {'Statut'}")
|
||||||
|
print("-" * 75)
|
||||||
|
|
||||||
|
all_passed = True
|
||||||
|
for query, expected, reason in test_queries:
|
||||||
|
result = should_use_hierarchical_search(query)
|
||||||
|
status = "✅ PASS" if result == expected else "❌ FAIL"
|
||||||
|
if result != expected:
|
||||||
|
all_passed = False
|
||||||
|
|
||||||
|
print(f"{query:<45} {expected!s:<10} {result!s:<10} {status}")
|
||||||
|
print(f" Raison : {reason}")
|
||||||
|
print()
|
||||||
|
|
||||||
|
print("=" * 60)
|
||||||
|
if all_passed:
|
||||||
|
print("✅ TOUS LES TESTS PASSENT")
|
||||||
|
else:
|
||||||
|
print("❌ CERTAINS TESTS ONT ÉCHOUÉ")
|
||||||
|
print("=" * 60)
|
||||||
Reference in New Issue
Block a user