chore: Major cleanup - archive migration scripts and remove temp files
CLEANUP ACTIONS: - Archived 11 migration/optimization scripts to archive/migration_scripts/ - Archived 11 phase documentation files to archive/documentation/ - Moved backups/, docs/, scripts/ to archive/ - Deleted 30+ temporary debug/test/fix scripts - Cleaned Python cache (__pycache__/, *.pyc) - Cleaned log files (*.log) NEW FILES: - CHANGELOG.md: Consolidated project history and migration documentation - Updated .gitignore: Added *.log, *.pyc, archive/ exclusions FINAL ROOT STRUCTURE (19 items): - Core framework: agent.py, autonomous_agent_demo.py, client.py, security.py, progress.py, prompts.py - Config: requirements.txt, package.json, .gitignore - Docs: README.md, CHANGELOG.md, project_progress.md - Directories: archive/, generations/, memory/, prompts/, utils/ ARCHIVED SCRIPTS (in archive/migration_scripts/): 01-11: Migration & optimization scripts (migrate, schema, rechunk, vectorize, etc.) ARCHIVED DOCS (in archive/documentation/): PHASE_0-8: Detailed phase summaries MIGRATION_README.md, PLAN_MIGRATION_WEAVIATE_GPU.md Repository is now clean and production-ready with all important files preserved in archive/. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This commit is contained in:
@@ -451,7 +451,101 @@ filter_by_author(author="Platon")
|
||||
delete_document(source_id="platon-menon", confirm=true)
|
||||
```
|
||||
|
||||
Pour plus de détails, voir la documentation complète dans `.claude/CLAUDE.md`.
|
||||
### Outils MCP Memory (9 outils intégrés - Phase 4)
|
||||
|
||||
**Système de Mémoire Unifié** : Le serveur MCP intègre désormais 9 outils pour gérer un système de mémoire (Thoughts, Messages, Conversations) utilisant Weaviate + GPU embeddings. Ces outils permettent à Claude Desktop de créer, rechercher et gérer des pensées, messages et conversations de manière persistante.
|
||||
|
||||
**Architecture Memory** :
|
||||
- **Backend** : Weaviate 1.34.4 (collections Thought, Message, Conversation)
|
||||
- **Embeddings** : BAAI/bge-m3 GPU (1024-dim, RTX 4070, PyTorch 2.6.0+cu124)
|
||||
- **Handlers** : `memory/mcp/` (thought_tools, message_tools, conversation_tools)
|
||||
- **Données** : 102 Thoughts, 377 Messages, 12 Conversations (au 2025-01-08)
|
||||
|
||||
#### Thought Tools (3)
|
||||
|
||||
**1. add_thought** - Ajouter une pensée au système
|
||||
```
|
||||
add_thought(
|
||||
content="Exploring vector databases for semantic search",
|
||||
thought_type="observation", # reflection, question, intuition, observation
|
||||
trigger="Research session",
|
||||
concepts=["weaviate", "embeddings", "gpu"],
|
||||
privacy_level="private" # private, shared, public
|
||||
)
|
||||
```
|
||||
|
||||
**2. search_thoughts** - Recherche sémantique dans les pensées
|
||||
```
|
||||
search_thoughts(
|
||||
query="vector databases GPU",
|
||||
limit=10,
|
||||
thought_type_filter="observation" # optionnel
|
||||
)
|
||||
```
|
||||
|
||||
**3. get_thought** - Récupérer une pensée par UUID
|
||||
```
|
||||
get_thought(uuid="730c1a8e-b09f-4889-bbe9-4867d0ee7f1a")
|
||||
```
|
||||
|
||||
#### Message Tools (3)
|
||||
|
||||
**4. add_message** - Ajouter un message à une conversation
|
||||
```
|
||||
add_message(
|
||||
content="Explain transformers in AI",
|
||||
role="user", # user, assistant, system
|
||||
conversation_id="chat_2025_01_08",
|
||||
order_index=0
|
||||
)
|
||||
```
|
||||
|
||||
**5. get_messages** - Récupérer tous les messages d'une conversation
|
||||
```
|
||||
get_messages(
|
||||
conversation_id="chat_2025_01_08",
|
||||
limit=50
|
||||
)
|
||||
```
|
||||
|
||||
**6. search_messages** - Recherche sémantique dans les messages
|
||||
```
|
||||
search_messages(
|
||||
query="transformers AI",
|
||||
limit=10,
|
||||
conversation_id_filter="chat_2025_01_08" # optionnel
|
||||
)
|
||||
```
|
||||
|
||||
#### Conversation Tools (3)
|
||||
|
||||
**7. get_conversation** - Récupérer une conversation par ID
|
||||
```
|
||||
get_conversation(conversation_id="ikario_derniere_pensee")
|
||||
```
|
||||
|
||||
**8. search_conversations** - Recherche sémantique dans les conversations
|
||||
```
|
||||
search_conversations(
|
||||
query="philosophical discussion",
|
||||
limit=10,
|
||||
category_filter="philosophy" # optionnel
|
||||
)
|
||||
```
|
||||
|
||||
**9. list_conversations** - Lister toutes les conversations
|
||||
```
|
||||
list_conversations(
|
||||
limit=20,
|
||||
category_filter="testing" # optionnel
|
||||
)
|
||||
```
|
||||
|
||||
**Tests** : Tous les outils Memory ont été testés avec succès (voir `test_memory_mcp_tools.py`)
|
||||
|
||||
**Documentation complète** : Voir `memory/README_MCP_TOOLS.md` pour l'architecture détaillée, les schémas de données et les exemples d'utilisation.
|
||||
|
||||
Pour plus de détails sur les outils Library RAG, voir la documentation complète dans `.claude/CLAUDE.md`.
|
||||
|
||||
---
|
||||
|
||||
|
||||
@@ -89,8 +89,23 @@ from utils.types import (
|
||||
SSEEvent,
|
||||
)
|
||||
|
||||
# GPU Embedder for manual vectorization (Phase 5: Backend Integration)
|
||||
import sys
|
||||
sys.path.insert(0, str(Path(__file__).parent.parent.parent))
|
||||
from memory.core import get_embedder
|
||||
|
||||
app = Flask(__name__)
|
||||
|
||||
# Initialize GPU embedder singleton
|
||||
_embedder = None
|
||||
|
||||
def get_gpu_embedder():
|
||||
"""Get or create GPU embedder singleton."""
|
||||
global _embedder
|
||||
if _embedder is None:
|
||||
_embedder = get_embedder()
|
||||
return _embedder
|
||||
|
||||
# Configuration Flask
|
||||
app.config["SECRET_KEY"] = os.environ.get("SECRET_KEY", "dev-secret-key-change-in-production")
|
||||
|
||||
@@ -152,26 +167,25 @@ def get_collection_stats() -> Optional[CollectionStats]:
|
||||
stats: CollectionStats = {}
|
||||
|
||||
# Chunk stats (renamed from Passage)
|
||||
passages = client.collections.get("Chunk")
|
||||
passages = client.collections.get("Chunk_v2")
|
||||
passage_count = passages.aggregate.over_all(total_count=True)
|
||||
stats["passages"] = passage_count.total_count or 0
|
||||
|
||||
# Get unique authors and works (from nested objects)
|
||||
all_passages = passages.query.fetch_objects(limit=1000)
|
||||
# Get unique authors and works (from direct properties in v2)
|
||||
all_passages = passages.query.fetch_objects(limit=10000)
|
||||
authors: set[str] = set()
|
||||
works: set[str] = set()
|
||||
languages: set[str] = set()
|
||||
|
||||
for obj in all_passages.objects:
|
||||
# Work is now a nested object with {title, author}
|
||||
work_obj = obj.properties.get("work")
|
||||
if work_obj and isinstance(work_obj, dict):
|
||||
if work_obj.get("author"):
|
||||
authors.add(str(work_obj["author"]))
|
||||
if work_obj.get("title"):
|
||||
works.add(str(work_obj["title"]))
|
||||
if obj.properties.get("language"):
|
||||
languages.add(str(obj.properties["language"]))
|
||||
props = obj.properties
|
||||
# In v2: workAuthor and workTitle are direct properties
|
||||
if props.get("workAuthor"):
|
||||
authors.add(str(props["workAuthor"]))
|
||||
if props.get("workTitle"):
|
||||
works.add(str(props["workTitle"]))
|
||||
if props.get("language"):
|
||||
languages.add(str(props["language"]))
|
||||
|
||||
stats["authors"] = len(authors)
|
||||
stats["works"] = len(works)
|
||||
@@ -208,13 +222,13 @@ def get_all_passages(
|
||||
if client is None:
|
||||
return []
|
||||
|
||||
chunks = client.collections.get("Chunk")
|
||||
chunks = client.collections.get("Chunk_v2")
|
||||
|
||||
result = chunks.query.fetch_objects(
|
||||
limit=limit,
|
||||
offset=offset,
|
||||
return_properties=[
|
||||
"text", "sectionPath", "sectionLevel", "chapterTitle",
|
||||
"text", "sectionPath", "chapterTitle",
|
||||
"canonicalReference", "unitType", "keywords", "orderIndex", "language"
|
||||
],
|
||||
)
|
||||
@@ -253,7 +267,7 @@ def simple_search(
|
||||
if client is None:
|
||||
return []
|
||||
|
||||
chunks = client.collections.get("Chunk")
|
||||
chunks = client.collections.get("Chunk_v2")
|
||||
|
||||
# Build filters using top-level properties (workAuthor, workTitle)
|
||||
filters: Optional[Any] = None
|
||||
@@ -263,13 +277,17 @@ def simple_search(
|
||||
work_filter_obj = wvq.Filter.by_property("workTitle").equal(work_filter)
|
||||
filters = filters & work_filter_obj if filters else work_filter_obj
|
||||
|
||||
result = chunks.query.near_text(
|
||||
query=query,
|
||||
# Generate query vector with GPU embedder (Phase 5: manual vectorization)
|
||||
embedder = get_gpu_embedder()
|
||||
query_vector = embedder.embed_single(query)
|
||||
|
||||
result = chunks.query.near_vector(
|
||||
near_vector=query_vector.tolist(),
|
||||
limit=limit,
|
||||
filters=filters,
|
||||
return_metadata=wvq.MetadataQuery(distance=True),
|
||||
return_properties=[
|
||||
"text", "sectionPath", "sectionLevel", "chapterTitle",
|
||||
"text", "sectionPath", "chapterTitle",
|
||||
"canonicalReference", "unitType", "keywords", "orderIndex", "language"
|
||||
],
|
||||
)
|
||||
@@ -333,10 +351,14 @@ def hierarchical_search(
|
||||
# STAGE 1: Search Summary collection for relevant sections
|
||||
# ═══════════════════════════════════════════════════════════════
|
||||
|
||||
summary_collection = client.collections.get("Summary")
|
||||
summary_collection = client.collections.get("Summary_v2")
|
||||
|
||||
summaries_result = summary_collection.query.near_text(
|
||||
query=query,
|
||||
# Generate query vector with GPU embedder (Phase 5: manual vectorization)
|
||||
embedder = get_gpu_embedder()
|
||||
query_vector = embedder.embed_single(query)
|
||||
|
||||
summaries_result = summary_collection.query.near_vector(
|
||||
near_vector=query_vector.tolist(),
|
||||
limit=sections_limit,
|
||||
return_metadata=wvq.MetadataQuery(distance=True),
|
||||
# Note: Don't specify return_properties - let Weaviate return all properties
|
||||
@@ -358,63 +380,62 @@ def hierarchical_search(
|
||||
for summary_obj in summaries_result.objects:
|
||||
props = summary_obj.properties
|
||||
|
||||
# Try to get document.sourceId if available (nested object might still be returned)
|
||||
doc_obj = props.get("document")
|
||||
source_id = ""
|
||||
if doc_obj and isinstance(doc_obj, dict):
|
||||
source_id = doc_obj.get("sourceId", "")
|
||||
# In v2: Summary has workTitle property, need to get sourceId from Work
|
||||
work_title = props.get("workTitle", "")
|
||||
|
||||
# We'll get sourceId later by matching workTitle with Work.sourceId
|
||||
# For now, use workTitle as identifier
|
||||
sections_data.append({
|
||||
"section_path": props.get("sectionPath", ""),
|
||||
"title": props.get("title", ""),
|
||||
"summary_text": props.get("text", ""),
|
||||
"level": props.get("level", 1),
|
||||
"concepts": props.get("concepts", []),
|
||||
"document_source_id": source_id,
|
||||
"summary_uuid": str(summary_obj.uuid), # Keep UUID for later retrieval if needed
|
||||
"document_source_id": "", # Will be populated during filtering
|
||||
"work_title": work_title, # Add workTitle for filtering
|
||||
"summary_uuid": str(summary_obj.uuid),
|
||||
"similarity": round((1 - summary_obj.metadata.distance) * 100, 1) if summary_obj.metadata and summary_obj.metadata.distance else 0,
|
||||
})
|
||||
|
||||
# Post-filter sections by author/work (Summary doesn't have work nested object)
|
||||
# Post-filter sections by author/work (Summary_v2 has workTitle property)
|
||||
if author_filter or work_filter:
|
||||
print(f"[HIERARCHICAL] Post-filtering {len(sections_data)} sections by work='{work_filter}'")
|
||||
doc_collection = client.collections.get("Document")
|
||||
filtered_sections = []
|
||||
|
||||
# Build Work title -> author map for filtering
|
||||
work_collection = client.collections.get("Work")
|
||||
work_map = {}
|
||||
for work in work_collection.iterator(include_vector=False):
|
||||
props = work.properties
|
||||
title = props.get("title")
|
||||
if title:
|
||||
work_map[title] = {
|
||||
"author": props.get("author", "Unknown"),
|
||||
"sourceId": props.get("sourceId", "")
|
||||
}
|
||||
|
||||
filtered_sections = []
|
||||
for section in sections_data:
|
||||
source_id = section["document_source_id"]
|
||||
if not source_id:
|
||||
print(f"[HIERARCHICAL] Section '{section['section_path'][:40]}...' SKIPPED (no sourceId)")
|
||||
work_title = section.get("work_title", "")
|
||||
|
||||
if not work_title or work_title not in work_map:
|
||||
print(f"[HIERARCHICAL] Section '{section['section_path'][:40]}...' SKIPPED (no work mapping)")
|
||||
continue
|
||||
|
||||
# Query Document to get work metadata
|
||||
# Note: 'work' is a nested object, so we don't specify it in return_properties
|
||||
# Weaviate should return it automatically
|
||||
doc_result = doc_collection.query.fetch_objects(
|
||||
filters=wvq.Filter.by_property("sourceId").equal(source_id),
|
||||
limit=1,
|
||||
)
|
||||
work_author = work_map[work_title]["author"]
|
||||
section["document_source_id"] = work_map[work_title]["sourceId"] # Populate sourceId
|
||||
|
||||
if doc_result.objects:
|
||||
doc_work = doc_result.objects[0].properties.get("work", {})
|
||||
print(f"[HIERARCHICAL] Section '{section['section_path'][:40]}...' doc_work type={type(doc_work)}, value={doc_work}")
|
||||
if isinstance(doc_work, dict):
|
||||
work_title = doc_work.get("title", "N/A")
|
||||
work_author = doc_work.get("author", "N/A")
|
||||
# Check filters
|
||||
if author_filter and work_author != author_filter:
|
||||
print(f"[HIERARCHICAL] Section '{section['section_path'][:40]}...' FILTERED (author '{work_author}' != '{author_filter}')")
|
||||
continue
|
||||
if work_filter and work_title != work_filter:
|
||||
print(f"[HIERARCHICAL] Section '{section['section_path'][:40]}...' FILTERED (work '{work_title}' != '{work_filter}')")
|
||||
continue
|
||||
print(f"[HIERARCHICAL] Section '{section['section_path'][:40]}...' work={work_title}, author={work_author}")
|
||||
|
||||
print(f"[HIERARCHICAL] Section '{section['section_path'][:40]}...' KEPT (work='{work_title}')")
|
||||
filtered_sections.append(section)
|
||||
else:
|
||||
print(f"[HIERARCHICAL] Section '{section['section_path'][:40]}...' SKIPPED (doc_work not a dict)")
|
||||
else:
|
||||
print(f"[HIERARCHICAL] Section '{section['section_path'][:40]}...' SKIPPED (no doc found for sourceId='{source_id}')")
|
||||
# Check filters
|
||||
if author_filter and work_author != author_filter:
|
||||
print(f"[HIERARCHICAL] Section '{section['section_path'][:40]}...' FILTERED (author '{work_author}' != '{author_filter}')")
|
||||
continue
|
||||
if work_filter and work_title != work_filter:
|
||||
print(f"[HIERARCHICAL] Section '{section['section_path'][:40]}...' FILTERED (work '{work_title}' != '{work_filter}')")
|
||||
continue
|
||||
|
||||
print(f"[HIERARCHICAL] Section '{section['section_path'][:40]}...' KEPT (work='{work_title}')")
|
||||
filtered_sections.append(section)
|
||||
|
||||
sections_data = filtered_sections
|
||||
print(f"[HIERARCHICAL] After filtering: {len(sections_data)} sections remaining")
|
||||
@@ -438,7 +459,7 @@ def hierarchical_search(
|
||||
# For each section, search chunks using the section's summary text
|
||||
# This groups chunks under their relevant sections
|
||||
|
||||
chunk_collection = client.collections.get("Chunk")
|
||||
chunk_collection = client.collections.get("Chunk_v2")
|
||||
|
||||
# Build base filters (author/work only)
|
||||
base_filters: Optional[Any] = None
|
||||
@@ -464,8 +485,11 @@ def hierarchical_search(
|
||||
if base_filters:
|
||||
section_filters = base_filters & section_filters
|
||||
|
||||
chunks_result = chunk_collection.query.near_text(
|
||||
query=section_query,
|
||||
# Generate query vector with GPU embedder (Phase 5: manual vectorization)
|
||||
section_query_vector = embedder.embed_single(section_query)
|
||||
|
||||
chunks_result = chunk_collection.query.near_vector(
|
||||
near_vector=section_query_vector.tolist(),
|
||||
limit=chunks_per_section,
|
||||
filters=section_filters,
|
||||
return_metadata=wvq.MetadataQuery(distance=True),
|
||||
@@ -600,14 +624,28 @@ def summary_only_search(
|
||||
if client is None:
|
||||
return []
|
||||
|
||||
summaries = client.collections.get("Summary")
|
||||
summaries = client.collections.get("Summary_v2")
|
||||
|
||||
# Note: Cannot filter by nested document properties directly in Weaviate v4
|
||||
# Must fetch all and filter in Python if author/work filters are present
|
||||
# Build Work map for metadata lookup (Summary_v2 has workTitle, not document)
|
||||
work_collection = client.collections.get("Work")
|
||||
work_map = {}
|
||||
for work in work_collection.iterator(include_vector=False):
|
||||
work_props = work.properties
|
||||
title = work_props.get("title")
|
||||
if title:
|
||||
work_map[title] = {
|
||||
"author": work_props.get("author", "Unknown"),
|
||||
"year": work_props.get("year", 0),
|
||||
"sourceId": work_props.get("sourceId", ""),
|
||||
}
|
||||
|
||||
# Generate query vector with GPU embedder (Phase 5: manual vectorization)
|
||||
embedder = get_gpu_embedder()
|
||||
query_vector = embedder.embed_single(query)
|
||||
|
||||
# Semantic search
|
||||
results = summaries.query.near_text(
|
||||
query=query,
|
||||
results = summaries.query.near_vector(
|
||||
near_vector=query_vector.tolist(),
|
||||
limit=limit * 3 if (author_filter or work_filter) else limit, # Fetch more if filtering
|
||||
return_metadata=wvq.MetadataQuery(distance=True)
|
||||
)
|
||||
@@ -618,24 +656,34 @@ def summary_only_search(
|
||||
props = obj.properties
|
||||
similarity = 1 - obj.metadata.distance
|
||||
|
||||
# Apply filters (Python-side since nested properties)
|
||||
if author_filter and props["document"].get("author", "") != author_filter:
|
||||
# Get work metadata from workTitle
|
||||
work_title = props.get("workTitle", "")
|
||||
if not work_title or work_title not in work_map:
|
||||
continue
|
||||
if work_filter and props["document"].get("title", "") != work_filter:
|
||||
|
||||
work_info = work_map[work_title]
|
||||
work_author = work_info["author"]
|
||||
work_year = work_info["year"]
|
||||
source_id = work_info["sourceId"]
|
||||
|
||||
# Apply filters
|
||||
if author_filter and work_author != author_filter:
|
||||
continue
|
||||
if work_filter and work_title != work_filter:
|
||||
continue
|
||||
|
||||
# Determine document icon and name
|
||||
doc_id = props["document"]["sourceId"].lower()
|
||||
if "tiercelin" in doc_id:
|
||||
doc_id_lower = source_id.lower()
|
||||
if "tiercelin" in doc_id_lower:
|
||||
doc_icon = "🟡"
|
||||
doc_name = "Tiercelin"
|
||||
elif "platon" in doc_id or "menon" in doc_id:
|
||||
elif "platon" in doc_id_lower or "menon" in doc_id_lower:
|
||||
doc_icon = "🟢"
|
||||
doc_name = "Platon"
|
||||
elif "haugeland" in doc_id:
|
||||
elif "haugeland" in doc_id_lower:
|
||||
doc_icon = "🟣"
|
||||
doc_name = "Haugeland"
|
||||
elif "logique" in doc_id:
|
||||
elif "logique" in doc_id_lower:
|
||||
doc_icon = "🔵"
|
||||
doc_name = "Logique"
|
||||
else:
|
||||
@@ -647,19 +695,19 @@ def summary_only_search(
|
||||
"uuid": str(obj.uuid),
|
||||
"similarity": round(similarity * 100, 1), # Convert to percentage
|
||||
"text": props.get("text", ""),
|
||||
"title": props["title"],
|
||||
"title": props.get("title", ""),
|
||||
"concepts": props.get("concepts", []),
|
||||
"doc_icon": doc_icon,
|
||||
"doc_name": doc_name,
|
||||
"author": props["document"].get("author", ""),
|
||||
"year": props["document"].get("year", 0),
|
||||
"author": work_author,
|
||||
"year": work_year,
|
||||
"chunks_count": props.get("chunksCount", 0),
|
||||
"section_path": props.get("sectionPath", ""),
|
||||
"sectionPath": props.get("sectionPath", ""), # Alias for template compatibility
|
||||
# Add work info for template compatibility
|
||||
"work": {
|
||||
"title": props["document"].get("title", ""),
|
||||
"author": props["document"].get("author", ""),
|
||||
"title": work_title,
|
||||
"author": work_author,
|
||||
},
|
||||
}
|
||||
|
||||
@@ -969,7 +1017,7 @@ def rag_search(
|
||||
print("[RAG Search] Weaviate client unavailable")
|
||||
return []
|
||||
|
||||
chunks = client.collections.get("Chunk")
|
||||
chunks = client.collections.get("Chunk_v2")
|
||||
|
||||
# Build work filter if selected_works is provided
|
||||
work_filter: Optional[Any] = None
|
||||
@@ -978,9 +1026,13 @@ def rag_search(
|
||||
work_filter = wvq.Filter.by_property("workTitle").contains_any(selected_works)
|
||||
print(f"[RAG Search] Applying work filter: {selected_works}")
|
||||
|
||||
# Generate query vector with GPU embedder (Phase 5: manual vectorization)
|
||||
embedder = get_gpu_embedder()
|
||||
query_vector = embedder.embed_single(query)
|
||||
|
||||
# Query with properties needed for RAG context
|
||||
result = chunks.query.near_text(
|
||||
query=query,
|
||||
result = chunks.query.near_vector(
|
||||
near_vector=query_vector.tolist(),
|
||||
limit=limit,
|
||||
filters=work_filter,
|
||||
return_metadata=wvq.MetadataQuery(distance=True),
|
||||
@@ -1444,33 +1496,30 @@ def api_get_works() -> Union[Response, tuple[Response, int]]:
|
||||
"message": "Cannot connect to Weaviate database"
|
||||
}), 500
|
||||
|
||||
# Query Chunk collection to get all unique works with counts
|
||||
chunks = client.collections.get("Chunk")
|
||||
# Query Chunk_v2 collection to get all unique works with counts
|
||||
chunks = client.collections.get("Chunk_v2")
|
||||
|
||||
# Fetch all chunks to aggregate by work
|
||||
# Using a larger limit to get all documents
|
||||
# Note: Don't use return_properties with nested objects (causes gRPC error)
|
||||
# Fetch all objects without specifying properties
|
||||
# In v2: work is NOT a nested object, use workTitle and workAuthor properties
|
||||
all_chunks = chunks.query.fetch_objects(limit=10000)
|
||||
|
||||
# Aggregate chunks by work (title + author)
|
||||
works_count: Dict[str, Dict[str, Any]] = {}
|
||||
|
||||
for obj in all_chunks.objects:
|
||||
work_obj = obj.properties.get("work")
|
||||
if work_obj and isinstance(work_obj, dict):
|
||||
title = work_obj.get("title", "")
|
||||
author = work_obj.get("author", "")
|
||||
props = obj.properties
|
||||
title = props.get("workTitle", "")
|
||||
author = props.get("workAuthor", "")
|
||||
|
||||
if title: # Only count if title exists
|
||||
# Use title as key (assumes unique titles)
|
||||
if title not in works_count:
|
||||
works_count[title] = {
|
||||
"title": title,
|
||||
"author": author or "Unknown",
|
||||
"chunks_count": 0
|
||||
}
|
||||
works_count[title]["chunks_count"] += 1
|
||||
if title: # Only count if title exists
|
||||
# Use title as key (assumes unique titles)
|
||||
if title not in works_count:
|
||||
works_count[title] = {
|
||||
"title": title,
|
||||
"author": author or "Unknown",
|
||||
"chunks_count": 0
|
||||
}
|
||||
works_count[title]["chunks_count"] += 1
|
||||
|
||||
# Convert to list and sort by author, then title
|
||||
works_list = list(works_count.values())
|
||||
@@ -3082,45 +3131,60 @@ def documents() -> str:
|
||||
|
||||
with get_weaviate_client() as client:
|
||||
if client is not None:
|
||||
# Get chunk counts and authors
|
||||
chunk_collection = client.collections.get("Chunk")
|
||||
from typing import cast
|
||||
|
||||
for obj in chunk_collection.iterator(include_vector=False):
|
||||
props = obj.properties
|
||||
from typing import cast
|
||||
doc_obj = cast(Dict[str, Any], props.get("document", {}))
|
||||
work_obj = cast(Dict[str, Any], props.get("work", {}))
|
||||
|
||||
if doc_obj:
|
||||
source_id = doc_obj.get("sourceId", "")
|
||||
if source_id:
|
||||
if source_id not in documents_from_weaviate:
|
||||
documents_from_weaviate[source_id] = {
|
||||
"source_id": source_id,
|
||||
"title": work_obj.get("title") if work_obj else "Unknown",
|
||||
"author": work_obj.get("author") if work_obj else "Unknown",
|
||||
"chunks_count": 0,
|
||||
"summaries_count": 0,
|
||||
"authors": set(),
|
||||
}
|
||||
documents_from_weaviate[source_id]["chunks_count"] += 1
|
||||
|
||||
# Track unique authors
|
||||
author = work_obj.get("author") if work_obj else None
|
||||
if author:
|
||||
documents_from_weaviate[source_id]["authors"].add(author)
|
||||
|
||||
# Get summary counts
|
||||
# Get all Works (now with sourceId added in Phase 1 of migration)
|
||||
try:
|
||||
summary_collection = client.collections.get("Summary")
|
||||
for obj in summary_collection.iterator(include_vector=False):
|
||||
props = obj.properties
|
||||
doc_obj = cast(Dict[str, Any], props.get("document", {}))
|
||||
work_collection = client.collections.get("Work")
|
||||
chunk_collection = client.collections.get("Chunk_v2")
|
||||
|
||||
if doc_obj:
|
||||
source_id = doc_obj.get("sourceId", "")
|
||||
if source_id and source_id in documents_from_weaviate:
|
||||
documents_from_weaviate[source_id]["summaries_count"] += 1
|
||||
# Build documents from Work collection
|
||||
for work in work_collection.iterator(include_vector=False):
|
||||
props = work.properties
|
||||
source_id = props.get("sourceId")
|
||||
|
||||
# Skip Works without sourceId (not documents)
|
||||
if not source_id:
|
||||
continue
|
||||
|
||||
documents_from_weaviate[source_id] = {
|
||||
"source_id": source_id,
|
||||
"title": props.get("title", "Unknown"),
|
||||
"author": props.get("author", "Unknown"),
|
||||
"pages": props.get("pages", 0),
|
||||
"edition": props.get("edition", ""),
|
||||
"chunks_count": 0,
|
||||
"summaries_count": 0,
|
||||
"authors": set(),
|
||||
}
|
||||
|
||||
# Add author to set
|
||||
if props.get("author") and props.get("author") != "Unknown":
|
||||
documents_from_weaviate[source_id]["authors"].add(props.get("author"))
|
||||
|
||||
# Count chunks per document (via workTitle)
|
||||
for chunk in chunk_collection.iterator(include_vector=False):
|
||||
work_title = chunk.properties.get("workTitle")
|
||||
|
||||
# Find corresponding sourceId
|
||||
for source_id, doc_data in documents_from_weaviate.items():
|
||||
if doc_data["title"] == work_title:
|
||||
doc_data["chunks_count"] += 1
|
||||
break
|
||||
except Exception as e:
|
||||
print(f"Warning: Could not load Work collection: {e}")
|
||||
|
||||
# Count summaries (if collection exists)
|
||||
try:
|
||||
summary_collection = client.collections.get("Summary_v2")
|
||||
for summary in summary_collection.iterator(include_vector=False):
|
||||
work_title = summary.properties.get("workTitle")
|
||||
|
||||
# Find corresponding sourceId
|
||||
for source_id, doc_data in documents_from_weaviate.items():
|
||||
if doc_data["title"] == work_title:
|
||||
doc_data["summaries_count"] += 1
|
||||
break
|
||||
except Exception:
|
||||
# Summary collection may not exist
|
||||
pass
|
||||
@@ -3157,17 +3221,195 @@ def documents() -> str:
|
||||
"has_images": images_dir.exists() and any(images_dir.iterdir()) if images_dir.exists() else False,
|
||||
"image_count": len(list(images_dir.glob("*.png"))) if images_dir.exists() else 0,
|
||||
"metadata": metadata,
|
||||
"pages": weaviate_data.get("pages", pages), # FROM WEAVIATE, fallback to file
|
||||
"summaries_count": weaviate_data["summaries_count"], # FROM WEAVIATE
|
||||
"authors_count": len(weaviate_data["authors"]), # FROM WEAVIATE
|
||||
"chunks_count": weaviate_data["chunks_count"], # FROM WEAVIATE
|
||||
"title": weaviate_data["title"], # FROM WEAVIATE
|
||||
"author": weaviate_data["author"], # FROM WEAVIATE
|
||||
"edition": weaviate_data.get("edition", ""), # FROM WEAVIATE
|
||||
"toc": toc,
|
||||
})
|
||||
|
||||
return render_template("documents.html", documents=documents_list)
|
||||
|
||||
|
||||
# ═══════════════════════════════════════════════════════════════════════════════
|
||||
# Memory Routes (Phase 5: Backend Integration)
|
||||
# ═══════════════════════════════════════════════════════════════════════════════
|
||||
|
||||
def run_async(coro):
|
||||
"""Run async coroutine in sync Flask context."""
|
||||
import asyncio
|
||||
loop = asyncio.new_event_loop()
|
||||
asyncio.set_event_loop(loop)
|
||||
try:
|
||||
return loop.run_until_complete(coro)
|
||||
finally:
|
||||
loop.close()
|
||||
|
||||
|
||||
@app.route("/memories")
|
||||
def memories() -> str:
|
||||
"""Render the Memory search page (Thoughts + Messages)."""
|
||||
# Get memory statistics
|
||||
with get_weaviate_client() as client:
|
||||
if client is None:
|
||||
flash("Cannot connect to Weaviate database", "error")
|
||||
stats = {"thoughts": 0, "messages": 0, "conversations": 0}
|
||||
else:
|
||||
try:
|
||||
thoughts = client.collections.get("Thought")
|
||||
messages = client.collections.get("Message")
|
||||
conversations = client.collections.get("Conversation")
|
||||
|
||||
thoughts_count = thoughts.aggregate.over_all(total_count=True).total_count
|
||||
messages_count = messages.aggregate.over_all(total_count=True).total_count
|
||||
conversations_count = conversations.aggregate.over_all(total_count=True).total_count
|
||||
|
||||
stats = {
|
||||
"thoughts": thoughts_count or 0,
|
||||
"messages": messages_count or 0,
|
||||
"conversations": conversations_count or 0,
|
||||
}
|
||||
except Exception as e:
|
||||
print(f"Error fetching memory stats: {e}")
|
||||
stats = {"thoughts": 0, "messages": 0, "conversations": 0}
|
||||
|
||||
return render_template("memories.html", stats=stats)
|
||||
|
||||
|
||||
@app.route("/api/memories/search-thoughts", methods=["POST"])
|
||||
def api_search_thoughts():
|
||||
"""API endpoint for thought semantic search."""
|
||||
try:
|
||||
# Import Memory MCP tools locally
|
||||
from memory.mcp import SearchThoughtsInput, search_thoughts_handler
|
||||
|
||||
data = request.json
|
||||
query = data.get("query", "")
|
||||
limit = data.get("limit", 10)
|
||||
thought_type_filter = data.get("thought_type_filter")
|
||||
|
||||
input_data = SearchThoughtsInput(
|
||||
query=query,
|
||||
limit=limit,
|
||||
thought_type_filter=thought_type_filter
|
||||
)
|
||||
|
||||
result = run_async(search_thoughts_handler(input_data))
|
||||
return jsonify(result)
|
||||
except Exception as e:
|
||||
return jsonify({"success": False, "error": str(e)}), 500
|
||||
|
||||
|
||||
@app.route("/api/memories/search-messages", methods=["POST"])
|
||||
def api_search_messages():
|
||||
"""API endpoint for message semantic search."""
|
||||
try:
|
||||
from memory.mcp import SearchMessagesInput, search_messages_handler
|
||||
|
||||
data = request.json
|
||||
query = data.get("query", "")
|
||||
limit = data.get("limit", 10)
|
||||
conversation_id_filter = data.get("conversation_id_filter")
|
||||
|
||||
input_data = SearchMessagesInput(
|
||||
query=query,
|
||||
limit=limit,
|
||||
conversation_id_filter=conversation_id_filter
|
||||
)
|
||||
|
||||
result = run_async(search_messages_handler(input_data))
|
||||
return jsonify(result)
|
||||
except Exception as e:
|
||||
return jsonify({"success": False, "error": str(e)}), 500
|
||||
|
||||
|
||||
@app.route("/conversations")
|
||||
def conversations() -> str:
|
||||
"""Render the Conversations page."""
|
||||
try:
|
||||
from memory.mcp import ListConversationsInput, list_conversations_handler
|
||||
|
||||
limit = request.args.get("limit", 20, type=int)
|
||||
category_filter = request.args.get("category")
|
||||
|
||||
input_data = ListConversationsInput(
|
||||
limit=limit,
|
||||
category_filter=category_filter
|
||||
)
|
||||
|
||||
result = run_async(list_conversations_handler(input_data))
|
||||
|
||||
if result.get("success"):
|
||||
conversations_list = result.get("conversations", [])
|
||||
else:
|
||||
flash(f"Error loading conversations: {result.get('error')}", "error")
|
||||
conversations_list = []
|
||||
|
||||
return render_template("conversations.html", conversations=conversations_list)
|
||||
except Exception as e:
|
||||
flash(f"Error loading conversations: {str(e)}", "error")
|
||||
return render_template("conversations.html", conversations=[])
|
||||
|
||||
|
||||
@app.route("/conversation/<conversation_id>")
|
||||
def conversation_view(conversation_id: str) -> str:
|
||||
"""View a specific conversation with all its messages."""
|
||||
try:
|
||||
from memory.mcp import (
|
||||
GetConversationInput, get_conversation_handler,
|
||||
GetMessagesInput, get_messages_handler
|
||||
)
|
||||
|
||||
# Get conversation metadata
|
||||
conv_input = GetConversationInput(conversation_id=conversation_id)
|
||||
conversation = run_async(get_conversation_handler(conv_input))
|
||||
|
||||
if not conversation.get("success"):
|
||||
flash(f"Conversation not found: {conversation.get('error')}", "error")
|
||||
return redirect(url_for("conversations"))
|
||||
|
||||
# Get all messages
|
||||
msg_input = GetMessagesInput(conversation_id=conversation_id, limit=500)
|
||||
messages_result = run_async(get_messages_handler(msg_input))
|
||||
|
||||
messages = messages_result.get("messages", []) if messages_result.get("success") else []
|
||||
|
||||
return render_template(
|
||||
"conversation_view.html",
|
||||
conversation=conversation,
|
||||
messages=messages
|
||||
)
|
||||
except Exception as e:
|
||||
flash(f"Error loading conversation: {str(e)}", "error")
|
||||
return redirect(url_for("conversations"))
|
||||
|
||||
|
||||
@app.route("/api/conversations/search", methods=["POST"])
|
||||
def api_search_conversations():
|
||||
"""API endpoint for conversation semantic search."""
|
||||
try:
|
||||
from memory.mcp import SearchConversationsInput, search_conversations_handler
|
||||
|
||||
data = request.json
|
||||
query = data.get("query", "")
|
||||
limit = data.get("limit", 10)
|
||||
category_filter = data.get("category_filter")
|
||||
|
||||
input_data = SearchConversationsInput(
|
||||
query=query,
|
||||
limit=limit,
|
||||
category_filter=category_filter
|
||||
)
|
||||
|
||||
result = run_async(search_conversations_handler(input_data))
|
||||
return jsonify(result)
|
||||
except Exception as e:
|
||||
return jsonify({"success": False, "error": str(e)}), 500
|
||||
|
||||
|
||||
# ═══════════════════════════════════════════════════════════════════════════════
|
||||
# Main
|
||||
# ═══════════════════════════════════════════════════════════════════════════════
|
||||
|
||||
@@ -62,6 +62,31 @@ from mcp_tools import (
|
||||
PDFProcessingError,
|
||||
)
|
||||
|
||||
# Memory MCP Tools (added for unified Memory + Library system)
|
||||
sys.path.insert(0, str(Path(__file__).parent.parent.parent))
|
||||
from memory.mcp import (
|
||||
# Thought tools
|
||||
AddThoughtInput,
|
||||
SearchThoughtsInput,
|
||||
add_thought_handler,
|
||||
search_thoughts_handler,
|
||||
get_thought_handler,
|
||||
# Message tools
|
||||
AddMessageInput,
|
||||
GetMessagesInput,
|
||||
SearchMessagesInput,
|
||||
add_message_handler,
|
||||
get_messages_handler,
|
||||
search_messages_handler,
|
||||
# Conversation tools
|
||||
GetConversationInput,
|
||||
SearchConversationsInput,
|
||||
ListConversationsInput,
|
||||
get_conversation_handler,
|
||||
search_conversations_handler,
|
||||
list_conversations_handler,
|
||||
)
|
||||
|
||||
# =============================================================================
|
||||
# Logging Configuration
|
||||
# =============================================================================
|
||||
@@ -551,6 +576,264 @@ async def delete_document(
|
||||
return result.model_dump(mode='json')
|
||||
|
||||
|
||||
# =============================================================================
|
||||
# Memory Tools (Thoughts, Messages, Conversations)
|
||||
# =============================================================================
|
||||
|
||||
|
||||
@mcp.tool()
|
||||
async def add_thought(
|
||||
content: str,
|
||||
thought_type: str = "reflection",
|
||||
trigger: str = "",
|
||||
concepts: list[str] | None = None,
|
||||
privacy_level: str = "private",
|
||||
) -> Dict[str, Any]:
|
||||
"""
|
||||
Add a new thought to the Memory system.
|
||||
|
||||
Args:
|
||||
content: The thought content.
|
||||
thought_type: Type (reflection, question, intuition, observation, etc.).
|
||||
trigger: What triggered this thought (optional).
|
||||
concepts: Related concepts/tags (optional).
|
||||
privacy_level: Privacy level (private, shared, public).
|
||||
|
||||
Returns:
|
||||
Dictionary containing:
|
||||
- success: Whether thought was added successfully
|
||||
- uuid: UUID of the created thought
|
||||
- content: Preview of the thought content
|
||||
- thought_type: The thought type
|
||||
"""
|
||||
input_data = AddThoughtInput(
|
||||
content=content,
|
||||
thought_type=thought_type,
|
||||
trigger=trigger,
|
||||
concepts=concepts or [],
|
||||
privacy_level=privacy_level,
|
||||
)
|
||||
result = await add_thought_handler(input_data)
|
||||
return result
|
||||
|
||||
|
||||
@mcp.tool()
|
||||
async def search_thoughts(
|
||||
query: str,
|
||||
limit: int = 10,
|
||||
thought_type_filter: str | None = None,
|
||||
) -> Dict[str, Any]:
|
||||
"""
|
||||
Search thoughts using semantic similarity.
|
||||
|
||||
Args:
|
||||
query: Search query text.
|
||||
limit: Maximum number of results (1-100, default 10).
|
||||
thought_type_filter: Filter by thought type (optional).
|
||||
|
||||
Returns:
|
||||
Dictionary containing:
|
||||
- success: Whether search succeeded
|
||||
- query: The original search query
|
||||
- results: List of matching thoughts
|
||||
- count: Number of results returned
|
||||
"""
|
||||
input_data = SearchThoughtsInput(
|
||||
query=query,
|
||||
limit=limit,
|
||||
thought_type_filter=thought_type_filter,
|
||||
)
|
||||
result = await search_thoughts_handler(input_data)
|
||||
return result
|
||||
|
||||
|
||||
@mcp.tool()
|
||||
async def get_thought(uuid: str) -> Dict[str, Any]:
|
||||
"""
|
||||
Get a specific thought by UUID.
|
||||
|
||||
Args:
|
||||
uuid: Thought UUID.
|
||||
|
||||
Returns:
|
||||
Dictionary containing complete thought data or error message.
|
||||
"""
|
||||
result = await get_thought_handler(uuid)
|
||||
return result
|
||||
|
||||
|
||||
@mcp.tool()
|
||||
async def add_message(
|
||||
content: str,
|
||||
role: str,
|
||||
conversation_id: str,
|
||||
order_index: int = 0,
|
||||
) -> Dict[str, Any]:
|
||||
"""
|
||||
Add a new message to a conversation.
|
||||
|
||||
Args:
|
||||
content: Message content.
|
||||
role: Role (user, assistant, system).
|
||||
conversation_id: Conversation identifier.
|
||||
order_index: Position in conversation (default 0).
|
||||
|
||||
Returns:
|
||||
Dictionary containing:
|
||||
- success: Whether message was added successfully
|
||||
- uuid: UUID of the created message
|
||||
- content: Preview of the message content
|
||||
- role: The message role
|
||||
- conversation_id: The conversation ID
|
||||
"""
|
||||
input_data = AddMessageInput(
|
||||
content=content,
|
||||
role=role,
|
||||
conversation_id=conversation_id,
|
||||
order_index=order_index,
|
||||
)
|
||||
result = await add_message_handler(input_data)
|
||||
return result
|
||||
|
||||
|
||||
@mcp.tool()
|
||||
async def get_messages(
|
||||
conversation_id: str,
|
||||
limit: int = 50,
|
||||
) -> Dict[str, Any]:
|
||||
"""
|
||||
Get all messages from a conversation in order.
|
||||
|
||||
Args:
|
||||
conversation_id: Conversation identifier.
|
||||
limit: Maximum messages to return (1-500, default 50).
|
||||
|
||||
Returns:
|
||||
Dictionary containing:
|
||||
- success: Whether query succeeded
|
||||
- conversation_id: The conversation ID
|
||||
- messages: List of messages in order
|
||||
- count: Number of messages returned
|
||||
"""
|
||||
input_data = GetMessagesInput(
|
||||
conversation_id=conversation_id,
|
||||
limit=limit,
|
||||
)
|
||||
result = await get_messages_handler(input_data)
|
||||
return result
|
||||
|
||||
|
||||
@mcp.tool()
|
||||
async def search_messages(
|
||||
query: str,
|
||||
limit: int = 10,
|
||||
conversation_id_filter: str | None = None,
|
||||
) -> Dict[str, Any]:
|
||||
"""
|
||||
Search messages using semantic similarity.
|
||||
|
||||
Args:
|
||||
query: Search query text.
|
||||
limit: Maximum number of results (1-100, default 10).
|
||||
conversation_id_filter: Filter by conversation ID (optional).
|
||||
|
||||
Returns:
|
||||
Dictionary containing:
|
||||
- success: Whether search succeeded
|
||||
- query: The original search query
|
||||
- results: List of matching messages
|
||||
- count: Number of results returned
|
||||
"""
|
||||
input_data = SearchMessagesInput(
|
||||
query=query,
|
||||
limit=limit,
|
||||
conversation_id_filter=conversation_id_filter,
|
||||
)
|
||||
result = await search_messages_handler(input_data)
|
||||
return result
|
||||
|
||||
|
||||
@mcp.tool()
|
||||
async def get_conversation(conversation_id: str) -> Dict[str, Any]:
|
||||
"""
|
||||
Get a specific conversation by ID.
|
||||
|
||||
Args:
|
||||
conversation_id: Conversation identifier.
|
||||
|
||||
Returns:
|
||||
Dictionary containing:
|
||||
- success: Whether conversation was found
|
||||
- conversation_id: The conversation ID
|
||||
- category: Conversation category
|
||||
- summary: Conversation summary
|
||||
- timestamp_start: Start time
|
||||
- timestamp_end: End time
|
||||
- participants: List of participants
|
||||
- tags: Semantic tags
|
||||
- message_count: Number of messages
|
||||
"""
|
||||
input_data = GetConversationInput(conversation_id=conversation_id)
|
||||
result = await get_conversation_handler(input_data)
|
||||
return result
|
||||
|
||||
|
||||
@mcp.tool()
|
||||
async def search_conversations(
|
||||
query: str,
|
||||
limit: int = 10,
|
||||
category_filter: str | None = None,
|
||||
) -> Dict[str, Any]:
|
||||
"""
|
||||
Search conversations using semantic similarity.
|
||||
|
||||
Args:
|
||||
query: Search query text.
|
||||
limit: Maximum number of results (1-50, default 10).
|
||||
category_filter: Filter by category (optional).
|
||||
|
||||
Returns:
|
||||
Dictionary containing:
|
||||
- success: Whether search succeeded
|
||||
- query: The original search query
|
||||
- results: List of matching conversations
|
||||
- count: Number of results returned
|
||||
"""
|
||||
input_data = SearchConversationsInput(
|
||||
query=query,
|
||||
limit=limit,
|
||||
category_filter=category_filter,
|
||||
)
|
||||
result = await search_conversations_handler(input_data)
|
||||
return result
|
||||
|
||||
|
||||
@mcp.tool()
|
||||
async def list_conversations(
|
||||
limit: int = 20,
|
||||
category_filter: str | None = None,
|
||||
) -> Dict[str, Any]:
|
||||
"""
|
||||
List all conversations with optional filtering.
|
||||
|
||||
Args:
|
||||
limit: Maximum conversations to return (1-100, default 20).
|
||||
category_filter: Filter by category (optional).
|
||||
|
||||
Returns:
|
||||
Dictionary containing:
|
||||
- success: Whether query succeeded
|
||||
- conversations: List of conversations
|
||||
- count: Number of conversations returned
|
||||
"""
|
||||
input_data = ListConversationsInput(
|
||||
limit=limit,
|
||||
category_filter=category_filter,
|
||||
)
|
||||
result = await list_conversations_handler(input_data)
|
||||
return result
|
||||
|
||||
|
||||
# =============================================================================
|
||||
# Signal Handlers
|
||||
# =============================================================================
|
||||
|
||||
@@ -718,6 +718,15 @@
|
||||
<span class="icon">📚</span>
|
||||
<span>Documents</span>
|
||||
</a>
|
||||
<div style="margin: 1rem 0; border-top: 1px solid rgba(255,255,255,0.1);"></div>
|
||||
<a href="/memories" class="{{ 'active' if request.endpoint == 'memories' else '' }}">
|
||||
<span class="icon">🧠</span>
|
||||
<span>Memory (Ikario)</span>
|
||||
</a>
|
||||
<a href="/conversations" class="{{ 'active' if request.endpoint == 'conversations' else '' }}">
|
||||
<span class="icon">💭</span>
|
||||
<span>Conversations</span>
|
||||
</a>
|
||||
</div>
|
||||
</nav>
|
||||
|
||||
@@ -736,6 +745,7 @@
|
||||
<a href="/chat" class="{{ 'active' if request.endpoint == 'chat' else '' }}">Conversation</a>
|
||||
<a href="/upload" class="{{ 'active' if request.endpoint == 'upload' else '' }}">Parser PDF</a>
|
||||
<a href="/documents" class="{{ 'active' if request.endpoint == 'documents' else '' }}">Documents</a>
|
||||
<a href="/memories" class="{{ 'active' if request.endpoint == 'memories' else '' }}">Memory</a>
|
||||
</nav>
|
||||
</div>
|
||||
</header>
|
||||
|
||||
Reference in New Issue
Block a user