Files

David Blanc Brioir 2f34125ef6 feat: Add Memory system with Weaviate integration and MCP tools

MEMORY SYSTEM ARCHITECTURE:
- Weaviate-based memory storage (Thought, Message, Conversation collections)
- GPU embeddings with BAAI/bge-m3 (1024-dim, RTX 4070)
- 9 MCP tools for Claude Desktop integration

CORE MODULES (memory/):
- core/embedding_service.py: GPU embedder singleton with PyTorch
- schemas/memory_schemas.py: Weaviate schema definitions
- mcp/thought_tools.py: add_thought, search_thoughts, get_thought
- mcp/message_tools.py: add_message, get_messages, search_messages
- mcp/conversation_tools.py: get_conversation, search_conversations, list_conversations

FLASK TEMPLATES:
- conversation_view.html: Display single conversation with messages
- conversations.html: List all conversations with search
- memories.html: Browse and search thoughts

FEATURES:
- Semantic search across thoughts, messages, conversations
- Privacy levels (private, shared, public)
- Thought types (reflection, question, intuition, observation)
- Conversation categories with filtering
- Message ordering and role-based display

DATA (as of 2026-01-08):
- 102 Thoughts
- 377 Messages
- 12 Conversations

DOCUMENTATION:
- memory/README_MCP_TOOLS.md: Complete API reference and usage examples

All MCP tools tested and validated (see test_memory_mcp_tools.py in archive).

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

2026-01-08 18:08:13 +01:00

11 KiB

Raw Blame History

Memory MCP Tools Documentation

Overview

The Memory MCP tools provide a complete interface for managing thoughts, messages, and conversations in the unified Weaviate-based memory system. These tools are integrated into the Library RAG MCP server (generations/library_rag/mcp_server.py) and use GPU-accelerated embeddings for semantic search.

Architecture

Backend: Weaviate 1.34.4 (local instance)
Embeddings: BAAI/bge-m3 model (1024 dimensions, FP16 precision)
GPU: CUDA-enabled (RTX 4070) via PyTorch 2.6.0+cu124
Collections: 3 Weaviate collections (Thought, Message, Conversation)
Integration: FastMCP framework with async handlers

Available Tools

Thought Tools (3)

1. add_thought

Add a new thought to the memory system.

Parameters:

content (str, required): The thought content
thought_type (str, default="reflection"): Type of thought (reflection, question, intuition, observation, etc.)
trigger (str, default=""): What triggered this thought
concepts (list[str], default=[]): Related concepts/tags
privacy_level (str, default="private"): Privacy level (private, shared, public)

Returns:

{
    "success": True,
    "uuid": "730c1a8e-b09f-4889-bbe9-4867d0ee7f1a",
    "content": "This is a test thought...",
    "thought_type": "observation"
}

Example:

result = await add_thought(
    content="Exploring vector databases for semantic search",
    thought_type="observation",
    trigger="Research session",
    concepts=["weaviate", "embeddings", "gpu"],
    privacy_level="private"
)

2. search_thoughts

Search thoughts using semantic similarity.

Parameters:

query (str, required): Search query text
limit (int, default=10, range=1-100): Maximum results to return
thought_type_filter (str, optional): Filter by thought type

Returns:

{
    "success": True,
    "query": "vector databases GPU",
    "results": [
        {
            "uuid": "...",
            "content": "...",
            "thought_type": "observation",
            "timestamp": "2025-01-08T...",
            "trigger": "...",
            "concepts": ["weaviate", "gpu"]
        }
    ],
    "count": 5
}

3. get_thought

Retrieve a specific thought by UUID.

Parameters:

uuid (str, required): Thought UUID

Returns:

{
    "success": True,
    "uuid": "730c1a8e-b09f-4889-bbe9-4867d0ee7f1a",
    "content": "...",
    "thought_type": "observation",
    "timestamp": "2025-01-08T...",
    "trigger": "...",
    "concepts": [...],
    "privacy_level": "private",
    "emotional_state": "",
    "context": ""
}

Message Tools (3)

1. add_message

Add a new message to a conversation.

Parameters:

content (str, required): Message content
role (str, required): Role (user, assistant, system)
conversation_id (str, required): Conversation identifier
order_index (int, default=0): Position in conversation

Returns:

{
    "success": True,
    "uuid": "...",
    "content": "Hello, this is a test...",
    "role": "user",
    "conversation_id": "test_conversation_001"
}

Example:

result = await add_message(
    content="Explain transformers in AI",
    role="user",
    conversation_id="chat_2025_01_08",
    order_index=0
)

2. get_messages

Get all messages from a conversation in order.

Parameters:

conversation_id (str, required): Conversation identifier
limit (int, default=50, range=1-500): Maximum messages to return

Returns:

{
    "success": True,
    "conversation_id": "test_conversation_001",
    "messages": [
        {
            "uuid": "...",
            "content": "...",
            "role": "user",
            "timestamp": "2025-01-08T...",
            "order_index": 0
        },
        {
            "uuid": "...",
            "content": "...",
            "role": "assistant",
            "timestamp": "2025-01-08T...",
            "order_index": 1
        }
    ],
    "count": 2
}

3. search_messages

Search messages using semantic similarity.

Parameters:

query (str, required): Search query text
limit (int, default=10, range=1-100): Maximum results
conversation_id_filter (str, optional): Filter by conversation

Returns:

{
    "success": True,
    "query": "transformers AI",
    "results": [...],
    "count": 5
}

Conversation Tools (3)

1. get_conversation

Get a specific conversation by ID.

Parameters:

conversation_id (str, required): Conversation identifier

Returns:

{
    "success": True,
    "conversation_id": "ikario_derniere_pensee",
    "category": "testing",
    "summary": "Conversation with 2 participants...",
    "timestamp_start": "2025-01-06T...",
    "timestamp_end": "2025-01-06T...",
    "participants": ["assistant", "user"],
    "tags": [],
    "message_count": 19
}

2. search_conversations

Search conversations using semantic similarity on summaries.

Parameters:

query (str, required): Search query text
limit (int, default=10, range=1-50): Maximum results
category_filter (str, optional): Filter by category

Returns:

{
    "success": True,
    "query": "philosophical discussion",
    "results": [
        {
            "conversation_id": "...",
            "category": "philosophy",
            "summary": "...",
            "timestamp_start": "...",
            "timestamp_end": "...",
            "participants": [...],
            "message_count": 25
        }
    ],
    "count": 5
}

3. list_conversations

List all conversations with optional filtering.

Parameters:

limit (int, default=20, range=1-100): Maximum conversations to return
category_filter (str, optional): Filter by category

Returns:

{
    "success": True,
    "conversations": [
        {
            "conversation_id": "...",
            "category": "testing",
            "summary": "Conversation with 2 participants... (truncated)",
            "timestamp_start": "...",
            "message_count": 19,
            "participants": [...]
        }
    ],
    "count": 10
}

Implementation Details

Handler Pattern

All tools follow a consistent async handler pattern:

async def tool_handler(input_data: InputModel) -> Dict[str, Any]:
    """Handler function."""
    try:
        # 1. Connect to Weaviate
        client = weaviate.connect_to_local()

        try:
            # 2. Get GPU embedder (for vectorization)
            embedder = get_embedder()

            # 3. Generate vector (if needed)
            vector = embedder.embed_batch([text])[0]

            # 4. Query/Insert data
            collection = client.collections.get("CollectionName")
            result = collection.data.insert(...)

            # 5. Return success response
            return {"success": True, ...}

        finally:
            client.close()

    except Exception as e:
        return {"success": False, "error": str(e)}

GPU Vectorization

All text content is vectorized using the GPU-accelerated embedder:

from memory.core import get_embedder

embedder = get_embedder()  # Returns PyTorch GPU embedder
vector = embedder.embed_batch([content])[0]  # Returns 1024-dim FP16 vector

Weaviate Connection

Each tool handler creates a new connection and closes it after use:

client = weaviate.connect_to_local()  # Connects to localhost:8080
try:
    # Perform operations
    collection = client.collections.get("Thought")
    # ...
finally:
    client.close()  # Always close connection

Testing

A comprehensive test suite is available at test_memory_mcp_tools.py:

python test_memory_mcp_tools.py

Test Results (2025-01-08):

============================================================
TESTING THOUGHT TOOLS
============================================================
[OK] add_thought: Created thought with UUID
[OK] search_thoughts: Found 5 thoughts
[OK] get_thought: Retrieved thought successfully

============================================================
TESTING MESSAGE TOOLS
============================================================
[OK] add_message: Added 3 messages (user, assistant, user)
[OK] get_messages: Retrieved 3 messages in order
[OK] search_messages: Found 5 messages

============================================================
TESTING CONVERSATION TOOLS
============================================================
[OK] list_conversations: Found 10 conversations
[OK] get_conversation: Retrieved conversation metadata
[OK] search_conversations: Found 5 conversations

[OK] ALL TESTS COMPLETED
============================================================

Integration with MCP Server

The Memory tools are integrated into generations/library_rag/mcp_server.py alongside the existing Library RAG tools:

Total tools available: 17

Library RAG: 8 tools (search_documents, add_document, etc.)
Memory: 9 tools (thought, message, conversation tools)

Configuration: The MCP server is configured in Claude Desktop settings:

{
  "mcpServers": {
    "library-rag": {
      "command": "python",
      "args": ["C:/GitHub/linear_coding_library_rag/generations/library_rag/mcp_server.py"]
    }
  }
}

Error Handling

All tools return consistent error responses:

{
    "success": False,
    "error": "Error message description"
}

Common errors:

Connection errors: "Failed to connect to Weaviate"
Not found: "Conversation {id} not found"
Validation errors: "Invalid parameter: {details}"

Performance

Vectorization: ~50-100ms per text on RTX 4070 GPU
Search latency: <100ms for near-vector queries
Batch operations: Use embedder.embed_batch() for efficiency

Next Steps

Phase 5: Backend Integration (Pending)

Update Flask routes to use Weaviate Memory tools
Replace ChromaDB calls with new MCP tool calls
Connect flask-app frontend to new backend

Module Structure

memory/
├── core/
│   ├── __init__.py        # GPU embedder initialization
│   └── config.py          # Weaviate connection config
├── mcp/
│   ├── __init__.py        # Tool exports
│   ├── thought_tools.py   # Thought handlers
│   ├── message_tools.py   # Message handlers
│   └── conversation_tools.py  # Conversation handlers
└── README_MCP_TOOLS.md    # This file

Dependencies

weaviate-client >= 4.0.0
PyTorch 2.6.0+cu124
transformers (for BAAI/bge-m3)
pydantic (for input validation)
FastMCP framework

Weaviate Schema: memory/schemas/ (Thought, Message, Conversation schemas)
Migration Scripts: memory/migration/ (ChromaDB → Weaviate migration)
Library RAG README: generations/library_rag/README.md

Last Updated: 2025-01-08 Status: Phase 4 Complete ✓ Next Phase: Phase 5 - Backend Integration

11 KiB Raw Blame History

Memory MCP Tools Documentation

Overview

Architecture

Available Tools

Thought Tools (3)

1. add_thought

2. search_thoughts

3. get_thought

Message Tools (3)

1. add_message

2. get_messages

3. search_messages

Conversation Tools (3)

1. get_conversation

2. search_conversations

3. list_conversations

Implementation Details

Handler Pattern

GPU Vectorization

Weaviate Connection

Testing

Integration with MCP Server

Error Handling

Performance

Next Steps

Module Structure

Dependencies

Related Documentation

11 KiB

Raw Blame History