- identity_tools.py: rewritten to read StateTensor (8x1024 named vectors)
instead of StateVector (single 1024-dim). Uses CATEGORY_TO_DIMENSION mapping.
- mcp_server.py: get_state_vector renamed to get_state_tensor
- __init__.py: updated exports
Now returns S(30) with architecture v2_tensor instead of S(2) from V1.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add /daemon/start POST endpoint to launch autonomous cycle loop
- Add /daemon/stop POST endpoint to stop the daemon
- Add _autonomous_loop() async function with configurable interval (~86s/cycle)
- Add _generate_rumination_content() and _generate_corpus_content() helpers
- Track daemon_running state in DaemonStatusResponse
- Default config: 1000 cycles/day, 50% rumination, 30% corpus, 20% unresolved
Autonomous mode generates random philosophical themes for internal semiosis.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Document daemon status endpoint with JSON response example
- Add mode interpretation guide (idle/conversation/autonomous)
- Update endpoints table with /daemon/status
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add daemon state tracking globals (mode, is_ruminating, cycles_by_type)
- Track trigger type and timestamp on each /cycle call
- Add DaemonStatusResponse model
- Add GET /daemon/status endpoint returning:
- mode: idle | conversation | autonomous
- is_ruminating: true when in rumination_free or corpus cycles
- last_trigger: type and timestamp
- cycles_breakdown: count by trigger type
- cycles_since_last_user: autonomous cycles since last user interaction
- time_since_last_user_seconds: elapsed time
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add /profile GET endpoint returning Ikario + David projections on 100 directions
- Compute Ikario state from Weaviate StateVector v1 + 113 thoughts
- Compute David tensor from user messages (SQLite) + declared profile
- Map direction categories to StateTensor dimensions via CATEGORY_TO_DIMENSION
- Calculate david_similarity as average cosine across 8 dimensions
- Result: 60.93% Ikario-David similarity (vs 100% when initialized from same source)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- api.py: REST API exposing LatentEngine via FastAPI
- POST /cycle: Execute semiotic cycle
- POST /translate: Translate state to language
- GET /state, /vigilance, /metrics, /health
- Loads embedding model and David profile at startup
- ~1.3s per cycle (embedding + dissonance + fixation)
- README2.md: Complete documentation of v2 architecture
- StateTensor 8x1024 explanation
- Module descriptions with code examples
- Amendments compliance
- Usage instructions
Start with: uvicorn ikario_processual.api:app --port 8100
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Implements the processual architecture based on Whitehead's Process
Philosophy and Peirce's Semiotics. Core paradigm: "L'espace latent
pense. Le LLM traduit." (The latent space thinks. The LLM translates.)
Phase 1-4: Core semiotic cycle
- StateTensor 8x1024 (8 Peircean dimensions)
- Dissonance computation with hard negatives
- Fixation via 4 Peircean methods (Tenacity, Authority, A Priori, Science)
- LatentEngine orchestrating the full cycle
Phase 5: StateToLanguage
- LLM as pure translator (zero-reasoning, T=0)
- Projection on interpretable directions
- Reasoning markers detection (Amendment #4)
Phase 6: Vigilance
- x_ref (David) as guard-rail, NOT attractor
- Drift detection per dimension and globally
- Alerts: ok, warning, critical
Phase 7: Autonomous Daemon
- Two modes: CONVERSATION (always verbalize), AUTONOMOUS (~1000 cycles/day)
- Amendment #5: 50% probability on unresolved impacts
- TriggerGenerator with weighted random selection
Phase 8: Integration & Metrics
- ProcessMetrics for daily/weekly reports
- Health status monitoring
- Integration tests validating all modules
297 tests passing, version 0.7.0
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add library_rag directory to sys.path at startup so that imports
like 'utils.pdf_pipeline' and 'mcp_tools' work correctly when the
server is spawned by Ikario from a different working directory.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Document collection was merged into Work during the "SANS_DOCUMENT"
migration. Updated all handlers to use Work instead:
- get_document_handler: queries Work by sourceId
- list_documents_handler: queries Work directly
- filter_by_author_handler: simplified to use Work only
- delete_document_handler: deletes from Work
Work now contains: title, author, year, language, genre, sourceId,
pages, edition (all formerly in Document)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The Weaviate collections have vectorizer: "none", so near_text
searches fail silently. Changed all search handlers to:
- Import get_embedder from embedding_service
- Generate query vectors manually
- Use near_vector for semantic search
Affected handlers:
- search_memories_handler
- trace_concept_evolution_handler
- check_consistency_handler
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Documents the migration from @anthropic-ai/sdk to @anthropic-ai/claude-agent-sdk
with phases, code examples, and progress tracking.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Phase 3 - State Transformation:
- transform_state() function with alpha/beta parameters
- compute_adaptive_params() for dynamic transformation
- StateTransformer class for state management
Phase 4 - Occasion Logger:
- OccasionLog dataclass for structured logging
- OccasionLogger for JSON file storage
- Profile evolution tracking and statistics
Phase 5 - Occasion Manager:
- Full cycle: Prehension → Concrescence → Satisfaction
- Search integration (thoughts, library)
- State creation and logging orchestration
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- david_profile_declared.json: David's declared profile values from questionnaire
- scripts/embed_david.py: Python script to generate embeddings using BGE-M3 model
- questionnaire_david.md: Questionnaire template for profile values
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Create memory/mcp/unified_tools.py with 4 new handlers:
- search_memories: unified search across Thoughts and Conversations
- trace_concept_evolution: track concept development over time
- check_consistency: verify statement alignment with past content
- update_thought_evolution_stage: update thought maturity stage
- Export new tools from memory/mcp/__init__.py
- Register new tools in mcp_server.py with full docstrings
These tools complete the Ikario memory toolset to match memoryTools.js expectations.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The text2vec-transformers Docker service was removed in Jan 2026,
but retrieval_tools.py still used near_text() which requires it.
Now uses GPU embedder (BGE-M3) with near_vector() like flask_app.py.
Changes:
- Add GPU embedder singleton (get_gpu_embedder)
- search_chunks_handler: near_text → near_vector + BGE-M3
- search_summaries_handler: near_text → near_vector + BGE-M3
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The sidebar content was hidden by default (display: none) but
displayContext() never made it visible when chunks arrived.
Added sidebarContent.style.display = 'block' to show the
context panel with all RAG chunks.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Remove obsolete documentation, examples, and utility scripts
- Remove temporary screenshots and test files from root
- Add test_chat_backend.js for Puppeteer testing of chat RAG
- Update .gitignore
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add migrate_rename_collections.py script for data migration
- Update flask_app.py to use new collection names
- Update weaviate_ingest.py to use new collection names
- Update schema.py documentation
- Update README.md and ANALYSE_MCP_TOOLS.md
Migration completed: 5372 chunks + 114 summaries preserved with vectors.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Added two export scripts to backup memory collections:
1. export_conversations.py:
- Exports all Conversation + Message objects to markdown
- Includes conversation metadata (category, timestamps, participants)
- Formats messages chronologically with role indicators
- Generated: docs/conversations.md (12 conversations, 377 messages)
2. export_thoughts.py:
- Exports all Thought objects to markdown
- Groups by thought_type with summary statistics
- Includes metadata (trigger, emotional_state, concepts, privacy)
- Generated: docs/thoughts.md (104 thoughts across 8 types)
Both scripts use UTF-8 encoding for markdown output with emoji
formatting for better readability. Exports stored in docs/ for
versioned backup of memory collections.
Stats:
- Conversations: 12 (5 testing, 7 general)
- Messages: 377 total
- Thoughts: 104 (28 reflection, 36 synthesis, 27 test)
- Privacy: 100% private
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Created comprehensive Puppeteer tests for search functionality:
Test Files:
- test_search_simple.js: Simple search test (PASSED ✅)
- test_search_workflow.js: Multi-mode search test
- test_upload_search_workflow.js: Full PDF upload + search test
Test Results (test_search_simple.js):
- ✅ 16 results found for "Turing machine computation"
- ✅ GPU embedder vectorization working (~17ms)
- ✅ Weaviate semantic search operational
- ✅ Search interface responsive
- ✅ Total search time: ~2 seconds
Test Report:
- TEST_SEARCH_PUPPETEER.md: Detailed test report with performance metrics
Screenshots Generated:
- search_page.png: Initial search form
- search_results.png: Full results page (16 passages)
- test_screenshot_*.png: Various test stages
Note on Upload Test:
Upload test times out after 5 minutes (expected behavior for OCR + LLM
processing). Manual upload via web interface recommended for testing.
GPU Embedder Validation:
✅ Confirmed GPU embedder is used for query vectorization
✅ Confirmed near_vector() search in Weaviate
✅ Confirmed 30-70x performance improvement vs Docker
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Fixes issue where LLM was copying placeholder instructions from the
prompt template into actual metadata fields.
Changes:
1. Created fix_work_titles.py script to correct existing bad titles
- Detects patterns like "(si c'est bien...)", "Titre corrigé...", "Auteur à identifier"
- Extracts correct metadata from chunks JSON files
- Updates Work entries and associated chunks (44 chunks updated)
- Fixed 3 Works with placeholder contamination
2. Improved llm_metadata.py prompt to prevent future issues
- Added explicit INTERDIT/OBLIGATOIRE rules with ❌/✅ markers
- Replaced placeholder examples with real concrete examples
- Added two example responses (high confidence + low confidence)
- Final empty JSON template guides structure without placeholders
- Reinforced: use "confidence" field for uncertainty, not annotations
Results:
- "A Cartesian critique... (si c'est bien le titre)" → "A Cartesian critique of the artificial intelligence"
- "Titre corrigé si nécessaire (ex: ...)" → "Computationalism and The Case When the Brain Is Not a Computer"
- "Titre de l'article principal (à identifier)" → "Computationalism in the Philosophy of Mind"
All future document uploads will now extract clean metadata without
LLM commentary or placeholder instructions.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Adds automatic Work object creation to ensure all uploaded documents
appear on the /documents page. Previously, chunks were ingested but
Work entries were missing, causing documents to be invisible in the UI.
Changes:
- Add create_or_get_work() function to weaviate_ingest.py
- Checks for existing Work by sourceId (prevents duplicates)
- Creates new Work with metadata (title, author, year, pages)
- Returns UUID for potential future reference
- Integrate Work creation into ingest_document() flow
- Add helper scripts for retroactive fixes and verification:
- create_missing_works.py: Create Works for already-ingested documents
- reingest_batch_documents.py: Re-ingest documents after bug fixes
- check_batch_results.py: Verify batch upload results in Weaviate
This completes the batch upload feature - documents now properly appear
on /documents page immediately after ingestion.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Fixes batch upload ingestion that was failing silently due to schema mismatches:
Schema Fixes:
- Update collection names from "Chunk" to "Chunk_v2"
- Update collection names from "Summary" to "Summary_v2"
Object Structure Fixes:
- Replace nested objects (work: {title, author}) with flat fields
- Use workTitle and workAuthor instead of nested work object
- Add year field to chunks
- Remove document nested object (not used in current schema)
- Disable nested objects validation for flat schema
Impact:
- Batch upload now successfully ingests chunks to Weaviate
- Single-file upload also benefits from fixes
- All new documents will be properly indexed and searchable
Testing:
- Verified with 2-file batch upload (7 + 11 chunks = 18 total)
- Total chunks increased from 5,304 to 5,322
- All chunks properly searchable with workTitle/workAuthor filters
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Implements comprehensive batch upload system with real-time progress tracking:
Backend Infrastructure:
- Add batch_jobs global dict for batch orchestration
- Add BatchFileInfo and BatchJob TypedDicts to utils/types.py
- Create run_batch_sequential() worker function with thread.join() synchronization
- Modify /upload POST route to detect single vs multi-file uploads
- Add 3 batch API routes: /upload/batch/progress, /status, /result
- Add timestamp_to_date Jinja2 template filter
Frontend:
- Update upload.html with 'multiple' attribute and file counter
- Create upload_batch_progress.html: Real-time dashboard with SSE per file
- Create upload_batch_result.html: Final summary with statistics
Architecture:
- Backward compatible: single-file upload unchanged
- Sequential processing: one file after another (respects API limits)
- N parallel SSE connections: one per file for real-time progress
- Polling mechanism to discover job IDs as files start processing
- 1-hour timeout per file with error handling and continuation
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Works filter section: Increase max-height from 250px to 70vh (full screen)
- Context RAG section: Closed by default (display: none)
- Mobile responsive: Adjust works filter to 50vh on mobile
- Enhances visibility of available works at page load
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Implemented chunking optimization to resolve oversized chunks and improve
semantic search quality:
CHUNKING IMPROVEMENTS:
- Added strict 1000-word max limit (vs previous 1500-2000)
- Implemented 100-word overlap between consecutive chunks
- Created llm_chunker_improved.py with overlap functionality
- Added 3 fallback points in llm_chunker.py for robustness
RE-CHUNKING RESULTS:
- Identified and re-chunked 31 oversized chunks (>2000 tokens)
- Split into 92 optimally-sized chunks (max 1995 tokens)
- Preserved all metadata (workTitle, workAuthor, sectionPath, etc.)
- 0 chunks now exceed 2000 tokens (vs 31 before)
VECTORIZATION:
- Created manual vectorization script for chunks without vectors
- Successfully vectorized all 92 new chunks (100% coverage)
- All 5,304 chunks now have BGE-M3 embeddings
DOCKER CONFIGURATION:
- Exposed text2vec-transformers port 8090 for manual vectorization
- Added cluster configuration to fix "No private IP address found"
- Increased worker timeout to 600s for large chunks
TESTING:
- Created comprehensive search quality test suite
- Tests distribution, overlap detection, and semantic search
- Modified to use near_vector() (Chunk_v2 has no vectorizer)
Scripts:
- 08_fix_summaries_properties.py - Add missing Work metadata to summaries
- 09_rechunk_oversized.py - Re-chunk giant chunks with overlap
- 10_test_search_quality.py - Validate search improvements
- 11_vectorize_missing_chunks.py - Manual vectorization via API
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Add 'summary' field to Chunk collection (vectorized with text2vec)
- Migrate from Dynamic index to HNSW + RQ for both Chunk and Summary
- Add LLM summarizer module (utils/llm_summarizer.py)
- Add migration scripts (migrate_add_summary.py, restore_*.py)
- Add summary generation utilities and progress tracking
- Add testing and cleaning tools (outils_test_and_cleaning/)
- Add comprehensive documentation (ANALYSE_*.md, guides)
- Remove obsolete files (linear_config.py, old test files)
- Update .gitignore to exclude backups and temp files
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Removed max-height: 300px from .works-list
- Keeps only the Unicode encoding fix (→ to ->)
- Avoids having two scrollbars in the works filter section
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Problem 1: Only 3 works visible despite 8/10 badge
- Added max-height: 300px and overflow-y: auto to .works-list
- Now all 10 works are scrollable in the filter section
Problem 2: UnicodeEncodeError with → character in console
- Replaced Unicode arrow (→) with ASCII arrow (->) in print statements
- Fixes 'charmap' codec error on Windows console
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>