Update framework configuration and clean up obsolete specs

Configuration updates:
- Added .env.example template for environment variables
- Updated README.md with better setup instructions (.env usage)
- Enhanced .claude/settings.local.json with additional Bash permissions
- Added .claude/CLAUDE.md framework documentation

Spec cleanup:
- Removed obsolete spec files (language_selection, mistral_extensible, template, theme_customization)
- Consolidated app_spec.txt (Claude Clone example)
- Added app_spec_model.txt as reference template
- Added app_spec_library_rag_types_docs.txt
- Added coding_prompt_library.md

Framework improvements:
- Updated agent.py, autonomous_agent_demo.py, client.py with minor fixes
- Enhanced dockerize_my_project.py
- Updated prompts (initializer, initializer_bis) with better guidance
- Added docker-compose.my_project.yml example

This commit consolidates improvements made during development sessions.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This commit is contained in:
2025-12-25 12:53:14 +01:00
parent bf790b63a0
commit 2e33637dae
27 changed files with 3862 additions and 2378 deletions

File diff suppressed because it is too large Load Diff

View File

@@ -1,179 +0,0 @@
<project_specification>
<project_name>Language selection & i18n completion (FR default, EN/FR only)</project_name>
<overview>
This specification complements the existing "app_spec_language_selection.txt" file.
It does NOT replace the original spec. Instead, it adds additional requirements and
corrective steps to fully complete the language selection and i18n implementation.
Main goals:
- Support exactly two UI languages: English ("en") and French ("fr").
- Make French ("fr") the default language when no preference exists.
- Ensure that all user-facing text is translated via the i18n system (no hardcoded strings).
- Align the language selector UI with the actual supported languages.
</overview>
<relationship_to_original_spec>
- The original file "app_spec_language_selection.txt" defines the initial language selection
feature and i18n architecture (context, translation files, etc.).
- This completion spec:
* keeps that architecture,
* tightens some requirements (FR as default),
* and adds missing work items (removal of hardcoded English strings, cleanup of extra languages).
- The original spec remains valid; this completion spec should be applied on top of it.
</relationship_to_original_spec>
<constraints>
- Officially supported UI languages:
* English ("en")
* French ("fr")
- Default language:
* French ("fr") MUST be the default language when there is no stored preference.
- No other languages (es, de, ja, etc.) are considered part of this completion scope.
They may be added later in a separate spec with full translation coverage.
- The existing i18n architecture (LanguageContext, useLanguage hook, en.json, fr.json)
must be reused, not replaced.
</constraints>
<current_state_summary>
- LanguageContext and useLanguage already exist and manage language + translations.
- en.json and fr.json exist with a significant subset of strings translated.
- Some components already call t('...') correctly (e.g. welcome screen, many settings labels).
- However:
* Many UI strings are still hardcoded in English in "src/App.jsx".
* The language selector UI mentions more languages than are actually implemented.
* The default language behavior is not explicitly enforced as French.
</current_state_summary>
<target_state>
- French is used as the default language for new/anonymous users.
- Only English and French appear in the language selector.
- All user-facing UI strings in "src/App.jsx" and its inline components use t('key').
- Every key used by the UI is defined in both en.json and fr.json.
- No leftover English UI text appears when French is selected.
</target_state>
<implementation_details>
<default_language>
- In the language context code:
* Ensure there is a constant DEFAULT_LANGUAGE set to "fr".
Example:
const DEFAULT_LANGUAGE = 'fr';
- Initial language resolution MUST follow this order:
1. If a valid language ("en" or "fr") is found in localStorage, use it.
2. Otherwise, fall back to DEFAULT_LANGUAGE = "fr".
- This guarantees that first-time users and users without a stored preference see the UI in French.
</default_language>
<supported_languages>
- SUPPORTED_LANGUAGES must contain exactly:
* { code: 'en', name: 'English', nativeName: 'English' }
* { code: 'fr', name: 'French', nativeName: 'Français' }
- The Settings > Language dropdown must iterate only over SUPPORTED_LANGUAGES.
- Any explicit references to "es", "de", "ja" as selectable languages must be removed
or commented out as "future languages" (but not shown to users).
</supported_languages>
<hardcoded_strings_audit>
- Perform a systematic audit of "src/App.jsx" to identify every user-visible English string
that is still hardcoded. Typical areas include:
* ThemePreview sample messages (e.g. “Hello! Can you help me with something?”).
* About section in Settings > General (product name, description, “Built with …” text).
* Default model description and option labels.
* Project modals: “Cancel”, “Save Changes”, etc.
* Any toasts, confirmation messages, help texts, or labels still in English.
- For each identified string:
* Define a stable translation key (e.g. "themePreview.sampleUser1",
"settings.defaultModelDescription", "projectModal.cancel", "projectModal.saveChanges").
* Add this key to both en.json and fr.json.
</hardcoded_strings_audit>
<refactor_to_use_t>
- Replace each hardcoded string with a call to the translation function, for example:
BEFORE:
<p>Hello! Can you help me with something?</p>
AFTER:
<p>{t('themePreview.sampleUser1')}</p>
- Ensure that:
* The component (or function) imports useLanguage.
* const { t } = useLanguage() is declared in the correct scope.
- Apply this systematically across:
* Settings / General and Appearance sections.
* Theme preview component.
* Project-related modals.
* Any remaining banners, tooltips, or messages defined inside App.jsx.
</refactor_to_use_t>
<translation_files_update>
- Update translations/en.json:
* Add all new keys with natural English text.
- Update translations/fr.json:
* Add the same keys with accurate French translations.
- Goal:
* For every key used in code, both en.json and fr.json must contain a value.
</translation_files_update>
<fallback_behavior>
- Keep existing fallback behavior in LanguageContext:
* If a key is missing in the current language, fall back to English.
* If the key is also missing in English, return the key and log a warning.
- However, after this completion spec is implemented:
* No fallback warnings should appear in normal operation, because all keys are defined.
</fallback_behavior>
<settings_language_section>
- In the Settings > General tab:
* The language section heading must be translated via t('settings.language').
* Any helper text/description for the language selector must also use t('...').
* The select's value is bound to the language from useLanguage.
* The onChange handler calls setLanguage(newLanguageCode).
- Expected behavior:
* Switching to French instantly updates the UI and saves "fr" in localStorage.
* Switching to English instantly updates the UI and saves "en" in localStorage.
</settings_language_section>
</implementation_details>
<testing_plan>
<manual_tests>
1. Clear the language preference from localStorage.
2. Load the application:
- Confirm that the UI is initially in French (FR as default).
3. Open the Settings modal and navigate to the General tab.
- Verify that the language selector shows only "Français" and "English".
4. Switch to English:
- Verify that Sidebar, Settings, Welcome screen, Chat area, and modals are all in English.
5. Refresh the page:
- Confirm that the UI stays in English (preference persisted).
6. Switch back to French and repeat quick checks to confirm all UI text is in French.
</manual_tests>
<coverage_checks>
- Check in both languages:
* Main/empty state (welcome screen).
* Chat area (input placeholder, send/stop/regenerate buttons).
* Sidebar (navigation sections, search placeholder, pinned/archived labels).
* Settings (all tabs).
* Project creation and edit modals.
* Delete/confirmation dialogs and any share/export flows.
- Confirm:
* In French, there is no remaining English UI text.
* In English, there is no accidental French UI text.
</coverage_checks>
<regression>
- Verify:
* Chat behavior is unchanged except for translated labels/text.
* Project operations (create/update/delete) still work.
* No new console errors appear when switching languages or reloading.
</regression>
</testing_plan>
<success_criteria>
- "app_spec_language_selection.txt" remains the original base spec.
- This completion spec ("app_spec_language_selection.completion.txt") is fully implemented.
- French is used as default language when no preference exists.
- Only English and French are presented in the language selector.
- All user-facing strings in App.jsx go through t('key') and exist in both en.json and fr.json.
- No stray English text is visible when the French language is selected.
</success_criteria>
</project_specification>

View File

@@ -1,525 +0,0 @@
<project_specification>
<project_name>Claude.ai Clone - Language Selection Bug Fix</project_name>
<overview>
This specification fixes a bug in the language selection functionality. The feature was
originally planned in the initial app_spec.txt (line 127: "Language preferences") and a UI
component already exists in the settings panel (App.jsx lines 1412-1419), but the functionality
is incomplete and non-functional.
Currently, there is a language selector dropdown in the settings with options for English,
Español, Français, Deutsch, and 日本語, but it lacks:
- State management for the selected language
- Event handlers (onChange) to handle language changes
- A translation system (i18n)
- Translation files (en.json, fr.json, etc.)
- Language context/provider
- Persistence of language preference
This bug fix will complete the implementation by adding the missing functionality so that when
a language is selected, the entire interface updates immediately to display all text in the
chosen language. The language preference should persist across sessions.
Focus will be on English (default) and French as the primary languages, with the existing
UI supporting additional languages for future expansion.
</overview>
<current_state>
<existing_ui>
Location: src/App.jsx, lines 1412-1419
Component: Language selector dropdown in settings panel (General/Preferences section)
Current options: English (en), Español (es), Français (fr), Deutsch (de), 日本語 (ja)
Status: UI exists but is non-functional (no onChange handler, no state, no translations)
</existing_ui>
<specification_reference>
Original spec: prompts/app_spec.txt, line 127
Mentioned as: "Language preferences" in settings_preferences section
Status: Feature was planned but not fully implemented
</specification_reference>
</current_state>
<safety_requirements>
<critical>
- DO NOT remove or modify the existing language selector UI (lines 1412-1419 in App.jsx)
- DO NOT break existing functionality when language is changed
- English must remain the default language
- Language changes should apply immediately without page refresh
- All translations must be complete (no missing translations)
- Maintain backward compatibility with existing code
- Language preference should be stored and persist across sessions
- Keep the existing dropdown structure and styling
- Connect the existing select element to the new translation system
</critical>
</safety_requirements>
<bug_fixes>
<fix_language_selection>
<title>Fix Language Selection Functionality</title>
<description>
Complete the implementation of the existing language selector in the settings menu.
The UI already exists (App.jsx lines 1412-1419) but needs to be made functional.
The fix should:
- Connect the existing select element to state management
- Add onChange handler to the existing select element
- Display current selected language (load from localStorage on mount)
- Apply language changes immediately to the entire interface
- Save language preference to localStorage
- Persist language choice across sessions
The existing selector is already in the correct location (General/Preferences section
of settings panel) and has the correct styling, so only the functionality needs to be added.
</description>
<priority>1</priority>
<category>bug_fix</category>
<type>completion_of_existing_feature</type>
<implementation_approach>
- Keep the existing select element in App.jsx (lines 1412-1419)
- Add useState hook to manage selected language state
- Add value prop to select element (bound to state)
- Add onChange handler to select element
- Load language preference from localStorage on component mount
- Save language preference to localStorage on change
- Create translation files/dictionaries for each language
- Implement language context/provider to manage current language
- Create translation utility function to retrieve translated strings
- Update all hardcoded text to use translation function
- Apply language changes reactively throughout the application
</implementation_approach>
<test_steps>
1. Open settings menu
2. Navigate to "General" or "Preferences" section
3. Locate the existing "Language" selector (should already be visible)
4. Verify the select element now has a value bound to state (not empty)
5. Verify default language is "English" (en) on first load
6. Select "Français" (fr) from the existing language dropdown
7. Verify onChange handler fires and updates state
8. Verify entire interface updates immediately to French
9. Check that all UI elements are translated (buttons, labels, menus)
10. Navigate to different pages and verify translations persist
11. Refresh the page and verify language preference is maintained (loaded from localStorage)
12. Switch back to "English" and verify interface returns to English
13. Test with new conversations and verify messages/UI are in selected language
14. Verify the existing select element styling and structure remain unchanged
</test_steps>
</fix_language_selection>
<fix_translation_system>
<title>Translation System Infrastructure</title>
<description>
Implement a translation system that:
- Stores translations for English and French
- Provides a translation function/utility to retrieve translated strings
- Supports dynamic language switching
- Handles missing translations gracefully (fallback to English)
- Organizes translations by feature/component
Translation keys should be organized logically:
- Common UI elements (buttons, labels, placeholders)
- Settings panel
- Chat interface
- Navigation menus
- Error messages
- Success messages
- Tooltips and help text
</description>
<priority>1</priority>
<category>infrastructure</category>
<type>new_implementation</type>
<implementation_approach>
- Create translation files (JSON or JS objects):
* translations/en.json (English)
* translations/fr.json (French)
- Create translation context/provider (React Context)
- Create useTranslation hook for components
- Create translation utility function (t() or translate())
- Organize translations by namespace/feature
- Implement fallback mechanism for missing translations
- Ensure type safety for translation keys (TypeScript if applicable)
</implementation_approach>
<test_steps>
1. Verify translation files exist for both languages
2. Test translation function with valid keys
3. Test translation function with invalid keys (should fallback)
4. Verify all translation keys have values in both languages
5. Test language switching updates all components
6. Verify no console errors when switching languages
</test_steps>
</fix_translation_system>
<fix_ui_translations>
<title>Complete UI Translation Coverage</title>
<description>
Translate all user-facing text in the application to support both English and French:
Navigation & Menus:
- Sidebar navigation items
- Menu labels
- Breadcrumbs
Chat Interface:
- Input placeholder text
- Send button
- Message status indicators
- Empty state messages
- Loading states
Settings:
- All setting section titles
- Setting option labels
- Setting descriptions
- Save/Cancel buttons
Buttons & Actions:
- Primary action buttons
- Secondary buttons
- Delete/Remove actions
- Edit actions
- Save actions
Messages & Notifications:
- Success messages
- Error messages
- Warning messages
- Info messages
Forms:
- Form labels
- Input placeholders
- Validation messages
- Help text
Modals & Dialogs:
- Modal titles
- Modal content
- Confirmation dialogs
- Cancel/Confirm buttons
</description>
<priority>1</priority>
<category>ui</category>
<type>translation_implementation</type>
<implementation_approach>
- Audit all hardcoded text in the application
- Replace all hardcoded strings with translation function calls
- Create translation keys for each text element
- Add French translations for all keys
- Test each screen/page to ensure complete translation coverage
- Verify no English text remains when French is selected
</implementation_approach>
<test_steps>
1. Set language to French
2. Navigate through all pages/screens
3. Verify every text element is translated
4. Check all buttons, labels, placeholders
5. Test all modals and dialogs
6. Verify form validation messages
7. Check error and success notifications
8. Verify no English text appears when French is selected
9. Repeat test with English to ensure nothing broke
</test_steps>
</fix_ui_translations>
<fix_language_persistence>
<title>Language Preference Persistence</title>
<description>
Ensure that the selected language preference is saved and persists across:
- Page refreshes
- Browser sessions
- Tab closures
- Application restarts
The language preference should be:
- Stored in localStorage (client-side) or backend user preferences
- Loaded on application startup
- Applied immediately when the app loads
- Synchronized if user is logged in (multi-device support)
</description>
<priority>1</priority>
<category>persistence</category>
<type>bug_fix</type>
<implementation_approach>
- Save language selection to localStorage on change
- Load language preference on app initialization
- Apply saved language before rendering UI
- Optionally sync with backend user preferences if available
- Handle case where no preference is saved (default to English)
</implementation_approach>
<test_steps>
1. Select French language
2. Refresh the page
3. Verify interface is still in French
4. Close browser tab and reopen
5. Verify language preference persists
6. Clear localStorage and verify defaults to English
7. Select language again and verify it saves
</test_steps>
</fix_language_persistence>
</bug_fixes>
<implementation_notes>
<existing_code>
Location: src/App.jsx, lines 1412-1419
Current code:
```jsx
<div>
<h3 className="text-sm font-medium text-gray-900 dark:text-gray-100 mb-3">Language</h3>
<select className="w-full px-3 py-2 bg-white dark:bg-gray-700 border border-gray-300 dark:border-gray-600 rounded-lg text-sm text-gray-900 dark:text-gray-100 focus:ring-2 focus:ring-claude-orange focus:border-transparent">
<option value="en">English</option>
<option value="es">Español</option>
<option value="fr">Français</option>
<option value="de">Deutsch</option>
<option value="ja">日本語</option>
</select>
</div>
```
Required changes:
- Add value={language} to select element
- Add onChange={(e) => setLanguage(e.target.value)} to select element
- Add useState for language state
- Load from localStorage on mount
- Save to localStorage on change
</existing_code>
<code_structure>
frontend/
src/
App.jsx # UPDATE: Add language state and connect existing select
components/
LanguageSelector.jsx # Optional: Extract to component if needed (NEW)
contexts/
LanguageContext.jsx # Language context provider (NEW)
hooks/
useLanguage.js # Hook to access language and translations (NEW)
utils/
translations.js # Translation utility functions (NEW)
translations/
en.json # English translations (NEW)
fr.json # French translations (NEW)
</code_structure>
<translation_structure>
Translation files should be organized by feature/namespace:
{
"common": {
"save": "Save",
"cancel": "Cancel",
"delete": "Delete",
"edit": "Edit",
...
},
"settings": {
"title": "Settings",
"language": "Language",
"theme": "Theme",
...
},
"chat": {
"placeholder": "Message Claude...",
"send": "Send",
...
},
...
}
</translation_structure>
<storage_approach>
Store language preference in:
- localStorage key: "app_language" (value: "en" or "fr")
- Or backend user preferences if available:
{
language: "en" | "fr"
}
Default value: "en" (English)
</storage_approach>
<translation_function>
Example implementation:
- useTranslation() hook returns { t, language, setLanguage }
- t(key) function retrieves translation for current language
- t("common.save") returns "Save" (en) or "Enregistrer" (fr)
- Supports nested keys: t("settings.general.title")
- Falls back to English if translation missing
</translation_function>
<safety_guidelines>
- Keep all existing functionality intact
- Default to English if no language preference set
- Gracefully handle missing translations (fallback to English)
- Ensure language changes don't cause re-renders that break functionality
- Test thoroughly to ensure no English text remains when French is selected
- Maintain code readability with clear translation key naming
</safety_guidelines>
</implementation_notes>
<ui_components>
<language_selector>
<description>Language selector component in settings (ALREADY EXISTS - needs functionality)</description>
<location>Settings > General/Preferences section (App.jsx lines 1412-1419)</location>
<current_state>
- UI exists with dropdown/select element
- Has 5 language options: English (en), Español (es), Français (fr), Deutsch (de), 日本語 (ja)
- Styling is already correct
- Missing: value binding, onChange handler, state management
</current_state>
<required_changes>
- Add value prop bound to language state
- Add onChange handler to update language state
- Connect to translation system
- Add persistence (localStorage)
</required_changes>
<display>
- Keep existing dropdown/select element (no UI changes needed)
- Shows current selection (via value prop)
- Updates interface immediately on change (via onChange)
</display>
</language_selector>
</ui_components>
<translation_coverage>
<required_translations>
All text visible to users must be translated:
- Navigation menu items
- Page titles and headers
- Button labels
- Form labels and placeholders
- Input field labels
- Error messages
- Success messages
- Tooltips
- Help text
- Modal titles and content
- Dialog confirmations
- Empty states
- Loading states
- Settings labels and descriptions
- Chat interface elements
</required_translations>
<translation_examples>
English -> French:
- "Settings" -> "Paramètres"
- "Save" -> "Enregistrer"
- "Cancel" -> "Annuler"
- "Delete" -> "Supprimer"
- "Language" -> "Langue"
- "Theme" -> "Thème"
- "Send" -> "Envoyer"
- "New Conversation" -> "Nouvelle conversation"
- "Message Claude..." -> "Message à Claude..."
</translation_examples>
</translation_coverage>
<api_endpoints>
<if_backend_storage>
If storing language preference in backend:
- GET /api/user/preferences - Get user preferences (includes language)
- PUT /api/user/preferences - Update user preferences (includes language)
- GET /api/user/preferences/language - Get language preference only
- PUT /api/user/preferences/language - Update language preference only
</if_backend_storage>
<note>
If using localStorage only, no API endpoints needed.
Backend storage is optional but recommended for multi-device sync.
</note>
</api_endpoints>
<accessibility_requirements>
- Language selector must be keyboard navigable
- Language changes must be announced to screen readers
- Translation quality must be accurate (no machine translation errors)
- Text direction should be handled correctly (LTR for both languages)
- Font rendering should support both languages properly
</accessibility_requirements>
<testing_requirements>
<regression_tests>
- Verify existing functionality works in both languages
- Verify language change doesn't break any features
- Test that default language (English) still works as before
- Verify all existing features are accessible in both languages
</regression_tests>
<feature_tests>
- Test language selector in settings
- Test immediate language change on selection
- Test language persistence across page refresh
- Test language persistence across browser sessions
- Test all UI elements are translated
- Test translation fallback for missing keys
- Test switching between languages multiple times
- Verify no English text appears when French is selected
- Verify all pages/screens are translated
</feature_tests>
<translation_tests>
- Verify all translation keys have values in both languages
- Test translation accuracy (no machine translation errors)
- Verify consistent terminology across the application
- Test special characters and accents in French
- Verify text doesn't overflow UI elements in French (may be longer)
</translation_tests>
<compatibility_tests>
- Test with different browsers (Chrome, Firefox, Safari, Edge)
- Test with different screen sizes (responsive design)
- Test language switching during active conversations
- Test language switching with modals open
- Verify language preference syncs across tabs (if applicable)
</compatibility_tests>
</testing_requirements>
<summary>
<bug_description>
The language selection feature was planned in the original specification (app_spec.txt line 127)
and a UI component was created (App.jsx lines 1412-1419), but the implementation is incomplete.
The select dropdown exists but has no functionality - it lacks state management, event handlers,
and a translation system.
</bug_description>
<fix_scope>
This is a bug fix that completes the existing feature by:
1. Connecting the existing UI to state management
2. Adding the missing translation system
3. Implementing language persistence
4. Translating all UI text to support English and French
</fix_scope>
<key_principle>
DO NOT remove or significantly modify the existing language selector UI. Only add the
missing functionality to make it work.
</key_principle>
</summary>
<success_criteria>
<functionality>
- Users can select language from the existing settings dropdown (English or French)
- Language changes apply immediately to entire interface
- Language preference persists across sessions
- All UI elements are translated when language is changed
- English remains the default language
- No functionality is broken by language changes
- The existing select element in App.jsx (lines 1412-1419) is now functional
</functionality>
<user_experience>
- Language selector is easy to find in settings
- Language change is instant and smooth
- All text is properly translated (no English text in French mode)
- Translations are accurate and natural
- Interface layout works well with both languages
</user_experience>
<technical>
- Translation system is well-organized and maintainable
- Translation keys are logically structured
- Language preference is stored reliably
- No performance degradation with language switching
- Code is clean and follows existing patterns
- Easy to add more languages in the future
</technical>
</success_criteria>
</project_specification>

View File

@@ -0,0 +1,679 @@
<project_specification>
<project_name>Library RAG - Type Safety & Documentation Enhancement</project_name>
<overview>
Enhance the Library RAG application (philosophical texts indexing and semantic search) by adding
strict type annotations and comprehensive Google-style docstrings to all Python modules. This will
improve code maintainability, enable static type checking with mypy, and provide clear documentation
for all functions, classes, and modules.
The application is a RAG pipeline that processes PDF documents through OCR, LLM-based extraction,
semantic chunking, and ingestion into Weaviate vector database. It includes a Flask web interface
for document upload, processing, and semantic search.
</overview>
<technology_stack>
<backend>
<runtime>Python 3.10+</runtime>
<web_framework>Flask 3.0</web_framework>
<vector_database>Weaviate 1.34.4 with text2vec-transformers</vector_database>
<ocr>Mistral OCR API</ocr>
<llm>Ollama (local) or Mistral API</llm>
<type_checking>mypy with strict configuration</type_checking>
</backend>
<infrastructure>
<containerization>Docker Compose (Weaviate + transformers)</containerization>
<dependencies>weaviate-client, flask, mistralai, python-dotenv</dependencies>
</infrastructure>
</technology_stack>
<current_state>
<project_structure>
- flask_app.py: Main Flask application (640 lines)
- schema.py: Weaviate schema definition (383 lines)
- utils/: 16+ modules for PDF processing pipeline
- pdf_pipeline.py: Main orchestration (879 lines)
- mistral_client.py: OCR API client
- ocr_processor.py: OCR processing
- markdown_builder.py: Markdown generation
- llm_metadata.py: Metadata extraction via LLM
- llm_toc.py: Table of contents extraction
- llm_classifier.py: Section classification
- llm_chunker.py: Semantic chunking
- llm_cleaner.py: Chunk cleaning
- llm_validator.py: Document validation
- weaviate_ingest.py: Database ingestion
- hierarchy_parser.py: Document hierarchy parsing
- image_extractor.py: Image extraction from PDFs
- toc_extractor*.py: Various TOC extraction methods
- templates/: Jinja2 templates for Flask UI
- tests/utils2/: Minimal test coverage (3 test files)
</project_structure>
<issues>
- Inconsistent type annotations across modules (some have partial types, many have none)
- Missing or incomplete docstrings (no Google-style format)
- No mypy configuration for strict type checking
- Type hints missing on function parameters and return values
- Dict[str, Any] used extensively without proper typing
- No type stubs for complex nested structures
</issues>
</current_state>
<core_features>
<type_annotations>
<strict_typing>
- Add complete type annotations to ALL functions and methods
- Use proper generic types (List, Dict, Optional, Union) from typing module
- Add TypedDict for complex dictionary structures
- Add Protocol types for duck-typed interfaces
- Use Literal types for string constants
- Add ParamSpec and TypeVar where appropriate
- Type all class attributes and instance variables
- Add type annotations to lambda functions where possible
</strict_typing>
<mypy_configuration>
- Create mypy.ini with strict configuration
- Enable: check_untyped_defs, disallow_untyped_defs, disallow_incomplete_defs
- Enable: disallow_untyped_calls, disallow_untyped_decorators
- Enable: warn_return_any, warn_redundant_casts
- Enable: strict_equality, strict_optional
- Set python_version to 3.10
- Configure per-module overrides if needed for gradual migration
</mypy_configuration>
<type_stubs>
- Create TypedDict definitions for common data structures:
- OCR response structures
- Metadata dictionaries
- TOC entries
- Chunk objects
- Weaviate objects
- Pipeline results
- Add NewType for semantic type safety (DocumentName, ChunkId, etc.)
- Create Protocol types for callback functions
</type_stubs>
<specific_improvements>
- pdf_pipeline.py: Type all 10 pipeline steps, callbacks, result dictionaries
- flask_app.py: Type all route handlers, request/response types
- schema.py: Type Weaviate configuration objects
- llm_*.py: Type LLM request/response structures
- mistral_client.py: Type API client methods and responses
- weaviate_ingest.py: Type ingestion functions and batch operations
</specific_improvements>
</type_annotations>
<documentation>
<google_style_docstrings>
- Add comprehensive Google-style docstrings to ALL:
- Module-level docstrings explaining purpose and usage
- Class docstrings with Attributes section
- Function/method docstrings with Args, Returns, Raises sections
- Complex algorithm explanations with Examples section
- Include code examples for public APIs
- Document all exceptions that can be raised
- Add Notes section for important implementation details
- Add See Also section for related functions
</google_style_docstrings>
<module_documentation>
<utils_modules>
- pdf_pipeline.py: Document the 10-step pipeline, each step's purpose
- mistral_client.py: Document OCR API usage, cost calculation
- llm_metadata.py: Document metadata extraction logic
- llm_toc.py: Document TOC extraction strategies
- llm_classifier.py: Document section classification types
- llm_chunker.py: Document semantic vs basic chunking
- llm_cleaner.py: Document cleaning rules and validation
- llm_validator.py: Document validation criteria
- weaviate_ingest.py: Document ingestion process, nested objects
- hierarchy_parser.py: Document hierarchy building algorithm
</utils_modules>
<flask_app>
- Document all routes with request/response examples
- Document SSE (Server-Sent Events) implementation
- Document Weaviate query patterns
- Document upload processing workflow
- Document background job management
</flask_app>
<schema>
- Document Weaviate schema design decisions
- Document each collection's purpose and relationships
- Document nested object structure
- Document vectorization strategy
</schema>
</module_documentation>
<inline_comments>
- Add inline comments for complex logic only (don't over-comment)
- Explain WHY not WHAT (code should be self-documenting)
- Document performance considerations
- Document cost implications (OCR, LLM API calls)
- Document error handling strategies
</inline_comments>
</documentation>
<validation>
<type_checking>
- All modules must pass mypy --strict
- No # type: ignore comments without justification
- CI/CD should run mypy checks
- Type coverage should be 100%
</type_checking>
<documentation_quality>
- All public functions must have docstrings
- All docstrings must follow Google style
- Examples should be executable and tested
- Documentation should be clear and concise
</documentation_quality>
</validation>
</core_features>
<implementation_priority>
<critical_modules>
Priority 1 (Most used, most complex):
1. utils/pdf_pipeline.py - Main orchestration
2. flask_app.py - Web application entry point
3. utils/weaviate_ingest.py - Database operations
4. schema.py - Schema definition
Priority 2 (Core LLM modules):
5. utils/llm_metadata.py
6. utils/llm_toc.py
7. utils/llm_classifier.py
8. utils/llm_chunker.py
9. utils/llm_cleaner.py
10. utils/llm_validator.py
Priority 3 (OCR and parsing):
11. utils/mistral_client.py
12. utils/ocr_processor.py
13. utils/markdown_builder.py
14. utils/hierarchy_parser.py
15. utils/image_extractor.py
Priority 4 (Supporting modules):
16. utils/toc_extractor.py
17. utils/toc_extractor_markdown.py
18. utils/toc_extractor_visual.py
19. utils/llm_structurer.py (legacy)
</critical_modules>
</implementation_priority>
<implementation_steps>
<feature_1>
<title>Setup Type Checking Infrastructure</title>
<description>
Configure mypy with strict settings and create foundational type definitions
</description>
<tasks>
- Create mypy.ini configuration file with strict settings
- Add mypy to requirements.txt or dev dependencies
- Create utils/types.py module for common TypedDict definitions
- Define core types: OCRResponse, Metadata, TOCEntry, ChunkData, PipelineResult
- Add NewType definitions for semantic types: DocumentName, ChunkId, SectionPath
- Create Protocol types for callbacks (ProgressCallback, etc.)
- Document type definitions in utils/types.py module docstring
- Test mypy configuration on a single module to verify settings
</tasks>
<acceptance_criteria>
- mypy.ini exists with strict configuration
- utils/types.py contains all foundational types with docstrings
- mypy runs without errors on utils/types.py
- Type definitions are comprehensive and reusable
</acceptance_criteria>
</feature_1>
<feature_2>
<title>Add Types to PDF Pipeline Orchestration</title>
<description>
Add complete type annotations to pdf_pipeline.py (879 lines, most complex module)
</description>
<tasks>
- Add type annotations to all function signatures in pdf_pipeline.py
- Type the 10-step pipeline: OCR, Markdown, Metadata, TOC, Classify, Chunk, Clean, Validate, Weaviate
- Type progress_callback parameter with Protocol or Callable
- Add TypedDict for pipeline options dictionary
- Add TypedDict for pipeline result dictionary structure
- Type all helper functions (extract_document_metadata_legacy, etc.)
- Add proper return types for process_pdf_v2, process_pdf, process_pdf_bytes
- Fix any mypy errors that arise
- Verify mypy --strict passes on pdf_pipeline.py
</tasks>
<acceptance_criteria>
- All functions in pdf_pipeline.py have complete type annotations
- progress_callback is properly typed with Protocol
- All Dict[str, Any] replaced with TypedDict where appropriate
- mypy --strict pdf_pipeline.py passes with zero errors
- No # type: ignore comments (or justified if absolutely necessary)
</acceptance_criteria>
</feature_2>
<feature_3>
<title>Add Types to Flask Application</title>
<description>
Add complete type annotations to flask_app.py and type all routes
</description>
<tasks>
- Add type annotations to all Flask route handlers
- Type request.args, request.form, request.files usage
- Type jsonify() return values
- Type get_weaviate_client context manager
- Type get_collection_stats, get_all_chunks, search_chunks functions
- Add TypedDict for Weaviate query results
- Type background job processing functions (run_processing_job)
- Type SSE generator function (upload_progress)
- Add type hints for template rendering
- Verify mypy --strict passes on flask_app.py
</tasks>
<acceptance_criteria>
- All Flask routes have complete type annotations
- Request/response types are clear and documented
- Weaviate query functions are properly typed
- SSE generator is correctly typed
- mypy --strict flask_app.py passes with zero errors
</acceptance_criteria>
</feature_3>
<feature_4>
<title>Add Types to Core LLM Modules</title>
<description>
Add complete type annotations to all LLM processing modules (metadata, TOC, classifier, chunker, cleaner, validator)
</description>
<tasks>
- llm_metadata.py: Type extract_metadata function, return structure
- llm_toc.py: Type extract_toc function, TOC hierarchy structure
- llm_classifier.py: Type classify_sections, section types (Literal), validation functions
- llm_chunker.py: Type chunk_section_with_llm, chunk objects
- llm_cleaner.py: Type clean_chunk, is_chunk_valid functions
- llm_validator.py: Type validate_document, validation result structure
- Add TypedDict for LLM request/response structures
- Type provider selection ("ollama" | "mistral" as Literal)
- Type model names with Literal or constants
- Verify mypy --strict passes on all llm_*.py modules
</tasks>
<acceptance_criteria>
- All LLM modules have complete type annotations
- Section types use Literal for type safety
- Provider and model parameters are strongly typed
- LLM request/response structures use TypedDict
- mypy --strict passes on all llm_*.py modules with zero errors
</acceptance_criteria>
</feature_4>
<feature_5>
<title>Add Types to Weaviate and Database Modules</title>
<description>
Add complete type annotations to schema.py and weaviate_ingest.py
</description>
<tasks>
- schema.py: Type Weaviate configuration objects
- schema.py: Type collection property definitions
- weaviate_ingest.py: Type ingest_document function signature
- weaviate_ingest.py: Type delete_document_chunks function
- weaviate_ingest.py: Add TypedDict for Weaviate object structure
- Type batch insertion operations
- Type nested object references (work, document)
- Add proper error types for Weaviate exceptions
- Verify mypy --strict passes on both modules
</tasks>
<acceptance_criteria>
- schema.py has complete type annotations for Weaviate config
- weaviate_ingest.py functions are fully typed
- Nested object structures use TypedDict
- Weaviate client operations are properly typed
- mypy --strict passes on both modules with zero errors
</acceptance_criteria>
</feature_5>
<feature_6>
<title>Add Types to OCR and Parsing Modules</title>
<description>
Add complete type annotations to mistral_client.py, ocr_processor.py, markdown_builder.py, hierarchy_parser.py
</description>
<tasks>
- mistral_client.py: Type create_client, run_ocr, estimate_ocr_cost
- mistral_client.py: Add TypedDict for Mistral API response structures
- ocr_processor.py: Type serialize_ocr_response, OCR object structures
- markdown_builder.py: Type build_markdown, image_writer parameter
- hierarchy_parser.py: Type build_hierarchy, flatten_hierarchy functions
- hierarchy_parser.py: Add TypedDict for hierarchy node structure
- image_extractor.py: Type create_image_writer, image handling
- Verify mypy --strict passes on all modules
</tasks>
<acceptance_criteria>
- All OCR/parsing modules have complete type annotations
- Mistral API structures use TypedDict
- Hierarchy nodes are properly typed
- Image handling functions are typed
- mypy --strict passes on all modules with zero errors
</acceptance_criteria>
</feature_6>
<feature_7>
<title>Add Google-Style Docstrings to Core Modules</title>
<description>
Add comprehensive Google-style docstrings to pdf_pipeline.py, flask_app.py, and weaviate modules
</description>
<tasks>
- pdf_pipeline.py: Add module docstring explaining the V2 pipeline
- pdf_pipeline.py: Add docstrings to process_pdf_v2 with Args, Returns, Raises sections
- pdf_pipeline.py: Document each of the 10 pipeline steps in comments
- pdf_pipeline.py: Add Examples section showing typical usage
- flask_app.py: Add module docstring explaining Flask application
- flask_app.py: Document all routes with request/response examples
- flask_app.py: Document Weaviate connection management
- schema.py: Add module docstring explaining schema design
- schema.py: Document each collection's purpose and relationships
- weaviate_ingest.py: Document ingestion process with examples
- All docstrings must follow Google style format exactly
</tasks>
<acceptance_criteria>
- All core modules have comprehensive module-level docstrings
- All public functions have Google-style docstrings
- Args, Returns, Raises sections are complete and accurate
- Examples are provided for complex functions
- Docstrings explain WHY, not just WHAT
</acceptance_criteria>
</feature_7>
<feature_8>
<title>Add Google-Style Docstrings to LLM Modules</title>
<description>
Add comprehensive Google-style docstrings to all LLM processing modules
</description>
<tasks>
- llm_metadata.py: Document metadata extraction logic with examples
- llm_toc.py: Document TOC extraction strategies and fallbacks
- llm_classifier.py: Document section types and classification criteria
- llm_chunker.py: Document semantic vs basic chunking approaches
- llm_cleaner.py: Document cleaning rules and validation logic
- llm_validator.py: Document validation criteria and corrections
- Add Examples sections showing input/output for each function
- Document LLM provider differences (Ollama vs Mistral)
- Document cost implications in Notes sections
- All docstrings must follow Google style format exactly
</tasks>
<acceptance_criteria>
- All LLM modules have comprehensive docstrings
- Each function has Args, Returns, Raises sections
- Examples show realistic input/output
- Provider differences are documented
- Cost implications are noted where relevant
</acceptance_criteria>
</feature_8>
<feature_9>
<title>Add Google-Style Docstrings to OCR and Parsing Modules</title>
<description>
Add comprehensive Google-style docstrings to OCR, markdown, hierarchy, and extraction modules
</description>
<tasks>
- mistral_client.py: Document OCR API usage, cost calculation
- ocr_processor.py: Document OCR response processing
- markdown_builder.py: Document markdown generation strategy
- hierarchy_parser.py: Document hierarchy building algorithm
- image_extractor.py: Document image extraction process
- toc_extractor*.py: Document various TOC extraction methods
- Add Examples sections for complex algorithms
- Document edge cases and error handling
- All docstrings must follow Google style format exactly
</tasks>
<acceptance_criteria>
- All OCR/parsing modules have comprehensive docstrings
- Complex algorithms are well explained
- Edge cases are documented
- Error handling is documented
- Examples demonstrate typical usage
</acceptance_criteria>
</feature_9>
<feature_10>
<title>Final Validation and CI Integration</title>
<description>
Verify all type annotations and docstrings, integrate mypy into CI/CD
</description>
<tasks>
- Run mypy --strict on entire codebase, verify 100% pass rate
- Verify all public functions have docstrings
- Check docstring formatting with pydocstyle or similar tool
- Create GitHub Actions workflow to run mypy on every commit
- Update README.md with type checking instructions
- Update CLAUDE.md with documentation standards
- Create CONTRIBUTING.md with type annotation and docstring guidelines
- Generate API documentation with Sphinx or pdoc
- Fix any remaining mypy errors or missing docstrings
</tasks>
<acceptance_criteria>
- mypy --strict passes on entire codebase with zero errors
- All public functions have Google-style docstrings
- CI/CD runs mypy checks automatically
- Documentation is generated and accessible
- Contributing guidelines document type/docstring requirements
</acceptance_criteria>
</feature_10>
</implementation_steps>
<success_criteria>
<type_safety>
- 100% type coverage across all modules
- mypy --strict passes with zero errors
- No # type: ignore comments without justification
- All Dict[str, Any] replaced with TypedDict where appropriate
- Proper use of generics, protocols, and type variables
- NewType used for semantic type safety
</type_safety>
<documentation_quality>
- All modules have comprehensive module-level docstrings
- All public functions/classes have Google-style docstrings
- All docstrings include Args, Returns, Raises sections
- Complex functions include Examples sections
- Cost implications documented in Notes sections
- Error handling clearly documented
- Provider differences (Ollama vs Mistral) documented
</documentation_quality>
<code_quality>
- Code is self-documenting with clear variable names
- Inline comments explain WHY, not WHAT
- Complex algorithms are well explained
- Performance considerations documented
- Security considerations documented
</code_quality>
<developer_experience>
- IDE autocomplete works perfectly with type hints
- Type errors caught at development time, not runtime
- Documentation is easily accessible in IDE
- API examples are executable and tested
- Contributing guidelines are clear and comprehensive
</developer_experience>
<maintainability>
- Refactoring is safer with type checking
- Function signatures are self-documenting
- API contracts are explicit and enforced
- Breaking changes are caught by type checker
- New developers can understand code quickly
</maintainability>
</success_criteria>
<constraints>
<compatibility>
- Must maintain backward compatibility with existing code
- Cannot break existing Flask routes or API contracts
- Weaviate schema must remain unchanged
- Existing tests must continue to pass
</compatibility>
<gradual_migration>
- Can use per-module mypy configuration for gradual migration
- Can temporarily disable strict checks on legacy modules
- Priority modules must be completed first
- Low-priority modules can be deferred
</gradual_migration>
<standards>
- All type annotations must use Python 3.10+ syntax
- Docstrings must follow Google style exactly (not NumPy or reStructuredText)
- Use typing module (List, Dict, Optional) until Python 3.9 support dropped
- Use from __future__ import annotations if needed for forward references
</standards>
</constraints>
<testing_strategy>
<type_checking>
- Run mypy --strict on each module after adding types
- Use mypy daemon (dmypy) for faster incremental checking
- Add mypy to pre-commit hooks
- CI/CD must run mypy and fail on type errors
</type_checking>
<documentation_validation>
- Use pydocstyle to validate Google-style format
- Use sphinx-build to generate docs and catch errors
- Manual review of docstring examples
- Verify examples are executable and correct
</documentation_validation>
<integration_testing>
- Verify existing tests still pass after type additions
- Add new tests for complex typed structures
- Test mypy configuration on sample code
- Verify IDE autocomplete works correctly
</integration_testing>
</testing_strategy>
<documentation_examples>
<module_docstring>
```python
"""
PDF Pipeline V2 - Intelligent document processing with LLM enhancement.
This module orchestrates a 10-step pipeline for processing PDF documents:
1. OCR via Mistral API
2. Markdown construction with images
3. Metadata extraction via LLM
4. Table of contents (TOC) extraction
5. Section classification
6. Semantic chunking
7. Chunk cleaning and validation
8. Enrichment with concepts
9. Validation and corrections
10. Ingestion into Weaviate vector database
The pipeline supports multiple LLM providers (Ollama local, Mistral API) and
various processing modes (skip OCR, semantic chunking, OCR annotations).
Typical usage:
>>> from pathlib import Path
>>> from utils.pdf_pipeline import process_pdf
>>>
>>> result = process_pdf(
... Path("document.pdf"),
... use_llm=True,
... llm_provider="ollama",
... ingest_to_weaviate=True,
... )
>>> print(f"Processed {result['pages']} pages, {result['chunks_count']} chunks")
See Also:
mistral_client: OCR API client
llm_metadata: Metadata extraction
weaviate_ingest: Database ingestion
"""
```
</module_docstring>
<function_docstring>
```python
def process_pdf_v2(
pdf_path: Path,
output_dir: Path = Path("output"),
*,
use_llm: bool = True,
llm_provider: Literal["ollama", "mistral"] = "ollama",
llm_model: Optional[str] = None,
skip_ocr: bool = False,
ingest_to_weaviate: bool = True,
progress_callback: Optional[ProgressCallback] = None,
) -> PipelineResult:
"""
Process a PDF through the complete V2 pipeline with LLM enhancement.
This function orchestrates all 10 steps of the intelligent document processing
pipeline, from OCR to Weaviate ingestion. It supports both local (Ollama) and
cloud (Mistral API) LLM providers, with optional caching via skip_ocr.
Args:
pdf_path: Absolute path to the PDF file to process.
output_dir: Base directory for output files. Defaults to "./output".
use_llm: Enable LLM-based processing (metadata, TOC, chunking).
If False, uses basic heuristic processing.
llm_provider: LLM provider to use. "ollama" for local (free but slow),
"mistral" for API (fast but paid).
llm_model: Specific model name. If None, auto-detects based on provider
(qwen2.5:7b for ollama, mistral-small-latest for mistral).
skip_ocr: If True, reuses existing markdown file to avoid OCR cost.
Requires output_dir/<doc_name>/<doc_name>.md to exist.
ingest_to_weaviate: If True, ingests chunks into Weaviate after processing.
progress_callback: Optional callback for real-time progress updates.
Called with (step_id, status, detail) for each pipeline step.
Returns:
Dictionary containing processing results with the following keys:
- success (bool): True if processing completed without errors
- document_name (str): Name of the processed document
- pages (int): Number of pages in the PDF
- chunks_count (int): Number of chunks generated
- cost_ocr (float): OCR cost in euros (0 if skip_ocr=True)
- cost_llm (float): LLM API cost in euros (0 if provider=ollama)
- cost_total (float): Total cost (ocr + llm)
- metadata (dict): Extracted metadata (title, author, etc.)
- toc (list): Hierarchical table of contents
- files (dict): Paths to generated files (markdown, chunks, etc.)
Raises:
FileNotFoundError: If pdf_path does not exist.
ValueError: If skip_ocr=True but markdown file not found.
RuntimeError: If Weaviate connection fails during ingestion.
Examples:
Basic usage with Ollama (free):
>>> result = process_pdf_v2(
... Path("platon_menon.pdf"),
... llm_provider="ollama"
... )
>>> print(f"Cost: {result['cost_total']:.4f}€")
Cost: 0.0270€ # OCR only
With Mistral API (faster):
>>> result = process_pdf_v2(
... Path("platon_menon.pdf"),
... llm_provider="mistral",
... llm_model="mistral-small-latest"
... )
Skip OCR to avoid cost:
>>> result = process_pdf_v2(
... Path("platon_menon.pdf"),
... skip_ocr=True, # Reuses existing markdown
... ingest_to_weaviate=False
... )
Notes:
- OCR cost: ~0.003€/page (standard), ~0.009€/page (with annotations)
- LLM cost: Free with Ollama, variable with Mistral API
- Processing time: ~30s/page with Ollama, ~5s/page with Mistral
- Weaviate must be running (docker-compose up -d) before ingestion
"""
```
</function_docstring>
</documentation_examples>
</project_specification>

View File

@@ -1,448 +0,0 @@
<project_specification>
<project_name>Claude.ai Clone - Multi-Provider Support (Mistral + Extensible)</project_name>
<overview>
This specification adds Mistral AI model support AND creates an extensible provider architecture
that makes it easy to add additional AI providers (OpenAI, Gemini, etc.) in the future.
This uses the "Open/Closed Principle" - open for extension, closed for modification.
All changes are additive and backward-compatible. Existing Claude functionality remains unchanged.
</overview>
<safety_requirements>
<critical>
- DO NOT modify existing Claude API integration code directly
- DO NOT change existing model selection logic for Claude models
- DO NOT modify existing database schema without safe migrations
- DO NOT break existing conversations or messages
- All new code must be in separate files/modules when possible
- Test thoroughly before marking issues as complete
- Maintain backward compatibility at all times
- Refactor Claude code to use BaseProvider WITHOUT changing functionality
</critical>
</safety_requirements>
<architecture_design>
<provider_pattern>
Create an abstract provider interface that all AI providers implement:
- BaseProvider (abstract class/interface) - defines common interface
- ClaudeProvider (existing code refactored to extend BaseProvider)
- MistralProvider (new, extends BaseProvider)
- OpenAIProvider (future, extends BaseProvider - easy to add)
- GeminiProvider (future, extends BaseProvider - easy to add)
</provider_pattern>
<benefits>
- Easy to add new providers without modifying existing code
- Consistent interface across all providers
- Isolated error handling per provider
- Unified model selection UI
- Shared functionality (streaming, error handling, logging)
- Future-proof architecture
</benefits>
</architecture_design>
<new_features>
<feature_provider_architecture>
<title>Extensible Provider Architecture (Foundation)</title>
<description>
Create a provider abstraction layer that allows easy addition of multiple AI providers.
This is the foundation that makes adding OpenAI, Gemini, etc. trivial in the future.
BaseProvider abstract class should define:
- sendMessage(messages, options) -&gt; Promise&lt;response&gt;
- streamMessage(messages, options) -&gt; AsyncGenerator&lt;chunk&gt;
- getModels() -&gt; Promise&lt;array&gt; of available models
- validateApiKey(key) -&gt; Promise&lt;boolean&gt;
- getCapabilities() -&gt; object with provider capabilities
- getName() -&gt; string (provider name: 'claude', 'mistral', 'openai', etc.)
- getDefaultModel() -&gt; string (default model ID for this provider)
ProviderRegistry should:
- Register all available providers
- Provide list of all providers
- Check which providers are configured (have API keys)
- Enable/disable providers
ProviderFactory should:
- Create provider instances based on model ID or provider name
- Handle provider selection logic
- Route requests to correct provider
</description>
<priority>1</priority>
<category>functional</category>
<implementation_approach>
- Create server/providers/BaseProvider.js (abstract base class)
- Refactor existing Claude code to server/providers/ClaudeProvider.js (extends BaseProvider)
- Create server/providers/ProviderRegistry.js (manages all providers)
- Create server/providers/ProviderFactory.js (creates provider instances)
- Update existing routes to use ProviderFactory instead of direct Claude calls
- Keep all provider code in server/providers/ directory
</implementation_approach>
<test_steps>
1. Verify Claude still works after refactoring to use BaseProvider
2. Test that ProviderFactory creates ClaudeProvider correctly
3. Test that ProviderRegistry lists Claude provider
4. Verify error handling works correctly
5. Test that adding a mock provider is straightforward
6. Verify no regression in existing Claude functionality
</test_steps>
</feature_provider_architecture>
<feature_mistral_provider>
<title>Mistral Provider Implementation</title>
<description>
Implement MistralProvider extending BaseProvider. This should:
- Implement all BaseProvider abstract methods
- Handle Mistral-specific API calls (https://api.mistral.ai/v1/chat/completions)
- Support Mistral streaming (Server-Sent Events)
- Handle Mistral-specific error codes and messages
- Provide Mistral model list:
* mistral-large-latest (default)
* mistral-medium-latest
* mistral-small-latest
* mistral-7b-instruct
- Manage Mistral API authentication
- Return responses in unified format (same as Claude)
</description>
<priority>2</priority>
<category>functional</category>
<implementation_approach>
- Create server/providers/MistralProvider.js
- Extend BaseProvider class
- Implement Mistral API integration using fetch or axios
- Register in ProviderRegistry
- Use same response format as ClaudeProvider for consistency
</implementation_approach>
<test_steps>
1. Test MistralProvider.sendMessage() works with valid API key
2. Test MistralProvider.streamMessage() works
3. Test MistralProvider.getModels() returns correct models
4. Test error handling for invalid API key
5. Test error handling for API rate limits
6. Verify it integrates with ProviderFactory
7. Verify responses match expected format
</test_steps>
</feature_mistral_provider>
<feature_unified_model_selector>
<title>Unified Model Selector (All Providers)</title>
<description>
Update model selector to dynamically load models from all registered providers.
The selector should:
- Query all providers for available models via GET /api/models
- Group models by provider (Claude, Mistral, etc.)
- Display provider badges/icons next to model names
- Show which provider each model belongs to
- Filter models by provider (optional toggle)
- Show provider-specific capabilities (streaming, images, etc.)
- Only show models from providers with configured API keys
- Handle providers gracefully (show "Configure API key" if not set)
</description>
<priority>2</priority>
<category>functional</category>
<implementation_approach>
- Create API endpoint: GET /api/models (returns all models from all providers)
- Update frontend ModelSelector component to handle multiple providers
- Add provider grouping/filtering in UI
- Show provider badges/icons next to model names
- Group models by provider with collapsible sections
- Show provider status (configured/not configured)
</implementation_approach>
<test_steps>
1. Verify model selector shows Claude models (existing functionality)
2. Verify model selector shows Mistral models (if key configured)
3. Test grouping by provider works
4. Test filtering by provider works
5. Verify provider badges display correctly
6. Test that providers without API keys show "Configure" message
7. Verify selecting a model works for both providers
</test_steps>
</feature_unified_model_selector>
<feature_provider_settings>
<title>Multi-Provider API Key Management</title>
<description>
Create unified API key management that supports multiple providers. Users should be able to:
- Manage API keys for each provider separately (Claude, Mistral, OpenAI, etc.)
- See which providers are available
- See which providers are configured (have API keys)
- Test each provider's API key independently
- Enable/disable providers (hide models if key not configured)
- See provider status indicators (configured/not configured/error)
- Update or remove API keys for any provider
- See usage statistics per provider
</description>
<priority>2</priority>
<category>functional</category>
<implementation_approach>
- Create server/routes/providers.js with unified provider management
- Update settings UI to show provider cards (one per provider)
- Each provider card has:
* Provider name and logo/icon
* API key input field (masked)
* "Test Connection" button
* Status indicator (green/yellow/red)
* Enable/disable toggle
- Store keys in api_keys table with key_name = 'claude_api_key', 'mistral_api_key', etc.
- Use same encryption method for all providers
</implementation_approach>
<test_steps>
1. Configure Claude API key (verify existing functionality still works)
2. Configure Mistral API key
3. Verify both keys are stored separately
4. Test each provider's "Test Connection" button
5. Remove one key and verify only that provider's models are hidden
6. Verify provider status indicators update correctly
7. Test that disabling a provider hides its models
</test_steps>
</feature_provider_settings>
<feature_database_provider_support>
<title>Database Support for Multiple Providers (Future-Proof)</title>
<description>
Update database schema to support multiple providers in a future-proof way.
This should:
- Add provider field to conversations table (TEXT, default: 'claude')
- Add provider field to messages/usage_tracking (TEXT, default: 'claude')
- Use TEXT field (not ENUM) to allow easy addition of new providers without schema changes
- Migration should be safe, idempotent, and backward compatible
- All existing records default to 'claude' provider
- Add indexes for performance on provider queries
</description>
<priority>1</priority>
<category>functional</category>
<implementation_approach>
- Create migration: server/migrations/add_provider_support.sql
- Use TEXT field (not ENUM) for provider name (allows 'claude', 'mistral', 'openai', etc.)
- Default all existing records to 'claude'
- Add indexes on provider columns for performance
- Make migration idempotent (can run multiple times safely)
- Create rollback script if needed
</implementation_approach>
<test_steps>
1. Backup existing database
2. Run migration script
3. Verify all existing conversations have provider='claude'
4. Verify all existing messages have provider='claude' (via usage_tracking)
5. Create new conversation with Mistral provider
6. Verify provider='mistral' is saved correctly
7. Query conversations by provider (test index performance)
8. Verify existing Claude conversations still work
9. Test rollback script if needed
</test_steps>
</feature_database_provider_support>
<feature_unified_chat_endpoint>
<title>Unified Chat Endpoint (Works with Any Provider)</title>
<description>
Update chat endpoints to use ProviderFactory, making them work with any provider.
The endpoint should:
- Accept provider or model ID in request
- Use ProviderFactory to get correct provider
- Route request to appropriate provider
- Return unified response format
- Handle provider-specific errors gracefully
- Support streaming for all providers that support it
</description>
<priority>1</priority>
<category>functional</category>
<implementation_approach>
- Update POST /api/chat to use ProviderFactory
- Update POST /api/chat/stream to use ProviderFactory
- Extract provider from model ID or accept provider parameter
- Route to correct provider instance
- Return unified response format
</implementation_approach>
<test_steps>
1. Test POST /api/chat with Claude model (verify no regression)
2. Test POST /api/chat with Mistral model
3. Test POST /api/chat/stream with Claude (verify streaming still works)
4. Test POST /api/chat/stream with Mistral
5. Test error handling for invalid provider
6. Test error handling for missing API key
</test_steps>
</feature_unified_chat_endpoint>
</new_features>
<future_extensibility>
<openai_provider_example>
<title>How to Add OpenAI in the Future</title>
<description>
To add OpenAI support later, simply follow these steps (NO changes to existing code needed):
1. Create server/providers/OpenAIProvider.js extending BaseProvider
2. Implement OpenAI API calls (https://api.openai.com/v1/chat/completions)
3. Register in ProviderRegistry: ProviderRegistry.register('openai', OpenAIProvider)
4. That's it! OpenAI models will automatically appear in model selector.
Example OpenAIProvider structure:
- Extends BaseProvider
- Implements sendMessage() using OpenAI API
- Implements streamMessage() for streaming support
- Returns models: gpt-4, gpt-3.5-turbo, etc.
- Handles OpenAI-specific authentication and errors
</description>
</openai_provider_example>
<other_providers>
<note>
Same pattern works for any AI provider:
- Google Gemini (GeminiProvider)
- Cohere (CohereProvider)
- Any other AI API that follows similar patterns
Just create a new Provider class extending BaseProvider and register it.
</note>
</other_providers>
</future_extensibility>
<implementation_notes>
<code_structure>
server/
providers/
BaseProvider.js # Abstract base class (NEW)
ClaudeProvider.js # Refactored Claude (extends BaseProvider)
MistralProvider.js # New Mistral (extends BaseProvider)
ProviderRegistry.js # Manages all providers (NEW)
ProviderFactory.js # Creates provider instances (NEW)
routes/
providers.js # Unified provider management (NEW)
chat.js # Updated to use ProviderFactory
migrations/
add_provider_support.sql # Database migration (NEW)
</code_structure>
<safety_guidelines>
- Refactor Claude code to use BaseProvider WITHOUT changing functionality
- All providers are isolated - errors in one don't affect others
- Database changes are backward compatible (TEXT field, not ENUM)
- Existing conversations default to 'claude' provider
- Test Claude thoroughly after refactoring
- Use feature flags if needed to enable/disable providers
- Log all provider operations separately for debugging
</safety_guidelines>
<error_handling>
- Each provider handles its own errors
- Provider errors should NOT affect other providers
- Show user-friendly error messages
- Log errors with provider context
- Don't throw unhandled exceptions
</error_handling>
</implementation_notes>
<database_changes>
<safe_migrations>
<migration_1>
<description>Add provider support (TEXT field for extensibility)</description>
<sql>
-- Add provider column to conversations (TEXT allows any provider name)
-- Default to 'claude' for backward compatibility
ALTER TABLE conversations
ADD COLUMN provider TEXT DEFAULT 'claude';
-- Add provider column to usage_tracking
ALTER TABLE usage_tracking
ADD COLUMN provider TEXT DEFAULT 'claude';
-- Add indexes for performance
CREATE INDEX IF NOT EXISTS idx_conversations_provider
ON conversations(provider);
CREATE INDEX IF NOT EXISTS idx_usage_tracking_provider
ON usage_tracking(provider);
</sql>
<rollback>
-- Rollback script (use with caution - may cause data issues)
DROP INDEX IF EXISTS idx_conversations_provider;
DROP INDEX IF EXISTS idx_usage_tracking_provider;
-- Note: SQLite doesn't support DROP COLUMN easily
-- Would need to recreate table without provider column
</rollback>
<note>
Using TEXT instead of ENUM allows adding new providers (OpenAI, Gemini, etc.)
without database schema changes in the future. This is future-proof.
</note>
</migration_1>
</safe_migrations>
<data_integrity>
- All existing conversations default to provider='claude'
- All existing messages default to provider='claude'
- Migration is idempotent (can run multiple times safely)
- No data loss during migration
- Existing queries continue to work
</data_integrity>
</database_changes>
<api_endpoints>
<new_endpoints>
- GET /api/models - Get all models from all configured providers
- GET /api/providers - Get list of available providers and their status
- POST /api/providers/:provider/key - Set API key for specific provider
- POST /api/providers/:provider/test - Test provider API key
- GET /api/providers/:provider/status - Get provider configuration status
- DELETE /api/providers/:provider/key - Remove provider API key
</new_endpoints>
<updated_endpoints>
- POST /api/chat - Updated to use ProviderFactory (works with any provider)
* Accepts: { model: 'model-id', messages: [...], ... }
* Provider is determined from model ID or can be specified
- POST /api/chat/stream - Updated to use ProviderFactory (streaming for any provider)
* Same interface, works with any provider that supports streaming
</updated_endpoints>
</api_endpoints>
<dependencies>
<backend>
- No new dependencies required (use native fetch for Mistral API)
- Optional: @mistralai/mistralai (only if provides significant value)
- Keep dependencies minimal to avoid conflicts
</backend>
</dependencies>
<testing_requirements>
<regression_tests>
- Verify all existing Claude functionality still works
- Test that existing conversations load correctly
- Verify Claude model selection still works
- Test Claude API endpoints are unaffected
- Verify database queries for Claude still work
- Test Claude streaming still works
</regression_tests>
<integration_tests>
- Test switching between Claude and Mistral models
- Test conversations with different providers
- Test error handling doesn't affect other providers
- Test migration doesn't break existing data
- Test ProviderFactory routes correctly
- Test unified model selector with multiple providers
</integration_tests>
<extensibility_tests>
- Verify adding a mock provider is straightforward
- Test that ProviderFactory correctly routes to providers
- Verify provider isolation (errors don't propagate)
- Test that new providers automatically appear in UI
</extensibility_tests>
</testing_requirements>
<success_criteria>
<functionality>
- Claude functionality works exactly as before (no regression)
- Mistral models appear in selector and work correctly
- Users can switch between Claude and Mistral seamlessly
- API key management works for both providers
- Database migration is safe and backward compatible
</functionality>
<extensibility>
- Adding a new provider (like OpenAI) requires only creating one new file
- No changes needed to existing code when adding providers
- Provider architecture is documented and easy to follow
- Code is organized and maintainable
</extensibility>
</success_criteria>
</project_specification>

681
prompts/app_spec_model.txt Normal file
View File

@@ -0,0 +1,681 @@
<project_specification>
<project_name>Claude.ai Clone - AI Chat Interface</project_name>
<overview>
Build a fully functional clone of claude.ai, Anthropic's conversational AI interface. The application should
provide a clean, modern chat interface for interacting with Claude via the API, including features like
conversation management, artifact rendering, project organization, multiple model selection, and advanced
settings. The UI should closely match claude.ai's design using Tailwind CSS with a focus on excellent
user experience and responsive design.
</overview>
<technology_stack>
<api_key>
You can use an API key located at /tmp/api-key for testing. You will not be allowed to read this file, but you can reference it in code.
</api_key>
<frontend>
<framework>React with Vite</framework>
<styling>Tailwind CSS (via CDN)</styling>
<state_management>React hooks and context</state_management>
<routing>React Router for navigation</routing>
<markdown>React Markdown for message rendering</markdown>
<code_highlighting>Syntax highlighting for code blocks</code_highlighting>
<port>Only launch on port {frontend_port}</port>
</frontend>
<backend>
<runtime>Node.js with Express</runtime>
<database>SQLite with better-sqlite3</database>
<api_integration>Claude API for chat completions</api_integration>
<streaming>Server-Sent Events for streaming responses</streaming>
</backend>
<communication>
<api>RESTful endpoints</api>
<streaming>SSE for real-time message streaming</streaming>
<claude_api>Integration with Claude API using Anthropic SDK</claude_api>
</communication>
</technology_stack>
<prerequisites>
<environment_setup>
- Repository includes .env with VITE_ANTHROPIC_API_KEY configured
- Frontend dependencies pre-installed via pnpm
- Backend code goes in /server directory
- Install backend dependencies as needed
</environment_setup>
</prerequisites>
<core_features>
<chat_interface>
- Clean, centered chat layout with message bubbles
- Streaming message responses with typing indicator
- Markdown rendering with proper formatting
- Code blocks with syntax highlighting and copy button
- LaTeX/math equation rendering
- Image upload and display in messages
- Multi-turn conversations with context
- Message editing and regeneration
- Stop generation button during streaming
- Input field with auto-resize textarea
- Character count and token estimation
- Keyboard shortcuts (Enter to send, Shift+Enter for newline)
</chat_interface>
<artifacts>
- Artifact detection and rendering in side panel
- Code artifact viewer with syntax highlighting
- HTML/SVG preview with live rendering
- React component preview
- Mermaid diagram rendering
- Text document artifacts
- Artifact editing and re-prompting
- Full-screen artifact view
- Download artifact content
- Artifact versioning and history
</artifacts>
<conversation_management>
- Create new conversations
- Conversation list in sidebar
- Rename conversations
- Delete conversations
- Search conversations by title/content
- Pin important conversations
- Archive conversations
- Conversation folders/organization
- Duplicate conversation
- Export conversation (JSON, Markdown, PDF)
- Conversation timestamps (created, last updated)
- Unread message indicators
</conversation_management>
<projects>
- Create projects to group related conversations
- Project knowledge base (upload documents)
- Project-specific custom instructions
- Share projects with team (mock feature)
- Project settings and configuration
- Move conversations between projects
- Project templates
- Project analytics (usage stats)
</projects>
<model_selection>
- Model selector dropdown with the following models:
- Claude Sonnet 4.5 (claude-sonnet-4-5-20250929) - default
- Claude Haiku 4.5 (claude-haiku-4-5-20251001)
- Claude Opus 4.1 (claude-opus-4-1-20250805)
- Model capabilities display
- Context window indicator
- Model-specific pricing info (display only)
- Switch models mid-conversation
- Model comparison view
</model_selection>
<custom_instructions>
- Global custom instructions
- Project-specific custom instructions
- Conversation-specific system prompts
- Custom instruction templates
- Preview how instructions affect responses
</custom_instructions>
<settings_preferences>
- Theme selection (Light, Dark, Auto)
- Font size adjustment
- Message density (compact, comfortable, spacious)
- Code theme selection
- Language preferences
- Accessibility options
- Keyboard shortcuts reference
- Data export options
- Privacy settings
- API key management
</settings_preferences>
<advanced_features>
- Temperature control slider
- Max tokens adjustment
- Top-p (nucleus sampling) control
- System prompt override
- Thinking/reasoning mode toggle
- Multi-modal input (text + images)
- Voice input (optional, mock UI)
- Response suggestions
- Related prompts
- Conversation branching
</advanced_features>
<collaboration>
- Share conversation via link (read-only)
- Export conversation formats
- Conversation templates
- Prompt library
- Share artifacts
- Team workspaces (mock UI)
</collaboration>
<search_discovery>
- Search across all conversations
- Filter by project, date, model
- Prompt library with categories
- Example conversations
- Quick actions menu
- Command palette (Cmd/Ctrl+K)
</search_discovery>
<usage_tracking>
- Token usage display per message
- Conversation cost estimation
- Daily/monthly usage dashboard
- Usage limits and warnings
- API quota tracking
</usage_tracking>
<onboarding>
- Welcome screen for new users
- Feature tour highlights
- Example prompts to get started
- Quick tips and best practices
- Keyboard shortcuts tutorial
</onboarding>
<accessibility>
- Full keyboard navigation
- Screen reader support
- ARIA labels and roles
- High contrast mode
- Focus management
- Reduced motion support
</accessibility>
<responsive_design>
- Mobile-first responsive layout
- Touch-optimized interface
- Collapsible sidebar on mobile
- Swipe gestures for navigation
- Adaptive artifact display
- Progressive Web App (PWA) support
</responsive_design>
</core_features>
<database_schema>
<tables>
<users>
- id, email, name, avatar_url
- created_at, last_login
- preferences (JSON: theme, font_size, etc.)
- custom_instructions
</users>
<projects>
- id, user_id, name, description, color
- custom_instructions, knowledge_base_path
- created_at, updated_at
- is_archived, is_pinned
</projects>
<conversations>
- id, user_id, project_id, title
- model, created_at, updated_at, last_message_at
- is_archived, is_pinned, is_deleted
- settings (JSON: temperature, max_tokens, etc.)
- token_count, message_count
</conversations>
<messages>
- id, conversation_id, role (user/assistant/system)
- content, created_at, edited_at
- tokens, finish_reason
- images (JSON array of image data)
- parent_message_id (for branching)
</messages>
<artifacts>
- id, message_id, conversation_id
- type (code/html/svg/react/mermaid/text)
- title, identifier, language
- content, version
- created_at, updated_at
</artifacts>
<shared_conversations>
- id, conversation_id, share_token
- created_at, expires_at, view_count
- is_public
</shared_conversations>
<prompt_library>
- id, user_id, title, description
- prompt_template, category, tags (JSON)
- is_public, usage_count
- created_at, updated_at
</prompt_library>
<conversation_folders>
- id, user_id, project_id, name, parent_folder_id
- created_at, position
</conversation_folders>
<conversation_folder_items>
- id, folder_id, conversation_id
</conversation_folder_items>
<usage_tracking>
- id, user_id, conversation_id, message_id
- model, input_tokens, output_tokens
- cost_estimate, created_at
</usage_tracking>
<api_keys>
- id, user_id, key_name, api_key_hash
- created_at, last_used_at
- is_active
</api_keys>
</tables>
</database_schema>
<api_endpoints_summary>
<authentication>
- POST /api/auth/login
- POST /api/auth/logout
- GET /api/auth/me
- PUT /api/auth/profile
</authentication>
<conversations>
- GET /api/conversations
- POST /api/conversations
- GET /api/conversations/:id
- PUT /api/conversations/:id
- DELETE /api/conversations/:id
- POST /api/conversations/:id/duplicate
- POST /api/conversations/:id/export
- PUT /api/conversations/:id/archive
- PUT /api/conversations/:id/pin
- POST /api/conversations/:id/branch
</conversations>
<messages>
- GET /api/conversations/:id/messages
- POST /api/conversations/:id/messages
- PUT /api/messages/:id
- DELETE /api/messages/:id
- POST /api/messages/:id/regenerate
- GET /api/messages/stream (SSE endpoint)
</messages>
<artifacts>
- GET /api/conversations/:id/artifacts
- GET /api/artifacts/:id
- PUT /api/artifacts/:id
- DELETE /api/artifacts/:id
- POST /api/artifacts/:id/fork
- GET /api/artifacts/:id/versions
</artifacts>
<projects>
- GET /api/projects
- POST /api/projects
- GET /api/projects/:id
- PUT /api/projects/:id
- DELETE /api/projects/:id
- POST /api/projects/:id/knowledge
- GET /api/projects/:id/conversations
- PUT /api/projects/:id/settings
</projects>
<sharing>
- POST /api/conversations/:id/share
- GET /api/share/:token
- DELETE /api/share/:token
- PUT /api/share/:token/settings
</sharing>
<prompts>
- GET /api/prompts/library
- POST /api/prompts/library
- GET /api/prompts/:id
- PUT /api/prompts/:id
- DELETE /api/prompts/:id
- GET /api/prompts/categories
- GET /api/prompts/examples
</prompts>
<search>
- GET /api/search/conversations?q=query
- GET /api/search/messages?q=query
- GET /api/search/artifacts?q=query
- GET /api/search/prompts?q=query
</search>
<folders>
- GET /api/folders
- POST /api/folders
- PUT /api/folders/:id
- DELETE /api/folders/:id
- POST /api/folders/:id/items
- DELETE /api/folders/:id/items/:conversationId
</folders>
<usage>
- GET /api/usage/daily
- GET /api/usage/monthly
- GET /api/usage/by-model
- GET /api/usage/conversations/:id
</usage>
<settings>
- GET /api/settings
- PUT /api/settings
- GET /api/settings/custom-instructions
- PUT /api/settings/custom-instructions
</settings>
<claude_api>
- POST /api/claude/chat (proxy to Claude API)
- POST /api/claude/chat/stream (streaming proxy)
- GET /api/claude/models
- POST /api/claude/images/upload
</claude_api>
</api_endpoints_summary>
<ui_layout>
<main_structure>
- Three-column layout: sidebar (conversations), main (chat), panel (artifacts)
- Collapsible sidebar with resize handle
- Responsive breakpoints: mobile (single column), tablet (two column), desktop (three column)
- Persistent header with project/model selector
- Bottom input area with send button and options
</main_structure>
<sidebar_left>
- New chat button (prominent)
- Project selector dropdown
- Search conversations input
- Conversations list (grouped by date: Today, Yesterday, Previous 7 days, etc.)
- Folder tree view (collapsible)
- Settings gear icon at bottom
- User profile at bottom
</sidebar_left>
<main_chat_area>
- Conversation title (editable inline)
- Model selector badge
- Message history (scrollable)
- Welcome screen for new conversations
- Suggested prompts (empty state)
- Input area with formatting toolbar
- Attachment button for images
- Send button with loading state
- Stop generation button
</main_chat_area>
<artifacts_panel>
- Artifact header with title and type badge
- Code editor or preview pane
- Tabs for multiple artifacts
- Full-screen toggle
- Download button
- Edit/Re-prompt button
- Version selector
- Close panel button
</artifacts_panel>
<modals_overlays>
- Settings modal (tabbed interface)
- Share conversation modal
- Export options modal
- Project settings modal
- Prompt library modal
- Command palette overlay
- Keyboard shortcuts reference
</modals_overlays>
</ui_layout>
<design_system>
<color_palette>
- Primary: Orange/amber accent (#CC785C claude-style)
- Background: White (light mode), Dark gray (#1A1A1A dark mode)
- Surface: Light gray (#F5F5F5 light), Darker gray (#2A2A2A dark)
- Text: Near black (#1A1A1A light), Off-white (#E5E5E5 dark)
- Borders: Light gray (#E5E5E5 light), Dark gray (#404040 dark)
- Code blocks: Monaco editor theme
</color_palette>
<typography>
- Sans-serif system font stack (Inter, SF Pro, Roboto, system-ui)
- Headings: font-semibold
- Body: font-normal, leading-relaxed
- Code: Monospace (JetBrains Mono, Consolas, Monaco)
- Message text: text-base (16px), comfortable line-height
</typography>
<components>
<message_bubble>
- User messages: Right-aligned, subtle background
- Assistant messages: Left-aligned, no background
- Markdown formatting with proper spacing
- Inline code with bg-gray-100 background
- Code blocks with syntax highlighting
- Copy button on code blocks
</message_bubble>
<buttons>
- Primary: Orange/amber background, white text, rounded
- Secondary: Border style with hover fill
- Icon buttons: Square with hover background
- Disabled state: Reduced opacity, no pointer events
</buttons>
<inputs>
- Rounded borders with focus ring
- Textarea auto-resize
- Placeholder text in gray
- Error states in red
- Character counter
</inputs>
<cards>
- Subtle border or shadow
- Rounded corners (8px)
- Padding: p-4 to p-6
- Hover state: slight shadow increase
</cards>
</components>
<animations>
- Smooth transitions (150-300ms)
- Fade in for new messages
- Slide in for sidebar
- Typing indicator animation
- Loading spinner for generation
- Skeleton loaders for content
</animations>
</design_system>
<key_interactions>
<message_flow>
1. User types message in input field
2. Optional: Attach images via button
3. Click send or press Enter
4. Message appears in chat immediately
5. Typing indicator shows while waiting
6. Response streams in word by word
7. Code blocks render with syntax highlighting
8. Artifacts detected and rendered in side panel
9. Message complete, enable regenerate option
</message_flow>
<artifact_flow>
1. Assistant generates artifact in response
2. Artifact panel slides in from right
3. Content renders (code with highlighting or live preview)
4. User can edit artifact inline
5. "Re-prompt" button to iterate with Claude
6. Download or copy artifact content
7. Full-screen mode for detailed work
8. Close panel to return to chat focus
</artifact_flow>
<conversation_management>
1. Click "New Chat" to start fresh conversation
2. Conversations auto-save with first message
3. Auto-generate title from first exchange
4. Click title to rename inline
5. Drag conversations into folders
6. Right-click for context menu (pin, archive, delete, export)
7. Search filters conversations in real-time
8. Click conversation to switch context
</conversation_management>
</key_interactions>
<implementation_steps>
<step number="1">
<title>Setup Project Foundation and Database</title>
<tasks>
- Initialize Express server with SQLite database
- Set up Claude API client with streaming support
- Create database schema with migrations
- Implement authentication endpoints
- Set up basic CORS and middleware
- Create health check endpoint
</tasks>
</step>
<step number="2">
<title>Build Core Chat Interface</title>
<tasks>
- Create main layout with sidebar and chat area
- Implement message display with markdown rendering
- Add streaming message support with SSE
- Build input area with auto-resize textarea
- Add code block syntax highlighting
- Implement stop generation functionality
- Add typing indicators and loading states
</tasks>
</step>
<step number="3">
<title>Conversation Management</title>
<tasks>
- Create conversation list in sidebar
- Implement new conversation creation
- Add conversation switching
- Build conversation rename functionality
- Implement delete with confirmation
- Add conversation search
- Create conversation grouping by date
</tasks>
</step>
<step number="4">
<title>Artifacts System</title>
<tasks>
- Build artifact detection from Claude responses
- Create artifact rendering panel
- Implement code artifact viewer
- Add HTML/SVG live preview
- Build artifact editing interface
- Add artifact versioning
- Implement full-screen artifact view
</tasks>
</step>
<step number="5">
<title>Projects and Organization</title>
<tasks>
- Create projects CRUD endpoints
- Build project selector UI
- Implement project-specific custom instructions
- Add folder system for conversations
- Create drag-and-drop organization
- Build project settings panel
</tasks>
</step>
<step number="6">
<title>Advanced Features</title>
<tasks>
- Add model selection dropdown
- Implement temperature and parameter controls
- Build image upload functionality
- Create message editing and regeneration
- Add conversation branching
- Implement export functionality
</tasks>
</step>
<step number="7">
<title>Settings and Customization</title>
<tasks>
- Build settings modal with tabs
- Implement theme switching (light/dark)
- Add custom instructions management
- Create keyboard shortcuts
- Build prompt library
- Add usage tracking dashboard
</tasks>
</step>
<step number="8">
<title>Sharing and Collaboration</title>
<tasks>
- Implement conversation sharing with tokens
- Create public share view
- Add export to multiple formats
- Build prompt templates
- Create example conversations
</tasks>
</step>
<step number="9">
<title>Polish and Optimization</title>
<tasks>
- Optimize for mobile responsiveness
- Add command palette (Cmd+K)
- Implement comprehensive keyboard navigation
- Add onboarding flow
- Create accessibility improvements
- Performance optimization and caching
</tasks>
</step>
</implementation_steps>
<success_criteria>
<functionality>
- Streaming chat responses work smoothly
- Artifact detection and rendering accurate
- Conversation management intuitive and reliable
- Project organization clear and useful
- Image upload and display working
- All CRUD operations functional
</functionality>
<user_experience>
- Interface matches claude.ai design language
- Responsive on all device sizes
- Smooth animations and transitions
- Fast response times and minimal lag
- Intuitive navigation and workflows
- Clear feedback for all actions
</user_experience>
<technical_quality>
- Clean, maintainable code structure
- Proper error handling throughout
- Secure API key management
- Optimized database queries
- Efficient streaming implementation
- Comprehensive testing coverage
</technical_quality>
<design_polish>
- Consistent with claude.ai visual design
- Beautiful typography and spacing
- Smooth animations and micro-interactions
- Excellent contrast and accessibility
- Professional, polished appearance
- Dark mode fully implemented
</design_polish>
</success_criteria>
</project_specification>

View File

@@ -1,134 +0,0 @@
<project_specification>
<project_name>Votre Nom d'Application</project_name>
<overview>
Description complète de votre application. Expliquez :
- Ce que fait l'application
- Qui sont les utilisateurs cibles
- Les objectifs principaux
- Les fonctionnalités clés en quelques phrases
</overview>
<technology_stack>
<api_key>
Note: Vous pouvez utiliser une clé API située à /tmp/api-key pour les tests.
Vous ne pourrez pas lire ce fichier, mais vous pouvez le référencer dans le code.
</api_key>
<frontend>
<framework>React avec Vite</framework>
<styling>Tailwind CSS (via CDN)</styling>
<state_management>React hooks et context</state_management>
<routing>React Router pour la navigation</routing>
<port>Lancer uniquement sur le port {frontend_port}</port>
</frontend>
<backend>
<runtime>Node.js avec Express</runtime>
<database>SQLite avec better-sqlite3</database>
<api_integration>Intégration avec les APIs nécessaires</api_integration>
<streaming>Server-Sent Events pour les réponses en temps réel (si nécessaire)</streaming>
</backend>
<communication>
<api>Endpoints RESTful</api>
<streaming>SSE pour le streaming en temps réel (si nécessaire)</streaming>
</communication>
</technology_stack>
<prerequisites>
<environment_setup>
- Repository inclut .env avec les clés API configurées
- Dépendances frontend pré-installées via npm/pnpm
- Code backend dans le répertoire /server
- Installer les dépendances backend au besoin
</environment_setup>
</prerequisites>
<core_features>
<!--
IMPORTANT: Créez une balise <feature_X> pour chaque fonctionnalité.
L'agent initializer créera une issue Linear pour chaque feature.
Utilisez des numéros séquentiels: feature_1, feature_2, feature_3, etc.
-->
<feature_1>
<title>Nom de la fonctionnalité 1</title>
<description>
Description détaillée de ce que fait cette fonctionnalité.
Incluez les détails techniques importants, les cas d'usage, et les
interactions avec d'autres parties de l'application.
</description>
<priority>1</priority>
<category>frontend</category>
<test_steps>
1. Étape de test 1 - décrire l'action
2. Étape de test 2 - décrire l'action
3. Étape de test 3 - vérifier le résultat attendu
4. Étape de test 4 - tester les cas d'erreur
</test_steps>
</feature_1>
<feature_2>
<title>Nom de la fonctionnalité 2</title>
<description>
Description de la fonctionnalité 2...
</description>
<priority>2</priority>
<category>backend</category>
<test_steps>
1. Étape de test 1
2. Étape de test 2
</test_steps>
</feature_2>
<!--
Continuez à ajouter des features...
L'agent créera environ 50 issues au total, donc détaillez bien vos fonctionnalités
en les divisant en sous-fonctionnalités si nécessaire.
-->
</core_features>
<ui_design>
<!--
Optionnel: Décrivez le design UI si nécessaire
- Layout général
- Couleurs et thème
- Composants réutilisables
- Responsive design
-->
</ui_design>
<api_endpoints>
<!--
Optionnel: Liste des endpoints API si nécessaire
<endpoint>
<method>POST</method>
<path>/api/users</path>
<description>Créer un nouvel utilisateur</description>
<request_body>JSON avec email, password</request_body>
<response>JSON avec user_id, email</response>
</endpoint>
-->
</api_endpoints>
<database_schema>
<!--
Optionnel: Schéma de base de données si nécessaire
<table>
<name>users</name>
<columns>
<column>id INTEGER PRIMARY KEY</column>
<column>email TEXT UNIQUE</column>
<column>password_hash TEXT</column>
<column>created_at DATETIME</column>
</columns>
</table>
-->
</database_schema>
<deployment>
<!--
Optionnel: Instructions de déploiement si nécessaire
-->
</deployment>
</project_specification>

View File

@@ -1,403 +0,0 @@
<project_specification>
<project_name>Claude.ai Clone - Advanced Theme Customization</project_name>
<overview>
This specification adds advanced theme customization features to the Claude.ai clone application.
Users will be able to customize accent colors, font sizes, message spacing, and choose from
preset color themes. All changes are additive and backward-compatible with existing theme functionality.
The existing light/dark mode toggle remains unchanged and functional.
</overview>
<safety_requirements>
<critical>
- DO NOT modify existing light/dark mode functionality
- DO NOT break existing theme persistence
- DO NOT change existing CSS classes without ensuring backward compatibility
- All new theme options must be optional (defaults should match current behavior)
- Test thoroughly to ensure existing themes still work
- Maintain backward compatibility at all times
- New theme preferences should be stored separately from existing theme settings
</critical>
</safety_requirements>
<new_features>
<feature_6_theme_customization>
<title>Advanced Theme Customization</title>
<description>
Add advanced theme customization options. Users should be able to:
- Customize accent colors (beyond just light/dark mode)
- Choose from preset color themes (blue, green, purple, orange)
- Adjust font size globally (small, medium, large)
- Adjust message spacing (compact, comfortable, spacious)
- Preview theme changes before applying
- Save custom theme preferences
The customization interface should be intuitive and provide real-time preview
of changes before they are applied. All preferences should persist across sessions.
</description>
<priority>3</priority>
<category>style</category>
<implementation_approach>
- Create a new "Appearance" or "Theme" section in settings
- Add accent color picker with preset options (blue, green, purple, orange)
- Add font size slider/selector (small, medium, large)
- Add message spacing selector (compact, comfortable, spacious)
- Implement preview functionality that shows changes in real-time
- Store theme preferences in localStorage or backend (user preferences)
- Apply theme using CSS custom properties (CSS variables)
- Ensure theme works with both light and dark modes
</implementation_approach>
<test_steps>
1. Open settings menu
2. Navigate to "Appearance" or "Theme" section
3. Select a different accent color (e.g., green)
4. Verify accent color changes are visible in preview
5. Adjust font size slider to "large"
6. Verify font size changes in preview
7. Adjust message spacing option to "spacious"
8. Verify spacing changes in preview
9. Click "Preview" to see changes applied temporarily
10. Click "Apply" to save changes permanently
11. Verify changes persist after page refresh
12. Test with both light and dark mode
13. Test reset to default theme
14. Verify existing conversations display correctly with new theme
</test_steps>
</feature_6_theme_customization>
<feature_accent_colors>
<title>Accent Color Customization</title>
<description>
Allow users to customize the accent color used throughout the application.
This includes:
- Primary button colors
- Link colors
- Focus states
- Active states
- Selection highlights
- Progress indicators
Preset options:
- Blue (default, matches Claude.ai)
- Green
- Purple
- Orange
Users should be able to see a preview of each color before applying.
</description>
<priority>3</priority>
<category>style</category>
<implementation_approach>
- Define accent colors as CSS custom properties
- Create color palette for each preset (light and dark variants)
- Add color picker UI component in settings
- Update all accent color usages to use CSS variables
- Ensure colors have proper contrast ratios for accessibility
- Store selected accent color in user preferences
</implementation_approach>
<test_steps>
1. Open theme settings
2. Select "Green" accent color
3. Verify buttons, links, and highlights use green
4. Switch to dark mode and verify green accent still works
5. Test all preset colors (blue, green, purple, orange)
6. Verify color persists after refresh
7. Test accessibility (contrast ratios)
</test_steps>
</feature_accent_colors>
<feature_font_size>
<title>Global Font Size Adjustment</title>
<description>
Allow users to adjust the global font size for better readability.
Options:
- Small (12px base)
- Medium (14px base, default)
- Large (16px base)
Font size should scale proportionally across all text elements:
- Message text
- UI labels
- Input fields
- Buttons
- Sidebar text
</description>
<priority>3</priority>
<category>style</category>
<implementation_approach>
- Use CSS rem units for all font sizes
- Set base font size on root element
- Create font size presets (small, medium, large)
- Add font size selector in settings
- Store preference in user settings
- Ensure responsive design still works with different font sizes
</implementation_approach>
<test_steps>
1. Open theme settings
2. Select "Small" font size
3. Verify all text is smaller throughout the app
4. Select "Large" font size
5. Verify all text is larger throughout the app
6. Verify layout doesn't break with different font sizes
7. Test with long messages to ensure wrapping works
8. Verify preference persists after refresh
</test_steps>
</feature_font_size>
<feature_message_spacing>
<title>Message Spacing Customization</title>
<description>
Allow users to adjust the spacing between messages and within message bubbles.
Options:
- Compact: Minimal spacing (for users who prefer dense layouts)
- Comfortable: Default spacing (current behavior)
- Spacious: Increased spacing (for better readability)
This affects:
- Vertical spacing between messages
- Padding within message bubbles
- Spacing between message elements (avatar, text, timestamp)
</description>
<priority>3</priority>
<category>style</category>
<implementation_approach>
- Define spacing scale using CSS custom properties
- Create spacing presets (compact, comfortable, spacious)
- Apply spacing to message containers and bubbles
- Add spacing selector in settings
- Store preference in user settings
- Ensure spacing works well with different font sizes
</implementation_approach>
<test_steps>
1. Open theme settings
2. Select "Compact" spacing
3. Verify messages are closer together
4. Select "Spacious" spacing
5. Verify messages have more space between them
6. Test with long conversations to ensure scrolling works
7. Verify spacing preference persists after refresh
8. Test with different font sizes to ensure compatibility
</test_steps>
</feature_message_spacing>
<feature_theme_preview>
<title>Theme Preview Functionality</title>
<description>
Allow users to preview theme changes before applying them permanently.
The preview should:
- Show a sample conversation with the new theme applied
- Update in real-time as settings are changed
- Allow users to cancel and revert to previous theme
- Show both light and dark mode previews if applicable
Users should be able to:
- See preview immediately when changing settings
- Click "Apply" to save changes
- Click "Cancel" to discard changes
- Click "Reset" to return to default theme
</description>
<priority>3</priority>
<category>functional</category>
<implementation_approach>
- Create preview component showing sample conversation
- Apply theme changes temporarily to preview
- Store original theme state for cancel functionality
- Update preview in real-time as settings change
- Only persist changes when "Apply" is clicked
- Show clear visual feedback for preview vs. applied state
</implementation_approach>
<test_steps>
1. Open theme settings
2. Change accent color to green
3. Verify preview updates immediately
4. Change font size to large
5. Verify preview updates with new font size
6. Click "Cancel" and verify changes are reverted
7. Make changes again and click "Apply"
8. Verify changes are saved and applied to actual interface
9. Test preview with both light and dark mode
</test_steps>
</feature_theme_preview>
</new_features>
<implementation_notes>
<code_structure>
frontend/
components/
ThemeSettings.jsx # New theme customization UI (NEW)
ThemePreview.jsx # Preview component (NEW)
styles/
theme-variables.css # CSS custom properties for themes (NEW)
accent-colors.css # Accent color definitions (NEW)
hooks/
useTheme.js # Updated to handle new theme options
utils/
themeStorage.js # Theme preference persistence (NEW)
</code_structure>
<css_architecture>
Use CSS custom properties (CSS variables) for all theme values:
- --accent-color-primary
- --accent-color-hover
- --font-size-base
- --message-spacing-vertical
- --message-padding
This allows easy theme switching without JavaScript manipulation.
</css_architecture>
<storage_approach>
Store theme preferences in:
- localStorage for client-side persistence
- Or backend user preferences table if available
Structure:
{
accentColor: 'blue' | 'green' | 'purple' | 'orange',
fontSize: 'small' | 'medium' | 'large',
messageSpacing: 'compact' | 'comfortable' | 'spacious',
theme: 'light' | 'dark' (existing)
}
</storage_approach>
<safety_guidelines>
- Keep existing theme functionality intact
- Default values should match current behavior
- Use feature detection for new theme features
- Gracefully degrade if CSS custom properties not supported
- Test with existing conversations and UI elements
- Ensure accessibility standards are maintained
</safety_guidelines>
</implementation_notes>
<ui_components>
<theme_settings_panel>
<description>Settings panel for theme customization</description>
<sections>
- Accent Color: Radio buttons or color swatches for preset colors
- Font Size: Slider or dropdown (small, medium, large)
- Message Spacing: Radio buttons (compact, comfortable, spacious)
- Preview: Live preview of theme changes
- Actions: Apply, Cancel, Reset buttons
</sections>
</theme_settings_panel>
<theme_preview>
<description>Preview component showing sample conversation</description>
<elements>
- Sample user message
- Sample AI response
- Shows current accent color
- Shows current font size
- Shows current spacing
- Updates in real-time
</elements>
</theme_preview>
</ui_components>
<css_custom_properties>
<accent_colors>
Define CSS variables for each accent color preset:
--accent-blue: #2563eb;
--accent-green: #10b981;
--accent-purple: #8b5cf6;
--accent-orange: #f59e0b;
Each should have hover, active, and focus variants.
</accent_colors>
<font_sizes>
Define base font sizes:
--font-size-small: 0.75rem; (12px)
--font-size-medium: 0.875rem; (14px, default)
--font-size-large: 1rem; (16px)
</font_sizes>
<spacing>
Define spacing scales:
--spacing-compact: 0.5rem;
--spacing-comfortable: 1rem; (default)
--spacing-spacious: 1.5rem;
</spacing>
</css_custom_properties>
<api_endpoints>
<if_backend_storage>
If storing preferences in backend:
- GET /api/user/preferences - Get user theme preferences
- PUT /api/user/preferences - Update user theme preferences
- GET /api/user/preferences/theme - Get theme preferences only
</if_backend_storage>
<note>
If using localStorage only, no API endpoints needed.
Backend storage is optional but recommended for multi-device sync.
</note>
</api_endpoints>
<accessibility_requirements>
- All accent colors must meet WCAG AA contrast ratios (4.5:1 for text)
- Font size changes must not break screen reader compatibility
- Theme settings must be keyboard navigable
- Color choices should not be the only way to convey information
- Provide high contrast mode option if possible
</accessibility_requirements>
<testing_requirements>
<regression_tests>
- Verify existing light/dark mode toggle still works
- Verify existing theme persistence still works
- Test that default theme matches current behavior
- Verify existing conversations display correctly
- Test that all UI elements are styled correctly
</regression_tests>
<feature_tests>
- Test each accent color preset
- Test each font size option
- Test each spacing option
- Test theme preview functionality
- Test theme persistence (localStorage/backend)
- Test theme reset to defaults
- Test theme with both light and dark modes
- Test theme changes in real-time
</feature_tests>
<compatibility_tests>
- Test with different browsers (Chrome, Firefox, Safari, Edge)
- Test with different screen sizes (responsive design)
- Test with long conversations
- Test with different message types (text, code, artifacts)
- Test accessibility with screen readers
</compatibility_tests>
</testing_requirements>
<success_criteria>
<functionality>
- Users can customize accent colors from preset options
- Users can adjust global font size (small, medium, large)
- Users can adjust message spacing (compact, comfortable, spacious)
- Theme preview shows changes in real-time
- Theme preferences persist across sessions
- Existing light/dark mode functionality works unchanged
- All theme options work together harmoniously
</functionality>
<user_experience>
- Theme customization is intuitive and easy to use
- Preview provides clear feedback before applying changes
- Changes apply smoothly without flickering
- Settings are easy to find and access
- Reset to defaults is easily accessible
</user_experience>
<technical>
- Code is well-organized and maintainable
- CSS custom properties are used consistently
- Theme preferences are stored reliably
- No performance degradation with theme changes
- Backward compatibility is maintained
</technical>
</success_criteria>
</project_specification>

View File

@@ -0,0 +1,679 @@
<project_specification>
<project_name>Library RAG - Type Safety & Documentation Enhancement</project_name>
<overview>
Enhance the Library RAG application (philosophical texts indexing and semantic search) by adding
strict type annotations and comprehensive Google-style docstrings to all Python modules. This will
improve code maintainability, enable static type checking with mypy, and provide clear documentation
for all functions, classes, and modules.
The application is a RAG pipeline that processes PDF documents through OCR, LLM-based extraction,
semantic chunking, and ingestion into Weaviate vector database. It includes a Flask web interface
for document upload, processing, and semantic search.
</overview>
<technology_stack>
<backend>
<runtime>Python 3.10+</runtime>
<web_framework>Flask 3.0</web_framework>
<vector_database>Weaviate 1.34.4 with text2vec-transformers</vector_database>
<ocr>Mistral OCR API</ocr>
<llm>Ollama (local) or Mistral API</llm>
<type_checking>mypy with strict configuration</type_checking>
</backend>
<infrastructure>
<containerization>Docker Compose (Weaviate + transformers)</containerization>
<dependencies>weaviate-client, flask, mistralai, python-dotenv</dependencies>
</infrastructure>
</technology_stack>
<current_state>
<project_structure>
- flask_app.py: Main Flask application (640 lines)
- schema.py: Weaviate schema definition (383 lines)
- utils/: 16+ modules for PDF processing pipeline
- pdf_pipeline.py: Main orchestration (879 lines)
- mistral_client.py: OCR API client
- ocr_processor.py: OCR processing
- markdown_builder.py: Markdown generation
- llm_metadata.py: Metadata extraction via LLM
- llm_toc.py: Table of contents extraction
- llm_classifier.py: Section classification
- llm_chunker.py: Semantic chunking
- llm_cleaner.py: Chunk cleaning
- llm_validator.py: Document validation
- weaviate_ingest.py: Database ingestion
- hierarchy_parser.py: Document hierarchy parsing
- image_extractor.py: Image extraction from PDFs
- toc_extractor*.py: Various TOC extraction methods
- templates/: Jinja2 templates for Flask UI
- tests/utils2/: Minimal test coverage (3 test files)
</project_structure>
<issues>
- Inconsistent type annotations across modules (some have partial types, many have none)
- Missing or incomplete docstrings (no Google-style format)
- No mypy configuration for strict type checking
- Type hints missing on function parameters and return values
- Dict[str, Any] used extensively without proper typing
- No type stubs for complex nested structures
</issues>
</current_state>
<core_features>
<type_annotations>
<strict_typing>
- Add complete type annotations to ALL functions and methods
- Use proper generic types (List, Dict, Optional, Union) from typing module
- Add TypedDict for complex dictionary structures
- Add Protocol types for duck-typed interfaces
- Use Literal types for string constants
- Add ParamSpec and TypeVar where appropriate
- Type all class attributes and instance variables
- Add type annotations to lambda functions where possible
</strict_typing>
<mypy_configuration>
- Create mypy.ini with strict configuration
- Enable: check_untyped_defs, disallow_untyped_defs, disallow_incomplete_defs
- Enable: disallow_untyped_calls, disallow_untyped_decorators
- Enable: warn_return_any, warn_redundant_casts
- Enable: strict_equality, strict_optional
- Set python_version to 3.10
- Configure per-module overrides if needed for gradual migration
</mypy_configuration>
<type_stubs>
- Create TypedDict definitions for common data structures:
- OCR response structures
- Metadata dictionaries
- TOC entries
- Chunk objects
- Weaviate objects
- Pipeline results
- Add NewType for semantic type safety (DocumentName, ChunkId, etc.)
- Create Protocol types for callback functions
</type_stubs>
<specific_improvements>
- pdf_pipeline.py: Type all 10 pipeline steps, callbacks, result dictionaries
- flask_app.py: Type all route handlers, request/response types
- schema.py: Type Weaviate configuration objects
- llm_*.py: Type LLM request/response structures
- mistral_client.py: Type API client methods and responses
- weaviate_ingest.py: Type ingestion functions and batch operations
</specific_improvements>
</type_annotations>
<documentation>
<google_style_docstrings>
- Add comprehensive Google-style docstrings to ALL:
- Module-level docstrings explaining purpose and usage
- Class docstrings with Attributes section
- Function/method docstrings with Args, Returns, Raises sections
- Complex algorithm explanations with Examples section
- Include code examples for public APIs
- Document all exceptions that can be raised
- Add Notes section for important implementation details
- Add See Also section for related functions
</google_style_docstrings>
<module_documentation>
<utils_modules>
- pdf_pipeline.py: Document the 10-step pipeline, each step's purpose
- mistral_client.py: Document OCR API usage, cost calculation
- llm_metadata.py: Document metadata extraction logic
- llm_toc.py: Document TOC extraction strategies
- llm_classifier.py: Document section classification types
- llm_chunker.py: Document semantic vs basic chunking
- llm_cleaner.py: Document cleaning rules and validation
- llm_validator.py: Document validation criteria
- weaviate_ingest.py: Document ingestion process, nested objects
- hierarchy_parser.py: Document hierarchy building algorithm
</utils_modules>
<flask_app>
- Document all routes with request/response examples
- Document SSE (Server-Sent Events) implementation
- Document Weaviate query patterns
- Document upload processing workflow
- Document background job management
</flask_app>
<schema>
- Document Weaviate schema design decisions
- Document each collection's purpose and relationships
- Document nested object structure
- Document vectorization strategy
</schema>
</module_documentation>
<inline_comments>
- Add inline comments for complex logic only (don't over-comment)
- Explain WHY not WHAT (code should be self-documenting)
- Document performance considerations
- Document cost implications (OCR, LLM API calls)
- Document error handling strategies
</inline_comments>
</documentation>
<validation>
<type_checking>
- All modules must pass mypy --strict
- No # type: ignore comments without justification
- CI/CD should run mypy checks
- Type coverage should be 100%
</type_checking>
<documentation_quality>
- All public functions must have docstrings
- All docstrings must follow Google style
- Examples should be executable and tested
- Documentation should be clear and concise
</documentation_quality>
</validation>
</core_features>
<implementation_priority>
<critical_modules>
Priority 1 (Most used, most complex):
1. utils/pdf_pipeline.py - Main orchestration
2. flask_app.py - Web application entry point
3. utils/weaviate_ingest.py - Database operations
4. schema.py - Schema definition
Priority 2 (Core LLM modules):
5. utils/llm_metadata.py
6. utils/llm_toc.py
7. utils/llm_classifier.py
8. utils/llm_chunker.py
9. utils/llm_cleaner.py
10. utils/llm_validator.py
Priority 3 (OCR and parsing):
11. utils/mistral_client.py
12. utils/ocr_processor.py
13. utils/markdown_builder.py
14. utils/hierarchy_parser.py
15. utils/image_extractor.py
Priority 4 (Supporting modules):
16. utils/toc_extractor.py
17. utils/toc_extractor_markdown.py
18. utils/toc_extractor_visual.py
19. utils/llm_structurer.py (legacy)
</critical_modules>
</implementation_priority>
<implementation_steps>
<feature_1>
<title>Setup Type Checking Infrastructure</title>
<description>
Configure mypy with strict settings and create foundational type definitions
</description>
<tasks>
- Create mypy.ini configuration file with strict settings
- Add mypy to requirements.txt or dev dependencies
- Create utils/types.py module for common TypedDict definitions
- Define core types: OCRResponse, Metadata, TOCEntry, ChunkData, PipelineResult
- Add NewType definitions for semantic types: DocumentName, ChunkId, SectionPath
- Create Protocol types for callbacks (ProgressCallback, etc.)
- Document type definitions in utils/types.py module docstring
- Test mypy configuration on a single module to verify settings
</tasks>
<acceptance_criteria>
- mypy.ini exists with strict configuration
- utils/types.py contains all foundational types with docstrings
- mypy runs without errors on utils/types.py
- Type definitions are comprehensive and reusable
</acceptance_criteria>
</feature_1>
<feature_2>
<title>Add Types to PDF Pipeline Orchestration</title>
<description>
Add complete type annotations to pdf_pipeline.py (879 lines, most complex module)
</description>
<tasks>
- Add type annotations to all function signatures in pdf_pipeline.py
- Type the 10-step pipeline: OCR, Markdown, Metadata, TOC, Classify, Chunk, Clean, Validate, Weaviate
- Type progress_callback parameter with Protocol or Callable
- Add TypedDict for pipeline options dictionary
- Add TypedDict for pipeline result dictionary structure
- Type all helper functions (extract_document_metadata_legacy, etc.)
- Add proper return types for process_pdf_v2, process_pdf, process_pdf_bytes
- Fix any mypy errors that arise
- Verify mypy --strict passes on pdf_pipeline.py
</tasks>
<acceptance_criteria>
- All functions in pdf_pipeline.py have complete type annotations
- progress_callback is properly typed with Protocol
- All Dict[str, Any] replaced with TypedDict where appropriate
- mypy --strict pdf_pipeline.py passes with zero errors
- No # type: ignore comments (or justified if absolutely necessary)
</acceptance_criteria>
</feature_2>
<feature_3>
<title>Add Types to Flask Application</title>
<description>
Add complete type annotations to flask_app.py and type all routes
</description>
<tasks>
- Add type annotations to all Flask route handlers
- Type request.args, request.form, request.files usage
- Type jsonify() return values
- Type get_weaviate_client context manager
- Type get_collection_stats, get_all_chunks, search_chunks functions
- Add TypedDict for Weaviate query results
- Type background job processing functions (run_processing_job)
- Type SSE generator function (upload_progress)
- Add type hints for template rendering
- Verify mypy --strict passes on flask_app.py
</tasks>
<acceptance_criteria>
- All Flask routes have complete type annotations
- Request/response types are clear and documented
- Weaviate query functions are properly typed
- SSE generator is correctly typed
- mypy --strict flask_app.py passes with zero errors
</acceptance_criteria>
</feature_3>
<feature_4>
<title>Add Types to Core LLM Modules</title>
<description>
Add complete type annotations to all LLM processing modules (metadata, TOC, classifier, chunker, cleaner, validator)
</description>
<tasks>
- llm_metadata.py: Type extract_metadata function, return structure
- llm_toc.py: Type extract_toc function, TOC hierarchy structure
- llm_classifier.py: Type classify_sections, section types (Literal), validation functions
- llm_chunker.py: Type chunk_section_with_llm, chunk objects
- llm_cleaner.py: Type clean_chunk, is_chunk_valid functions
- llm_validator.py: Type validate_document, validation result structure
- Add TypedDict for LLM request/response structures
- Type provider selection ("ollama" | "mistral" as Literal)
- Type model names with Literal or constants
- Verify mypy --strict passes on all llm_*.py modules
</tasks>
<acceptance_criteria>
- All LLM modules have complete type annotations
- Section types use Literal for type safety
- Provider and model parameters are strongly typed
- LLM request/response structures use TypedDict
- mypy --strict passes on all llm_*.py modules with zero errors
</acceptance_criteria>
</feature_4>
<feature_5>
<title>Add Types to Weaviate and Database Modules</title>
<description>
Add complete type annotations to schema.py and weaviate_ingest.py
</description>
<tasks>
- schema.py: Type Weaviate configuration objects
- schema.py: Type collection property definitions
- weaviate_ingest.py: Type ingest_document function signature
- weaviate_ingest.py: Type delete_document_chunks function
- weaviate_ingest.py: Add TypedDict for Weaviate object structure
- Type batch insertion operations
- Type nested object references (work, document)
- Add proper error types for Weaviate exceptions
- Verify mypy --strict passes on both modules
</tasks>
<acceptance_criteria>
- schema.py has complete type annotations for Weaviate config
- weaviate_ingest.py functions are fully typed
- Nested object structures use TypedDict
- Weaviate client operations are properly typed
- mypy --strict passes on both modules with zero errors
</acceptance_criteria>
</feature_5>
<feature_6>
<title>Add Types to OCR and Parsing Modules</title>
<description>
Add complete type annotations to mistral_client.py, ocr_processor.py, markdown_builder.py, hierarchy_parser.py
</description>
<tasks>
- mistral_client.py: Type create_client, run_ocr, estimate_ocr_cost
- mistral_client.py: Add TypedDict for Mistral API response structures
- ocr_processor.py: Type serialize_ocr_response, OCR object structures
- markdown_builder.py: Type build_markdown, image_writer parameter
- hierarchy_parser.py: Type build_hierarchy, flatten_hierarchy functions
- hierarchy_parser.py: Add TypedDict for hierarchy node structure
- image_extractor.py: Type create_image_writer, image handling
- Verify mypy --strict passes on all modules
</tasks>
<acceptance_criteria>
- All OCR/parsing modules have complete type annotations
- Mistral API structures use TypedDict
- Hierarchy nodes are properly typed
- Image handling functions are typed
- mypy --strict passes on all modules with zero errors
</acceptance_criteria>
</feature_6>
<feature_7>
<title>Add Google-Style Docstrings to Core Modules</title>
<description>
Add comprehensive Google-style docstrings to pdf_pipeline.py, flask_app.py, and weaviate modules
</description>
<tasks>
- pdf_pipeline.py: Add module docstring explaining the V2 pipeline
- pdf_pipeline.py: Add docstrings to process_pdf_v2 with Args, Returns, Raises sections
- pdf_pipeline.py: Document each of the 10 pipeline steps in comments
- pdf_pipeline.py: Add Examples section showing typical usage
- flask_app.py: Add module docstring explaining Flask application
- flask_app.py: Document all routes with request/response examples
- flask_app.py: Document Weaviate connection management
- schema.py: Add module docstring explaining schema design
- schema.py: Document each collection's purpose and relationships
- weaviate_ingest.py: Document ingestion process with examples
- All docstrings must follow Google style format exactly
</tasks>
<acceptance_criteria>
- All core modules have comprehensive module-level docstrings
- All public functions have Google-style docstrings
- Args, Returns, Raises sections are complete and accurate
- Examples are provided for complex functions
- Docstrings explain WHY, not just WHAT
</acceptance_criteria>
</feature_7>
<feature_8>
<title>Add Google-Style Docstrings to LLM Modules</title>
<description>
Add comprehensive Google-style docstrings to all LLM processing modules
</description>
<tasks>
- llm_metadata.py: Document metadata extraction logic with examples
- llm_toc.py: Document TOC extraction strategies and fallbacks
- llm_classifier.py: Document section types and classification criteria
- llm_chunker.py: Document semantic vs basic chunking approaches
- llm_cleaner.py: Document cleaning rules and validation logic
- llm_validator.py: Document validation criteria and corrections
- Add Examples sections showing input/output for each function
- Document LLM provider differences (Ollama vs Mistral)
- Document cost implications in Notes sections
- All docstrings must follow Google style format exactly
</tasks>
<acceptance_criteria>
- All LLM modules have comprehensive docstrings
- Each function has Args, Returns, Raises sections
- Examples show realistic input/output
- Provider differences are documented
- Cost implications are noted where relevant
</acceptance_criteria>
</feature_8>
<feature_9>
<title>Add Google-Style Docstrings to OCR and Parsing Modules</title>
<description>
Add comprehensive Google-style docstrings to OCR, markdown, hierarchy, and extraction modules
</description>
<tasks>
- mistral_client.py: Document OCR API usage, cost calculation
- ocr_processor.py: Document OCR response processing
- markdown_builder.py: Document markdown generation strategy
- hierarchy_parser.py: Document hierarchy building algorithm
- image_extractor.py: Document image extraction process
- toc_extractor*.py: Document various TOC extraction methods
- Add Examples sections for complex algorithms
- Document edge cases and error handling
- All docstrings must follow Google style format exactly
</tasks>
<acceptance_criteria>
- All OCR/parsing modules have comprehensive docstrings
- Complex algorithms are well explained
- Edge cases are documented
- Error handling is documented
- Examples demonstrate typical usage
</acceptance_criteria>
</feature_9>
<feature_10>
<title>Final Validation and CI Integration</title>
<description>
Verify all type annotations and docstrings, integrate mypy into CI/CD
</description>
<tasks>
- Run mypy --strict on entire codebase, verify 100% pass rate
- Verify all public functions have docstrings
- Check docstring formatting with pydocstyle or similar tool
- Create GitHub Actions workflow to run mypy on every commit
- Update README.md with type checking instructions
- Update CLAUDE.md with documentation standards
- Create CONTRIBUTING.md with type annotation and docstring guidelines
- Generate API documentation with Sphinx or pdoc
- Fix any remaining mypy errors or missing docstrings
</tasks>
<acceptance_criteria>
- mypy --strict passes on entire codebase with zero errors
- All public functions have Google-style docstrings
- CI/CD runs mypy checks automatically
- Documentation is generated and accessible
- Contributing guidelines document type/docstring requirements
</acceptance_criteria>
</feature_10>
</implementation_steps>
<success_criteria>
<type_safety>
- 100% type coverage across all modules
- mypy --strict passes with zero errors
- No # type: ignore comments without justification
- All Dict[str, Any] replaced with TypedDict where appropriate
- Proper use of generics, protocols, and type variables
- NewType used for semantic type safety
</type_safety>
<documentation_quality>
- All modules have comprehensive module-level docstrings
- All public functions/classes have Google-style docstrings
- All docstrings include Args, Returns, Raises sections
- Complex functions include Examples sections
- Cost implications documented in Notes sections
- Error handling clearly documented
- Provider differences (Ollama vs Mistral) documented
</documentation_quality>
<code_quality>
- Code is self-documenting with clear variable names
- Inline comments explain WHY, not WHAT
- Complex algorithms are well explained
- Performance considerations documented
- Security considerations documented
</code_quality>
<developer_experience>
- IDE autocomplete works perfectly with type hints
- Type errors caught at development time, not runtime
- Documentation is easily accessible in IDE
- API examples are executable and tested
- Contributing guidelines are clear and comprehensive
</developer_experience>
<maintainability>
- Refactoring is safer with type checking
- Function signatures are self-documenting
- API contracts are explicit and enforced
- Breaking changes are caught by type checker
- New developers can understand code quickly
</maintainability>
</success_criteria>
<constraints>
<compatibility>
- Must maintain backward compatibility with existing code
- Cannot break existing Flask routes or API contracts
- Weaviate schema must remain unchanged
- Existing tests must continue to pass
</compatibility>
<gradual_migration>
- Can use per-module mypy configuration for gradual migration
- Can temporarily disable strict checks on legacy modules
- Priority modules must be completed first
- Low-priority modules can be deferred
</gradual_migration>
<standards>
- All type annotations must use Python 3.10+ syntax
- Docstrings must follow Google style exactly (not NumPy or reStructuredText)
- Use typing module (List, Dict, Optional) until Python 3.9 support dropped
- Use from __future__ import annotations if needed for forward references
</standards>
</constraints>
<testing_strategy>
<type_checking>
- Run mypy --strict on each module after adding types
- Use mypy daemon (dmypy) for faster incremental checking
- Add mypy to pre-commit hooks
- CI/CD must run mypy and fail on type errors
</type_checking>
<documentation_validation>
- Use pydocstyle to validate Google-style format
- Use sphinx-build to generate docs and catch errors
- Manual review of docstring examples
- Verify examples are executable and correct
</documentation_validation>
<integration_testing>
- Verify existing tests still pass after type additions
- Add new tests for complex typed structures
- Test mypy configuration on sample code
- Verify IDE autocomplete works correctly
</integration_testing>
</testing_strategy>
<documentation_examples>
<module_docstring>
```python
"""
PDF Pipeline V2 - Intelligent document processing with LLM enhancement.
This module orchestrates a 10-step pipeline for processing PDF documents:
1. OCR via Mistral API
2. Markdown construction with images
3. Metadata extraction via LLM
4. Table of contents (TOC) extraction
5. Section classification
6. Semantic chunking
7. Chunk cleaning and validation
8. Enrichment with concepts
9. Validation and corrections
10. Ingestion into Weaviate vector database
The pipeline supports multiple LLM providers (Ollama local, Mistral API) and
various processing modes (skip OCR, semantic chunking, OCR annotations).
Typical usage:
>>> from pathlib import Path
>>> from utils.pdf_pipeline import process_pdf
>>>
>>> result = process_pdf(
... Path("document.pdf"),
... use_llm=True,
... llm_provider="ollama",
... ingest_to_weaviate=True,
... )
>>> print(f"Processed {result['pages']} pages, {result['chunks_count']} chunks")
See Also:
mistral_client: OCR API client
llm_metadata: Metadata extraction
weaviate_ingest: Database ingestion
"""
```
</module_docstring>
<function_docstring>
```python
def process_pdf_v2(
pdf_path: Path,
output_dir: Path = Path("output"),
*,
use_llm: bool = True,
llm_provider: Literal["ollama", "mistral"] = "ollama",
llm_model: Optional[str] = None,
skip_ocr: bool = False,
ingest_to_weaviate: bool = True,
progress_callback: Optional[ProgressCallback] = None,
) -> PipelineResult:
"""
Process a PDF through the complete V2 pipeline with LLM enhancement.
This function orchestrates all 10 steps of the intelligent document processing
pipeline, from OCR to Weaviate ingestion. It supports both local (Ollama) and
cloud (Mistral API) LLM providers, with optional caching via skip_ocr.
Args:
pdf_path: Absolute path to the PDF file to process.
output_dir: Base directory for output files. Defaults to "./output".
use_llm: Enable LLM-based processing (metadata, TOC, chunking).
If False, uses basic heuristic processing.
llm_provider: LLM provider to use. "ollama" for local (free but slow),
"mistral" for API (fast but paid).
llm_model: Specific model name. If None, auto-detects based on provider
(qwen2.5:7b for ollama, mistral-small-latest for mistral).
skip_ocr: If True, reuses existing markdown file to avoid OCR cost.
Requires output_dir/<doc_name>/<doc_name>.md to exist.
ingest_to_weaviate: If True, ingests chunks into Weaviate after processing.
progress_callback: Optional callback for real-time progress updates.
Called with (step_id, status, detail) for each pipeline step.
Returns:
Dictionary containing processing results with the following keys:
- success (bool): True if processing completed without errors
- document_name (str): Name of the processed document
- pages (int): Number of pages in the PDF
- chunks_count (int): Number of chunks generated
- cost_ocr (float): OCR cost in euros (0 if skip_ocr=True)
- cost_llm (float): LLM API cost in euros (0 if provider=ollama)
- cost_total (float): Total cost (ocr + llm)
- metadata (dict): Extracted metadata (title, author, etc.)
- toc (list): Hierarchical table of contents
- files (dict): Paths to generated files (markdown, chunks, etc.)
Raises:
FileNotFoundError: If pdf_path does not exist.
ValueError: If skip_ocr=True but markdown file not found.
RuntimeError: If Weaviate connection fails during ingestion.
Examples:
Basic usage with Ollama (free):
>>> result = process_pdf_v2(
... Path("platon_menon.pdf"),
... llm_provider="ollama"
... )
>>> print(f"Cost: {result['cost_total']:.4f}€")
Cost: 0.0270€ # OCR only
With Mistral API (faster):
>>> result = process_pdf_v2(
... Path("platon_menon.pdf"),
... llm_provider="mistral",
... llm_model="mistral-small-latest"
... )
Skip OCR to avoid cost:
>>> result = process_pdf_v2(
... Path("platon_menon.pdf"),
... skip_ocr=True, # Reuses existing markdown
... ingest_to_weaviate=False
... )
Notes:
- OCR cost: ~0.003€/page (standard), ~0.009€/page (with annotations)
- LLM cost: Free with Ollama, variable with Mistral API
- Processing time: ~30s/page with Ollama, ~5s/page with Mistral
- Weaviate must be running (docker-compose up -d) before ingestion
"""
```
</function_docstring>
</documentation_examples>
</project_specification>

View File

@@ -0,0 +1,290 @@
## YOUR ROLE - CODING AGENT (Library RAG - Type Safety & Documentation)
You are working on adding strict type annotations and Google-style docstrings to a Python library project.
This is a FRESH context window - you have no memory of previous sessions.
You have access to Linear for project management via MCP tools. Linear is your single source of truth.
### STEP 1: GET YOUR BEARINGS (MANDATORY)
Start by orienting yourself:
```bash
# 1. See your working directory
pwd
# 2. List files to understand project structure
ls -la
# 3. Read the project specification
cat app_spec.txt
# 4. Read the Linear project state
cat .linear_project.json
# 5. Check recent git history
git log --oneline -20
```
### STEP 2: CHECK LINEAR STATUS
Query Linear to understand current project state using the project_id from `.linear_project.json`.
1. **Get all issues and count progress:**
```
mcp__linear__list_issues with project_id
```
Count:
- Issues "Done" = completed
- Issues "Todo" = remaining
- Issues "In Progress" = currently being worked on
2. **Find META issue** (if exists) for session context
3. **Check for in-progress work** - complete it first if found
### STEP 3: SELECT NEXT ISSUE
Get Todo issues sorted by priority:
```
mcp__linear__list_issues with project_id, status="Todo", limit=5
```
Select ONE highest-priority issue to work on.
### STEP 4: CLAIM THE ISSUE
Use `mcp__linear__update_issue` to set status to "In Progress"
### STEP 5: IMPLEMENT THE ISSUE
Based on issue category:
**For Type Annotation Issues (e.g., "Types - Add type annotations to X.py"):**
1. Read the target Python file
2. Identify all functions, methods, and variables
3. Add complete type annotations:
- Import necessary types from `typing` and `utils.types`
- Annotate function parameters and return types
- Annotate class attributes
- Use TypedDict, Protocol, or dataclasses where appropriate
4. Save the file
5. Run mypy to verify (MANDATORY):
```bash
cd generations/library_rag
mypy --config-file=mypy.ini <file_path>
```
6. Fix any mypy errors
7. Commit the changes
**For Documentation Issues (e.g., "Docs - Add docstrings to X.py"):**
1. Read the target Python file
2. Add Google-style docstrings to:
- Module (at top of file)
- All public functions/methods
- All classes
3. Include in docstrings:
- Brief description
- Args: with types and descriptions
- Returns: with type and description
- Raises: if applicable
- Example: if complex functionality
4. Save the file
5. Optionally run pydocstyle to verify (if installed)
6. Commit the changes
**For Setup/Infrastructure Issues:**
Follow the specific instructions in the issue description.
### STEP 6: VERIFICATION
**Type Annotation Issues:**
- Run mypy on the modified file(s)
- Ensure zero type errors
- If errors exist, fix them before proceeding
**Documentation Issues:**
- Review docstrings for completeness
- Ensure Args/Returns sections match function signatures
- Check that examples are accurate
**Functional Changes (rare):**
- If the issue changes behavior, test manually
- Start Flask server if needed: `python flask_app.py`
- Test the affected functionality
### STEP 7: GIT COMMIT
Make a descriptive commit:
```bash
git add <files>
git commit -m "<Issue ID>: <Short description>
- <List of changes>
- Verified with mypy (for type issues)
- Linear issue: <issue identifier>
"
```
### STEP 8: UPDATE LINEAR ISSUE
1. **Add implementation comment:**
```markdown
## Implementation Complete
### Changes Made
- [List of files modified]
- [Key changes]
### Verification
- mypy passes with zero errors (for type issues)
- All test steps from issue description verified
### Git Commit
[commit hash and message]
```
2. **Update status to "Done"** using `mcp__linear__update_issue`
### STEP 9: DECIDE NEXT ACTION
After completing an issue, ask yourself:
1. Have I been working for a while? (Use judgment based on complexity of work done)
2. Is the code in a stable state?
3. Would this be a good handoff point?
**If YES to all three:**
- Proceed to STEP 10 (Session Summary)
- End cleanly
**If NO:**
- Continue to another issue (go back to STEP 3)
- But commit first!
**Pacing Guidelines:**
- Early phase (< 20% done): Can complete multiple simple issues
- Mid/late phase (> 20% done): 1-2 issues per session for quality
### STEP 10: SESSION SUMMARY (When Ending)
If META issue exists, add a comment:
```markdown
## Session Complete
### Completed This Session
- [Issue ID]: [Title] - [Brief summary]
### Current Progress
- X issues Done
- Y issues In Progress
- Z issues Todo
### Notes for Next Session
- [Important context]
- [Recommendations]
- [Any concerns]
```
Ensure:
- All code committed
- No uncommitted changes
- App in working state
---
## LINEAR WORKFLOW RULES
**Status Transitions:**
- Todo → In Progress (when starting)
- In Progress → Done (when verified)
**NEVER:**
- Delete or modify issue descriptions
- Mark Done without verification
- Leave issues In Progress when switching
---
## TYPE ANNOTATION GUIDELINES
**Imports needed:**
```python
from typing import Optional, Dict, List, Any, Tuple, Callable
from pathlib import Path
from utils.types import <ProjectSpecificTypes>
```
**Common patterns:**
```python
# Functions
def process_data(input: str, options: Optional[Dict[str, Any]] = None) -> List[str]:
"""Process input data."""
...
# Methods with self
def save(self, path: Path) -> None:
"""Save to file."""
...
# Async functions
async def fetch_data(url: str) -> Dict[str, Any]:
"""Fetch from API."""
...
```
**Use project types from `utils/types.py`:**
- Metadata, OCRResponse, TOCEntry, ChunkData, PipelineResult, etc.
---
## DOCSTRING TEMPLATE (Google Style)
```python
def function_name(param1: str, param2: int = 0) -> List[str]:
"""
Brief one-line description.
More detailed description if needed. Explain what the function does,
any important behavior, side effects, etc.
Args:
param1: Description of param1.
param2: Description of param2. Defaults to 0.
Returns:
Description of return value.
Raises:
ValueError: When param1 is empty.
IOError: When file cannot be read.
Example:
>>> result = function_name("test", 5)
>>> print(result)
['test', 'test', 'test', 'test', 'test']
"""
```
---
## IMPORTANT REMINDERS
**Your Goal:** Add strict type annotations and comprehensive documentation to all Python modules
**This Session's Goal:** Complete 1-2 issues with quality work and clean handoff
**Quality Bar:**
- mypy --strict passes with zero errors
- All public functions have complete Google-style docstrings
- Code is clean and well-documented
**Context is finite.** End sessions early with good handoff notes. The next agent will continue.
---
Begin by running STEP 1 (Get Your Bearings).

View File

@@ -60,6 +60,13 @@ using the `mcp__linear__create_issue` tool.
- Do NOT modify existing issues
- Only create NEW issues for the NEW features
**IMPORTANT - Issue Count:**
Create EXACTLY ONE issue per feature listed in the `<implementation_steps>` section of the new spec file.
- If the spec has 8 features → create 8 issues
- If the spec has 15 features → create 15 issues
- Do NOT create a fixed number like 50 issues
- Each `<feature_N>` in the spec = 1 Linear issue
**For each NEW feature, create an issue with:**
```

View File

@@ -30,9 +30,16 @@ Before creating issues, you need to set up Linear:
### CRITICAL TASK: Create Linear Issues
**IMPORTANT - Issue Count:**
Create EXACTLY ONE issue per feature listed in the `<implementation_steps>` section of app_spec.txt.
- Count the `<feature_N>` tags in the spec file
- If the spec has 8 features → create 8 issues
- If the spec has 50 features → create 50 issues
- Do NOT create a fixed arbitrary number
- Each `<feature_N>` in `<implementation_steps>` = 1 Linear issue
Based on `app_spec.txt`, create Linear issues for each feature using the
`mcp__linear__create_issue` tool. Create 50 detailed issues that
comprehensively cover all features in the spec.
`mcp__linear__create_issue` tool.
**For each feature, create an issue with:**
@@ -66,7 +73,7 @@ priority: 1-4 based on importance (1=urgent/foundational, 4=low/polish)
```
**Requirements for Linear Issues:**
- Create 50 issues total covering all features in the spec
- Create ONE issue per `<feature_N>` tag in `<implementation_steps>`
- Mix of functional and style features (note category in description)
- Order by priority: foundational features get priority 1-2, polish features get 3-4
- Include detailed test steps in each issue description

576
prompts/spec_embed_BAAI.txt Normal file
View File

@@ -0,0 +1,576 @@
<project_specification>
<project_name>Library RAG - Migration to BGE-M3 Embeddings</project_name>
<overview>
Migrate the Library RAG embedding model from sentence-transformers MiniLM-L6 (384-dim)
to BAAI/bge-m3 (1024-dim) for superior performance on multilingual philosophical texts.
**Why BGE-M3?**
- 1024 dimensions vs 384 (2.7x richer semantic representation)
- 8192 token context vs 512 (16x longer sequences)
- Superior multilingual support (Greek, Latin, French, English)
- Better trained on academic/research texts
- Captures philosophical nuances more effectively
**Scope:**
This is a focused migration that only affects the vectorization layer.
LLM processing (Ollama/Mistral) remains completely unchanged.
**Migration Strategy:**
- Auto-detect GPU availability and configure accordingly
- Delete existing collections (384-dim vectors incompatible with 1024-dim)
- Recreate schema with BGE-M3 vectorizer
- Re-ingest existing 2 documents from cached chunks
- Validate search quality improvements
</overview>
<technology_stack>
<backend>
<weaviate>1.34.4 (no change)</weaviate>
<new_vectorizer>BAAI/bge-m3 via text2vec-transformers</new_vectorizer>
<old_vectorizer>sentence-transformers-multi-qa-MiniLM-L6-cos-v1</old_vectorizer>
<gpu_support>Auto-detect CUDA availability (ENABLE_CUDA="1" if GPU, "0" if CPU)</gpu_support>
</backend>
<unchanged>
<llm>Ollama/Mistral (no impact on LLM processing)</llm>
<ocr>Mistral OCR (no change)</ocr>
<pipeline>PDF pipeline steps 1-9 unchanged</pipeline>
</unchanged>
</technology_stack>
<prerequisites>
<environment_setup>
- Existing Library RAG application (generations/library_rag/)
- Docker and Docker Compose installed
- NVIDIA Docker runtime (if GPU available)
- Only 2 documents currently ingested (will be re-ingested)
- No production data to preserve
- RTX 4070 GPU available (will be auto-detected and used)
</environment_setup>
</prerequisites>
<architecture_impact>
<independent_components>
**LLM Processing (Steps 1-9):**
- OCR extraction (Mistral API)
- Metadata extraction (Ollama/Mistral)
- TOC extraction (Ollama/Mistral)
- Section classification (Ollama/Mistral)
- Semantic chunking (Ollama/Mistral)
- Cleaning and validation (Ollama/Mistral)
→ **None of these are affected by embedding model change**
**Vectorization (Step 10):**
- Text → Vector conversion (text2vec-transformers in Weaviate)
- This is the ONLY component that changes
- Happens automatically during Weaviate ingestion
- No Python code changes required
</independent_components>
<breaking_changes>
**IMPORTANT: Vector dimensions are incompatible**
- Existing collections use 384-dim vectors (MiniLM-L6)
- New model generates 1024-dim vectors (BGE-M3)
- Weaviate cannot mix dimensions in same collection
- All collections must be deleted and recreated
- All documents must be re-ingested
**Why this is safe:**
- Only 2 documents currently ingested
- Source chunks.json files preserved in output/ directory
- No OCR/LLM re-processing needed (reuse existing chunks)
- No additional costs incurred
- Estimated total migration time: 20-25 minutes
</breaking_changes>
</architecture_impact>
<implementation_steps>
<feature_1>
<title>Complete BGE-M3 Setup with GPU Auto-Detection</title>
<description>
Atomic migration: GPU detection → Docker configuration → Schema deletion → Recreation.
This feature must be completed entirely in one session (cannot be partially done).
**Step 1: GPU Auto-Detection**
- Check for NVIDIA GPU availability: nvidia-smi or docker run --gpus all nvidia/cuda
- If GPU detected: Set ENABLE_CUDA="1"
- If no GPU: Set ENABLE_CUDA="0"
- Verify NVIDIA Docker runtime if GPU available
**Step 2: Update Docker Compose**
- Backup current docker-compose.yml to docker-compose.yml.backup
- Update text2vec-transformers service:
* Change image to: cr.weaviate.io/semitechnologies/transformers-inference:sentence-transformers-BAAI-bge-m3
* Set ENABLE_CUDA based on GPU detection
* Add GPU device mapping if CUDA enabled
- Update comments to reflect BGE-M3 model
- Stop containers: docker-compose down
- Remove old transformers image: docker rmi [old-image-name]
- Start new containers: docker-compose up -d
- Verify BGE-M3 loaded: docker-compose logs text2vec-transformers | grep -i "model"
- If GPU enabled, verify GPU usage: nvidia-smi (should show transformers process)
**Step 3: Delete Existing Collections**
- Create migrate_to_bge_m3.py script with safety checks
- List all existing collections and object counts
- Confirm deletion prompt: "Delete all collections? (yes/no)"
- Delete all collections: client.collections.delete_all()
- Verify deletion: client.collections.list_all() should return empty
- Log deleted collections and counts for reference
**Step 4: Recreate Schema with BGE-M3**
- Update schema.py docstring (line 40: MiniLM-L6 → BGE-M3)
- Add migration note at top of schema.py
- Run: python schema.py to recreate all collections
- Weaviate will auto-detect 1024-dim from text2vec-transformers service
- Verify collections created: Work, Document, Chunk, Summary
- Verify vectorizer configured: display_schema() should show text2vec-transformers
- Query text2vec-transformers service to confirm 1024 dimensions
**Validation:**
- All containers running (docker-compose ps)
- BGE-M3 model loaded successfully
- GPU utilized if available (check nvidia-smi)
- All collections exist with empty state
- Vector dimensions = 1024 (query Weaviate schema)
**Rollback if needed:**
- Restore docker-compose.yml.backup
- docker-compose down && docker-compose up -d
- python schema.py to recreate with old model
</description>
<priority>1</priority>
<category>migration</category>
<test_steps>
1. Run GPU detection: nvidia-smi or equivalent
2. Verify ENABLE_CUDA set correctly based on GPU availability
3. Backup docker-compose.yml created
4. Stop containers: docker-compose down
5. Start with BGE-M3: docker-compose up -d
6. Check logs: docker-compose logs text2vec-transformers
7. Verify "BAAI/bge-m3" appears in logs
8. If GPU: verify nvidia-smi shows transformers process
9. Run migrate_to_bge_m3.py and confirm deletion
10. Verify all collections deleted
11. Run schema.py to recreate
12. Verify 4 collections exist: Work, Document, Chunk, Summary
13. Query Weaviate API to confirm vector dimensions = 1024
14. Verify collections are empty (object count = 0)
</test_steps>
</feature_1>
<feature_2>
<title>Document Re-ingestion from Cached Chunks</title>
<description>
Re-ingest the 2 existing documents using their cached chunks.json files.
No OCR or LLM re-processing needed (saves time and cost).
**Process:**
1. Identify existing documents in output/ directory
2. For each document directory:
- Read {document_name}_chunks.json
- Verify chunk structure contains all required fields
- Extract Work metadata (title, author, year, language, genre)
- Extract Document metadata (sourceId, edition, pages, toc, hierarchy)
- Extract Chunk data (text, keywords, sectionPath, etc.)
3. Ingest to Weaviate using utils/weaviate_ingest.py:
- Create Work object (if not exists)
- Create Document object with nested Work reference
- Create Chunk objects with nested Document and Work references
- text2vec-transformers will auto-generate 1024-dim vectors
4. Verify ingestion success:
- Query Weaviate for each document by sourceId
- Verify chunk counts match original
- Check that vectors are 1024 dimensions
- Verify nested Work/Document metadata accessible
**Example code:**
```python
import json
from pathlib import Path
from utils.weaviate_ingest import (
create_work, create_document, ingest_chunks_to_weaviate
)
output_dir = Path("output")
for doc_dir in output_dir.iterdir():
if doc_dir.is_dir():
chunks_file = doc_dir / f"{doc_dir.name}_chunks.json"
if chunks_file.exists():
with open(chunks_file) as f:
data = json.load(f)
# Create Work
work_id = create_work(client, data["work_metadata"])
# Create Document
doc_id = create_document(client, data["document_metadata"], work_id)
# Ingest chunks
ingest_chunks_to_weaviate(client, data["chunks"], doc_id, work_id)
print(f"✓ Ingested {doc_dir.name}")
```
**Success criteria:**
- All documents from output/ directory ingested
- Chunk counts match original (verify in Weaviate)
- No vectorization errors in logs
- All vectors are 1024 dimensions
</description>
<priority>1</priority>
<category>data</category>
<test_steps>
1. List all directories in output/
2. For each directory, verify {name}_chunks.json exists
3. Load first chunks.json and inspect structure
4. Run re-ingestion script for all documents
5. Query Weaviate for total Chunk count
6. Verify count matches sum of all original chunks
7. Query a sample chunk and verify:
- Vector dimensions = 1024
- Nested work.title and work.author present
- Nested document.sourceId present
8. Verify no errors in Weaviate logs
9. Check text2vec-transformers logs for vectorization activity
</test_steps>
</feature_2>
<feature_3>
<title>Search Quality Validation and Performance Testing</title>
<description>
Validate that BGE-M3 provides superior search quality for philosophical texts.
Test multilingual capabilities and measure performance improvements.
**Create test script: test_bge_m3_quality.py**
**Test 1: Multilingual Queries**
- Test French philosophical terms: "justice", "vertu", "liberté"
- Test English philosophical terms: "virtue", "knowledge", "ethics"
- Test Greek philosophical terms: "ἀρετή" (arete), "τέλος" (telos), "ψυχή" (psyche)
- Test Latin philosophical terms: "virtus", "sapientia", "forma"
- Verify results are semantically relevant
- Compare with expected passages (if baseline available)
**Test 2: Long Query Handling**
- Test query with 100+ words (BGE-M3 supports 8192 tokens)
- Test query with complex philosophical argument
- Verify no truncation warnings
- Verify semantically appropriate results
**Test 3: Semantic Understanding**
- Query: "What is the nature of reality?"
- Expected: Results about ontology, metaphysics, being
- Query: "How should we live?"
- Expected: Results about ethics, virtue, good life
- Query: "What can we know?"
- Expected: Results about epistemology, knowledge, certainty
**Test 4: Performance Metrics**
- Measure query latency (should be &lt;500ms)
- Measure indexing speed during ingestion
- Monitor GPU utilization (if enabled)
- Monitor memory usage (~2GB for BGE-M3)
- Compare with baseline (MiniLM-L6) if metrics available
**Test 5: Vector Dimension Verification**
- Query Weaviate schema API
- Verify all Chunk vectors are 1024 dimensions
- Verify no 384-dim vectors remain (from old model)
**Example test script:**
```python
import weaviate
import weaviate.classes.query as wvq
import time
client = weaviate.connect_to_local()
chunks = client.collections.get("Chunk")
# Test multilingual
test_queries = [
("justice", "French philosophical concept"),
("ἀρετή", "Greek virtue/excellence"),
("What is the good life?", "Long philosophical query"),
]
for query, description in test_queries:
start = time.time()
result = chunks.query.near_text(
query=query,
limit=5,
return_metadata=wvq.MetadataQuery(distance=True),
)
latency = (time.time() - start) * 1000
print(f"\nQuery: {query} ({description})")
print(f"Latency: {latency:.1f}ms")
for obj in result.objects:
similarity = (1 - obj.metadata.distance) * 100
print(f" [{similarity:.1f}%] {obj.properties['work']['title']}")
print(f" {obj.properties['text'][:150]}...")
client.close()
```
**Document results:**
- Create SEARCH_QUALITY_RESULTS.md with:
* Sample queries and results
* Performance metrics
* Comparison with MiniLM-L6 (if available)
* Notes on quality improvements observed
</description>
<priority>1</priority>
<category>validation</category>
<test_steps>
1. Create test_bge_m3_quality.py script
2. Run multilingual query tests (French, English, Greek, Latin)
3. Verify results are semantically relevant
4. Test long queries (100+ words)
5. Measure average query latency over 10 queries
6. Verify latency &lt;500ms
7. Query Weaviate schema to verify vector dimensions = 1024
8. If GPU enabled, monitor nvidia-smi during queries
9. Document search quality improvements in markdown file
10. Compare results with expected philosophical passages
</test_steps>
</feature_3>
<feature_4>
<title>Documentation Update</title>
<description>
Update all documentation to reflect BGE-M3 migration.
**Files to update:**
1. **docker-compose.yml**
- Update comments to mention BGE-M3
- Note GPU auto-detection logic
- Document ENABLE_CUDA setting
2. **README.md**
- Update "Embedding Model" section
- Change: MiniLM-L6 (384-dim) → BGE-M3 (1024-dim)
- Add benefits: multilingual, longer context, better quality
- Update docker-compose instructions if needed
3. **CLAUDE.md**
- Update schema documentation (line ~35)
- Change vectorizer description
- Update example queries to showcase multilingual
- Add migration notes section
4. **schema.py**
- Update module docstring (line 40)
- Change "MiniLM-L6" references to "BGE-M3"
- Add migration date and rationale in comments
- Update display_schema() output text
5. **Create MIGRATION_BGE_M3.md**
- Document migration process
- Explain why BGE-M3 chosen
- List breaking changes (dimension incompatibility)
- Document rollback procedure
- Include before/after comparison
- Note LLM independence (Ollama/Mistral unaffected)
- Document search quality improvements
6. **MCP_README.md** (if exists)
- Update technical details about embeddings
- Update vector dimension references
**Migration notes template:**
```markdown
# BGE-M3 Migration - [Date]
## Why
- Superior multilingual support (Greek, Latin, French, English)
- 1024-dim vectors (2.7x richer than MiniLM-L6)
- 8192 token context (16x longer than MiniLM-L6)
- Better trained on academic/philosophical texts
## What Changed
- Embedding model: MiniLM-L6 → BAAI/bge-m3
- Vector dimensions: 384 → 1024
- All collections deleted and recreated
- 2 documents re-ingested
## Impact
- LLM processing (Ollama/Mistral): **No impact**
- Search quality: **Significantly improved**
- GPU acceleration: **Auto-enabled** (if available)
- Migration time: ~25 minutes
## Search Quality Improvements
[Insert results from Feature 3 testing]
```
**Verify:**
- Search all files for "MiniLM-L6" references
- Search all files for "384" dimension references
- Replace with "BGE-M3" and "1024" respectively
- Grep for "text2vec" and update comments where needed
</description>
<priority>2</priority>
<category>documentation</category>
<test_steps>
1. Update docker-compose.yml comments
2. Update README.md embedding section
3. Update CLAUDE.md schema documentation
4. Update schema.py docstring and comments
5. Create MIGRATION_BGE_M3.md with full migration notes
6. Search codebase for "MiniLM-L6" references: grep -r "MiniLM" .
7. Replace all with "BGE-M3"
8. Search for "384" dimension references
9. Replace with "1024" where appropriate
10. Review all updated files for consistency
11. Verify no outdated references remain
</test_steps>
</feature_4>
</implementation_steps>
<deliverables>
<code>
- Updated docker-compose.yml with BGE-M3 and GPU auto-detection
- migrate_to_bge_m3.py script for safe collection deletion
- Updated schema.py with BGE-M3 documentation
- Re-ingestion script (or integration with existing utils)
- test_bge_m3_quality.py for validation
</code>
<documentation>
- MIGRATION_BGE_M3.md with complete migration notes
- Updated README.md with BGE-M3 details
- Updated CLAUDE.md with schema changes
- SEARCH_QUALITY_RESULTS.md with validation results
- Updated inline comments in all affected files
</documentation>
</deliverables>
<success_criteria>
<functionality>
- BGE-M3 model loads successfully in Weaviate
- GPU auto-detected and utilized if available
- All collections recreated with 1024-dim vectors
- Documents re-ingested successfully from cached chunks
- Semantic search returns relevant results
- Multilingual queries work correctly (Greek, Latin, French, English)
</functionality>
<quality>
- Search quality demonstrably improved vs MiniLM-L6
- Greek/Latin philosophical terms properly embedded
- Long queries (&gt;512 tokens) handled correctly
- No vectorization errors in logs
- Vector dimensions verified as 1024 across all collections
</quality>
<performance>
- Query latency acceptable (&lt;500ms average)
- GPU utilized if available (verified via nvidia-smi)
- Memory usage stable (~2GB for text2vec-transformers)
- Indexing throughput acceptable during re-ingestion
- No performance degradation vs MiniLM-L6
</performance>
<documentation>
- All documentation updated to reflect BGE-M3
- No outdated MiniLM-L6 references remain
- Migration process fully documented
- Rollback procedure documented and tested
- Search quality improvements quantified
</documentation>
</success_criteria>
<migration_notes>
<breaking_changes>
**IMPORTANT: This is a destructive migration**
- All existing Weaviate collections must be deleted
- Vector dimensions change: 384 → 1024 (incompatible)
- Weaviate cannot mix dimensions in same collection
- All documents must be re-ingested
**Low impact:**
- Only 2 documents currently ingested
- Source chunks.json files preserved in output/ directory
- No OCR re-processing needed (saves ~0.006€ per doc)
- No LLM re-processing needed (saves time and cost)
- Estimated migration time: 20-25 minutes total
</breaking_changes>
<rollback_plan>
If BGE-M3 causes issues, rollback is straightforward:
1. Stop containers: docker-compose down
2. Restore backup: mv docker-compose.yml.backup docker-compose.yml
3. Start containers: docker-compose up -d
4. Recreate schema: python schema.py
5. Re-ingest documents from output/ directory (same process as Feature 2)
**Time to rollback: ~15 minutes**
**Note:** Backup of docker-compose.yml created automatically in Feature 1
</rollback_plan>
<gpu_auto_detection>
**GPU is NOT optional - it's auto-detected**
The system will automatically detect GPU availability and configure accordingly:
- **If GPU available (RTX 4070 detected):**
* ENABLE_CUDA="1" in docker-compose.yml
* GPU device mapping added to text2vec-transformers service
* Vectorization uses GPU (5-10x faster)
* ~2GB VRAM used (plenty of headroom on 4070)
* Ollama/Qwen can still use remaining VRAM
- **If NO GPU available:**
* ENABLE_CUDA="0" in docker-compose.yml
* Vectorization uses CPU (slower but functional)
* No GPU device mapping needed
**Detection method:**
```bash
# Try nvidia-smi
if command -v nvidia-smi &> /dev/null; then
GPU_AVAILABLE=true
else
# Try Docker GPU test
if docker run --rm --gpus all nvidia/cuda:11.8.0-base-ubuntu22.04 nvidia-smi &> /dev/null; then
GPU_AVAILABLE=true
else
GPU_AVAILABLE=false
fi
fi
```
**User has RTX 4070:** GPU will be detected and used automatically.
</gpu_auto_detection>
<llm_independence>
**Ollama/Mistral are NOT affected by this change**
The embedding model migration ONLY affects Weaviate vectorization (pipeline step 10).
All LLM processing (steps 1-9) remains unchanged:
- OCR extraction (Mistral API)
- Metadata extraction (Ollama/Mistral)
- TOC extraction (Ollama/Mistral)
- Section classification (Ollama/Mistral)
- Semantic chunking (Ollama/Mistral)
- Cleaning and validation (Ollama/Mistral)
**No Python code changes required.**
Weaviate handles vectorization automatically via text2vec-transformers service.
**Ollama can still use GPU:**
BGE-M3 uses ~2GB VRAM. RTX 4070 has 12GB.
Ollama/Qwen can use remaining 10GB without conflict.
</llm_independence>
</migration_notes>
</project_specification>