Update framework configuration and clean up obsolete specs
Configuration updates: - Added .env.example template for environment variables - Updated README.md with better setup instructions (.env usage) - Enhanced .claude/settings.local.json with additional Bash permissions - Added .claude/CLAUDE.md framework documentation Spec cleanup: - Removed obsolete spec files (language_selection, mistral_extensible, template, theme_customization) - Consolidated app_spec.txt (Claude Clone example) - Added app_spec_model.txt as reference template - Added app_spec_library_rag_types_docs.txt - Added coding_prompt_library.md Framework improvements: - Updated agent.py, autonomous_agent_demo.py, client.py with minor fixes - Enhanced dockerize_my_project.py - Updated prompts (initializer, initializer_bis) with better guidance - Added docker-compose.my_project.yml example This commit consolidates improvements made during development sessions. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This commit is contained in:
234
.claude/CLAUDE.md
Normal file
234
.claude/CLAUDE.md
Normal file
@@ -0,0 +1,234 @@
|
|||||||
|
# CLAUDE.md
|
||||||
|
|
||||||
|
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
This is an autonomous coding agent framework that uses Claude Agent SDK with Linear integration for project management. The framework enables long-running autonomous development sessions where agents create complete applications from XML specifications.
|
||||||
|
|
||||||
|
**Key Architecture**: Two-agent pattern (Initializer + Coding Agent) with Linear as the single source of truth for project state and progress tracking.
|
||||||
|
|
||||||
|
## Common Commands
|
||||||
|
|
||||||
|
### Running the Agent
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Fresh project initialization
|
||||||
|
python autonomous_agent_demo.py --project-dir ./my_project
|
||||||
|
|
||||||
|
# Continue existing project
|
||||||
|
python autonomous_agent_demo.py --project-dir ./my_project
|
||||||
|
|
||||||
|
# Add new features to existing project (Initializer Bis)
|
||||||
|
python autonomous_agent_demo.py --project-dir ./my_project --new-spec app_spec_theme_customization.txt
|
||||||
|
|
||||||
|
# Limit iterations for testing
|
||||||
|
python autonomous_agent_demo.py --project-dir ./my_project --max-iterations 3
|
||||||
|
```
|
||||||
|
|
||||||
|
### Testing
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Run security hook tests
|
||||||
|
python test_security.py
|
||||||
|
|
||||||
|
# Test mypy type checking (for library projects)
|
||||||
|
mypy path/to/module.py
|
||||||
|
```
|
||||||
|
|
||||||
|
### Environment Setup
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Generate Claude Code OAuth token
|
||||||
|
claude setup-token
|
||||||
|
|
||||||
|
# Install dependencies
|
||||||
|
pip install -r requirements.txt
|
||||||
|
```
|
||||||
|
|
||||||
|
## High-Level Architecture
|
||||||
|
|
||||||
|
### Core Agent Flow
|
||||||
|
|
||||||
|
1. **First Run (Initializer Agent)**:
|
||||||
|
- Reads `prompts/app_spec.txt` specification
|
||||||
|
- Creates Linear project and ~50 issues (one per `<feature_X>` tag)
|
||||||
|
- Creates META issue for session tracking
|
||||||
|
- Initializes project structure with `init.sh`
|
||||||
|
- Writes `.linear_project.json` marker file
|
||||||
|
|
||||||
|
2. **Subsequent Runs (Coding Agent)**:
|
||||||
|
- Queries Linear for highest-priority Todo issue
|
||||||
|
- Updates issue status to "In Progress"
|
||||||
|
- Implements feature using SDK tools
|
||||||
|
- Tests implementation (Puppeteer for web apps, pytest/mypy for libraries)
|
||||||
|
- Adds comment to Linear issue with implementation notes
|
||||||
|
- Marks issue as "Done"
|
||||||
|
- Updates META issue with session summary
|
||||||
|
|
||||||
|
3. **Initializer Bis (Add Features)**:
|
||||||
|
- Triggered by `--new-spec` flag on existing projects
|
||||||
|
- Reads new spec file from `prompts/`
|
||||||
|
- Creates additional Linear issues for new features
|
||||||
|
- Updates existing project without re-initializing
|
||||||
|
|
||||||
|
### Key Design Patterns
|
||||||
|
|
||||||
|
**Session Handoff via Linear**: Agents don't use local state files for coordination. All session context, implementation notes, and progress tracking happens through Linear issues and comments. This provides:
|
||||||
|
- Real-time visibility in Linear workspace
|
||||||
|
- Persistent history across sessions
|
||||||
|
- Easy debugging via issue comments
|
||||||
|
|
||||||
|
**Defense-in-Depth Security** (see `security.py` and `client.py`):
|
||||||
|
1. OS-level sandbox for bash command isolation
|
||||||
|
2. Filesystem restrictions (operations limited to project directory)
|
||||||
|
3. Bash command allowlist with pre-tool-use hooks
|
||||||
|
4. Explicit MCP tool permissions
|
||||||
|
|
||||||
|
**Project Type Detection** (`agent.py:is_library_project`):
|
||||||
|
- Detects library/type-safety projects vs full-stack web apps
|
||||||
|
- Uses different coding prompts (`coding_prompt_library.md` vs `coding_prompt.md`)
|
||||||
|
- Keywords: "type safety", "docstrings", "mypy", "library rag"
|
||||||
|
|
||||||
|
### Module Responsibilities
|
||||||
|
|
||||||
|
- **`autonomous_agent_demo.py`**: Entry point, argument parsing, environment validation
|
||||||
|
- **`agent.py`**: Core agent loop, session orchestration, project type detection
|
||||||
|
- **`client.py`**: Claude SDK client configuration, MCP server setup (Linear + Puppeteer)
|
||||||
|
- **`security.py`**: Bash command validation with allowlist, pre-tool-use hooks
|
||||||
|
- **`prompts.py`**: Prompt loading utilities, spec file copying
|
||||||
|
- **`progress.py`**: Progress tracking via `.linear_project.json` marker
|
||||||
|
- **`linear_config.py`**: Linear API configuration constants
|
||||||
|
|
||||||
|
### MCP Servers
|
||||||
|
|
||||||
|
**Linear** (HTTP transport at `mcp.linear.app/mcp`):
|
||||||
|
- Project/team management
|
||||||
|
- Issue CRUD operations
|
||||||
|
- Comments and status updates
|
||||||
|
- Requires `LINEAR_API_KEY` in `.env`
|
||||||
|
|
||||||
|
**Puppeteer** (stdio transport):
|
||||||
|
- Browser automation for UI testing
|
||||||
|
- Navigate, screenshot, click, fill, evaluate
|
||||||
|
- Used by web app projects, not library projects
|
||||||
|
|
||||||
|
## Application Specification Format
|
||||||
|
|
||||||
|
Specifications use XML format in `prompts/app_spec.txt`:
|
||||||
|
|
||||||
|
```xml
|
||||||
|
<project_specification>
|
||||||
|
<project_name>Your App Name</project_name>
|
||||||
|
<overview>Detailed description...</overview>
|
||||||
|
|
||||||
|
<technology_stack>
|
||||||
|
<frontend>...</frontend>
|
||||||
|
<backend>...</backend>
|
||||||
|
</technology_stack>
|
||||||
|
|
||||||
|
<core_features>
|
||||||
|
<feature_1>
|
||||||
|
<title>Feature title</title>
|
||||||
|
<description>Detailed description</description>
|
||||||
|
<priority>1-4 (1=urgent, 4=low)</priority>
|
||||||
|
<category>frontend|backend|auth|etc</category>
|
||||||
|
<test_steps>
|
||||||
|
1. Step one
|
||||||
|
2. Step two
|
||||||
|
</test_steps>
|
||||||
|
</feature_1>
|
||||||
|
<!-- More features... -->
|
||||||
|
</core_features>
|
||||||
|
</project_specification>
|
||||||
|
```
|
||||||
|
|
||||||
|
**Important**: Each `<feature_X>` tag becomes a separate Linear issue. The initializer creates exactly one issue per feature tag.
|
||||||
|
|
||||||
|
## Environment Configuration
|
||||||
|
|
||||||
|
All configuration via `.env` file (copy from `.env.example`):
|
||||||
|
|
||||||
|
```bash
|
||||||
|
CLAUDE_CODE_OAUTH_TOKEN='your-oauth-token' # From: claude setup-token
|
||||||
|
LINEAR_API_KEY='lin_api_xxxxx' # From: linear.app/settings/api
|
||||||
|
LINEAR_TEAM_ID='team-id' # Optional, agent prompts if missing
|
||||||
|
```
|
||||||
|
|
||||||
|
## Security Model
|
||||||
|
|
||||||
|
### Allowed Commands (`security.py:ALLOWED_COMMANDS`)
|
||||||
|
|
||||||
|
File operations: `ls`, `cat`, `head`, `tail`, `wc`, `grep`, `cp`, `mkdir`, `chmod`
|
||||||
|
Development: `npm`, `node`, `python`, `python3`, `mypy`, `pytest`
|
||||||
|
Version control: `git`
|
||||||
|
Process management: `ps`, `lsof`, `sleep`, `pkill`
|
||||||
|
Scripts: `init.sh`
|
||||||
|
|
||||||
|
### Additional Validation
|
||||||
|
|
||||||
|
- **`pkill`**: Only allowed for dev processes (node, npm, vite, next)
|
||||||
|
- **`chmod`**: Only `+x` mode permitted (making scripts executable)
|
||||||
|
- **`init.sh`**: Must be `./init.sh` or end with `/init.sh`
|
||||||
|
|
||||||
|
### Adding New Commands
|
||||||
|
|
||||||
|
Edit `security.py:ALLOWED_COMMANDS` and optionally add validation logic to `bash_security_hook`.
|
||||||
|
|
||||||
|
## Generated Project Structure
|
||||||
|
|
||||||
|
After initialization, projects contain:
|
||||||
|
|
||||||
|
```
|
||||||
|
my_project/
|
||||||
|
├── .linear_project.json # Linear state marker (project_id, total_issues, meta_issue_id)
|
||||||
|
├── .claude_settings.json # Security settings (auto-generated)
|
||||||
|
├── app_spec.txt # Original specification (copied from prompts/)
|
||||||
|
├── init.sh # Environment setup script (executable)
|
||||||
|
└── [generated code] # Application files created by agent
|
||||||
|
```
|
||||||
|
|
||||||
|
## Creating New Applications
|
||||||
|
|
||||||
|
1. Create `prompts/app_spec.txt` with your XML specification
|
||||||
|
2. Use existing spec files as templates (see `prompts/app_spec.txt` for Claude Clone example)
|
||||||
|
3. Run: `python autonomous_agent_demo.py --project-dir ./new_app`
|
||||||
|
4. Monitor progress in Linear workspace
|
||||||
|
|
||||||
|
See `GUIDE_NEW_APP.md` for detailed guide (French).
|
||||||
|
|
||||||
|
## Prompt Templates
|
||||||
|
|
||||||
|
Located in `prompts/`:
|
||||||
|
|
||||||
|
- **`initializer_prompt.md`**: First session prompt (creates Linear project/issues)
|
||||||
|
- **`initializer_bis_prompt.md`**: Add features prompt (extends existing project)
|
||||||
|
- **`coding_prompt.md`**: Standard coding session (web apps with Puppeteer testing)
|
||||||
|
- **`coding_prompt_library.md`**: Library coding session (focuses on types/docs, uses pytest/mypy)
|
||||||
|
|
||||||
|
The framework automatically selects the appropriate prompt based on session type and project detection.
|
||||||
|
|
||||||
|
## Important Implementation Notes
|
||||||
|
|
||||||
|
### Linear Integration
|
||||||
|
|
||||||
|
- All work tracked as Linear issues, not local files
|
||||||
|
- Session handoff via Linear comments on META issue
|
||||||
|
- Status workflow: Todo → In Progress → Done
|
||||||
|
- Early termination: Agent stops when detecting "feature-complete" in responses
|
||||||
|
|
||||||
|
### Auto-Continue Behavior
|
||||||
|
|
||||||
|
Agent auto-continues with 3-second delay between sessions (`agent.py:AUTO_CONTINUE_DELAY_SECONDS`). Stops when:
|
||||||
|
- `--max-iterations` limit reached
|
||||||
|
- Response contains "feature-complete" or "all issues completed"
|
||||||
|
- Fatal error occurs
|
||||||
|
|
||||||
|
### Project Directory Handling
|
||||||
|
|
||||||
|
Relative paths automatically placed under `generations/` directory unless absolute path provided.
|
||||||
|
|
||||||
|
### Model Selection
|
||||||
|
|
||||||
|
Default: `claude-opus-4-5-20251101` (Opus 4.5 for best coding performance)
|
||||||
|
Override with: `--model claude-sonnet-4-5-20250929`
|
||||||
@@ -3,7 +3,41 @@
|
|||||||
"allow": [
|
"allow": [
|
||||||
"Bash(test:*)",
|
"Bash(test:*)",
|
||||||
"Bash(cat:*)",
|
"Bash(cat:*)",
|
||||||
"Bash(netstat:*)"
|
"Bash(netstat:*)",
|
||||||
|
"Bash(docker-compose:*)",
|
||||||
|
"Bash(ls:*)",
|
||||||
|
"Bash(rm:*)",
|
||||||
|
"Bash(python autonomous_agent_demo.py:*)",
|
||||||
|
"Bash(dir C:GitHublinear_coding_philosophia_raggenerationslibrary_rag*.py)",
|
||||||
|
"Bash(git add:*)",
|
||||||
|
"Bash(git commit -m \"$\\(cat <<''EOF''\nFix import error: rename delete_document_passages to delete_document_chunks\n\nThe function was renamed in weaviate_ingest.py but the import in __init__.py\nwas not updated, causing ImportError when using the library.\n\nChanges:\n- Updated import statement in utils/__init__.py\n- Updated __all__ export list to use correct function name\nEOF\n\\)\")",
|
||||||
|
"Bash(dir \"C:\\\\GitHub\\\\linear_coding_philosophia_rag\\\\generations\\\\library_rag\\\\.env\")",
|
||||||
|
"Bash(git commit:*)",
|
||||||
|
"Bash(tasklist:*)",
|
||||||
|
"Bash(findstr:*)",
|
||||||
|
"Bash(wmic process:*)",
|
||||||
|
"Bash(powershell -Command \"Get-Process python | Select-Object Id,Path,StartTime | Format-Table -AutoSize\")",
|
||||||
|
"Bash(powershell -Command \"Get-WmiObject Win32_Process -Filter \"\"name = ''python.exe''\"\" | Select-Object ProcessId, CommandLine | Format-List\")",
|
||||||
|
"Bash(timeout:*)",
|
||||||
|
"Bash(powershell -Command:*)",
|
||||||
|
"Bash(python:*)",
|
||||||
|
"Bash(dir \"C:\\\\GitHub\\\\linear_coding_library_rag\\\\generations\\\\library_rag\")",
|
||||||
|
"Bash(docker ps:*)",
|
||||||
|
"Bash(curl:*)",
|
||||||
|
"Bash(dir:*)",
|
||||||
|
"Bash(grep:*)",
|
||||||
|
"Bash(git push:*)",
|
||||||
|
"Bash(mypy:*)",
|
||||||
|
"WebSearch",
|
||||||
|
"Bash(nvidia-smi:*)",
|
||||||
|
"WebFetch(domain:cr.weaviate.io)",
|
||||||
|
"Bash(git restore:*)",
|
||||||
|
"Bash(git log:*)",
|
||||||
|
"Bash(done)",
|
||||||
|
"Bash(git remote set-url:*)",
|
||||||
|
"Bash(docker compose:*)",
|
||||||
|
"Bash(pytest:*)",
|
||||||
|
"Bash(git pull:*)"
|
||||||
]
|
]
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|||||||
11
.env.example
Normal file
11
.env.example
Normal file
@@ -0,0 +1,11 @@
|
|||||||
|
# Claude Code OAuth Token
|
||||||
|
# Run 'claude setup-token' to generate this token
|
||||||
|
CLAUDE_CODE_OAUTH_TOKEN=your-oauth-token-here
|
||||||
|
|
||||||
|
# Linear API Key
|
||||||
|
# Get your API key from: https://linear.app/YOUR-TEAM/settings/api
|
||||||
|
LINEAR_API_KEY=lin_api_xxxxxxxxxxxxx
|
||||||
|
|
||||||
|
# Linear Team ID (optional)
|
||||||
|
# If not set, the agent will list teams and ask you to choose
|
||||||
|
LINEAR_TEAM_ID=
|
||||||
@@ -332,3 +332,6 @@ Pour créer une nouvelle application :
|
|||||||
Le framework s'occupe du reste ! 🚀
|
Le framework s'occupe du reste ! 🚀
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|||||||
42
README.md
42
README.md
@@ -26,23 +26,35 @@ pip install -r requirements.txt
|
|||||||
|
|
||||||
### 2. Set Up Authentication
|
### 2. Set Up Authentication
|
||||||
|
|
||||||
You need two authentication tokens:
|
Create a `.env` file in the root directory by copying the example:
|
||||||
|
|
||||||
**Claude Code OAuth Token:**
|
```bash
|
||||||
|
cp .env.example .env
|
||||||
|
```
|
||||||
|
|
||||||
|
Then configure your credentials in the `.env` file:
|
||||||
|
|
||||||
|
**1. Claude Code OAuth Token:**
|
||||||
```bash
|
```bash
|
||||||
# Generate the token using Claude Code CLI
|
# Generate the token using Claude Code CLI
|
||||||
claude setup-token
|
claude setup-token
|
||||||
|
|
||||||
# Set the environment variable
|
# Add to .env file:
|
||||||
export CLAUDE_CODE_OAUTH_TOKEN='your-oauth-token-here'
|
CLAUDE_CODE_OAUTH_TOKEN='your-oauth-token-here'
|
||||||
```
|
```
|
||||||
|
|
||||||
**Linear API Key:**
|
**2. Linear API Key:**
|
||||||
```bash
|
```bash
|
||||||
# Get your API key from: https://linear.app/YOUR-TEAM/settings/api
|
# Get your API key from: https://linear.app/YOUR-TEAM/settings/api
|
||||||
export LINEAR_API_KEY='lin_api_xxxxxxxxxxxxx'
|
# Add to .env file:
|
||||||
|
LINEAR_API_KEY='lin_api_xxxxxxxxxxxxx'
|
||||||
|
|
||||||
|
# Optional: Linear Team ID (if not set, agent will list teams)
|
||||||
|
LINEAR_TEAM_ID='your-team-id'
|
||||||
```
|
```
|
||||||
|
|
||||||
|
**Important:** The `.env` file is already in `.gitignore` - never commit it!
|
||||||
|
|
||||||
### 3. Verify Installation
|
### 3. Verify Installation
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
@@ -142,12 +154,15 @@ Instead of local text files, agents communicate through:
|
|||||||
- **META Issue**: Session summaries and handoff notes
|
- **META Issue**: Session summaries and handoff notes
|
||||||
- **Issue Status**: Todo / In Progress / Done workflow
|
- **Issue Status**: Todo / In Progress / Done workflow
|
||||||
|
|
||||||
## Environment Variables
|
## Configuration (.env file)
|
||||||
|
|
||||||
|
All configuration is done via a `.env` file in the root directory.
|
||||||
|
|
||||||
| Variable | Description | Required |
|
| Variable | Description | Required |
|
||||||
|----------|-------------|----------|
|
|----------|-------------|----------|
|
||||||
| `CLAUDE_CODE_OAUTH_TOKEN` | Claude Code OAuth token (from `claude setup-token`) | Yes |
|
| `CLAUDE_CODE_OAUTH_TOKEN` | Claude Code OAuth token (from `claude setup-token`) | Yes |
|
||||||
| `LINEAR_API_KEY` | Linear API key for MCP access | Yes |
|
| `LINEAR_API_KEY` | Linear API key for MCP access | Yes |
|
||||||
|
| `LINEAR_TEAM_ID` | Linear Team ID (if not set, agent will list teams and ask) | No |
|
||||||
|
|
||||||
## Command Line Options
|
## Command Line Options
|
||||||
|
|
||||||
@@ -268,11 +283,14 @@ Edit `security.py` to add or remove commands from `ALLOWED_COMMANDS`.
|
|||||||
|
|
||||||
## Troubleshooting
|
## Troubleshooting
|
||||||
|
|
||||||
**"CLAUDE_CODE_OAUTH_TOKEN not set"**
|
**"CLAUDE_CODE_OAUTH_TOKEN not found in .env file"**
|
||||||
Run `claude setup-token` to generate a token, then export it.
|
1. Run `claude setup-token` to generate a token
|
||||||
|
2. Copy `.env.example` to `.env`
|
||||||
|
3. Add your token to the `.env` file
|
||||||
|
|
||||||
**"LINEAR_API_KEY not set"**
|
**"LINEAR_API_KEY not found in .env file"**
|
||||||
Get your API key from `https://linear.app/YOUR-TEAM/settings/api`
|
1. Get your API key from `https://linear.app/YOUR-TEAM/settings/api`
|
||||||
|
2. Add it to your `.env` file
|
||||||
|
|
||||||
**"Appears to hang on first run"**
|
**"Appears to hang on first run"**
|
||||||
Normal behavior. The initializer is creating a Linear project and 50 issues with detailed descriptions. Watch for `[Tool: mcp__linear__create_issue]` output.
|
Normal behavior. The initializer is creating a Linear project and 50 issues with detailed descriptions. Watch for `[Tool: mcp__linear__create_issue]` output.
|
||||||
@@ -281,7 +299,7 @@ Normal behavior. The initializer is creating a Linear project and 50 issues with
|
|||||||
The agent tried to run a disallowed command. Add it to `ALLOWED_COMMANDS` in `security.py` if needed.
|
The agent tried to run a disallowed command. Add it to `ALLOWED_COMMANDS` in `security.py` if needed.
|
||||||
|
|
||||||
**"MCP server connection failed"**
|
**"MCP server connection failed"**
|
||||||
Verify your `LINEAR_API_KEY` is valid and has appropriate permissions. The Linear MCP server uses HTTP transport at `https://mcp.linear.app/mcp`.
|
Verify your `LINEAR_API_KEY` in the `.env` file is valid and has appropriate permissions. The Linear MCP server uses HTTP transport at `https://mcp.linear.app/mcp`.
|
||||||
|
|
||||||
## Viewing Progress
|
## Viewing Progress
|
||||||
|
|
||||||
|
|||||||
39
agent.py
39
agent.py
@@ -17,6 +17,7 @@ from prompts import (
|
|||||||
get_initializer_prompt,
|
get_initializer_prompt,
|
||||||
get_initializer_bis_prompt,
|
get_initializer_bis_prompt,
|
||||||
get_coding_prompt,
|
get_coding_prompt,
|
||||||
|
get_coding_prompt_library,
|
||||||
copy_spec_to_project,
|
copy_spec_to_project,
|
||||||
copy_new_spec_to_project,
|
copy_new_spec_to_project,
|
||||||
)
|
)
|
||||||
@@ -26,6 +27,34 @@ from prompts import (
|
|||||||
AUTO_CONTINUE_DELAY_SECONDS = 3
|
AUTO_CONTINUE_DELAY_SECONDS = 3
|
||||||
|
|
||||||
|
|
||||||
|
def is_library_project(project_dir: Path) -> bool:
|
||||||
|
"""
|
||||||
|
Detect if this is a library/type-safety project vs a full-stack web app.
|
||||||
|
|
||||||
|
Checks app_spec.txt for keywords related to type safety, documentation, or library projects.
|
||||||
|
"""
|
||||||
|
app_spec_path = project_dir / "app_spec.txt"
|
||||||
|
if not app_spec_path.exists():
|
||||||
|
return False
|
||||||
|
|
||||||
|
try:
|
||||||
|
spec_content = app_spec_path.read_text(encoding='utf-8').lower()
|
||||||
|
|
||||||
|
# Keywords that indicate a library/type-safety project
|
||||||
|
library_keywords = [
|
||||||
|
"type safety",
|
||||||
|
"type annotations",
|
||||||
|
"docstrings",
|
||||||
|
"documentation enhancement",
|
||||||
|
"mypy",
|
||||||
|
"library rag",
|
||||||
|
]
|
||||||
|
|
||||||
|
return any(keyword in spec_content for keyword in library_keywords)
|
||||||
|
except Exception:
|
||||||
|
return False
|
||||||
|
|
||||||
|
|
||||||
async def run_agent_session(
|
async def run_agent_session(
|
||||||
client: ClaudeSDKClient,
|
client: ClaudeSDKClient,
|
||||||
message: str,
|
message: str,
|
||||||
@@ -162,8 +191,8 @@ async def run_autonomous_agent(
|
|||||||
print("Fresh start - will use initializer agent")
|
print("Fresh start - will use initializer agent")
|
||||||
print()
|
print()
|
||||||
print("=" * 70)
|
print("=" * 70)
|
||||||
print(" NOTE: First session takes 10-20+ minutes!")
|
print(" NOTE: First session may take several minutes!")
|
||||||
print(" The agent is creating 50 Linear issues and setting up the project.")
|
print(" The agent is creating Linear issues (one per feature in spec).")
|
||||||
print(" This may appear to hang - it's working. Watch for [Tool: ...] output.")
|
print(" This may appear to hang - it's working. Watch for [Tool: ...] output.")
|
||||||
print("=" * 70)
|
print("=" * 70)
|
||||||
print()
|
print()
|
||||||
@@ -213,7 +242,11 @@ async def run_autonomous_agent(
|
|||||||
prompt = get_initializer_bis_prompt()
|
prompt = get_initializer_bis_prompt()
|
||||||
use_initializer_bis = False # Only use initializer bis once
|
use_initializer_bis = False # Only use initializer bis once
|
||||||
else:
|
else:
|
||||||
prompt = get_coding_prompt()
|
# Detect project type and use appropriate coding prompt
|
||||||
|
if is_library_project(project_dir):
|
||||||
|
prompt = get_coding_prompt_library()
|
||||||
|
else:
|
||||||
|
prompt = get_coding_prompt()
|
||||||
|
|
||||||
# Run session with async context manager
|
# Run session with async context manager
|
||||||
async with client:
|
async with client:
|
||||||
|
|||||||
@@ -17,8 +17,12 @@ import asyncio
|
|||||||
import os
|
import os
|
||||||
from pathlib import Path
|
from pathlib import Path
|
||||||
|
|
||||||
|
from dotenv import load_dotenv
|
||||||
from agent import run_autonomous_agent
|
from agent import run_autonomous_agent
|
||||||
|
|
||||||
|
# Load environment variables from .env file
|
||||||
|
load_dotenv()
|
||||||
|
|
||||||
|
|
||||||
# Configuration
|
# Configuration
|
||||||
# Using Claude Opus 4.5 as default for best coding and agentic performance
|
# Using Claude Opus 4.5 as default for best coding and agentic performance
|
||||||
@@ -48,9 +52,10 @@ Examples:
|
|||||||
# Add new specifications to existing project
|
# Add new specifications to existing project
|
||||||
python autonomous_agent_demo.py --project-dir ./claude_clone --new-spec app_spec_new1.txt
|
python autonomous_agent_demo.py --project-dir ./claude_clone --new-spec app_spec_new1.txt
|
||||||
|
|
||||||
Environment Variables:
|
Configuration (.env file):
|
||||||
CLAUDE_CODE_OAUTH_TOKEN Claude Code OAuth token (required)
|
CLAUDE_CODE_OAUTH_TOKEN Claude Code OAuth token (required)
|
||||||
LINEAR_API_KEY Linear API key (required)
|
LINEAR_API_KEY Linear API key (required)
|
||||||
|
LINEAR_TEAM_ID Linear Team ID (optional)
|
||||||
""",
|
""",
|
||||||
)
|
)
|
||||||
|
|
||||||
@@ -91,18 +96,17 @@ def main() -> None:
|
|||||||
|
|
||||||
# Check for Claude Code OAuth token
|
# Check for Claude Code OAuth token
|
||||||
if not os.environ.get("CLAUDE_CODE_OAUTH_TOKEN"):
|
if not os.environ.get("CLAUDE_CODE_OAUTH_TOKEN"):
|
||||||
print("Error: CLAUDE_CODE_OAUTH_TOKEN environment variable not set")
|
print("Error: CLAUDE_CODE_OAUTH_TOKEN not found in .env file")
|
||||||
print("\nRun 'claude setup-token' after installing the Claude Code CLI.")
|
print("\n1. Run 'claude setup-token' after installing the Claude Code CLI")
|
||||||
print("\nThen set it:")
|
print("2. Copy .env.example to .env")
|
||||||
print(" export CLAUDE_CODE_OAUTH_TOKEN='your-token-here'")
|
print("3. Add your token to .env: CLAUDE_CODE_OAUTH_TOKEN='your-token-here'")
|
||||||
return
|
return
|
||||||
|
|
||||||
# Check for Linear API key
|
# Check for Linear API key
|
||||||
if not os.environ.get("LINEAR_API_KEY"):
|
if not os.environ.get("LINEAR_API_KEY"):
|
||||||
print("Error: LINEAR_API_KEY environment variable not set")
|
print("Error: LINEAR_API_KEY not found in .env file")
|
||||||
print("\nGet your API key from: https://linear.app/YOUR-TEAM/settings/api")
|
print("\n1. Get your API key from: https://linear.app/YOUR-TEAM/settings/api")
|
||||||
print("\nThen set it:")
|
print("2. Add it to .env: LINEAR_API_KEY='lin_api_xxxxxxxxxxxxx'")
|
||||||
print(" export LINEAR_API_KEY='lin_api_xxxxxxxxxxxxx'")
|
|
||||||
return
|
return
|
||||||
|
|
||||||
# Automatically place projects in generations/ directory unless already specified
|
# Automatically place projects in generations/ directory unless already specified
|
||||||
|
|||||||
14
client.py
14
client.py
@@ -9,11 +9,15 @@ import json
|
|||||||
import os
|
import os
|
||||||
from pathlib import Path
|
from pathlib import Path
|
||||||
|
|
||||||
|
from dotenv import load_dotenv
|
||||||
from claude_code_sdk import ClaudeCodeOptions, ClaudeSDKClient
|
from claude_code_sdk import ClaudeCodeOptions, ClaudeSDKClient
|
||||||
from claude_code_sdk.types import HookMatcher
|
from claude_code_sdk.types import HookMatcher
|
||||||
|
|
||||||
from security import bash_security_hook
|
from security import bash_security_hook
|
||||||
|
|
||||||
|
# Load environment variables from .env file
|
||||||
|
load_dotenv()
|
||||||
|
|
||||||
|
|
||||||
# Puppeteer MCP tools for browser automation
|
# Puppeteer MCP tools for browser automation
|
||||||
PUPPETEER_TOOLS = [
|
PUPPETEER_TOOLS = [
|
||||||
@@ -85,15 +89,17 @@ def create_client(project_dir: Path, model: str) -> ClaudeSDKClient:
|
|||||||
api_key = os.environ.get("CLAUDE_CODE_OAUTH_TOKEN")
|
api_key = os.environ.get("CLAUDE_CODE_OAUTH_TOKEN")
|
||||||
if not api_key:
|
if not api_key:
|
||||||
raise ValueError(
|
raise ValueError(
|
||||||
"CLAUDE_CODE_OAUTH_TOKEN environment variable not set.\n"
|
"CLAUDE_CODE_OAUTH_TOKEN not set in .env file.\n"
|
||||||
"Run 'claude setup-token after installing the Claude Code CLI."
|
"Run 'claude setup-token' after installing the Claude Code CLI,\n"
|
||||||
|
"then add the token to your .env file."
|
||||||
)
|
)
|
||||||
|
|
||||||
linear_api_key = os.environ.get("LINEAR_API_KEY")
|
linear_api_key = os.environ.get("LINEAR_API_KEY")
|
||||||
if not linear_api_key:
|
if not linear_api_key:
|
||||||
raise ValueError(
|
raise ValueError(
|
||||||
"LINEAR_API_KEY environment variable not set.\n"
|
"LINEAR_API_KEY not set in .env file.\n"
|
||||||
"Get your API key from: https://linear.app/YOUR-TEAM/settings/api"
|
"Get your API key from: https://linear.app/YOUR-TEAM/settings/api\n"
|
||||||
|
"then add it to your .env file."
|
||||||
)
|
)
|
||||||
|
|
||||||
# Create comprehensive security settings
|
# Create comprehensive security settings
|
||||||
|
|||||||
29
docker-compose.my_project.yml
Normal file
29
docker-compose.my_project.yml
Normal file
@@ -0,0 +1,29 @@
|
|||||||
|
services:
|
||||||
|
my_project_frontend:
|
||||||
|
image: node:20
|
||||||
|
working_dir: /app
|
||||||
|
volumes:
|
||||||
|
- ./generations/my_project:/app
|
||||||
|
# Eviter de réutiliser les node_modules Windows dans le conteneur Linux
|
||||||
|
- /app/node_modules
|
||||||
|
command: ["sh", "-c", "npm install && npm run dev -- --host 0.0.0.0 --port 3000"]
|
||||||
|
ports:
|
||||||
|
- "4300:3000"
|
||||||
|
environment:
|
||||||
|
- NODE_ENV=development
|
||||||
|
|
||||||
|
my_project_server:
|
||||||
|
image: node:20
|
||||||
|
working_dir: /app/server
|
||||||
|
volumes:
|
||||||
|
- ./generations/my_project:/app
|
||||||
|
# Eviter de réutiliser les node_modules Windows dans le conteneur Linux
|
||||||
|
- /app/server/node_modules
|
||||||
|
command: ["sh", "-c", "npm install && npm start"]
|
||||||
|
ports:
|
||||||
|
- "4301:3001"
|
||||||
|
environment:
|
||||||
|
- NODE_ENV=development
|
||||||
|
depends_on:
|
||||||
|
- my_project_frontend
|
||||||
|
|
||||||
@@ -7,10 +7,14 @@ These values are used in prompts and for project state management.
|
|||||||
"""
|
"""
|
||||||
|
|
||||||
import os
|
import os
|
||||||
|
from dotenv import load_dotenv
|
||||||
|
|
||||||
# Environment variables (must be set before running)
|
# Load environment variables from .env file
|
||||||
|
load_dotenv()
|
||||||
|
|
||||||
|
# Environment variables (loaded from .env file)
|
||||||
LINEAR_API_KEY = os.environ.get("LINEAR_API_KEY")
|
LINEAR_API_KEY = os.environ.get("LINEAR_API_KEY")
|
||||||
LINEAR_TEAM_ID = os.environ.get("LINEAR_TEAM_ID")
|
LINEAR_TEAM_ID = os.environ.get("LINEAR_TEAM_ID")
|
||||||
|
|
||||||
# Default number of issues to create (can be overridden via command line)
|
# Default number of issues to create (can be overridden via command line)
|
||||||
DEFAULT_ISSUE_COUNT = 50
|
DEFAULT_ISSUE_COUNT = 50
|
||||||
|
|||||||
6
package-lock.json
generated
Normal file
6
package-lock.json
generated
Normal file
@@ -0,0 +1,6 @@
|
|||||||
|
{
|
||||||
|
"name": "linear_coding_philosophia_rag",
|
||||||
|
"lockfileVersion": 3,
|
||||||
|
"requires": true,
|
||||||
|
"packages": {}
|
||||||
|
}
|
||||||
@@ -28,6 +28,11 @@ def get_coding_prompt() -> str:
|
|||||||
return load_prompt("coding_prompt")
|
return load_prompt("coding_prompt")
|
||||||
|
|
||||||
|
|
||||||
|
def get_coding_prompt_library() -> str:
|
||||||
|
"""Load the library-specific coding agent prompt (for type safety & documentation projects)."""
|
||||||
|
return load_prompt("coding_prompt_library")
|
||||||
|
|
||||||
|
|
||||||
def copy_spec_to_project(project_dir: Path) -> None:
|
def copy_spec_to_project(project_dir: Path) -> None:
|
||||||
"""Copy the app spec file into the project directory for the agent to read."""
|
"""Copy the app spec file into the project directory for the agent to read."""
|
||||||
spec_source = PROMPTS_DIR / "app_spec.txt"
|
spec_source = PROMPTS_DIR / "app_spec.txt"
|
||||||
|
|||||||
1171
prompts/app_spec.txt
1171
prompts/app_spec.txt
File diff suppressed because it is too large
Load Diff
@@ -1,179 +0,0 @@
|
|||||||
<project_specification>
|
|
||||||
<project_name>Language selection & i18n completion (FR default, EN/FR only)</project_name>
|
|
||||||
|
|
||||||
<overview>
|
|
||||||
This specification complements the existing "app_spec_language_selection.txt" file.
|
|
||||||
It does NOT replace the original spec. Instead, it adds additional requirements and
|
|
||||||
corrective steps to fully complete the language selection and i18n implementation.
|
|
||||||
|
|
||||||
Main goals:
|
|
||||||
- Support exactly two UI languages: English ("en") and French ("fr").
|
|
||||||
- Make French ("fr") the default language when no preference exists.
|
|
||||||
- Ensure that all user-facing text is translated via the i18n system (no hardcoded strings).
|
|
||||||
- Align the language selector UI with the actual supported languages.
|
|
||||||
</overview>
|
|
||||||
|
|
||||||
<relationship_to_original_spec>
|
|
||||||
- The original file "app_spec_language_selection.txt" defines the initial language selection
|
|
||||||
feature and i18n architecture (context, translation files, etc.).
|
|
||||||
- This completion spec:
|
|
||||||
* keeps that architecture,
|
|
||||||
* tightens some requirements (FR as default),
|
|
||||||
* and adds missing work items (removal of hardcoded English strings, cleanup of extra languages).
|
|
||||||
- The original spec remains valid; this completion spec should be applied on top of it.
|
|
||||||
</relationship_to_original_spec>
|
|
||||||
|
|
||||||
<constraints>
|
|
||||||
- Officially supported UI languages:
|
|
||||||
* English ("en")
|
|
||||||
* French ("fr")
|
|
||||||
- Default language:
|
|
||||||
* French ("fr") MUST be the default language when there is no stored preference.
|
|
||||||
- No other languages (es, de, ja, etc.) are considered part of this completion scope.
|
|
||||||
They may be added later in a separate spec with full translation coverage.
|
|
||||||
- The existing i18n architecture (LanguageContext, useLanguage hook, en.json, fr.json)
|
|
||||||
must be reused, not replaced.
|
|
||||||
</constraints>
|
|
||||||
|
|
||||||
<current_state_summary>
|
|
||||||
- LanguageContext and useLanguage already exist and manage language + translations.
|
|
||||||
- en.json and fr.json exist with a significant subset of strings translated.
|
|
||||||
- Some components already call t('...') correctly (e.g. welcome screen, many settings labels).
|
|
||||||
- However:
|
|
||||||
* Many UI strings are still hardcoded in English in "src/App.jsx".
|
|
||||||
* The language selector UI mentions more languages than are actually implemented.
|
|
||||||
* The default language behavior is not explicitly enforced as French.
|
|
||||||
</current_state_summary>
|
|
||||||
|
|
||||||
<target_state>
|
|
||||||
- French is used as the default language for new/anonymous users.
|
|
||||||
- Only English and French appear in the language selector.
|
|
||||||
- All user-facing UI strings in "src/App.jsx" and its inline components use t('key').
|
|
||||||
- Every key used by the UI is defined in both en.json and fr.json.
|
|
||||||
- No leftover English UI text appears when French is selected.
|
|
||||||
</target_state>
|
|
||||||
|
|
||||||
<implementation_details>
|
|
||||||
<default_language>
|
|
||||||
- In the language context code:
|
|
||||||
* Ensure there is a constant DEFAULT_LANGUAGE set to "fr".
|
|
||||||
Example:
|
|
||||||
const DEFAULT_LANGUAGE = 'fr';
|
|
||||||
- Initial language resolution MUST follow this order:
|
|
||||||
1. If a valid language ("en" or "fr") is found in localStorage, use it.
|
|
||||||
2. Otherwise, fall back to DEFAULT_LANGUAGE = "fr".
|
|
||||||
- This guarantees that first-time users and users without a stored preference see the UI in French.
|
|
||||||
</default_language>
|
|
||||||
|
|
||||||
<supported_languages>
|
|
||||||
- SUPPORTED_LANGUAGES must contain exactly:
|
|
||||||
* { code: 'en', name: 'English', nativeName: 'English' }
|
|
||||||
* { code: 'fr', name: 'French', nativeName: 'Français' }
|
|
||||||
- The Settings > Language dropdown must iterate only over SUPPORTED_LANGUAGES.
|
|
||||||
- Any explicit references to "es", "de", "ja" as selectable languages must be removed
|
|
||||||
or commented out as "future languages" (but not shown to users).
|
|
||||||
</supported_languages>
|
|
||||||
|
|
||||||
<hardcoded_strings_audit>
|
|
||||||
- Perform a systematic audit of "src/App.jsx" to identify every user-visible English string
|
|
||||||
that is still hardcoded. Typical areas include:
|
|
||||||
* ThemePreview sample messages (e.g. “Hello! Can you help me with something?”).
|
|
||||||
* About section in Settings > General (product name, description, “Built with …” text).
|
|
||||||
* Default model description and option labels.
|
|
||||||
* Project modals: “Cancel”, “Save Changes”, etc.
|
|
||||||
* Any toasts, confirmation messages, help texts, or labels still in English.
|
|
||||||
- For each identified string:
|
|
||||||
* Define a stable translation key (e.g. "themePreview.sampleUser1",
|
|
||||||
"settings.defaultModelDescription", "projectModal.cancel", "projectModal.saveChanges").
|
|
||||||
* Add this key to both en.json and fr.json.
|
|
||||||
</hardcoded_strings_audit>
|
|
||||||
|
|
||||||
<refactor_to_use_t>
|
|
||||||
- Replace each hardcoded string with a call to the translation function, for example:
|
|
||||||
BEFORE:
|
|
||||||
<p>Hello! Can you help me with something?</p>
|
|
||||||
AFTER:
|
|
||||||
<p>{t('themePreview.sampleUser1')}</p>
|
|
||||||
- Ensure that:
|
|
||||||
* The component (or function) imports useLanguage.
|
|
||||||
* const { t } = useLanguage() is declared in the correct scope.
|
|
||||||
- Apply this systematically across:
|
|
||||||
* Settings / General and Appearance sections.
|
|
||||||
* Theme preview component.
|
|
||||||
* Project-related modals.
|
|
||||||
* Any remaining banners, tooltips, or messages defined inside App.jsx.
|
|
||||||
</refactor_to_use_t>
|
|
||||||
|
|
||||||
<translation_files_update>
|
|
||||||
- Update translations/en.json:
|
|
||||||
* Add all new keys with natural English text.
|
|
||||||
- Update translations/fr.json:
|
|
||||||
* Add the same keys with accurate French translations.
|
|
||||||
- Goal:
|
|
||||||
* For every key used in code, both en.json and fr.json must contain a value.
|
|
||||||
</translation_files_update>
|
|
||||||
|
|
||||||
<fallback_behavior>
|
|
||||||
- Keep existing fallback behavior in LanguageContext:
|
|
||||||
* If a key is missing in the current language, fall back to English.
|
|
||||||
* If the key is also missing in English, return the key and log a warning.
|
|
||||||
- However, after this completion spec is implemented:
|
|
||||||
* No fallback warnings should appear in normal operation, because all keys are defined.
|
|
||||||
</fallback_behavior>
|
|
||||||
|
|
||||||
<settings_language_section>
|
|
||||||
- In the Settings > General tab:
|
|
||||||
* The language section heading must be translated via t('settings.language').
|
|
||||||
* Any helper text/description for the language selector must also use t('...').
|
|
||||||
* The select's value is bound to the language from useLanguage.
|
|
||||||
* The onChange handler calls setLanguage(newLanguageCode).
|
|
||||||
- Expected behavior:
|
|
||||||
* Switching to French instantly updates the UI and saves "fr" in localStorage.
|
|
||||||
* Switching to English instantly updates the UI and saves "en" in localStorage.
|
|
||||||
</settings_language_section>
|
|
||||||
</implementation_details>
|
|
||||||
|
|
||||||
<testing_plan>
|
|
||||||
<manual_tests>
|
|
||||||
1. Clear the language preference from localStorage.
|
|
||||||
2. Load the application:
|
|
||||||
- Confirm that the UI is initially in French (FR as default).
|
|
||||||
3. Open the Settings modal and navigate to the General tab.
|
|
||||||
- Verify that the language selector shows only "Français" and "English".
|
|
||||||
4. Switch to English:
|
|
||||||
- Verify that Sidebar, Settings, Welcome screen, Chat area, and modals are all in English.
|
|
||||||
5. Refresh the page:
|
|
||||||
- Confirm that the UI stays in English (preference persisted).
|
|
||||||
6. Switch back to French and repeat quick checks to confirm all UI text is in French.
|
|
||||||
</manual_tests>
|
|
||||||
|
|
||||||
<coverage_checks>
|
|
||||||
- Check in both languages:
|
|
||||||
* Main/empty state (welcome screen).
|
|
||||||
* Chat area (input placeholder, send/stop/regenerate buttons).
|
|
||||||
* Sidebar (navigation sections, search placeholder, pinned/archived labels).
|
|
||||||
* Settings (all tabs).
|
|
||||||
* Project creation and edit modals.
|
|
||||||
* Delete/confirmation dialogs and any share/export flows.
|
|
||||||
- Confirm:
|
|
||||||
* In French, there is no remaining English UI text.
|
|
||||||
* In English, there is no accidental French UI text.
|
|
||||||
</coverage_checks>
|
|
||||||
|
|
||||||
<regression>
|
|
||||||
- Verify:
|
|
||||||
* Chat behavior is unchanged except for translated labels/text.
|
|
||||||
* Project operations (create/update/delete) still work.
|
|
||||||
* No new console errors appear when switching languages or reloading.
|
|
||||||
</regression>
|
|
||||||
</testing_plan>
|
|
||||||
|
|
||||||
<success_criteria>
|
|
||||||
- "app_spec_language_selection.txt" remains the original base spec.
|
|
||||||
- This completion spec ("app_spec_language_selection.completion.txt") is fully implemented.
|
|
||||||
- French is used as default language when no preference exists.
|
|
||||||
- Only English and French are presented in the language selector.
|
|
||||||
- All user-facing strings in App.jsx go through t('key') and exist in both en.json and fr.json.
|
|
||||||
- No stray English text is visible when the French language is selected.
|
|
||||||
</success_criteria>
|
|
||||||
</project_specification>
|
|
||||||
@@ -1,525 +0,0 @@
|
|||||||
<project_specification>
|
|
||||||
<project_name>Claude.ai Clone - Language Selection Bug Fix</project_name>
|
|
||||||
|
|
||||||
<overview>
|
|
||||||
This specification fixes a bug in the language selection functionality. The feature was
|
|
||||||
originally planned in the initial app_spec.txt (line 127: "Language preferences") and a UI
|
|
||||||
component already exists in the settings panel (App.jsx lines 1412-1419), but the functionality
|
|
||||||
is incomplete and non-functional.
|
|
||||||
|
|
||||||
Currently, there is a language selector dropdown in the settings with options for English,
|
|
||||||
Español, Français, Deutsch, and 日本語, but it lacks:
|
|
||||||
- State management for the selected language
|
|
||||||
- Event handlers (onChange) to handle language changes
|
|
||||||
- A translation system (i18n)
|
|
||||||
- Translation files (en.json, fr.json, etc.)
|
|
||||||
- Language context/provider
|
|
||||||
- Persistence of language preference
|
|
||||||
|
|
||||||
This bug fix will complete the implementation by adding the missing functionality so that when
|
|
||||||
a language is selected, the entire interface updates immediately to display all text in the
|
|
||||||
chosen language. The language preference should persist across sessions.
|
|
||||||
|
|
||||||
Focus will be on English (default) and French as the primary languages, with the existing
|
|
||||||
UI supporting additional languages for future expansion.
|
|
||||||
</overview>
|
|
||||||
|
|
||||||
<current_state>
|
|
||||||
<existing_ui>
|
|
||||||
Location: src/App.jsx, lines 1412-1419
|
|
||||||
Component: Language selector dropdown in settings panel (General/Preferences section)
|
|
||||||
Current options: English (en), Español (es), Français (fr), Deutsch (de), 日本語 (ja)
|
|
||||||
Status: UI exists but is non-functional (no onChange handler, no state, no translations)
|
|
||||||
</existing_ui>
|
|
||||||
|
|
||||||
<specification_reference>
|
|
||||||
Original spec: prompts/app_spec.txt, line 127
|
|
||||||
Mentioned as: "Language preferences" in settings_preferences section
|
|
||||||
Status: Feature was planned but not fully implemented
|
|
||||||
</specification_reference>
|
|
||||||
</current_state>
|
|
||||||
|
|
||||||
<safety_requirements>
|
|
||||||
<critical>
|
|
||||||
- DO NOT remove or modify the existing language selector UI (lines 1412-1419 in App.jsx)
|
|
||||||
- DO NOT break existing functionality when language is changed
|
|
||||||
- English must remain the default language
|
|
||||||
- Language changes should apply immediately without page refresh
|
|
||||||
- All translations must be complete (no missing translations)
|
|
||||||
- Maintain backward compatibility with existing code
|
|
||||||
- Language preference should be stored and persist across sessions
|
|
||||||
- Keep the existing dropdown structure and styling
|
|
||||||
- Connect the existing select element to the new translation system
|
|
||||||
</critical>
|
|
||||||
</safety_requirements>
|
|
||||||
|
|
||||||
<bug_fixes>
|
|
||||||
<fix_language_selection>
|
|
||||||
<title>Fix Language Selection Functionality</title>
|
|
||||||
<description>
|
|
||||||
Complete the implementation of the existing language selector in the settings menu.
|
|
||||||
The UI already exists (App.jsx lines 1412-1419) but needs to be made functional.
|
|
||||||
|
|
||||||
The fix should:
|
|
||||||
- Connect the existing select element to state management
|
|
||||||
- Add onChange handler to the existing select element
|
|
||||||
- Display current selected language (load from localStorage on mount)
|
|
||||||
- Apply language changes immediately to the entire interface
|
|
||||||
- Save language preference to localStorage
|
|
||||||
- Persist language choice across sessions
|
|
||||||
|
|
||||||
The existing selector is already in the correct location (General/Preferences section
|
|
||||||
of settings panel) and has the correct styling, so only the functionality needs to be added.
|
|
||||||
</description>
|
|
||||||
<priority>1</priority>
|
|
||||||
<category>bug_fix</category>
|
|
||||||
<type>completion_of_existing_feature</type>
|
|
||||||
<implementation_approach>
|
|
||||||
- Keep the existing select element in App.jsx (lines 1412-1419)
|
|
||||||
- Add useState hook to manage selected language state
|
|
||||||
- Add value prop to select element (bound to state)
|
|
||||||
- Add onChange handler to select element
|
|
||||||
- Load language preference from localStorage on component mount
|
|
||||||
- Save language preference to localStorage on change
|
|
||||||
- Create translation files/dictionaries for each language
|
|
||||||
- Implement language context/provider to manage current language
|
|
||||||
- Create translation utility function to retrieve translated strings
|
|
||||||
- Update all hardcoded text to use translation function
|
|
||||||
- Apply language changes reactively throughout the application
|
|
||||||
</implementation_approach>
|
|
||||||
<test_steps>
|
|
||||||
1. Open settings menu
|
|
||||||
2. Navigate to "General" or "Preferences" section
|
|
||||||
3. Locate the existing "Language" selector (should already be visible)
|
|
||||||
4. Verify the select element now has a value bound to state (not empty)
|
|
||||||
5. Verify default language is "English" (en) on first load
|
|
||||||
6. Select "Français" (fr) from the existing language dropdown
|
|
||||||
7. Verify onChange handler fires and updates state
|
|
||||||
8. Verify entire interface updates immediately to French
|
|
||||||
9. Check that all UI elements are translated (buttons, labels, menus)
|
|
||||||
10. Navigate to different pages and verify translations persist
|
|
||||||
11. Refresh the page and verify language preference is maintained (loaded from localStorage)
|
|
||||||
12. Switch back to "English" and verify interface returns to English
|
|
||||||
13. Test with new conversations and verify messages/UI are in selected language
|
|
||||||
14. Verify the existing select element styling and structure remain unchanged
|
|
||||||
</test_steps>
|
|
||||||
</fix_language_selection>
|
|
||||||
|
|
||||||
<fix_translation_system>
|
|
||||||
<title>Translation System Infrastructure</title>
|
|
||||||
<description>
|
|
||||||
Implement a translation system that:
|
|
||||||
- Stores translations for English and French
|
|
||||||
- Provides a translation function/utility to retrieve translated strings
|
|
||||||
- Supports dynamic language switching
|
|
||||||
- Handles missing translations gracefully (fallback to English)
|
|
||||||
- Organizes translations by feature/component
|
|
||||||
|
|
||||||
Translation keys should be organized logically:
|
|
||||||
- Common UI elements (buttons, labels, placeholders)
|
|
||||||
- Settings panel
|
|
||||||
- Chat interface
|
|
||||||
- Navigation menus
|
|
||||||
- Error messages
|
|
||||||
- Success messages
|
|
||||||
- Tooltips and help text
|
|
||||||
</description>
|
|
||||||
<priority>1</priority>
|
|
||||||
<category>infrastructure</category>
|
|
||||||
<type>new_implementation</type>
|
|
||||||
<implementation_approach>
|
|
||||||
- Create translation files (JSON or JS objects):
|
|
||||||
* translations/en.json (English)
|
|
||||||
* translations/fr.json (French)
|
|
||||||
- Create translation context/provider (React Context)
|
|
||||||
- Create useTranslation hook for components
|
|
||||||
- Create translation utility function (t() or translate())
|
|
||||||
- Organize translations by namespace/feature
|
|
||||||
- Implement fallback mechanism for missing translations
|
|
||||||
- Ensure type safety for translation keys (TypeScript if applicable)
|
|
||||||
</implementation_approach>
|
|
||||||
<test_steps>
|
|
||||||
1. Verify translation files exist for both languages
|
|
||||||
2. Test translation function with valid keys
|
|
||||||
3. Test translation function with invalid keys (should fallback)
|
|
||||||
4. Verify all translation keys have values in both languages
|
|
||||||
5. Test language switching updates all components
|
|
||||||
6. Verify no console errors when switching languages
|
|
||||||
</test_steps>
|
|
||||||
</fix_translation_system>
|
|
||||||
|
|
||||||
<fix_ui_translations>
|
|
||||||
<title>Complete UI Translation Coverage</title>
|
|
||||||
<description>
|
|
||||||
Translate all user-facing text in the application to support both English and French:
|
|
||||||
|
|
||||||
Navigation & Menus:
|
|
||||||
- Sidebar navigation items
|
|
||||||
- Menu labels
|
|
||||||
- Breadcrumbs
|
|
||||||
|
|
||||||
Chat Interface:
|
|
||||||
- Input placeholder text
|
|
||||||
- Send button
|
|
||||||
- Message status indicators
|
|
||||||
- Empty state messages
|
|
||||||
- Loading states
|
|
||||||
|
|
||||||
Settings:
|
|
||||||
- All setting section titles
|
|
||||||
- Setting option labels
|
|
||||||
- Setting descriptions
|
|
||||||
- Save/Cancel buttons
|
|
||||||
|
|
||||||
Buttons & Actions:
|
|
||||||
- Primary action buttons
|
|
||||||
- Secondary buttons
|
|
||||||
- Delete/Remove actions
|
|
||||||
- Edit actions
|
|
||||||
- Save actions
|
|
||||||
|
|
||||||
Messages & Notifications:
|
|
||||||
- Success messages
|
|
||||||
- Error messages
|
|
||||||
- Warning messages
|
|
||||||
- Info messages
|
|
||||||
|
|
||||||
Forms:
|
|
||||||
- Form labels
|
|
||||||
- Input placeholders
|
|
||||||
- Validation messages
|
|
||||||
- Help text
|
|
||||||
|
|
||||||
Modals & Dialogs:
|
|
||||||
- Modal titles
|
|
||||||
- Modal content
|
|
||||||
- Confirmation dialogs
|
|
||||||
- Cancel/Confirm buttons
|
|
||||||
</description>
|
|
||||||
<priority>1</priority>
|
|
||||||
<category>ui</category>
|
|
||||||
<type>translation_implementation</type>
|
|
||||||
<implementation_approach>
|
|
||||||
- Audit all hardcoded text in the application
|
|
||||||
- Replace all hardcoded strings with translation function calls
|
|
||||||
- Create translation keys for each text element
|
|
||||||
- Add French translations for all keys
|
|
||||||
- Test each screen/page to ensure complete translation coverage
|
|
||||||
- Verify no English text remains when French is selected
|
|
||||||
</implementation_approach>
|
|
||||||
<test_steps>
|
|
||||||
1. Set language to French
|
|
||||||
2. Navigate through all pages/screens
|
|
||||||
3. Verify every text element is translated
|
|
||||||
4. Check all buttons, labels, placeholders
|
|
||||||
5. Test all modals and dialogs
|
|
||||||
6. Verify form validation messages
|
|
||||||
7. Check error and success notifications
|
|
||||||
8. Verify no English text appears when French is selected
|
|
||||||
9. Repeat test with English to ensure nothing broke
|
|
||||||
</test_steps>
|
|
||||||
</fix_ui_translations>
|
|
||||||
|
|
||||||
<fix_language_persistence>
|
|
||||||
<title>Language Preference Persistence</title>
|
|
||||||
<description>
|
|
||||||
Ensure that the selected language preference is saved and persists across:
|
|
||||||
- Page refreshes
|
|
||||||
- Browser sessions
|
|
||||||
- Tab closures
|
|
||||||
- Application restarts
|
|
||||||
|
|
||||||
The language preference should be:
|
|
||||||
- Stored in localStorage (client-side) or backend user preferences
|
|
||||||
- Loaded on application startup
|
|
||||||
- Applied immediately when the app loads
|
|
||||||
- Synchronized if user is logged in (multi-device support)
|
|
||||||
</description>
|
|
||||||
<priority>1</priority>
|
|
||||||
<category>persistence</category>
|
|
||||||
<type>bug_fix</type>
|
|
||||||
<implementation_approach>
|
|
||||||
- Save language selection to localStorage on change
|
|
||||||
- Load language preference on app initialization
|
|
||||||
- Apply saved language before rendering UI
|
|
||||||
- Optionally sync with backend user preferences if available
|
|
||||||
- Handle case where no preference is saved (default to English)
|
|
||||||
</implementation_approach>
|
|
||||||
<test_steps>
|
|
||||||
1. Select French language
|
|
||||||
2. Refresh the page
|
|
||||||
3. Verify interface is still in French
|
|
||||||
4. Close browser tab and reopen
|
|
||||||
5. Verify language preference persists
|
|
||||||
6. Clear localStorage and verify defaults to English
|
|
||||||
7. Select language again and verify it saves
|
|
||||||
</test_steps>
|
|
||||||
</fix_language_persistence>
|
|
||||||
</bug_fixes>
|
|
||||||
|
|
||||||
<implementation_notes>
|
|
||||||
<existing_code>
|
|
||||||
Location: src/App.jsx, lines 1412-1419
|
|
||||||
Current code:
|
|
||||||
```jsx
|
|
||||||
<div>
|
|
||||||
<h3 className="text-sm font-medium text-gray-900 dark:text-gray-100 mb-3">Language</h3>
|
|
||||||
<select className="w-full px-3 py-2 bg-white dark:bg-gray-700 border border-gray-300 dark:border-gray-600 rounded-lg text-sm text-gray-900 dark:text-gray-100 focus:ring-2 focus:ring-claude-orange focus:border-transparent">
|
|
||||||
<option value="en">English</option>
|
|
||||||
<option value="es">Español</option>
|
|
||||||
<option value="fr">Français</option>
|
|
||||||
<option value="de">Deutsch</option>
|
|
||||||
<option value="ja">日本語</option>
|
|
||||||
</select>
|
|
||||||
</div>
|
|
||||||
```
|
|
||||||
|
|
||||||
Required changes:
|
|
||||||
- Add value={language} to select element
|
|
||||||
- Add onChange={(e) => setLanguage(e.target.value)} to select element
|
|
||||||
- Add useState for language state
|
|
||||||
- Load from localStorage on mount
|
|
||||||
- Save to localStorage on change
|
|
||||||
</existing_code>
|
|
||||||
|
|
||||||
<code_structure>
|
|
||||||
frontend/
|
|
||||||
src/
|
|
||||||
App.jsx # UPDATE: Add language state and connect existing select
|
|
||||||
components/
|
|
||||||
LanguageSelector.jsx # Optional: Extract to component if needed (NEW)
|
|
||||||
contexts/
|
|
||||||
LanguageContext.jsx # Language context provider (NEW)
|
|
||||||
hooks/
|
|
||||||
useLanguage.js # Hook to access language and translations (NEW)
|
|
||||||
utils/
|
|
||||||
translations.js # Translation utility functions (NEW)
|
|
||||||
translations/
|
|
||||||
en.json # English translations (NEW)
|
|
||||||
fr.json # French translations (NEW)
|
|
||||||
</code_structure>
|
|
||||||
|
|
||||||
<translation_structure>
|
|
||||||
Translation files should be organized by feature/namespace:
|
|
||||||
{
|
|
||||||
"common": {
|
|
||||||
"save": "Save",
|
|
||||||
"cancel": "Cancel",
|
|
||||||
"delete": "Delete",
|
|
||||||
"edit": "Edit",
|
|
||||||
...
|
|
||||||
},
|
|
||||||
"settings": {
|
|
||||||
"title": "Settings",
|
|
||||||
"language": "Language",
|
|
||||||
"theme": "Theme",
|
|
||||||
...
|
|
||||||
},
|
|
||||||
"chat": {
|
|
||||||
"placeholder": "Message Claude...",
|
|
||||||
"send": "Send",
|
|
||||||
...
|
|
||||||
},
|
|
||||||
...
|
|
||||||
}
|
|
||||||
</translation_structure>
|
|
||||||
|
|
||||||
<storage_approach>
|
|
||||||
Store language preference in:
|
|
||||||
- localStorage key: "app_language" (value: "en" or "fr")
|
|
||||||
- Or backend user preferences if available:
|
|
||||||
{
|
|
||||||
language: "en" | "fr"
|
|
||||||
}
|
|
||||||
|
|
||||||
Default value: "en" (English)
|
|
||||||
</storage_approach>
|
|
||||||
|
|
||||||
<translation_function>
|
|
||||||
Example implementation:
|
|
||||||
- useTranslation() hook returns { t, language, setLanguage }
|
|
||||||
- t(key) function retrieves translation for current language
|
|
||||||
- t("common.save") returns "Save" (en) or "Enregistrer" (fr)
|
|
||||||
- Supports nested keys: t("settings.general.title")
|
|
||||||
- Falls back to English if translation missing
|
|
||||||
</translation_function>
|
|
||||||
|
|
||||||
<safety_guidelines>
|
|
||||||
- Keep all existing functionality intact
|
|
||||||
- Default to English if no language preference set
|
|
||||||
- Gracefully handle missing translations (fallback to English)
|
|
||||||
- Ensure language changes don't cause re-renders that break functionality
|
|
||||||
- Test thoroughly to ensure no English text remains when French is selected
|
|
||||||
- Maintain code readability with clear translation key naming
|
|
||||||
</safety_guidelines>
|
|
||||||
</implementation_notes>
|
|
||||||
|
|
||||||
<ui_components>
|
|
||||||
<language_selector>
|
|
||||||
<description>Language selector component in settings (ALREADY EXISTS - needs functionality)</description>
|
|
||||||
<location>Settings > General/Preferences section (App.jsx lines 1412-1419)</location>
|
|
||||||
<current_state>
|
|
||||||
- UI exists with dropdown/select element
|
|
||||||
- Has 5 language options: English (en), Español (es), Français (fr), Deutsch (de), 日本語 (ja)
|
|
||||||
- Styling is already correct
|
|
||||||
- Missing: value binding, onChange handler, state management
|
|
||||||
</current_state>
|
|
||||||
<required_changes>
|
|
||||||
- Add value prop bound to language state
|
|
||||||
- Add onChange handler to update language state
|
|
||||||
- Connect to translation system
|
|
||||||
- Add persistence (localStorage)
|
|
||||||
</required_changes>
|
|
||||||
<display>
|
|
||||||
- Keep existing dropdown/select element (no UI changes needed)
|
|
||||||
- Shows current selection (via value prop)
|
|
||||||
- Updates interface immediately on change (via onChange)
|
|
||||||
</display>
|
|
||||||
</language_selector>
|
|
||||||
</ui_components>
|
|
||||||
|
|
||||||
<translation_coverage>
|
|
||||||
<required_translations>
|
|
||||||
All text visible to users must be translated:
|
|
||||||
- Navigation menu items
|
|
||||||
- Page titles and headers
|
|
||||||
- Button labels
|
|
||||||
- Form labels and placeholders
|
|
||||||
- Input field labels
|
|
||||||
- Error messages
|
|
||||||
- Success messages
|
|
||||||
- Tooltips
|
|
||||||
- Help text
|
|
||||||
- Modal titles and content
|
|
||||||
- Dialog confirmations
|
|
||||||
- Empty states
|
|
||||||
- Loading states
|
|
||||||
- Settings labels and descriptions
|
|
||||||
- Chat interface elements
|
|
||||||
</required_translations>
|
|
||||||
|
|
||||||
<translation_examples>
|
|
||||||
English -> French:
|
|
||||||
- "Settings" -> "Paramètres"
|
|
||||||
- "Save" -> "Enregistrer"
|
|
||||||
- "Cancel" -> "Annuler"
|
|
||||||
- "Delete" -> "Supprimer"
|
|
||||||
- "Language" -> "Langue"
|
|
||||||
- "Theme" -> "Thème"
|
|
||||||
- "Send" -> "Envoyer"
|
|
||||||
- "New Conversation" -> "Nouvelle conversation"
|
|
||||||
- "Message Claude..." -> "Message à Claude..."
|
|
||||||
</translation_examples>
|
|
||||||
</translation_coverage>
|
|
||||||
|
|
||||||
<api_endpoints>
|
|
||||||
<if_backend_storage>
|
|
||||||
If storing language preference in backend:
|
|
||||||
- GET /api/user/preferences - Get user preferences (includes language)
|
|
||||||
- PUT /api/user/preferences - Update user preferences (includes language)
|
|
||||||
- GET /api/user/preferences/language - Get language preference only
|
|
||||||
- PUT /api/user/preferences/language - Update language preference only
|
|
||||||
</if_backend_storage>
|
|
||||||
|
|
||||||
<note>
|
|
||||||
If using localStorage only, no API endpoints needed.
|
|
||||||
Backend storage is optional but recommended for multi-device sync.
|
|
||||||
</note>
|
|
||||||
</api_endpoints>
|
|
||||||
|
|
||||||
<accessibility_requirements>
|
|
||||||
- Language selector must be keyboard navigable
|
|
||||||
- Language changes must be announced to screen readers
|
|
||||||
- Translation quality must be accurate (no machine translation errors)
|
|
||||||
- Text direction should be handled correctly (LTR for both languages)
|
|
||||||
- Font rendering should support both languages properly
|
|
||||||
</accessibility_requirements>
|
|
||||||
|
|
||||||
<testing_requirements>
|
|
||||||
<regression_tests>
|
|
||||||
- Verify existing functionality works in both languages
|
|
||||||
- Verify language change doesn't break any features
|
|
||||||
- Test that default language (English) still works as before
|
|
||||||
- Verify all existing features are accessible in both languages
|
|
||||||
</regression_tests>
|
|
||||||
|
|
||||||
<feature_tests>
|
|
||||||
- Test language selector in settings
|
|
||||||
- Test immediate language change on selection
|
|
||||||
- Test language persistence across page refresh
|
|
||||||
- Test language persistence across browser sessions
|
|
||||||
- Test all UI elements are translated
|
|
||||||
- Test translation fallback for missing keys
|
|
||||||
- Test switching between languages multiple times
|
|
||||||
- Verify no English text appears when French is selected
|
|
||||||
- Verify all pages/screens are translated
|
|
||||||
</feature_tests>
|
|
||||||
|
|
||||||
<translation_tests>
|
|
||||||
- Verify all translation keys have values in both languages
|
|
||||||
- Test translation accuracy (no machine translation errors)
|
|
||||||
- Verify consistent terminology across the application
|
|
||||||
- Test special characters and accents in French
|
|
||||||
- Verify text doesn't overflow UI elements in French (may be longer)
|
|
||||||
</translation_tests>
|
|
||||||
|
|
||||||
<compatibility_tests>
|
|
||||||
- Test with different browsers (Chrome, Firefox, Safari, Edge)
|
|
||||||
- Test with different screen sizes (responsive design)
|
|
||||||
- Test language switching during active conversations
|
|
||||||
- Test language switching with modals open
|
|
||||||
- Verify language preference syncs across tabs (if applicable)
|
|
||||||
</compatibility_tests>
|
|
||||||
</testing_requirements>
|
|
||||||
|
|
||||||
<summary>
|
|
||||||
<bug_description>
|
|
||||||
The language selection feature was planned in the original specification (app_spec.txt line 127)
|
|
||||||
and a UI component was created (App.jsx lines 1412-1419), but the implementation is incomplete.
|
|
||||||
The select dropdown exists but has no functionality - it lacks state management, event handlers,
|
|
||||||
and a translation system.
|
|
||||||
</bug_description>
|
|
||||||
|
|
||||||
<fix_scope>
|
|
||||||
This is a bug fix that completes the existing feature by:
|
|
||||||
1. Connecting the existing UI to state management
|
|
||||||
2. Adding the missing translation system
|
|
||||||
3. Implementing language persistence
|
|
||||||
4. Translating all UI text to support English and French
|
|
||||||
</fix_scope>
|
|
||||||
|
|
||||||
<key_principle>
|
|
||||||
DO NOT remove or significantly modify the existing language selector UI. Only add the
|
|
||||||
missing functionality to make it work.
|
|
||||||
</key_principle>
|
|
||||||
</summary>
|
|
||||||
|
|
||||||
<success_criteria>
|
|
||||||
<functionality>
|
|
||||||
- Users can select language from the existing settings dropdown (English or French)
|
|
||||||
- Language changes apply immediately to entire interface
|
|
||||||
- Language preference persists across sessions
|
|
||||||
- All UI elements are translated when language is changed
|
|
||||||
- English remains the default language
|
|
||||||
- No functionality is broken by language changes
|
|
||||||
- The existing select element in App.jsx (lines 1412-1419) is now functional
|
|
||||||
</functionality>
|
|
||||||
|
|
||||||
<user_experience>
|
|
||||||
- Language selector is easy to find in settings
|
|
||||||
- Language change is instant and smooth
|
|
||||||
- All text is properly translated (no English text in French mode)
|
|
||||||
- Translations are accurate and natural
|
|
||||||
- Interface layout works well with both languages
|
|
||||||
</user_experience>
|
|
||||||
|
|
||||||
<technical>
|
|
||||||
- Translation system is well-organized and maintainable
|
|
||||||
- Translation keys are logically structured
|
|
||||||
- Language preference is stored reliably
|
|
||||||
- No performance degradation with language switching
|
|
||||||
- Code is clean and follows existing patterns
|
|
||||||
- Easy to add more languages in the future
|
|
||||||
</technical>
|
|
||||||
</success_criteria>
|
|
||||||
</project_specification>
|
|
||||||
679
prompts/app_spec_library_rag_types_docs.txt
Normal file
679
prompts/app_spec_library_rag_types_docs.txt
Normal file
@@ -0,0 +1,679 @@
|
|||||||
|
<project_specification>
|
||||||
|
<project_name>Library RAG - Type Safety & Documentation Enhancement</project_name>
|
||||||
|
|
||||||
|
<overview>
|
||||||
|
Enhance the Library RAG application (philosophical texts indexing and semantic search) by adding
|
||||||
|
strict type annotations and comprehensive Google-style docstrings to all Python modules. This will
|
||||||
|
improve code maintainability, enable static type checking with mypy, and provide clear documentation
|
||||||
|
for all functions, classes, and modules.
|
||||||
|
|
||||||
|
The application is a RAG pipeline that processes PDF documents through OCR, LLM-based extraction,
|
||||||
|
semantic chunking, and ingestion into Weaviate vector database. It includes a Flask web interface
|
||||||
|
for document upload, processing, and semantic search.
|
||||||
|
</overview>
|
||||||
|
|
||||||
|
<technology_stack>
|
||||||
|
<backend>
|
||||||
|
<runtime>Python 3.10+</runtime>
|
||||||
|
<web_framework>Flask 3.0</web_framework>
|
||||||
|
<vector_database>Weaviate 1.34.4 with text2vec-transformers</vector_database>
|
||||||
|
<ocr>Mistral OCR API</ocr>
|
||||||
|
<llm>Ollama (local) or Mistral API</llm>
|
||||||
|
<type_checking>mypy with strict configuration</type_checking>
|
||||||
|
</backend>
|
||||||
|
<infrastructure>
|
||||||
|
<containerization>Docker Compose (Weaviate + transformers)</containerization>
|
||||||
|
<dependencies>weaviate-client, flask, mistralai, python-dotenv</dependencies>
|
||||||
|
</infrastructure>
|
||||||
|
</technology_stack>
|
||||||
|
|
||||||
|
<current_state>
|
||||||
|
<project_structure>
|
||||||
|
- flask_app.py: Main Flask application (640 lines)
|
||||||
|
- schema.py: Weaviate schema definition (383 lines)
|
||||||
|
- utils/: 16+ modules for PDF processing pipeline
|
||||||
|
- pdf_pipeline.py: Main orchestration (879 lines)
|
||||||
|
- mistral_client.py: OCR API client
|
||||||
|
- ocr_processor.py: OCR processing
|
||||||
|
- markdown_builder.py: Markdown generation
|
||||||
|
- llm_metadata.py: Metadata extraction via LLM
|
||||||
|
- llm_toc.py: Table of contents extraction
|
||||||
|
- llm_classifier.py: Section classification
|
||||||
|
- llm_chunker.py: Semantic chunking
|
||||||
|
- llm_cleaner.py: Chunk cleaning
|
||||||
|
- llm_validator.py: Document validation
|
||||||
|
- weaviate_ingest.py: Database ingestion
|
||||||
|
- hierarchy_parser.py: Document hierarchy parsing
|
||||||
|
- image_extractor.py: Image extraction from PDFs
|
||||||
|
- toc_extractor*.py: Various TOC extraction methods
|
||||||
|
- templates/: Jinja2 templates for Flask UI
|
||||||
|
- tests/utils2/: Minimal test coverage (3 test files)
|
||||||
|
</project_structure>
|
||||||
|
|
||||||
|
<issues>
|
||||||
|
- Inconsistent type annotations across modules (some have partial types, many have none)
|
||||||
|
- Missing or incomplete docstrings (no Google-style format)
|
||||||
|
- No mypy configuration for strict type checking
|
||||||
|
- Type hints missing on function parameters and return values
|
||||||
|
- Dict[str, Any] used extensively without proper typing
|
||||||
|
- No type stubs for complex nested structures
|
||||||
|
</issues>
|
||||||
|
</current_state>
|
||||||
|
|
||||||
|
<core_features>
|
||||||
|
<type_annotations>
|
||||||
|
<strict_typing>
|
||||||
|
- Add complete type annotations to ALL functions and methods
|
||||||
|
- Use proper generic types (List, Dict, Optional, Union) from typing module
|
||||||
|
- Add TypedDict for complex dictionary structures
|
||||||
|
- Add Protocol types for duck-typed interfaces
|
||||||
|
- Use Literal types for string constants
|
||||||
|
- Add ParamSpec and TypeVar where appropriate
|
||||||
|
- Type all class attributes and instance variables
|
||||||
|
- Add type annotations to lambda functions where possible
|
||||||
|
</strict_typing>
|
||||||
|
|
||||||
|
<mypy_configuration>
|
||||||
|
- Create mypy.ini with strict configuration
|
||||||
|
- Enable: check_untyped_defs, disallow_untyped_defs, disallow_incomplete_defs
|
||||||
|
- Enable: disallow_untyped_calls, disallow_untyped_decorators
|
||||||
|
- Enable: warn_return_any, warn_redundant_casts
|
||||||
|
- Enable: strict_equality, strict_optional
|
||||||
|
- Set python_version to 3.10
|
||||||
|
- Configure per-module overrides if needed for gradual migration
|
||||||
|
</mypy_configuration>
|
||||||
|
|
||||||
|
<type_stubs>
|
||||||
|
- Create TypedDict definitions for common data structures:
|
||||||
|
- OCR response structures
|
||||||
|
- Metadata dictionaries
|
||||||
|
- TOC entries
|
||||||
|
- Chunk objects
|
||||||
|
- Weaviate objects
|
||||||
|
- Pipeline results
|
||||||
|
- Add NewType for semantic type safety (DocumentName, ChunkId, etc.)
|
||||||
|
- Create Protocol types for callback functions
|
||||||
|
</type_stubs>
|
||||||
|
|
||||||
|
<specific_improvements>
|
||||||
|
- pdf_pipeline.py: Type all 10 pipeline steps, callbacks, result dictionaries
|
||||||
|
- flask_app.py: Type all route handlers, request/response types
|
||||||
|
- schema.py: Type Weaviate configuration objects
|
||||||
|
- llm_*.py: Type LLM request/response structures
|
||||||
|
- mistral_client.py: Type API client methods and responses
|
||||||
|
- weaviate_ingest.py: Type ingestion functions and batch operations
|
||||||
|
</specific_improvements>
|
||||||
|
</type_annotations>
|
||||||
|
|
||||||
|
<documentation>
|
||||||
|
<google_style_docstrings>
|
||||||
|
- Add comprehensive Google-style docstrings to ALL:
|
||||||
|
- Module-level docstrings explaining purpose and usage
|
||||||
|
- Class docstrings with Attributes section
|
||||||
|
- Function/method docstrings with Args, Returns, Raises sections
|
||||||
|
- Complex algorithm explanations with Examples section
|
||||||
|
- Include code examples for public APIs
|
||||||
|
- Document all exceptions that can be raised
|
||||||
|
- Add Notes section for important implementation details
|
||||||
|
- Add See Also section for related functions
|
||||||
|
</google_style_docstrings>
|
||||||
|
|
||||||
|
<module_documentation>
|
||||||
|
<utils_modules>
|
||||||
|
- pdf_pipeline.py: Document the 10-step pipeline, each step's purpose
|
||||||
|
- mistral_client.py: Document OCR API usage, cost calculation
|
||||||
|
- llm_metadata.py: Document metadata extraction logic
|
||||||
|
- llm_toc.py: Document TOC extraction strategies
|
||||||
|
- llm_classifier.py: Document section classification types
|
||||||
|
- llm_chunker.py: Document semantic vs basic chunking
|
||||||
|
- llm_cleaner.py: Document cleaning rules and validation
|
||||||
|
- llm_validator.py: Document validation criteria
|
||||||
|
- weaviate_ingest.py: Document ingestion process, nested objects
|
||||||
|
- hierarchy_parser.py: Document hierarchy building algorithm
|
||||||
|
</utils_modules>
|
||||||
|
|
||||||
|
<flask_app>
|
||||||
|
- Document all routes with request/response examples
|
||||||
|
- Document SSE (Server-Sent Events) implementation
|
||||||
|
- Document Weaviate query patterns
|
||||||
|
- Document upload processing workflow
|
||||||
|
- Document background job management
|
||||||
|
</flask_app>
|
||||||
|
|
||||||
|
<schema>
|
||||||
|
- Document Weaviate schema design decisions
|
||||||
|
- Document each collection's purpose and relationships
|
||||||
|
- Document nested object structure
|
||||||
|
- Document vectorization strategy
|
||||||
|
</schema>
|
||||||
|
</module_documentation>
|
||||||
|
|
||||||
|
<inline_comments>
|
||||||
|
- Add inline comments for complex logic only (don't over-comment)
|
||||||
|
- Explain WHY not WHAT (code should be self-documenting)
|
||||||
|
- Document performance considerations
|
||||||
|
- Document cost implications (OCR, LLM API calls)
|
||||||
|
- Document error handling strategies
|
||||||
|
</inline_comments>
|
||||||
|
</documentation>
|
||||||
|
|
||||||
|
<validation>
|
||||||
|
<type_checking>
|
||||||
|
- All modules must pass mypy --strict
|
||||||
|
- No # type: ignore comments without justification
|
||||||
|
- CI/CD should run mypy checks
|
||||||
|
- Type coverage should be 100%
|
||||||
|
</type_checking>
|
||||||
|
|
||||||
|
<documentation_quality>
|
||||||
|
- All public functions must have docstrings
|
||||||
|
- All docstrings must follow Google style
|
||||||
|
- Examples should be executable and tested
|
||||||
|
- Documentation should be clear and concise
|
||||||
|
</documentation_quality>
|
||||||
|
</validation>
|
||||||
|
</core_features>
|
||||||
|
|
||||||
|
<implementation_priority>
|
||||||
|
<critical_modules>
|
||||||
|
Priority 1 (Most used, most complex):
|
||||||
|
1. utils/pdf_pipeline.py - Main orchestration
|
||||||
|
2. flask_app.py - Web application entry point
|
||||||
|
3. utils/weaviate_ingest.py - Database operations
|
||||||
|
4. schema.py - Schema definition
|
||||||
|
|
||||||
|
Priority 2 (Core LLM modules):
|
||||||
|
5. utils/llm_metadata.py
|
||||||
|
6. utils/llm_toc.py
|
||||||
|
7. utils/llm_classifier.py
|
||||||
|
8. utils/llm_chunker.py
|
||||||
|
9. utils/llm_cleaner.py
|
||||||
|
10. utils/llm_validator.py
|
||||||
|
|
||||||
|
Priority 3 (OCR and parsing):
|
||||||
|
11. utils/mistral_client.py
|
||||||
|
12. utils/ocr_processor.py
|
||||||
|
13. utils/markdown_builder.py
|
||||||
|
14. utils/hierarchy_parser.py
|
||||||
|
15. utils/image_extractor.py
|
||||||
|
|
||||||
|
Priority 4 (Supporting modules):
|
||||||
|
16. utils/toc_extractor.py
|
||||||
|
17. utils/toc_extractor_markdown.py
|
||||||
|
18. utils/toc_extractor_visual.py
|
||||||
|
19. utils/llm_structurer.py (legacy)
|
||||||
|
</critical_modules>
|
||||||
|
</implementation_priority>
|
||||||
|
|
||||||
|
<implementation_steps>
|
||||||
|
<feature_1>
|
||||||
|
<title>Setup Type Checking Infrastructure</title>
|
||||||
|
<description>
|
||||||
|
Configure mypy with strict settings and create foundational type definitions
|
||||||
|
</description>
|
||||||
|
<tasks>
|
||||||
|
- Create mypy.ini configuration file with strict settings
|
||||||
|
- Add mypy to requirements.txt or dev dependencies
|
||||||
|
- Create utils/types.py module for common TypedDict definitions
|
||||||
|
- Define core types: OCRResponse, Metadata, TOCEntry, ChunkData, PipelineResult
|
||||||
|
- Add NewType definitions for semantic types: DocumentName, ChunkId, SectionPath
|
||||||
|
- Create Protocol types for callbacks (ProgressCallback, etc.)
|
||||||
|
- Document type definitions in utils/types.py module docstring
|
||||||
|
- Test mypy configuration on a single module to verify settings
|
||||||
|
</tasks>
|
||||||
|
<acceptance_criteria>
|
||||||
|
- mypy.ini exists with strict configuration
|
||||||
|
- utils/types.py contains all foundational types with docstrings
|
||||||
|
- mypy runs without errors on utils/types.py
|
||||||
|
- Type definitions are comprehensive and reusable
|
||||||
|
</acceptance_criteria>
|
||||||
|
</feature_1>
|
||||||
|
|
||||||
|
<feature_2>
|
||||||
|
<title>Add Types to PDF Pipeline Orchestration</title>
|
||||||
|
<description>
|
||||||
|
Add complete type annotations to pdf_pipeline.py (879 lines, most complex module)
|
||||||
|
</description>
|
||||||
|
<tasks>
|
||||||
|
- Add type annotations to all function signatures in pdf_pipeline.py
|
||||||
|
- Type the 10-step pipeline: OCR, Markdown, Metadata, TOC, Classify, Chunk, Clean, Validate, Weaviate
|
||||||
|
- Type progress_callback parameter with Protocol or Callable
|
||||||
|
- Add TypedDict for pipeline options dictionary
|
||||||
|
- Add TypedDict for pipeline result dictionary structure
|
||||||
|
- Type all helper functions (extract_document_metadata_legacy, etc.)
|
||||||
|
- Add proper return types for process_pdf_v2, process_pdf, process_pdf_bytes
|
||||||
|
- Fix any mypy errors that arise
|
||||||
|
- Verify mypy --strict passes on pdf_pipeline.py
|
||||||
|
</tasks>
|
||||||
|
<acceptance_criteria>
|
||||||
|
- All functions in pdf_pipeline.py have complete type annotations
|
||||||
|
- progress_callback is properly typed with Protocol
|
||||||
|
- All Dict[str, Any] replaced with TypedDict where appropriate
|
||||||
|
- mypy --strict pdf_pipeline.py passes with zero errors
|
||||||
|
- No # type: ignore comments (or justified if absolutely necessary)
|
||||||
|
</acceptance_criteria>
|
||||||
|
</feature_2>
|
||||||
|
|
||||||
|
<feature_3>
|
||||||
|
<title>Add Types to Flask Application</title>
|
||||||
|
<description>
|
||||||
|
Add complete type annotations to flask_app.py and type all routes
|
||||||
|
</description>
|
||||||
|
<tasks>
|
||||||
|
- Add type annotations to all Flask route handlers
|
||||||
|
- Type request.args, request.form, request.files usage
|
||||||
|
- Type jsonify() return values
|
||||||
|
- Type get_weaviate_client context manager
|
||||||
|
- Type get_collection_stats, get_all_chunks, search_chunks functions
|
||||||
|
- Add TypedDict for Weaviate query results
|
||||||
|
- Type background job processing functions (run_processing_job)
|
||||||
|
- Type SSE generator function (upload_progress)
|
||||||
|
- Add type hints for template rendering
|
||||||
|
- Verify mypy --strict passes on flask_app.py
|
||||||
|
</tasks>
|
||||||
|
<acceptance_criteria>
|
||||||
|
- All Flask routes have complete type annotations
|
||||||
|
- Request/response types are clear and documented
|
||||||
|
- Weaviate query functions are properly typed
|
||||||
|
- SSE generator is correctly typed
|
||||||
|
- mypy --strict flask_app.py passes with zero errors
|
||||||
|
</acceptance_criteria>
|
||||||
|
</feature_3>
|
||||||
|
|
||||||
|
<feature_4>
|
||||||
|
<title>Add Types to Core LLM Modules</title>
|
||||||
|
<description>
|
||||||
|
Add complete type annotations to all LLM processing modules (metadata, TOC, classifier, chunker, cleaner, validator)
|
||||||
|
</description>
|
||||||
|
<tasks>
|
||||||
|
- llm_metadata.py: Type extract_metadata function, return structure
|
||||||
|
- llm_toc.py: Type extract_toc function, TOC hierarchy structure
|
||||||
|
- llm_classifier.py: Type classify_sections, section types (Literal), validation functions
|
||||||
|
- llm_chunker.py: Type chunk_section_with_llm, chunk objects
|
||||||
|
- llm_cleaner.py: Type clean_chunk, is_chunk_valid functions
|
||||||
|
- llm_validator.py: Type validate_document, validation result structure
|
||||||
|
- Add TypedDict for LLM request/response structures
|
||||||
|
- Type provider selection ("ollama" | "mistral" as Literal)
|
||||||
|
- Type model names with Literal or constants
|
||||||
|
- Verify mypy --strict passes on all llm_*.py modules
|
||||||
|
</tasks>
|
||||||
|
<acceptance_criteria>
|
||||||
|
- All LLM modules have complete type annotations
|
||||||
|
- Section types use Literal for type safety
|
||||||
|
- Provider and model parameters are strongly typed
|
||||||
|
- LLM request/response structures use TypedDict
|
||||||
|
- mypy --strict passes on all llm_*.py modules with zero errors
|
||||||
|
</acceptance_criteria>
|
||||||
|
</feature_4>
|
||||||
|
|
||||||
|
<feature_5>
|
||||||
|
<title>Add Types to Weaviate and Database Modules</title>
|
||||||
|
<description>
|
||||||
|
Add complete type annotations to schema.py and weaviate_ingest.py
|
||||||
|
</description>
|
||||||
|
<tasks>
|
||||||
|
- schema.py: Type Weaviate configuration objects
|
||||||
|
- schema.py: Type collection property definitions
|
||||||
|
- weaviate_ingest.py: Type ingest_document function signature
|
||||||
|
- weaviate_ingest.py: Type delete_document_chunks function
|
||||||
|
- weaviate_ingest.py: Add TypedDict for Weaviate object structure
|
||||||
|
- Type batch insertion operations
|
||||||
|
- Type nested object references (work, document)
|
||||||
|
- Add proper error types for Weaviate exceptions
|
||||||
|
- Verify mypy --strict passes on both modules
|
||||||
|
</tasks>
|
||||||
|
<acceptance_criteria>
|
||||||
|
- schema.py has complete type annotations for Weaviate config
|
||||||
|
- weaviate_ingest.py functions are fully typed
|
||||||
|
- Nested object structures use TypedDict
|
||||||
|
- Weaviate client operations are properly typed
|
||||||
|
- mypy --strict passes on both modules with zero errors
|
||||||
|
</acceptance_criteria>
|
||||||
|
</feature_5>
|
||||||
|
|
||||||
|
<feature_6>
|
||||||
|
<title>Add Types to OCR and Parsing Modules</title>
|
||||||
|
<description>
|
||||||
|
Add complete type annotations to mistral_client.py, ocr_processor.py, markdown_builder.py, hierarchy_parser.py
|
||||||
|
</description>
|
||||||
|
<tasks>
|
||||||
|
- mistral_client.py: Type create_client, run_ocr, estimate_ocr_cost
|
||||||
|
- mistral_client.py: Add TypedDict for Mistral API response structures
|
||||||
|
- ocr_processor.py: Type serialize_ocr_response, OCR object structures
|
||||||
|
- markdown_builder.py: Type build_markdown, image_writer parameter
|
||||||
|
- hierarchy_parser.py: Type build_hierarchy, flatten_hierarchy functions
|
||||||
|
- hierarchy_parser.py: Add TypedDict for hierarchy node structure
|
||||||
|
- image_extractor.py: Type create_image_writer, image handling
|
||||||
|
- Verify mypy --strict passes on all modules
|
||||||
|
</tasks>
|
||||||
|
<acceptance_criteria>
|
||||||
|
- All OCR/parsing modules have complete type annotations
|
||||||
|
- Mistral API structures use TypedDict
|
||||||
|
- Hierarchy nodes are properly typed
|
||||||
|
- Image handling functions are typed
|
||||||
|
- mypy --strict passes on all modules with zero errors
|
||||||
|
</acceptance_criteria>
|
||||||
|
</feature_6>
|
||||||
|
|
||||||
|
<feature_7>
|
||||||
|
<title>Add Google-Style Docstrings to Core Modules</title>
|
||||||
|
<description>
|
||||||
|
Add comprehensive Google-style docstrings to pdf_pipeline.py, flask_app.py, and weaviate modules
|
||||||
|
</description>
|
||||||
|
<tasks>
|
||||||
|
- pdf_pipeline.py: Add module docstring explaining the V2 pipeline
|
||||||
|
- pdf_pipeline.py: Add docstrings to process_pdf_v2 with Args, Returns, Raises sections
|
||||||
|
- pdf_pipeline.py: Document each of the 10 pipeline steps in comments
|
||||||
|
- pdf_pipeline.py: Add Examples section showing typical usage
|
||||||
|
- flask_app.py: Add module docstring explaining Flask application
|
||||||
|
- flask_app.py: Document all routes with request/response examples
|
||||||
|
- flask_app.py: Document Weaviate connection management
|
||||||
|
- schema.py: Add module docstring explaining schema design
|
||||||
|
- schema.py: Document each collection's purpose and relationships
|
||||||
|
- weaviate_ingest.py: Document ingestion process with examples
|
||||||
|
- All docstrings must follow Google style format exactly
|
||||||
|
</tasks>
|
||||||
|
<acceptance_criteria>
|
||||||
|
- All core modules have comprehensive module-level docstrings
|
||||||
|
- All public functions have Google-style docstrings
|
||||||
|
- Args, Returns, Raises sections are complete and accurate
|
||||||
|
- Examples are provided for complex functions
|
||||||
|
- Docstrings explain WHY, not just WHAT
|
||||||
|
</acceptance_criteria>
|
||||||
|
</feature_7>
|
||||||
|
|
||||||
|
<feature_8>
|
||||||
|
<title>Add Google-Style Docstrings to LLM Modules</title>
|
||||||
|
<description>
|
||||||
|
Add comprehensive Google-style docstrings to all LLM processing modules
|
||||||
|
</description>
|
||||||
|
<tasks>
|
||||||
|
- llm_metadata.py: Document metadata extraction logic with examples
|
||||||
|
- llm_toc.py: Document TOC extraction strategies and fallbacks
|
||||||
|
- llm_classifier.py: Document section types and classification criteria
|
||||||
|
- llm_chunker.py: Document semantic vs basic chunking approaches
|
||||||
|
- llm_cleaner.py: Document cleaning rules and validation logic
|
||||||
|
- llm_validator.py: Document validation criteria and corrections
|
||||||
|
- Add Examples sections showing input/output for each function
|
||||||
|
- Document LLM provider differences (Ollama vs Mistral)
|
||||||
|
- Document cost implications in Notes sections
|
||||||
|
- All docstrings must follow Google style format exactly
|
||||||
|
</tasks>
|
||||||
|
<acceptance_criteria>
|
||||||
|
- All LLM modules have comprehensive docstrings
|
||||||
|
- Each function has Args, Returns, Raises sections
|
||||||
|
- Examples show realistic input/output
|
||||||
|
- Provider differences are documented
|
||||||
|
- Cost implications are noted where relevant
|
||||||
|
</acceptance_criteria>
|
||||||
|
</feature_8>
|
||||||
|
|
||||||
|
<feature_9>
|
||||||
|
<title>Add Google-Style Docstrings to OCR and Parsing Modules</title>
|
||||||
|
<description>
|
||||||
|
Add comprehensive Google-style docstrings to OCR, markdown, hierarchy, and extraction modules
|
||||||
|
</description>
|
||||||
|
<tasks>
|
||||||
|
- mistral_client.py: Document OCR API usage, cost calculation
|
||||||
|
- ocr_processor.py: Document OCR response processing
|
||||||
|
- markdown_builder.py: Document markdown generation strategy
|
||||||
|
- hierarchy_parser.py: Document hierarchy building algorithm
|
||||||
|
- image_extractor.py: Document image extraction process
|
||||||
|
- toc_extractor*.py: Document various TOC extraction methods
|
||||||
|
- Add Examples sections for complex algorithms
|
||||||
|
- Document edge cases and error handling
|
||||||
|
- All docstrings must follow Google style format exactly
|
||||||
|
</tasks>
|
||||||
|
<acceptance_criteria>
|
||||||
|
- All OCR/parsing modules have comprehensive docstrings
|
||||||
|
- Complex algorithms are well explained
|
||||||
|
- Edge cases are documented
|
||||||
|
- Error handling is documented
|
||||||
|
- Examples demonstrate typical usage
|
||||||
|
</acceptance_criteria>
|
||||||
|
</feature_9>
|
||||||
|
|
||||||
|
<feature_10>
|
||||||
|
<title>Final Validation and CI Integration</title>
|
||||||
|
<description>
|
||||||
|
Verify all type annotations and docstrings, integrate mypy into CI/CD
|
||||||
|
</description>
|
||||||
|
<tasks>
|
||||||
|
- Run mypy --strict on entire codebase, verify 100% pass rate
|
||||||
|
- Verify all public functions have docstrings
|
||||||
|
- Check docstring formatting with pydocstyle or similar tool
|
||||||
|
- Create GitHub Actions workflow to run mypy on every commit
|
||||||
|
- Update README.md with type checking instructions
|
||||||
|
- Update CLAUDE.md with documentation standards
|
||||||
|
- Create CONTRIBUTING.md with type annotation and docstring guidelines
|
||||||
|
- Generate API documentation with Sphinx or pdoc
|
||||||
|
- Fix any remaining mypy errors or missing docstrings
|
||||||
|
</tasks>
|
||||||
|
<acceptance_criteria>
|
||||||
|
- mypy --strict passes on entire codebase with zero errors
|
||||||
|
- All public functions have Google-style docstrings
|
||||||
|
- CI/CD runs mypy checks automatically
|
||||||
|
- Documentation is generated and accessible
|
||||||
|
- Contributing guidelines document type/docstring requirements
|
||||||
|
</acceptance_criteria>
|
||||||
|
</feature_10>
|
||||||
|
</implementation_steps>
|
||||||
|
|
||||||
|
<success_criteria>
|
||||||
|
<type_safety>
|
||||||
|
- 100% type coverage across all modules
|
||||||
|
- mypy --strict passes with zero errors
|
||||||
|
- No # type: ignore comments without justification
|
||||||
|
- All Dict[str, Any] replaced with TypedDict where appropriate
|
||||||
|
- Proper use of generics, protocols, and type variables
|
||||||
|
- NewType used for semantic type safety
|
||||||
|
</type_safety>
|
||||||
|
|
||||||
|
<documentation_quality>
|
||||||
|
- All modules have comprehensive module-level docstrings
|
||||||
|
- All public functions/classes have Google-style docstrings
|
||||||
|
- All docstrings include Args, Returns, Raises sections
|
||||||
|
- Complex functions include Examples sections
|
||||||
|
- Cost implications documented in Notes sections
|
||||||
|
- Error handling clearly documented
|
||||||
|
- Provider differences (Ollama vs Mistral) documented
|
||||||
|
</documentation_quality>
|
||||||
|
|
||||||
|
<code_quality>
|
||||||
|
- Code is self-documenting with clear variable names
|
||||||
|
- Inline comments explain WHY, not WHAT
|
||||||
|
- Complex algorithms are well explained
|
||||||
|
- Performance considerations documented
|
||||||
|
- Security considerations documented
|
||||||
|
</code_quality>
|
||||||
|
|
||||||
|
<developer_experience>
|
||||||
|
- IDE autocomplete works perfectly with type hints
|
||||||
|
- Type errors caught at development time, not runtime
|
||||||
|
- Documentation is easily accessible in IDE
|
||||||
|
- API examples are executable and tested
|
||||||
|
- Contributing guidelines are clear and comprehensive
|
||||||
|
</developer_experience>
|
||||||
|
|
||||||
|
<maintainability>
|
||||||
|
- Refactoring is safer with type checking
|
||||||
|
- Function signatures are self-documenting
|
||||||
|
- API contracts are explicit and enforced
|
||||||
|
- Breaking changes are caught by type checker
|
||||||
|
- New developers can understand code quickly
|
||||||
|
</maintainability>
|
||||||
|
</success_criteria>
|
||||||
|
|
||||||
|
<constraints>
|
||||||
|
<compatibility>
|
||||||
|
- Must maintain backward compatibility with existing code
|
||||||
|
- Cannot break existing Flask routes or API contracts
|
||||||
|
- Weaviate schema must remain unchanged
|
||||||
|
- Existing tests must continue to pass
|
||||||
|
</compatibility>
|
||||||
|
|
||||||
|
<gradual_migration>
|
||||||
|
- Can use per-module mypy configuration for gradual migration
|
||||||
|
- Can temporarily disable strict checks on legacy modules
|
||||||
|
- Priority modules must be completed first
|
||||||
|
- Low-priority modules can be deferred
|
||||||
|
</gradual_migration>
|
||||||
|
|
||||||
|
<standards>
|
||||||
|
- All type annotations must use Python 3.10+ syntax
|
||||||
|
- Docstrings must follow Google style exactly (not NumPy or reStructuredText)
|
||||||
|
- Use typing module (List, Dict, Optional) until Python 3.9 support dropped
|
||||||
|
- Use from __future__ import annotations if needed for forward references
|
||||||
|
</standards>
|
||||||
|
</constraints>
|
||||||
|
|
||||||
|
<testing_strategy>
|
||||||
|
<type_checking>
|
||||||
|
- Run mypy --strict on each module after adding types
|
||||||
|
- Use mypy daemon (dmypy) for faster incremental checking
|
||||||
|
- Add mypy to pre-commit hooks
|
||||||
|
- CI/CD must run mypy and fail on type errors
|
||||||
|
</type_checking>
|
||||||
|
|
||||||
|
<documentation_validation>
|
||||||
|
- Use pydocstyle to validate Google-style format
|
||||||
|
- Use sphinx-build to generate docs and catch errors
|
||||||
|
- Manual review of docstring examples
|
||||||
|
- Verify examples are executable and correct
|
||||||
|
</documentation_validation>
|
||||||
|
|
||||||
|
<integration_testing>
|
||||||
|
- Verify existing tests still pass after type additions
|
||||||
|
- Add new tests for complex typed structures
|
||||||
|
- Test mypy configuration on sample code
|
||||||
|
- Verify IDE autocomplete works correctly
|
||||||
|
</integration_testing>
|
||||||
|
</testing_strategy>
|
||||||
|
|
||||||
|
<documentation_examples>
|
||||||
|
<module_docstring>
|
||||||
|
```python
|
||||||
|
"""
|
||||||
|
PDF Pipeline V2 - Intelligent document processing with LLM enhancement.
|
||||||
|
|
||||||
|
This module orchestrates a 10-step pipeline for processing PDF documents:
|
||||||
|
1. OCR via Mistral API
|
||||||
|
2. Markdown construction with images
|
||||||
|
3. Metadata extraction via LLM
|
||||||
|
4. Table of contents (TOC) extraction
|
||||||
|
5. Section classification
|
||||||
|
6. Semantic chunking
|
||||||
|
7. Chunk cleaning and validation
|
||||||
|
8. Enrichment with concepts
|
||||||
|
9. Validation and corrections
|
||||||
|
10. Ingestion into Weaviate vector database
|
||||||
|
|
||||||
|
The pipeline supports multiple LLM providers (Ollama local, Mistral API) and
|
||||||
|
various processing modes (skip OCR, semantic chunking, OCR annotations).
|
||||||
|
|
||||||
|
Typical usage:
|
||||||
|
>>> from pathlib import Path
|
||||||
|
>>> from utils.pdf_pipeline import process_pdf
|
||||||
|
>>>
|
||||||
|
>>> result = process_pdf(
|
||||||
|
... Path("document.pdf"),
|
||||||
|
... use_llm=True,
|
||||||
|
... llm_provider="ollama",
|
||||||
|
... ingest_to_weaviate=True,
|
||||||
|
... )
|
||||||
|
>>> print(f"Processed {result['pages']} pages, {result['chunks_count']} chunks")
|
||||||
|
|
||||||
|
See Also:
|
||||||
|
mistral_client: OCR API client
|
||||||
|
llm_metadata: Metadata extraction
|
||||||
|
weaviate_ingest: Database ingestion
|
||||||
|
"""
|
||||||
|
```
|
||||||
|
</module_docstring>
|
||||||
|
|
||||||
|
<function_docstring>
|
||||||
|
```python
|
||||||
|
def process_pdf_v2(
|
||||||
|
pdf_path: Path,
|
||||||
|
output_dir: Path = Path("output"),
|
||||||
|
*,
|
||||||
|
use_llm: bool = True,
|
||||||
|
llm_provider: Literal["ollama", "mistral"] = "ollama",
|
||||||
|
llm_model: Optional[str] = None,
|
||||||
|
skip_ocr: bool = False,
|
||||||
|
ingest_to_weaviate: bool = True,
|
||||||
|
progress_callback: Optional[ProgressCallback] = None,
|
||||||
|
) -> PipelineResult:
|
||||||
|
"""
|
||||||
|
Process a PDF through the complete V2 pipeline with LLM enhancement.
|
||||||
|
|
||||||
|
This function orchestrates all 10 steps of the intelligent document processing
|
||||||
|
pipeline, from OCR to Weaviate ingestion. It supports both local (Ollama) and
|
||||||
|
cloud (Mistral API) LLM providers, with optional caching via skip_ocr.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
pdf_path: Absolute path to the PDF file to process.
|
||||||
|
output_dir: Base directory for output files. Defaults to "./output".
|
||||||
|
use_llm: Enable LLM-based processing (metadata, TOC, chunking).
|
||||||
|
If False, uses basic heuristic processing.
|
||||||
|
llm_provider: LLM provider to use. "ollama" for local (free but slow),
|
||||||
|
"mistral" for API (fast but paid).
|
||||||
|
llm_model: Specific model name. If None, auto-detects based on provider
|
||||||
|
(qwen2.5:7b for ollama, mistral-small-latest for mistral).
|
||||||
|
skip_ocr: If True, reuses existing markdown file to avoid OCR cost.
|
||||||
|
Requires output_dir/<doc_name>/<doc_name>.md to exist.
|
||||||
|
ingest_to_weaviate: If True, ingests chunks into Weaviate after processing.
|
||||||
|
progress_callback: Optional callback for real-time progress updates.
|
||||||
|
Called with (step_id, status, detail) for each pipeline step.
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Dictionary containing processing results with the following keys:
|
||||||
|
- success (bool): True if processing completed without errors
|
||||||
|
- document_name (str): Name of the processed document
|
||||||
|
- pages (int): Number of pages in the PDF
|
||||||
|
- chunks_count (int): Number of chunks generated
|
||||||
|
- cost_ocr (float): OCR cost in euros (0 if skip_ocr=True)
|
||||||
|
- cost_llm (float): LLM API cost in euros (0 if provider=ollama)
|
||||||
|
- cost_total (float): Total cost (ocr + llm)
|
||||||
|
- metadata (dict): Extracted metadata (title, author, etc.)
|
||||||
|
- toc (list): Hierarchical table of contents
|
||||||
|
- files (dict): Paths to generated files (markdown, chunks, etc.)
|
||||||
|
|
||||||
|
Raises:
|
||||||
|
FileNotFoundError: If pdf_path does not exist.
|
||||||
|
ValueError: If skip_ocr=True but markdown file not found.
|
||||||
|
RuntimeError: If Weaviate connection fails during ingestion.
|
||||||
|
|
||||||
|
Examples:
|
||||||
|
Basic usage with Ollama (free):
|
||||||
|
>>> result = process_pdf_v2(
|
||||||
|
... Path("platon_menon.pdf"),
|
||||||
|
... llm_provider="ollama"
|
||||||
|
... )
|
||||||
|
>>> print(f"Cost: {result['cost_total']:.4f}€")
|
||||||
|
Cost: 0.0270€ # OCR only
|
||||||
|
|
||||||
|
With Mistral API (faster):
|
||||||
|
>>> result = process_pdf_v2(
|
||||||
|
... Path("platon_menon.pdf"),
|
||||||
|
... llm_provider="mistral",
|
||||||
|
... llm_model="mistral-small-latest"
|
||||||
|
... )
|
||||||
|
|
||||||
|
Skip OCR to avoid cost:
|
||||||
|
>>> result = process_pdf_v2(
|
||||||
|
... Path("platon_menon.pdf"),
|
||||||
|
... skip_ocr=True, # Reuses existing markdown
|
||||||
|
... ingest_to_weaviate=False
|
||||||
|
... )
|
||||||
|
|
||||||
|
Notes:
|
||||||
|
- OCR cost: ~0.003€/page (standard), ~0.009€/page (with annotations)
|
||||||
|
- LLM cost: Free with Ollama, variable with Mistral API
|
||||||
|
- Processing time: ~30s/page with Ollama, ~5s/page with Mistral
|
||||||
|
- Weaviate must be running (docker-compose up -d) before ingestion
|
||||||
|
"""
|
||||||
|
```
|
||||||
|
</function_docstring>
|
||||||
|
</documentation_examples>
|
||||||
|
</project_specification>
|
||||||
@@ -1,448 +0,0 @@
|
|||||||
<project_specification>
|
|
||||||
<project_name>Claude.ai Clone - Multi-Provider Support (Mistral + Extensible)</project_name>
|
|
||||||
|
|
||||||
<overview>
|
|
||||||
This specification adds Mistral AI model support AND creates an extensible provider architecture
|
|
||||||
that makes it easy to add additional AI providers (OpenAI, Gemini, etc.) in the future.
|
|
||||||
This uses the "Open/Closed Principle" - open for extension, closed for modification.
|
|
||||||
|
|
||||||
All changes are additive and backward-compatible. Existing Claude functionality remains unchanged.
|
|
||||||
</overview>
|
|
||||||
|
|
||||||
<safety_requirements>
|
|
||||||
<critical>
|
|
||||||
- DO NOT modify existing Claude API integration code directly
|
|
||||||
- DO NOT change existing model selection logic for Claude models
|
|
||||||
- DO NOT modify existing database schema without safe migrations
|
|
||||||
- DO NOT break existing conversations or messages
|
|
||||||
- All new code must be in separate files/modules when possible
|
|
||||||
- Test thoroughly before marking issues as complete
|
|
||||||
- Maintain backward compatibility at all times
|
|
||||||
- Refactor Claude code to use BaseProvider WITHOUT changing functionality
|
|
||||||
</critical>
|
|
||||||
</safety_requirements>
|
|
||||||
|
|
||||||
<architecture_design>
|
|
||||||
<provider_pattern>
|
|
||||||
Create an abstract provider interface that all AI providers implement:
|
|
||||||
- BaseProvider (abstract class/interface) - defines common interface
|
|
||||||
- ClaudeProvider (existing code refactored to extend BaseProvider)
|
|
||||||
- MistralProvider (new, extends BaseProvider)
|
|
||||||
- OpenAIProvider (future, extends BaseProvider - easy to add)
|
|
||||||
- GeminiProvider (future, extends BaseProvider - easy to add)
|
|
||||||
</provider_pattern>
|
|
||||||
|
|
||||||
<benefits>
|
|
||||||
- Easy to add new providers without modifying existing code
|
|
||||||
- Consistent interface across all providers
|
|
||||||
- Isolated error handling per provider
|
|
||||||
- Unified model selection UI
|
|
||||||
- Shared functionality (streaming, error handling, logging)
|
|
||||||
- Future-proof architecture
|
|
||||||
</benefits>
|
|
||||||
</architecture_design>
|
|
||||||
|
|
||||||
<new_features>
|
|
||||||
<feature_provider_architecture>
|
|
||||||
<title>Extensible Provider Architecture (Foundation)</title>
|
|
||||||
<description>
|
|
||||||
Create a provider abstraction layer that allows easy addition of multiple AI providers.
|
|
||||||
This is the foundation that makes adding OpenAI, Gemini, etc. trivial in the future.
|
|
||||||
|
|
||||||
BaseProvider abstract class should define:
|
|
||||||
- sendMessage(messages, options) -> Promise<response>
|
|
||||||
- streamMessage(messages, options) -> AsyncGenerator<chunk>
|
|
||||||
- getModels() -> Promise<array> of available models
|
|
||||||
- validateApiKey(key) -> Promise<boolean>
|
|
||||||
- getCapabilities() -> object with provider capabilities
|
|
||||||
- getName() -> string (provider name: 'claude', 'mistral', 'openai', etc.)
|
|
||||||
- getDefaultModel() -> string (default model ID for this provider)
|
|
||||||
|
|
||||||
ProviderRegistry should:
|
|
||||||
- Register all available providers
|
|
||||||
- Provide list of all providers
|
|
||||||
- Check which providers are configured (have API keys)
|
|
||||||
- Enable/disable providers
|
|
||||||
|
|
||||||
ProviderFactory should:
|
|
||||||
- Create provider instances based on model ID or provider name
|
|
||||||
- Handle provider selection logic
|
|
||||||
- Route requests to correct provider
|
|
||||||
</description>
|
|
||||||
<priority>1</priority>
|
|
||||||
<category>functional</category>
|
|
||||||
<implementation_approach>
|
|
||||||
- Create server/providers/BaseProvider.js (abstract base class)
|
|
||||||
- Refactor existing Claude code to server/providers/ClaudeProvider.js (extends BaseProvider)
|
|
||||||
- Create server/providers/ProviderRegistry.js (manages all providers)
|
|
||||||
- Create server/providers/ProviderFactory.js (creates provider instances)
|
|
||||||
- Update existing routes to use ProviderFactory instead of direct Claude calls
|
|
||||||
- Keep all provider code in server/providers/ directory
|
|
||||||
</implementation_approach>
|
|
||||||
<test_steps>
|
|
||||||
1. Verify Claude still works after refactoring to use BaseProvider
|
|
||||||
2. Test that ProviderFactory creates ClaudeProvider correctly
|
|
||||||
3. Test that ProviderRegistry lists Claude provider
|
|
||||||
4. Verify error handling works correctly
|
|
||||||
5. Test that adding a mock provider is straightforward
|
|
||||||
6. Verify no regression in existing Claude functionality
|
|
||||||
</test_steps>
|
|
||||||
</feature_provider_architecture>
|
|
||||||
|
|
||||||
<feature_mistral_provider>
|
|
||||||
<title>Mistral Provider Implementation</title>
|
|
||||||
<description>
|
|
||||||
Implement MistralProvider extending BaseProvider. This should:
|
|
||||||
- Implement all BaseProvider abstract methods
|
|
||||||
- Handle Mistral-specific API calls (https://api.mistral.ai/v1/chat/completions)
|
|
||||||
- Support Mistral streaming (Server-Sent Events)
|
|
||||||
- Handle Mistral-specific error codes and messages
|
|
||||||
- Provide Mistral model list:
|
|
||||||
* mistral-large-latest (default)
|
|
||||||
* mistral-medium-latest
|
|
||||||
* mistral-small-latest
|
|
||||||
* mistral-7b-instruct
|
|
||||||
- Manage Mistral API authentication
|
|
||||||
- Return responses in unified format (same as Claude)
|
|
||||||
</description>
|
|
||||||
<priority>2</priority>
|
|
||||||
<category>functional</category>
|
|
||||||
<implementation_approach>
|
|
||||||
- Create server/providers/MistralProvider.js
|
|
||||||
- Extend BaseProvider class
|
|
||||||
- Implement Mistral API integration using fetch or axios
|
|
||||||
- Register in ProviderRegistry
|
|
||||||
- Use same response format as ClaudeProvider for consistency
|
|
||||||
</implementation_approach>
|
|
||||||
<test_steps>
|
|
||||||
1. Test MistralProvider.sendMessage() works with valid API key
|
|
||||||
2. Test MistralProvider.streamMessage() works
|
|
||||||
3. Test MistralProvider.getModels() returns correct models
|
|
||||||
4. Test error handling for invalid API key
|
|
||||||
5. Test error handling for API rate limits
|
|
||||||
6. Verify it integrates with ProviderFactory
|
|
||||||
7. Verify responses match expected format
|
|
||||||
</test_steps>
|
|
||||||
</feature_mistral_provider>
|
|
||||||
|
|
||||||
<feature_unified_model_selector>
|
|
||||||
<title>Unified Model Selector (All Providers)</title>
|
|
||||||
<description>
|
|
||||||
Update model selector to dynamically load models from all registered providers.
|
|
||||||
The selector should:
|
|
||||||
- Query all providers for available models via GET /api/models
|
|
||||||
- Group models by provider (Claude, Mistral, etc.)
|
|
||||||
- Display provider badges/icons next to model names
|
|
||||||
- Show which provider each model belongs to
|
|
||||||
- Filter models by provider (optional toggle)
|
|
||||||
- Show provider-specific capabilities (streaming, images, etc.)
|
|
||||||
- Only show models from providers with configured API keys
|
|
||||||
- Handle providers gracefully (show "Configure API key" if not set)
|
|
||||||
</description>
|
|
||||||
<priority>2</priority>
|
|
||||||
<category>functional</category>
|
|
||||||
<implementation_approach>
|
|
||||||
- Create API endpoint: GET /api/models (returns all models from all providers)
|
|
||||||
- Update frontend ModelSelector component to handle multiple providers
|
|
||||||
- Add provider grouping/filtering in UI
|
|
||||||
- Show provider badges/icons next to model names
|
|
||||||
- Group models by provider with collapsible sections
|
|
||||||
- Show provider status (configured/not configured)
|
|
||||||
</implementation_approach>
|
|
||||||
<test_steps>
|
|
||||||
1. Verify model selector shows Claude models (existing functionality)
|
|
||||||
2. Verify model selector shows Mistral models (if key configured)
|
|
||||||
3. Test grouping by provider works
|
|
||||||
4. Test filtering by provider works
|
|
||||||
5. Verify provider badges display correctly
|
|
||||||
6. Test that providers without API keys show "Configure" message
|
|
||||||
7. Verify selecting a model works for both providers
|
|
||||||
</test_steps>
|
|
||||||
</feature_unified_model_selector>
|
|
||||||
|
|
||||||
<feature_provider_settings>
|
|
||||||
<title>Multi-Provider API Key Management</title>
|
|
||||||
<description>
|
|
||||||
Create unified API key management that supports multiple providers. Users should be able to:
|
|
||||||
- Manage API keys for each provider separately (Claude, Mistral, OpenAI, etc.)
|
|
||||||
- See which providers are available
|
|
||||||
- See which providers are configured (have API keys)
|
|
||||||
- Test each provider's API key independently
|
|
||||||
- Enable/disable providers (hide models if key not configured)
|
|
||||||
- See provider status indicators (configured/not configured/error)
|
|
||||||
- Update or remove API keys for any provider
|
|
||||||
- See usage statistics per provider
|
|
||||||
</description>
|
|
||||||
<priority>2</priority>
|
|
||||||
<category>functional</category>
|
|
||||||
<implementation_approach>
|
|
||||||
- Create server/routes/providers.js with unified provider management
|
|
||||||
- Update settings UI to show provider cards (one per provider)
|
|
||||||
- Each provider card has:
|
|
||||||
* Provider name and logo/icon
|
|
||||||
* API key input field (masked)
|
|
||||||
* "Test Connection" button
|
|
||||||
* Status indicator (green/yellow/red)
|
|
||||||
* Enable/disable toggle
|
|
||||||
- Store keys in api_keys table with key_name = 'claude_api_key', 'mistral_api_key', etc.
|
|
||||||
- Use same encryption method for all providers
|
|
||||||
</implementation_approach>
|
|
||||||
<test_steps>
|
|
||||||
1. Configure Claude API key (verify existing functionality still works)
|
|
||||||
2. Configure Mistral API key
|
|
||||||
3. Verify both keys are stored separately
|
|
||||||
4. Test each provider's "Test Connection" button
|
|
||||||
5. Remove one key and verify only that provider's models are hidden
|
|
||||||
6. Verify provider status indicators update correctly
|
|
||||||
7. Test that disabling a provider hides its models
|
|
||||||
</test_steps>
|
|
||||||
</feature_provider_settings>
|
|
||||||
|
|
||||||
<feature_database_provider_support>
|
|
||||||
<title>Database Support for Multiple Providers (Future-Proof)</title>
|
|
||||||
<description>
|
|
||||||
Update database schema to support multiple providers in a future-proof way.
|
|
||||||
This should:
|
|
||||||
- Add provider field to conversations table (TEXT, default: 'claude')
|
|
||||||
- Add provider field to messages/usage_tracking (TEXT, default: 'claude')
|
|
||||||
- Use TEXT field (not ENUM) to allow easy addition of new providers without schema changes
|
|
||||||
- Migration should be safe, idempotent, and backward compatible
|
|
||||||
- All existing records default to 'claude' provider
|
|
||||||
- Add indexes for performance on provider queries
|
|
||||||
</description>
|
|
||||||
<priority>1</priority>
|
|
||||||
<category>functional</category>
|
|
||||||
<implementation_approach>
|
|
||||||
- Create migration: server/migrations/add_provider_support.sql
|
|
||||||
- Use TEXT field (not ENUM) for provider name (allows 'claude', 'mistral', 'openai', etc.)
|
|
||||||
- Default all existing records to 'claude'
|
|
||||||
- Add indexes on provider columns for performance
|
|
||||||
- Make migration idempotent (can run multiple times safely)
|
|
||||||
- Create rollback script if needed
|
|
||||||
</implementation_approach>
|
|
||||||
<test_steps>
|
|
||||||
1. Backup existing database
|
|
||||||
2. Run migration script
|
|
||||||
3. Verify all existing conversations have provider='claude'
|
|
||||||
4. Verify all existing messages have provider='claude' (via usage_tracking)
|
|
||||||
5. Create new conversation with Mistral provider
|
|
||||||
6. Verify provider='mistral' is saved correctly
|
|
||||||
7. Query conversations by provider (test index performance)
|
|
||||||
8. Verify existing Claude conversations still work
|
|
||||||
9. Test rollback script if needed
|
|
||||||
</test_steps>
|
|
||||||
</feature_database_provider_support>
|
|
||||||
|
|
||||||
<feature_unified_chat_endpoint>
|
|
||||||
<title>Unified Chat Endpoint (Works with Any Provider)</title>
|
|
||||||
<description>
|
|
||||||
Update chat endpoints to use ProviderFactory, making them work with any provider.
|
|
||||||
The endpoint should:
|
|
||||||
- Accept provider or model ID in request
|
|
||||||
- Use ProviderFactory to get correct provider
|
|
||||||
- Route request to appropriate provider
|
|
||||||
- Return unified response format
|
|
||||||
- Handle provider-specific errors gracefully
|
|
||||||
- Support streaming for all providers that support it
|
|
||||||
</description>
|
|
||||||
<priority>1</priority>
|
|
||||||
<category>functional</category>
|
|
||||||
<implementation_approach>
|
|
||||||
- Update POST /api/chat to use ProviderFactory
|
|
||||||
- Update POST /api/chat/stream to use ProviderFactory
|
|
||||||
- Extract provider from model ID or accept provider parameter
|
|
||||||
- Route to correct provider instance
|
|
||||||
- Return unified response format
|
|
||||||
</implementation_approach>
|
|
||||||
<test_steps>
|
|
||||||
1. Test POST /api/chat with Claude model (verify no regression)
|
|
||||||
2. Test POST /api/chat with Mistral model
|
|
||||||
3. Test POST /api/chat/stream with Claude (verify streaming still works)
|
|
||||||
4. Test POST /api/chat/stream with Mistral
|
|
||||||
5. Test error handling for invalid provider
|
|
||||||
6. Test error handling for missing API key
|
|
||||||
</test_steps>
|
|
||||||
</feature_unified_chat_endpoint>
|
|
||||||
</new_features>
|
|
||||||
|
|
||||||
<future_extensibility>
|
|
||||||
<openai_provider_example>
|
|
||||||
<title>How to Add OpenAI in the Future</title>
|
|
||||||
<description>
|
|
||||||
To add OpenAI support later, simply follow these steps (NO changes to existing code needed):
|
|
||||||
|
|
||||||
1. Create server/providers/OpenAIProvider.js extending BaseProvider
|
|
||||||
2. Implement OpenAI API calls (https://api.openai.com/v1/chat/completions)
|
|
||||||
3. Register in ProviderRegistry: ProviderRegistry.register('openai', OpenAIProvider)
|
|
||||||
4. That's it! OpenAI models will automatically appear in model selector.
|
|
||||||
|
|
||||||
Example OpenAIProvider structure:
|
|
||||||
- Extends BaseProvider
|
|
||||||
- Implements sendMessage() using OpenAI API
|
|
||||||
- Implements streamMessage() for streaming support
|
|
||||||
- Returns models: gpt-4, gpt-3.5-turbo, etc.
|
|
||||||
- Handles OpenAI-specific authentication and errors
|
|
||||||
</description>
|
|
||||||
</openai_provider_example>
|
|
||||||
|
|
||||||
<other_providers>
|
|
||||||
<note>
|
|
||||||
Same pattern works for any AI provider:
|
|
||||||
- Google Gemini (GeminiProvider)
|
|
||||||
- Cohere (CohereProvider)
|
|
||||||
- Any other AI API that follows similar patterns
|
|
||||||
Just create a new Provider class extending BaseProvider and register it.
|
|
||||||
</note>
|
|
||||||
</other_providers>
|
|
||||||
</future_extensibility>
|
|
||||||
|
|
||||||
<implementation_notes>
|
|
||||||
<code_structure>
|
|
||||||
server/
|
|
||||||
providers/
|
|
||||||
BaseProvider.js # Abstract base class (NEW)
|
|
||||||
ClaudeProvider.js # Refactored Claude (extends BaseProvider)
|
|
||||||
MistralProvider.js # New Mistral (extends BaseProvider)
|
|
||||||
ProviderRegistry.js # Manages all providers (NEW)
|
|
||||||
ProviderFactory.js # Creates provider instances (NEW)
|
|
||||||
routes/
|
|
||||||
providers.js # Unified provider management (NEW)
|
|
||||||
chat.js # Updated to use ProviderFactory
|
|
||||||
migrations/
|
|
||||||
add_provider_support.sql # Database migration (NEW)
|
|
||||||
</code_structure>
|
|
||||||
|
|
||||||
<safety_guidelines>
|
|
||||||
- Refactor Claude code to use BaseProvider WITHOUT changing functionality
|
|
||||||
- All providers are isolated - errors in one don't affect others
|
|
||||||
- Database changes are backward compatible (TEXT field, not ENUM)
|
|
||||||
- Existing conversations default to 'claude' provider
|
|
||||||
- Test Claude thoroughly after refactoring
|
|
||||||
- Use feature flags if needed to enable/disable providers
|
|
||||||
- Log all provider operations separately for debugging
|
|
||||||
</safety_guidelines>
|
|
||||||
|
|
||||||
<error_handling>
|
|
||||||
- Each provider handles its own errors
|
|
||||||
- Provider errors should NOT affect other providers
|
|
||||||
- Show user-friendly error messages
|
|
||||||
- Log errors with provider context
|
|
||||||
- Don't throw unhandled exceptions
|
|
||||||
</error_handling>
|
|
||||||
</implementation_notes>
|
|
||||||
|
|
||||||
<database_changes>
|
|
||||||
<safe_migrations>
|
|
||||||
<migration_1>
|
|
||||||
<description>Add provider support (TEXT field for extensibility)</description>
|
|
||||||
<sql>
|
|
||||||
-- Add provider column to conversations (TEXT allows any provider name)
|
|
||||||
-- Default to 'claude' for backward compatibility
|
|
||||||
ALTER TABLE conversations
|
|
||||||
ADD COLUMN provider TEXT DEFAULT 'claude';
|
|
||||||
|
|
||||||
-- Add provider column to usage_tracking
|
|
||||||
ALTER TABLE usage_tracking
|
|
||||||
ADD COLUMN provider TEXT DEFAULT 'claude';
|
|
||||||
|
|
||||||
-- Add indexes for performance
|
|
||||||
CREATE INDEX IF NOT EXISTS idx_conversations_provider
|
|
||||||
ON conversations(provider);
|
|
||||||
|
|
||||||
CREATE INDEX IF NOT EXISTS idx_usage_tracking_provider
|
|
||||||
ON usage_tracking(provider);
|
|
||||||
</sql>
|
|
||||||
<rollback>
|
|
||||||
-- Rollback script (use with caution - may cause data issues)
|
|
||||||
DROP INDEX IF EXISTS idx_conversations_provider;
|
|
||||||
DROP INDEX IF EXISTS idx_usage_tracking_provider;
|
|
||||||
-- Note: SQLite doesn't support DROP COLUMN easily
|
|
||||||
-- Would need to recreate table without provider column
|
|
||||||
</rollback>
|
|
||||||
<note>
|
|
||||||
Using TEXT instead of ENUM allows adding new providers (OpenAI, Gemini, etc.)
|
|
||||||
without database schema changes in the future. This is future-proof.
|
|
||||||
</note>
|
|
||||||
</migration_1>
|
|
||||||
</safe_migrations>
|
|
||||||
|
|
||||||
<data_integrity>
|
|
||||||
- All existing conversations default to provider='claude'
|
|
||||||
- All existing messages default to provider='claude'
|
|
||||||
- Migration is idempotent (can run multiple times safely)
|
|
||||||
- No data loss during migration
|
|
||||||
- Existing queries continue to work
|
|
||||||
</data_integrity>
|
|
||||||
</database_changes>
|
|
||||||
|
|
||||||
<api_endpoints>
|
|
||||||
<new_endpoints>
|
|
||||||
- GET /api/models - Get all models from all configured providers
|
|
||||||
- GET /api/providers - Get list of available providers and their status
|
|
||||||
- POST /api/providers/:provider/key - Set API key for specific provider
|
|
||||||
- POST /api/providers/:provider/test - Test provider API key
|
|
||||||
- GET /api/providers/:provider/status - Get provider configuration status
|
|
||||||
- DELETE /api/providers/:provider/key - Remove provider API key
|
|
||||||
</new_endpoints>
|
|
||||||
|
|
||||||
<updated_endpoints>
|
|
||||||
- POST /api/chat - Updated to use ProviderFactory (works with any provider)
|
|
||||||
* Accepts: { model: 'model-id', messages: [...], ... }
|
|
||||||
* Provider is determined from model ID or can be specified
|
|
||||||
- POST /api/chat/stream - Updated to use ProviderFactory (streaming for any provider)
|
|
||||||
* Same interface, works with any provider that supports streaming
|
|
||||||
</updated_endpoints>
|
|
||||||
</api_endpoints>
|
|
||||||
|
|
||||||
<dependencies>
|
|
||||||
<backend>
|
|
||||||
- No new dependencies required (use native fetch for Mistral API)
|
|
||||||
- Optional: @mistralai/mistralai (only if provides significant value)
|
|
||||||
- Keep dependencies minimal to avoid conflicts
|
|
||||||
</backend>
|
|
||||||
</dependencies>
|
|
||||||
|
|
||||||
<testing_requirements>
|
|
||||||
<regression_tests>
|
|
||||||
- Verify all existing Claude functionality still works
|
|
||||||
- Test that existing conversations load correctly
|
|
||||||
- Verify Claude model selection still works
|
|
||||||
- Test Claude API endpoints are unaffected
|
|
||||||
- Verify database queries for Claude still work
|
|
||||||
- Test Claude streaming still works
|
|
||||||
</regression_tests>
|
|
||||||
|
|
||||||
<integration_tests>
|
|
||||||
- Test switching between Claude and Mistral models
|
|
||||||
- Test conversations with different providers
|
|
||||||
- Test error handling doesn't affect other providers
|
|
||||||
- Test migration doesn't break existing data
|
|
||||||
- Test ProviderFactory routes correctly
|
|
||||||
- Test unified model selector with multiple providers
|
|
||||||
</integration_tests>
|
|
||||||
|
|
||||||
<extensibility_tests>
|
|
||||||
- Verify adding a mock provider is straightforward
|
|
||||||
- Test that ProviderFactory correctly routes to providers
|
|
||||||
- Verify provider isolation (errors don't propagate)
|
|
||||||
- Test that new providers automatically appear in UI
|
|
||||||
</extensibility_tests>
|
|
||||||
</testing_requirements>
|
|
||||||
|
|
||||||
<success_criteria>
|
|
||||||
<functionality>
|
|
||||||
- Claude functionality works exactly as before (no regression)
|
|
||||||
- Mistral models appear in selector and work correctly
|
|
||||||
- Users can switch between Claude and Mistral seamlessly
|
|
||||||
- API key management works for both providers
|
|
||||||
- Database migration is safe and backward compatible
|
|
||||||
</functionality>
|
|
||||||
|
|
||||||
<extensibility>
|
|
||||||
- Adding a new provider (like OpenAI) requires only creating one new file
|
|
||||||
- No changes needed to existing code when adding providers
|
|
||||||
- Provider architecture is documented and easy to follow
|
|
||||||
- Code is organized and maintainable
|
|
||||||
</extensibility>
|
|
||||||
</success_criteria>
|
|
||||||
</project_specification>
|
|
||||||
681
prompts/app_spec_model.txt
Normal file
681
prompts/app_spec_model.txt
Normal file
@@ -0,0 +1,681 @@
|
|||||||
|
<project_specification>
|
||||||
|
<project_name>Claude.ai Clone - AI Chat Interface</project_name>
|
||||||
|
|
||||||
|
<overview>
|
||||||
|
Build a fully functional clone of claude.ai, Anthropic's conversational AI interface. The application should
|
||||||
|
provide a clean, modern chat interface for interacting with Claude via the API, including features like
|
||||||
|
conversation management, artifact rendering, project organization, multiple model selection, and advanced
|
||||||
|
settings. The UI should closely match claude.ai's design using Tailwind CSS with a focus on excellent
|
||||||
|
user experience and responsive design.
|
||||||
|
</overview>
|
||||||
|
|
||||||
|
<technology_stack>
|
||||||
|
<api_key>
|
||||||
|
You can use an API key located at /tmp/api-key for testing. You will not be allowed to read this file, but you can reference it in code.
|
||||||
|
</api_key>
|
||||||
|
<frontend>
|
||||||
|
<framework>React with Vite</framework>
|
||||||
|
<styling>Tailwind CSS (via CDN)</styling>
|
||||||
|
<state_management>React hooks and context</state_management>
|
||||||
|
<routing>React Router for navigation</routing>
|
||||||
|
<markdown>React Markdown for message rendering</markdown>
|
||||||
|
<code_highlighting>Syntax highlighting for code blocks</code_highlighting>
|
||||||
|
<port>Only launch on port {frontend_port}</port>
|
||||||
|
</frontend>
|
||||||
|
<backend>
|
||||||
|
<runtime>Node.js with Express</runtime>
|
||||||
|
<database>SQLite with better-sqlite3</database>
|
||||||
|
<api_integration>Claude API for chat completions</api_integration>
|
||||||
|
<streaming>Server-Sent Events for streaming responses</streaming>
|
||||||
|
</backend>
|
||||||
|
<communication>
|
||||||
|
<api>RESTful endpoints</api>
|
||||||
|
<streaming>SSE for real-time message streaming</streaming>
|
||||||
|
<claude_api>Integration with Claude API using Anthropic SDK</claude_api>
|
||||||
|
</communication>
|
||||||
|
</technology_stack>
|
||||||
|
|
||||||
|
<prerequisites>
|
||||||
|
<environment_setup>
|
||||||
|
- Repository includes .env with VITE_ANTHROPIC_API_KEY configured
|
||||||
|
- Frontend dependencies pre-installed via pnpm
|
||||||
|
- Backend code goes in /server directory
|
||||||
|
- Install backend dependencies as needed
|
||||||
|
</environment_setup>
|
||||||
|
</prerequisites>
|
||||||
|
|
||||||
|
<core_features>
|
||||||
|
<chat_interface>
|
||||||
|
- Clean, centered chat layout with message bubbles
|
||||||
|
- Streaming message responses with typing indicator
|
||||||
|
- Markdown rendering with proper formatting
|
||||||
|
- Code blocks with syntax highlighting and copy button
|
||||||
|
- LaTeX/math equation rendering
|
||||||
|
- Image upload and display in messages
|
||||||
|
- Multi-turn conversations with context
|
||||||
|
- Message editing and regeneration
|
||||||
|
- Stop generation button during streaming
|
||||||
|
- Input field with auto-resize textarea
|
||||||
|
- Character count and token estimation
|
||||||
|
- Keyboard shortcuts (Enter to send, Shift+Enter for newline)
|
||||||
|
</chat_interface>
|
||||||
|
|
||||||
|
<artifacts>
|
||||||
|
- Artifact detection and rendering in side panel
|
||||||
|
- Code artifact viewer with syntax highlighting
|
||||||
|
- HTML/SVG preview with live rendering
|
||||||
|
- React component preview
|
||||||
|
- Mermaid diagram rendering
|
||||||
|
- Text document artifacts
|
||||||
|
- Artifact editing and re-prompting
|
||||||
|
- Full-screen artifact view
|
||||||
|
- Download artifact content
|
||||||
|
- Artifact versioning and history
|
||||||
|
</artifacts>
|
||||||
|
|
||||||
|
<conversation_management>
|
||||||
|
- Create new conversations
|
||||||
|
- Conversation list in sidebar
|
||||||
|
- Rename conversations
|
||||||
|
- Delete conversations
|
||||||
|
- Search conversations by title/content
|
||||||
|
- Pin important conversations
|
||||||
|
- Archive conversations
|
||||||
|
- Conversation folders/organization
|
||||||
|
- Duplicate conversation
|
||||||
|
- Export conversation (JSON, Markdown, PDF)
|
||||||
|
- Conversation timestamps (created, last updated)
|
||||||
|
- Unread message indicators
|
||||||
|
</conversation_management>
|
||||||
|
|
||||||
|
<projects>
|
||||||
|
- Create projects to group related conversations
|
||||||
|
- Project knowledge base (upload documents)
|
||||||
|
- Project-specific custom instructions
|
||||||
|
- Share projects with team (mock feature)
|
||||||
|
- Project settings and configuration
|
||||||
|
- Move conversations between projects
|
||||||
|
- Project templates
|
||||||
|
- Project analytics (usage stats)
|
||||||
|
</projects>
|
||||||
|
|
||||||
|
<model_selection>
|
||||||
|
- Model selector dropdown with the following models:
|
||||||
|
- Claude Sonnet 4.5 (claude-sonnet-4-5-20250929) - default
|
||||||
|
- Claude Haiku 4.5 (claude-haiku-4-5-20251001)
|
||||||
|
- Claude Opus 4.1 (claude-opus-4-1-20250805)
|
||||||
|
- Model capabilities display
|
||||||
|
- Context window indicator
|
||||||
|
- Model-specific pricing info (display only)
|
||||||
|
- Switch models mid-conversation
|
||||||
|
- Model comparison view
|
||||||
|
</model_selection>
|
||||||
|
|
||||||
|
<custom_instructions>
|
||||||
|
- Global custom instructions
|
||||||
|
- Project-specific custom instructions
|
||||||
|
- Conversation-specific system prompts
|
||||||
|
- Custom instruction templates
|
||||||
|
- Preview how instructions affect responses
|
||||||
|
</custom_instructions>
|
||||||
|
|
||||||
|
<settings_preferences>
|
||||||
|
- Theme selection (Light, Dark, Auto)
|
||||||
|
- Font size adjustment
|
||||||
|
- Message density (compact, comfortable, spacious)
|
||||||
|
- Code theme selection
|
||||||
|
- Language preferences
|
||||||
|
- Accessibility options
|
||||||
|
- Keyboard shortcuts reference
|
||||||
|
- Data export options
|
||||||
|
- Privacy settings
|
||||||
|
- API key management
|
||||||
|
</settings_preferences>
|
||||||
|
|
||||||
|
<advanced_features>
|
||||||
|
- Temperature control slider
|
||||||
|
- Max tokens adjustment
|
||||||
|
- Top-p (nucleus sampling) control
|
||||||
|
- System prompt override
|
||||||
|
- Thinking/reasoning mode toggle
|
||||||
|
- Multi-modal input (text + images)
|
||||||
|
- Voice input (optional, mock UI)
|
||||||
|
- Response suggestions
|
||||||
|
- Related prompts
|
||||||
|
- Conversation branching
|
||||||
|
</advanced_features>
|
||||||
|
|
||||||
|
<collaboration>
|
||||||
|
- Share conversation via link (read-only)
|
||||||
|
- Export conversation formats
|
||||||
|
- Conversation templates
|
||||||
|
- Prompt library
|
||||||
|
- Share artifacts
|
||||||
|
- Team workspaces (mock UI)
|
||||||
|
</collaboration>
|
||||||
|
|
||||||
|
<search_discovery>
|
||||||
|
- Search across all conversations
|
||||||
|
- Filter by project, date, model
|
||||||
|
- Prompt library with categories
|
||||||
|
- Example conversations
|
||||||
|
- Quick actions menu
|
||||||
|
- Command palette (Cmd/Ctrl+K)
|
||||||
|
</search_discovery>
|
||||||
|
|
||||||
|
<usage_tracking>
|
||||||
|
- Token usage display per message
|
||||||
|
- Conversation cost estimation
|
||||||
|
- Daily/monthly usage dashboard
|
||||||
|
- Usage limits and warnings
|
||||||
|
- API quota tracking
|
||||||
|
</usage_tracking>
|
||||||
|
|
||||||
|
<onboarding>
|
||||||
|
- Welcome screen for new users
|
||||||
|
- Feature tour highlights
|
||||||
|
- Example prompts to get started
|
||||||
|
- Quick tips and best practices
|
||||||
|
- Keyboard shortcuts tutorial
|
||||||
|
</onboarding>
|
||||||
|
|
||||||
|
<accessibility>
|
||||||
|
- Full keyboard navigation
|
||||||
|
- Screen reader support
|
||||||
|
- ARIA labels and roles
|
||||||
|
- High contrast mode
|
||||||
|
- Focus management
|
||||||
|
- Reduced motion support
|
||||||
|
</accessibility>
|
||||||
|
|
||||||
|
<responsive_design>
|
||||||
|
- Mobile-first responsive layout
|
||||||
|
- Touch-optimized interface
|
||||||
|
- Collapsible sidebar on mobile
|
||||||
|
- Swipe gestures for navigation
|
||||||
|
- Adaptive artifact display
|
||||||
|
- Progressive Web App (PWA) support
|
||||||
|
</responsive_design>
|
||||||
|
</core_features>
|
||||||
|
|
||||||
|
<database_schema>
|
||||||
|
<tables>
|
||||||
|
<users>
|
||||||
|
- id, email, name, avatar_url
|
||||||
|
- created_at, last_login
|
||||||
|
- preferences (JSON: theme, font_size, etc.)
|
||||||
|
- custom_instructions
|
||||||
|
</users>
|
||||||
|
|
||||||
|
<projects>
|
||||||
|
- id, user_id, name, description, color
|
||||||
|
- custom_instructions, knowledge_base_path
|
||||||
|
- created_at, updated_at
|
||||||
|
- is_archived, is_pinned
|
||||||
|
</projects>
|
||||||
|
|
||||||
|
<conversations>
|
||||||
|
- id, user_id, project_id, title
|
||||||
|
- model, created_at, updated_at, last_message_at
|
||||||
|
- is_archived, is_pinned, is_deleted
|
||||||
|
- settings (JSON: temperature, max_tokens, etc.)
|
||||||
|
- token_count, message_count
|
||||||
|
</conversations>
|
||||||
|
|
||||||
|
<messages>
|
||||||
|
- id, conversation_id, role (user/assistant/system)
|
||||||
|
- content, created_at, edited_at
|
||||||
|
- tokens, finish_reason
|
||||||
|
- images (JSON array of image data)
|
||||||
|
- parent_message_id (for branching)
|
||||||
|
</messages>
|
||||||
|
|
||||||
|
<artifacts>
|
||||||
|
- id, message_id, conversation_id
|
||||||
|
- type (code/html/svg/react/mermaid/text)
|
||||||
|
- title, identifier, language
|
||||||
|
- content, version
|
||||||
|
- created_at, updated_at
|
||||||
|
</artifacts>
|
||||||
|
|
||||||
|
<shared_conversations>
|
||||||
|
- id, conversation_id, share_token
|
||||||
|
- created_at, expires_at, view_count
|
||||||
|
- is_public
|
||||||
|
</shared_conversations>
|
||||||
|
|
||||||
|
<prompt_library>
|
||||||
|
- id, user_id, title, description
|
||||||
|
- prompt_template, category, tags (JSON)
|
||||||
|
- is_public, usage_count
|
||||||
|
- created_at, updated_at
|
||||||
|
</prompt_library>
|
||||||
|
|
||||||
|
<conversation_folders>
|
||||||
|
- id, user_id, project_id, name, parent_folder_id
|
||||||
|
- created_at, position
|
||||||
|
</conversation_folders>
|
||||||
|
|
||||||
|
<conversation_folder_items>
|
||||||
|
- id, folder_id, conversation_id
|
||||||
|
</conversation_folder_items>
|
||||||
|
|
||||||
|
<usage_tracking>
|
||||||
|
- id, user_id, conversation_id, message_id
|
||||||
|
- model, input_tokens, output_tokens
|
||||||
|
- cost_estimate, created_at
|
||||||
|
</usage_tracking>
|
||||||
|
|
||||||
|
<api_keys>
|
||||||
|
- id, user_id, key_name, api_key_hash
|
||||||
|
- created_at, last_used_at
|
||||||
|
- is_active
|
||||||
|
</api_keys>
|
||||||
|
</tables>
|
||||||
|
</database_schema>
|
||||||
|
|
||||||
|
<api_endpoints_summary>
|
||||||
|
<authentication>
|
||||||
|
- POST /api/auth/login
|
||||||
|
- POST /api/auth/logout
|
||||||
|
- GET /api/auth/me
|
||||||
|
- PUT /api/auth/profile
|
||||||
|
</authentication>
|
||||||
|
|
||||||
|
<conversations>
|
||||||
|
- GET /api/conversations
|
||||||
|
- POST /api/conversations
|
||||||
|
- GET /api/conversations/:id
|
||||||
|
- PUT /api/conversations/:id
|
||||||
|
- DELETE /api/conversations/:id
|
||||||
|
- POST /api/conversations/:id/duplicate
|
||||||
|
- POST /api/conversations/:id/export
|
||||||
|
- PUT /api/conversations/:id/archive
|
||||||
|
- PUT /api/conversations/:id/pin
|
||||||
|
- POST /api/conversations/:id/branch
|
||||||
|
</conversations>
|
||||||
|
|
||||||
|
<messages>
|
||||||
|
- GET /api/conversations/:id/messages
|
||||||
|
- POST /api/conversations/:id/messages
|
||||||
|
- PUT /api/messages/:id
|
||||||
|
- DELETE /api/messages/:id
|
||||||
|
- POST /api/messages/:id/regenerate
|
||||||
|
- GET /api/messages/stream (SSE endpoint)
|
||||||
|
</messages>
|
||||||
|
|
||||||
|
<artifacts>
|
||||||
|
- GET /api/conversations/:id/artifacts
|
||||||
|
- GET /api/artifacts/:id
|
||||||
|
- PUT /api/artifacts/:id
|
||||||
|
- DELETE /api/artifacts/:id
|
||||||
|
- POST /api/artifacts/:id/fork
|
||||||
|
- GET /api/artifacts/:id/versions
|
||||||
|
</artifacts>
|
||||||
|
|
||||||
|
<projects>
|
||||||
|
- GET /api/projects
|
||||||
|
- POST /api/projects
|
||||||
|
- GET /api/projects/:id
|
||||||
|
- PUT /api/projects/:id
|
||||||
|
- DELETE /api/projects/:id
|
||||||
|
- POST /api/projects/:id/knowledge
|
||||||
|
- GET /api/projects/:id/conversations
|
||||||
|
- PUT /api/projects/:id/settings
|
||||||
|
</projects>
|
||||||
|
|
||||||
|
<sharing>
|
||||||
|
- POST /api/conversations/:id/share
|
||||||
|
- GET /api/share/:token
|
||||||
|
- DELETE /api/share/:token
|
||||||
|
- PUT /api/share/:token/settings
|
||||||
|
</sharing>
|
||||||
|
|
||||||
|
<prompts>
|
||||||
|
- GET /api/prompts/library
|
||||||
|
- POST /api/prompts/library
|
||||||
|
- GET /api/prompts/:id
|
||||||
|
- PUT /api/prompts/:id
|
||||||
|
- DELETE /api/prompts/:id
|
||||||
|
- GET /api/prompts/categories
|
||||||
|
- GET /api/prompts/examples
|
||||||
|
</prompts>
|
||||||
|
|
||||||
|
<search>
|
||||||
|
- GET /api/search/conversations?q=query
|
||||||
|
- GET /api/search/messages?q=query
|
||||||
|
- GET /api/search/artifacts?q=query
|
||||||
|
- GET /api/search/prompts?q=query
|
||||||
|
</search>
|
||||||
|
|
||||||
|
<folders>
|
||||||
|
- GET /api/folders
|
||||||
|
- POST /api/folders
|
||||||
|
- PUT /api/folders/:id
|
||||||
|
- DELETE /api/folders/:id
|
||||||
|
- POST /api/folders/:id/items
|
||||||
|
- DELETE /api/folders/:id/items/:conversationId
|
||||||
|
</folders>
|
||||||
|
|
||||||
|
<usage>
|
||||||
|
- GET /api/usage/daily
|
||||||
|
- GET /api/usage/monthly
|
||||||
|
- GET /api/usage/by-model
|
||||||
|
- GET /api/usage/conversations/:id
|
||||||
|
</usage>
|
||||||
|
|
||||||
|
<settings>
|
||||||
|
- GET /api/settings
|
||||||
|
- PUT /api/settings
|
||||||
|
- GET /api/settings/custom-instructions
|
||||||
|
- PUT /api/settings/custom-instructions
|
||||||
|
</settings>
|
||||||
|
|
||||||
|
<claude_api>
|
||||||
|
- POST /api/claude/chat (proxy to Claude API)
|
||||||
|
- POST /api/claude/chat/stream (streaming proxy)
|
||||||
|
- GET /api/claude/models
|
||||||
|
- POST /api/claude/images/upload
|
||||||
|
</claude_api>
|
||||||
|
</api_endpoints_summary>
|
||||||
|
|
||||||
|
<ui_layout>
|
||||||
|
<main_structure>
|
||||||
|
- Three-column layout: sidebar (conversations), main (chat), panel (artifacts)
|
||||||
|
- Collapsible sidebar with resize handle
|
||||||
|
- Responsive breakpoints: mobile (single column), tablet (two column), desktop (three column)
|
||||||
|
- Persistent header with project/model selector
|
||||||
|
- Bottom input area with send button and options
|
||||||
|
</main_structure>
|
||||||
|
|
||||||
|
<sidebar_left>
|
||||||
|
- New chat button (prominent)
|
||||||
|
- Project selector dropdown
|
||||||
|
- Search conversations input
|
||||||
|
- Conversations list (grouped by date: Today, Yesterday, Previous 7 days, etc.)
|
||||||
|
- Folder tree view (collapsible)
|
||||||
|
- Settings gear icon at bottom
|
||||||
|
- User profile at bottom
|
||||||
|
</sidebar_left>
|
||||||
|
|
||||||
|
<main_chat_area>
|
||||||
|
- Conversation title (editable inline)
|
||||||
|
- Model selector badge
|
||||||
|
- Message history (scrollable)
|
||||||
|
- Welcome screen for new conversations
|
||||||
|
- Suggested prompts (empty state)
|
||||||
|
- Input area with formatting toolbar
|
||||||
|
- Attachment button for images
|
||||||
|
- Send button with loading state
|
||||||
|
- Stop generation button
|
||||||
|
</main_chat_area>
|
||||||
|
|
||||||
|
<artifacts_panel>
|
||||||
|
- Artifact header with title and type badge
|
||||||
|
- Code editor or preview pane
|
||||||
|
- Tabs for multiple artifacts
|
||||||
|
- Full-screen toggle
|
||||||
|
- Download button
|
||||||
|
- Edit/Re-prompt button
|
||||||
|
- Version selector
|
||||||
|
- Close panel button
|
||||||
|
</artifacts_panel>
|
||||||
|
|
||||||
|
<modals_overlays>
|
||||||
|
- Settings modal (tabbed interface)
|
||||||
|
- Share conversation modal
|
||||||
|
- Export options modal
|
||||||
|
- Project settings modal
|
||||||
|
- Prompt library modal
|
||||||
|
- Command palette overlay
|
||||||
|
- Keyboard shortcuts reference
|
||||||
|
</modals_overlays>
|
||||||
|
</ui_layout>
|
||||||
|
|
||||||
|
<design_system>
|
||||||
|
<color_palette>
|
||||||
|
- Primary: Orange/amber accent (#CC785C claude-style)
|
||||||
|
- Background: White (light mode), Dark gray (#1A1A1A dark mode)
|
||||||
|
- Surface: Light gray (#F5F5F5 light), Darker gray (#2A2A2A dark)
|
||||||
|
- Text: Near black (#1A1A1A light), Off-white (#E5E5E5 dark)
|
||||||
|
- Borders: Light gray (#E5E5E5 light), Dark gray (#404040 dark)
|
||||||
|
- Code blocks: Monaco editor theme
|
||||||
|
</color_palette>
|
||||||
|
|
||||||
|
<typography>
|
||||||
|
- Sans-serif system font stack (Inter, SF Pro, Roboto, system-ui)
|
||||||
|
- Headings: font-semibold
|
||||||
|
- Body: font-normal, leading-relaxed
|
||||||
|
- Code: Monospace (JetBrains Mono, Consolas, Monaco)
|
||||||
|
- Message text: text-base (16px), comfortable line-height
|
||||||
|
</typography>
|
||||||
|
|
||||||
|
<components>
|
||||||
|
<message_bubble>
|
||||||
|
- User messages: Right-aligned, subtle background
|
||||||
|
- Assistant messages: Left-aligned, no background
|
||||||
|
- Markdown formatting with proper spacing
|
||||||
|
- Inline code with bg-gray-100 background
|
||||||
|
- Code blocks with syntax highlighting
|
||||||
|
- Copy button on code blocks
|
||||||
|
</message_bubble>
|
||||||
|
|
||||||
|
<buttons>
|
||||||
|
- Primary: Orange/amber background, white text, rounded
|
||||||
|
- Secondary: Border style with hover fill
|
||||||
|
- Icon buttons: Square with hover background
|
||||||
|
- Disabled state: Reduced opacity, no pointer events
|
||||||
|
</buttons>
|
||||||
|
|
||||||
|
<inputs>
|
||||||
|
- Rounded borders with focus ring
|
||||||
|
- Textarea auto-resize
|
||||||
|
- Placeholder text in gray
|
||||||
|
- Error states in red
|
||||||
|
- Character counter
|
||||||
|
</inputs>
|
||||||
|
|
||||||
|
<cards>
|
||||||
|
- Subtle border or shadow
|
||||||
|
- Rounded corners (8px)
|
||||||
|
- Padding: p-4 to p-6
|
||||||
|
- Hover state: slight shadow increase
|
||||||
|
</cards>
|
||||||
|
</components>
|
||||||
|
|
||||||
|
<animations>
|
||||||
|
- Smooth transitions (150-300ms)
|
||||||
|
- Fade in for new messages
|
||||||
|
- Slide in for sidebar
|
||||||
|
- Typing indicator animation
|
||||||
|
- Loading spinner for generation
|
||||||
|
- Skeleton loaders for content
|
||||||
|
</animations>
|
||||||
|
</design_system>
|
||||||
|
|
||||||
|
<key_interactions>
|
||||||
|
<message_flow>
|
||||||
|
1. User types message in input field
|
||||||
|
2. Optional: Attach images via button
|
||||||
|
3. Click send or press Enter
|
||||||
|
4. Message appears in chat immediately
|
||||||
|
5. Typing indicator shows while waiting
|
||||||
|
6. Response streams in word by word
|
||||||
|
7. Code blocks render with syntax highlighting
|
||||||
|
8. Artifacts detected and rendered in side panel
|
||||||
|
9. Message complete, enable regenerate option
|
||||||
|
</message_flow>
|
||||||
|
|
||||||
|
<artifact_flow>
|
||||||
|
1. Assistant generates artifact in response
|
||||||
|
2. Artifact panel slides in from right
|
||||||
|
3. Content renders (code with highlighting or live preview)
|
||||||
|
4. User can edit artifact inline
|
||||||
|
5. "Re-prompt" button to iterate with Claude
|
||||||
|
6. Download or copy artifact content
|
||||||
|
7. Full-screen mode for detailed work
|
||||||
|
8. Close panel to return to chat focus
|
||||||
|
</artifact_flow>
|
||||||
|
|
||||||
|
<conversation_management>
|
||||||
|
1. Click "New Chat" to start fresh conversation
|
||||||
|
2. Conversations auto-save with first message
|
||||||
|
3. Auto-generate title from first exchange
|
||||||
|
4. Click title to rename inline
|
||||||
|
5. Drag conversations into folders
|
||||||
|
6. Right-click for context menu (pin, archive, delete, export)
|
||||||
|
7. Search filters conversations in real-time
|
||||||
|
8. Click conversation to switch context
|
||||||
|
</conversation_management>
|
||||||
|
</key_interactions>
|
||||||
|
|
||||||
|
<implementation_steps>
|
||||||
|
<step number="1">
|
||||||
|
<title>Setup Project Foundation and Database</title>
|
||||||
|
<tasks>
|
||||||
|
- Initialize Express server with SQLite database
|
||||||
|
- Set up Claude API client with streaming support
|
||||||
|
- Create database schema with migrations
|
||||||
|
- Implement authentication endpoints
|
||||||
|
- Set up basic CORS and middleware
|
||||||
|
- Create health check endpoint
|
||||||
|
</tasks>
|
||||||
|
</step>
|
||||||
|
|
||||||
|
<step number="2">
|
||||||
|
<title>Build Core Chat Interface</title>
|
||||||
|
<tasks>
|
||||||
|
- Create main layout with sidebar and chat area
|
||||||
|
- Implement message display with markdown rendering
|
||||||
|
- Add streaming message support with SSE
|
||||||
|
- Build input area with auto-resize textarea
|
||||||
|
- Add code block syntax highlighting
|
||||||
|
- Implement stop generation functionality
|
||||||
|
- Add typing indicators and loading states
|
||||||
|
</tasks>
|
||||||
|
</step>
|
||||||
|
|
||||||
|
<step number="3">
|
||||||
|
<title>Conversation Management</title>
|
||||||
|
<tasks>
|
||||||
|
- Create conversation list in sidebar
|
||||||
|
- Implement new conversation creation
|
||||||
|
- Add conversation switching
|
||||||
|
- Build conversation rename functionality
|
||||||
|
- Implement delete with confirmation
|
||||||
|
- Add conversation search
|
||||||
|
- Create conversation grouping by date
|
||||||
|
</tasks>
|
||||||
|
</step>
|
||||||
|
|
||||||
|
<step number="4">
|
||||||
|
<title>Artifacts System</title>
|
||||||
|
<tasks>
|
||||||
|
- Build artifact detection from Claude responses
|
||||||
|
- Create artifact rendering panel
|
||||||
|
- Implement code artifact viewer
|
||||||
|
- Add HTML/SVG live preview
|
||||||
|
- Build artifact editing interface
|
||||||
|
- Add artifact versioning
|
||||||
|
- Implement full-screen artifact view
|
||||||
|
</tasks>
|
||||||
|
</step>
|
||||||
|
|
||||||
|
<step number="5">
|
||||||
|
<title>Projects and Organization</title>
|
||||||
|
<tasks>
|
||||||
|
- Create projects CRUD endpoints
|
||||||
|
- Build project selector UI
|
||||||
|
- Implement project-specific custom instructions
|
||||||
|
- Add folder system for conversations
|
||||||
|
- Create drag-and-drop organization
|
||||||
|
- Build project settings panel
|
||||||
|
</tasks>
|
||||||
|
</step>
|
||||||
|
|
||||||
|
<step number="6">
|
||||||
|
<title>Advanced Features</title>
|
||||||
|
<tasks>
|
||||||
|
- Add model selection dropdown
|
||||||
|
- Implement temperature and parameter controls
|
||||||
|
- Build image upload functionality
|
||||||
|
- Create message editing and regeneration
|
||||||
|
- Add conversation branching
|
||||||
|
- Implement export functionality
|
||||||
|
</tasks>
|
||||||
|
</step>
|
||||||
|
|
||||||
|
<step number="7">
|
||||||
|
<title>Settings and Customization</title>
|
||||||
|
<tasks>
|
||||||
|
- Build settings modal with tabs
|
||||||
|
- Implement theme switching (light/dark)
|
||||||
|
- Add custom instructions management
|
||||||
|
- Create keyboard shortcuts
|
||||||
|
- Build prompt library
|
||||||
|
- Add usage tracking dashboard
|
||||||
|
</tasks>
|
||||||
|
</step>
|
||||||
|
|
||||||
|
<step number="8">
|
||||||
|
<title>Sharing and Collaboration</title>
|
||||||
|
<tasks>
|
||||||
|
- Implement conversation sharing with tokens
|
||||||
|
- Create public share view
|
||||||
|
- Add export to multiple formats
|
||||||
|
- Build prompt templates
|
||||||
|
- Create example conversations
|
||||||
|
</tasks>
|
||||||
|
</step>
|
||||||
|
|
||||||
|
<step number="9">
|
||||||
|
<title>Polish and Optimization</title>
|
||||||
|
<tasks>
|
||||||
|
- Optimize for mobile responsiveness
|
||||||
|
- Add command palette (Cmd+K)
|
||||||
|
- Implement comprehensive keyboard navigation
|
||||||
|
- Add onboarding flow
|
||||||
|
- Create accessibility improvements
|
||||||
|
- Performance optimization and caching
|
||||||
|
</tasks>
|
||||||
|
</step>
|
||||||
|
</implementation_steps>
|
||||||
|
|
||||||
|
<success_criteria>
|
||||||
|
<functionality>
|
||||||
|
- Streaming chat responses work smoothly
|
||||||
|
- Artifact detection and rendering accurate
|
||||||
|
- Conversation management intuitive and reliable
|
||||||
|
- Project organization clear and useful
|
||||||
|
- Image upload and display working
|
||||||
|
- All CRUD operations functional
|
||||||
|
</functionality>
|
||||||
|
|
||||||
|
<user_experience>
|
||||||
|
- Interface matches claude.ai design language
|
||||||
|
- Responsive on all device sizes
|
||||||
|
- Smooth animations and transitions
|
||||||
|
- Fast response times and minimal lag
|
||||||
|
- Intuitive navigation and workflows
|
||||||
|
- Clear feedback for all actions
|
||||||
|
</user_experience>
|
||||||
|
|
||||||
|
<technical_quality>
|
||||||
|
- Clean, maintainable code structure
|
||||||
|
- Proper error handling throughout
|
||||||
|
- Secure API key management
|
||||||
|
- Optimized database queries
|
||||||
|
- Efficient streaming implementation
|
||||||
|
- Comprehensive testing coverage
|
||||||
|
</technical_quality>
|
||||||
|
|
||||||
|
<design_polish>
|
||||||
|
- Consistent with claude.ai visual design
|
||||||
|
- Beautiful typography and spacing
|
||||||
|
- Smooth animations and micro-interactions
|
||||||
|
- Excellent contrast and accessibility
|
||||||
|
- Professional, polished appearance
|
||||||
|
- Dark mode fully implemented
|
||||||
|
</design_polish>
|
||||||
|
</success_criteria>
|
||||||
|
</project_specification>
|
||||||
@@ -1,134 +0,0 @@
|
|||||||
<project_specification>
|
|
||||||
<project_name>Votre Nom d'Application</project_name>
|
|
||||||
|
|
||||||
<overview>
|
|
||||||
Description complète de votre application. Expliquez :
|
|
||||||
- Ce que fait l'application
|
|
||||||
- Qui sont les utilisateurs cibles
|
|
||||||
- Les objectifs principaux
|
|
||||||
- Les fonctionnalités clés en quelques phrases
|
|
||||||
</overview>
|
|
||||||
|
|
||||||
<technology_stack>
|
|
||||||
<api_key>
|
|
||||||
Note: Vous pouvez utiliser une clé API située à /tmp/api-key pour les tests.
|
|
||||||
Vous ne pourrez pas lire ce fichier, mais vous pouvez le référencer dans le code.
|
|
||||||
</api_key>
|
|
||||||
<frontend>
|
|
||||||
<framework>React avec Vite</framework>
|
|
||||||
<styling>Tailwind CSS (via CDN)</styling>
|
|
||||||
<state_management>React hooks et context</state_management>
|
|
||||||
<routing>React Router pour la navigation</routing>
|
|
||||||
<port>Lancer uniquement sur le port {frontend_port}</port>
|
|
||||||
</frontend>
|
|
||||||
<backend>
|
|
||||||
<runtime>Node.js avec Express</runtime>
|
|
||||||
<database>SQLite avec better-sqlite3</database>
|
|
||||||
<api_integration>Intégration avec les APIs nécessaires</api_integration>
|
|
||||||
<streaming>Server-Sent Events pour les réponses en temps réel (si nécessaire)</streaming>
|
|
||||||
</backend>
|
|
||||||
<communication>
|
|
||||||
<api>Endpoints RESTful</api>
|
|
||||||
<streaming>SSE pour le streaming en temps réel (si nécessaire)</streaming>
|
|
||||||
</communication>
|
|
||||||
</technology_stack>
|
|
||||||
|
|
||||||
<prerequisites>
|
|
||||||
<environment_setup>
|
|
||||||
- Repository inclut .env avec les clés API configurées
|
|
||||||
- Dépendances frontend pré-installées via npm/pnpm
|
|
||||||
- Code backend dans le répertoire /server
|
|
||||||
- Installer les dépendances backend au besoin
|
|
||||||
</environment_setup>
|
|
||||||
</prerequisites>
|
|
||||||
|
|
||||||
<core_features>
|
|
||||||
<!--
|
|
||||||
IMPORTANT: Créez une balise <feature_X> pour chaque fonctionnalité.
|
|
||||||
L'agent initializer créera une issue Linear pour chaque feature.
|
|
||||||
Utilisez des numéros séquentiels: feature_1, feature_2, feature_3, etc.
|
|
||||||
-->
|
|
||||||
|
|
||||||
<feature_1>
|
|
||||||
<title>Nom de la fonctionnalité 1</title>
|
|
||||||
<description>
|
|
||||||
Description détaillée de ce que fait cette fonctionnalité.
|
|
||||||
Incluez les détails techniques importants, les cas d'usage, et les
|
|
||||||
interactions avec d'autres parties de l'application.
|
|
||||||
</description>
|
|
||||||
<priority>1</priority>
|
|
||||||
<category>frontend</category>
|
|
||||||
<test_steps>
|
|
||||||
1. Étape de test 1 - décrire l'action
|
|
||||||
2. Étape de test 2 - décrire l'action
|
|
||||||
3. Étape de test 3 - vérifier le résultat attendu
|
|
||||||
4. Étape de test 4 - tester les cas d'erreur
|
|
||||||
</test_steps>
|
|
||||||
</feature_1>
|
|
||||||
|
|
||||||
<feature_2>
|
|
||||||
<title>Nom de la fonctionnalité 2</title>
|
|
||||||
<description>
|
|
||||||
Description de la fonctionnalité 2...
|
|
||||||
</description>
|
|
||||||
<priority>2</priority>
|
|
||||||
<category>backend</category>
|
|
||||||
<test_steps>
|
|
||||||
1. Étape de test 1
|
|
||||||
2. Étape de test 2
|
|
||||||
</test_steps>
|
|
||||||
</feature_2>
|
|
||||||
|
|
||||||
<!--
|
|
||||||
Continuez à ajouter des features...
|
|
||||||
L'agent créera environ 50 issues au total, donc détaillez bien vos fonctionnalités
|
|
||||||
en les divisant en sous-fonctionnalités si nécessaire.
|
|
||||||
-->
|
|
||||||
</core_features>
|
|
||||||
|
|
||||||
<ui_design>
|
|
||||||
<!--
|
|
||||||
Optionnel: Décrivez le design UI si nécessaire
|
|
||||||
- Layout général
|
|
||||||
- Couleurs et thème
|
|
||||||
- Composants réutilisables
|
|
||||||
- Responsive design
|
|
||||||
-->
|
|
||||||
</ui_design>
|
|
||||||
|
|
||||||
<api_endpoints>
|
|
||||||
<!--
|
|
||||||
Optionnel: Liste des endpoints API si nécessaire
|
|
||||||
<endpoint>
|
|
||||||
<method>POST</method>
|
|
||||||
<path>/api/users</path>
|
|
||||||
<description>Créer un nouvel utilisateur</description>
|
|
||||||
<request_body>JSON avec email, password</request_body>
|
|
||||||
<response>JSON avec user_id, email</response>
|
|
||||||
</endpoint>
|
|
||||||
-->
|
|
||||||
</api_endpoints>
|
|
||||||
|
|
||||||
<database_schema>
|
|
||||||
<!--
|
|
||||||
Optionnel: Schéma de base de données si nécessaire
|
|
||||||
<table>
|
|
||||||
<name>users</name>
|
|
||||||
<columns>
|
|
||||||
<column>id INTEGER PRIMARY KEY</column>
|
|
||||||
<column>email TEXT UNIQUE</column>
|
|
||||||
<column>password_hash TEXT</column>
|
|
||||||
<column>created_at DATETIME</column>
|
|
||||||
</columns>
|
|
||||||
</table>
|
|
||||||
-->
|
|
||||||
</database_schema>
|
|
||||||
|
|
||||||
<deployment>
|
|
||||||
<!--
|
|
||||||
Optionnel: Instructions de déploiement si nécessaire
|
|
||||||
-->
|
|
||||||
</deployment>
|
|
||||||
</project_specification>
|
|
||||||
|
|
||||||
|
|
||||||
@@ -1,403 +0,0 @@
|
|||||||
<project_specification>
|
|
||||||
<project_name>Claude.ai Clone - Advanced Theme Customization</project_name>
|
|
||||||
|
|
||||||
<overview>
|
|
||||||
This specification adds advanced theme customization features to the Claude.ai clone application.
|
|
||||||
Users will be able to customize accent colors, font sizes, message spacing, and choose from
|
|
||||||
preset color themes. All changes are additive and backward-compatible with existing theme functionality.
|
|
||||||
|
|
||||||
The existing light/dark mode toggle remains unchanged and functional.
|
|
||||||
</overview>
|
|
||||||
|
|
||||||
<safety_requirements>
|
|
||||||
<critical>
|
|
||||||
- DO NOT modify existing light/dark mode functionality
|
|
||||||
- DO NOT break existing theme persistence
|
|
||||||
- DO NOT change existing CSS classes without ensuring backward compatibility
|
|
||||||
- All new theme options must be optional (defaults should match current behavior)
|
|
||||||
- Test thoroughly to ensure existing themes still work
|
|
||||||
- Maintain backward compatibility at all times
|
|
||||||
- New theme preferences should be stored separately from existing theme settings
|
|
||||||
</critical>
|
|
||||||
</safety_requirements>
|
|
||||||
|
|
||||||
<new_features>
|
|
||||||
<feature_6_theme_customization>
|
|
||||||
<title>Advanced Theme Customization</title>
|
|
||||||
<description>
|
|
||||||
Add advanced theme customization options. Users should be able to:
|
|
||||||
- Customize accent colors (beyond just light/dark mode)
|
|
||||||
- Choose from preset color themes (blue, green, purple, orange)
|
|
||||||
- Adjust font size globally (small, medium, large)
|
|
||||||
- Adjust message spacing (compact, comfortable, spacious)
|
|
||||||
- Preview theme changes before applying
|
|
||||||
- Save custom theme preferences
|
|
||||||
|
|
||||||
The customization interface should be intuitive and provide real-time preview
|
|
||||||
of changes before they are applied. All preferences should persist across sessions.
|
|
||||||
</description>
|
|
||||||
<priority>3</priority>
|
|
||||||
<category>style</category>
|
|
||||||
<implementation_approach>
|
|
||||||
- Create a new "Appearance" or "Theme" section in settings
|
|
||||||
- Add accent color picker with preset options (blue, green, purple, orange)
|
|
||||||
- Add font size slider/selector (small, medium, large)
|
|
||||||
- Add message spacing selector (compact, comfortable, spacious)
|
|
||||||
- Implement preview functionality that shows changes in real-time
|
|
||||||
- Store theme preferences in localStorage or backend (user preferences)
|
|
||||||
- Apply theme using CSS custom properties (CSS variables)
|
|
||||||
- Ensure theme works with both light and dark modes
|
|
||||||
</implementation_approach>
|
|
||||||
<test_steps>
|
|
||||||
1. Open settings menu
|
|
||||||
2. Navigate to "Appearance" or "Theme" section
|
|
||||||
3. Select a different accent color (e.g., green)
|
|
||||||
4. Verify accent color changes are visible in preview
|
|
||||||
5. Adjust font size slider to "large"
|
|
||||||
6. Verify font size changes in preview
|
|
||||||
7. Adjust message spacing option to "spacious"
|
|
||||||
8. Verify spacing changes in preview
|
|
||||||
9. Click "Preview" to see changes applied temporarily
|
|
||||||
10. Click "Apply" to save changes permanently
|
|
||||||
11. Verify changes persist after page refresh
|
|
||||||
12. Test with both light and dark mode
|
|
||||||
13. Test reset to default theme
|
|
||||||
14. Verify existing conversations display correctly with new theme
|
|
||||||
</test_steps>
|
|
||||||
</feature_6_theme_customization>
|
|
||||||
|
|
||||||
<feature_accent_colors>
|
|
||||||
<title>Accent Color Customization</title>
|
|
||||||
<description>
|
|
||||||
Allow users to customize the accent color used throughout the application.
|
|
||||||
This includes:
|
|
||||||
- Primary button colors
|
|
||||||
- Link colors
|
|
||||||
- Focus states
|
|
||||||
- Active states
|
|
||||||
- Selection highlights
|
|
||||||
- Progress indicators
|
|
||||||
|
|
||||||
Preset options:
|
|
||||||
- Blue (default, matches Claude.ai)
|
|
||||||
- Green
|
|
||||||
- Purple
|
|
||||||
- Orange
|
|
||||||
|
|
||||||
Users should be able to see a preview of each color before applying.
|
|
||||||
</description>
|
|
||||||
<priority>3</priority>
|
|
||||||
<category>style</category>
|
|
||||||
<implementation_approach>
|
|
||||||
- Define accent colors as CSS custom properties
|
|
||||||
- Create color palette for each preset (light and dark variants)
|
|
||||||
- Add color picker UI component in settings
|
|
||||||
- Update all accent color usages to use CSS variables
|
|
||||||
- Ensure colors have proper contrast ratios for accessibility
|
|
||||||
- Store selected accent color in user preferences
|
|
||||||
</implementation_approach>
|
|
||||||
<test_steps>
|
|
||||||
1. Open theme settings
|
|
||||||
2. Select "Green" accent color
|
|
||||||
3. Verify buttons, links, and highlights use green
|
|
||||||
4. Switch to dark mode and verify green accent still works
|
|
||||||
5. Test all preset colors (blue, green, purple, orange)
|
|
||||||
6. Verify color persists after refresh
|
|
||||||
7. Test accessibility (contrast ratios)
|
|
||||||
</test_steps>
|
|
||||||
</feature_accent_colors>
|
|
||||||
|
|
||||||
<feature_font_size>
|
|
||||||
<title>Global Font Size Adjustment</title>
|
|
||||||
<description>
|
|
||||||
Allow users to adjust the global font size for better readability.
|
|
||||||
Options:
|
|
||||||
- Small (12px base)
|
|
||||||
- Medium (14px base, default)
|
|
||||||
- Large (16px base)
|
|
||||||
|
|
||||||
Font size should scale proportionally across all text elements:
|
|
||||||
- Message text
|
|
||||||
- UI labels
|
|
||||||
- Input fields
|
|
||||||
- Buttons
|
|
||||||
- Sidebar text
|
|
||||||
</description>
|
|
||||||
<priority>3</priority>
|
|
||||||
<category>style</category>
|
|
||||||
<implementation_approach>
|
|
||||||
- Use CSS rem units for all font sizes
|
|
||||||
- Set base font size on root element
|
|
||||||
- Create font size presets (small, medium, large)
|
|
||||||
- Add font size selector in settings
|
|
||||||
- Store preference in user settings
|
|
||||||
- Ensure responsive design still works with different font sizes
|
|
||||||
</implementation_approach>
|
|
||||||
<test_steps>
|
|
||||||
1. Open theme settings
|
|
||||||
2. Select "Small" font size
|
|
||||||
3. Verify all text is smaller throughout the app
|
|
||||||
4. Select "Large" font size
|
|
||||||
5. Verify all text is larger throughout the app
|
|
||||||
6. Verify layout doesn't break with different font sizes
|
|
||||||
7. Test with long messages to ensure wrapping works
|
|
||||||
8. Verify preference persists after refresh
|
|
||||||
</test_steps>
|
|
||||||
</feature_font_size>
|
|
||||||
|
|
||||||
<feature_message_spacing>
|
|
||||||
<title>Message Spacing Customization</title>
|
|
||||||
<description>
|
|
||||||
Allow users to adjust the spacing between messages and within message bubbles.
|
|
||||||
Options:
|
|
||||||
- Compact: Minimal spacing (for users who prefer dense layouts)
|
|
||||||
- Comfortable: Default spacing (current behavior)
|
|
||||||
- Spacious: Increased spacing (for better readability)
|
|
||||||
|
|
||||||
This affects:
|
|
||||||
- Vertical spacing between messages
|
|
||||||
- Padding within message bubbles
|
|
||||||
- Spacing between message elements (avatar, text, timestamp)
|
|
||||||
</description>
|
|
||||||
<priority>3</priority>
|
|
||||||
<category>style</category>
|
|
||||||
<implementation_approach>
|
|
||||||
- Define spacing scale using CSS custom properties
|
|
||||||
- Create spacing presets (compact, comfortable, spacious)
|
|
||||||
- Apply spacing to message containers and bubbles
|
|
||||||
- Add spacing selector in settings
|
|
||||||
- Store preference in user settings
|
|
||||||
- Ensure spacing works well with different font sizes
|
|
||||||
</implementation_approach>
|
|
||||||
<test_steps>
|
|
||||||
1. Open theme settings
|
|
||||||
2. Select "Compact" spacing
|
|
||||||
3. Verify messages are closer together
|
|
||||||
4. Select "Spacious" spacing
|
|
||||||
5. Verify messages have more space between them
|
|
||||||
6. Test with long conversations to ensure scrolling works
|
|
||||||
7. Verify spacing preference persists after refresh
|
|
||||||
8. Test with different font sizes to ensure compatibility
|
|
||||||
</test_steps>
|
|
||||||
</feature_message_spacing>
|
|
||||||
|
|
||||||
<feature_theme_preview>
|
|
||||||
<title>Theme Preview Functionality</title>
|
|
||||||
<description>
|
|
||||||
Allow users to preview theme changes before applying them permanently.
|
|
||||||
The preview should:
|
|
||||||
- Show a sample conversation with the new theme applied
|
|
||||||
- Update in real-time as settings are changed
|
|
||||||
- Allow users to cancel and revert to previous theme
|
|
||||||
- Show both light and dark mode previews if applicable
|
|
||||||
|
|
||||||
Users should be able to:
|
|
||||||
- See preview immediately when changing settings
|
|
||||||
- Click "Apply" to save changes
|
|
||||||
- Click "Cancel" to discard changes
|
|
||||||
- Click "Reset" to return to default theme
|
|
||||||
</description>
|
|
||||||
<priority>3</priority>
|
|
||||||
<category>functional</category>
|
|
||||||
<implementation_approach>
|
|
||||||
- Create preview component showing sample conversation
|
|
||||||
- Apply theme changes temporarily to preview
|
|
||||||
- Store original theme state for cancel functionality
|
|
||||||
- Update preview in real-time as settings change
|
|
||||||
- Only persist changes when "Apply" is clicked
|
|
||||||
- Show clear visual feedback for preview vs. applied state
|
|
||||||
</implementation_approach>
|
|
||||||
<test_steps>
|
|
||||||
1. Open theme settings
|
|
||||||
2. Change accent color to green
|
|
||||||
3. Verify preview updates immediately
|
|
||||||
4. Change font size to large
|
|
||||||
5. Verify preview updates with new font size
|
|
||||||
6. Click "Cancel" and verify changes are reverted
|
|
||||||
7. Make changes again and click "Apply"
|
|
||||||
8. Verify changes are saved and applied to actual interface
|
|
||||||
9. Test preview with both light and dark mode
|
|
||||||
</test_steps>
|
|
||||||
</feature_theme_preview>
|
|
||||||
</new_features>
|
|
||||||
|
|
||||||
<implementation_notes>
|
|
||||||
<code_structure>
|
|
||||||
frontend/
|
|
||||||
components/
|
|
||||||
ThemeSettings.jsx # New theme customization UI (NEW)
|
|
||||||
ThemePreview.jsx # Preview component (NEW)
|
|
||||||
styles/
|
|
||||||
theme-variables.css # CSS custom properties for themes (NEW)
|
|
||||||
accent-colors.css # Accent color definitions (NEW)
|
|
||||||
hooks/
|
|
||||||
useTheme.js # Updated to handle new theme options
|
|
||||||
utils/
|
|
||||||
themeStorage.js # Theme preference persistence (NEW)
|
|
||||||
</code_structure>
|
|
||||||
|
|
||||||
<css_architecture>
|
|
||||||
Use CSS custom properties (CSS variables) for all theme values:
|
|
||||||
- --accent-color-primary
|
|
||||||
- --accent-color-hover
|
|
||||||
- --font-size-base
|
|
||||||
- --message-spacing-vertical
|
|
||||||
- --message-padding
|
|
||||||
|
|
||||||
This allows easy theme switching without JavaScript manipulation.
|
|
||||||
</css_architecture>
|
|
||||||
|
|
||||||
<storage_approach>
|
|
||||||
Store theme preferences in:
|
|
||||||
- localStorage for client-side persistence
|
|
||||||
- Or backend user preferences table if available
|
|
||||||
|
|
||||||
Structure:
|
|
||||||
{
|
|
||||||
accentColor: 'blue' | 'green' | 'purple' | 'orange',
|
|
||||||
fontSize: 'small' | 'medium' | 'large',
|
|
||||||
messageSpacing: 'compact' | 'comfortable' | 'spacious',
|
|
||||||
theme: 'light' | 'dark' (existing)
|
|
||||||
}
|
|
||||||
</storage_approach>
|
|
||||||
|
|
||||||
<safety_guidelines>
|
|
||||||
- Keep existing theme functionality intact
|
|
||||||
- Default values should match current behavior
|
|
||||||
- Use feature detection for new theme features
|
|
||||||
- Gracefully degrade if CSS custom properties not supported
|
|
||||||
- Test with existing conversations and UI elements
|
|
||||||
- Ensure accessibility standards are maintained
|
|
||||||
</safety_guidelines>
|
|
||||||
</implementation_notes>
|
|
||||||
|
|
||||||
<ui_components>
|
|
||||||
<theme_settings_panel>
|
|
||||||
<description>Settings panel for theme customization</description>
|
|
||||||
<sections>
|
|
||||||
- Accent Color: Radio buttons or color swatches for preset colors
|
|
||||||
- Font Size: Slider or dropdown (small, medium, large)
|
|
||||||
- Message Spacing: Radio buttons (compact, comfortable, spacious)
|
|
||||||
- Preview: Live preview of theme changes
|
|
||||||
- Actions: Apply, Cancel, Reset buttons
|
|
||||||
</sections>
|
|
||||||
</theme_settings_panel>
|
|
||||||
|
|
||||||
<theme_preview>
|
|
||||||
<description>Preview component showing sample conversation</description>
|
|
||||||
<elements>
|
|
||||||
- Sample user message
|
|
||||||
- Sample AI response
|
|
||||||
- Shows current accent color
|
|
||||||
- Shows current font size
|
|
||||||
- Shows current spacing
|
|
||||||
- Updates in real-time
|
|
||||||
</elements>
|
|
||||||
</theme_preview>
|
|
||||||
</ui_components>
|
|
||||||
|
|
||||||
<css_custom_properties>
|
|
||||||
<accent_colors>
|
|
||||||
Define CSS variables for each accent color preset:
|
|
||||||
--accent-blue: #2563eb;
|
|
||||||
--accent-green: #10b981;
|
|
||||||
--accent-purple: #8b5cf6;
|
|
||||||
--accent-orange: #f59e0b;
|
|
||||||
|
|
||||||
Each should have hover, active, and focus variants.
|
|
||||||
</accent_colors>
|
|
||||||
|
|
||||||
<font_sizes>
|
|
||||||
Define base font sizes:
|
|
||||||
--font-size-small: 0.75rem; (12px)
|
|
||||||
--font-size-medium: 0.875rem; (14px, default)
|
|
||||||
--font-size-large: 1rem; (16px)
|
|
||||||
</font_sizes>
|
|
||||||
|
|
||||||
<spacing>
|
|
||||||
Define spacing scales:
|
|
||||||
--spacing-compact: 0.5rem;
|
|
||||||
--spacing-comfortable: 1rem; (default)
|
|
||||||
--spacing-spacious: 1.5rem;
|
|
||||||
</spacing>
|
|
||||||
</css_custom_properties>
|
|
||||||
|
|
||||||
<api_endpoints>
|
|
||||||
<if_backend_storage>
|
|
||||||
If storing preferences in backend:
|
|
||||||
- GET /api/user/preferences - Get user theme preferences
|
|
||||||
- PUT /api/user/preferences - Update user theme preferences
|
|
||||||
- GET /api/user/preferences/theme - Get theme preferences only
|
|
||||||
</if_backend_storage>
|
|
||||||
|
|
||||||
<note>
|
|
||||||
If using localStorage only, no API endpoints needed.
|
|
||||||
Backend storage is optional but recommended for multi-device sync.
|
|
||||||
</note>
|
|
||||||
</api_endpoints>
|
|
||||||
|
|
||||||
<accessibility_requirements>
|
|
||||||
- All accent colors must meet WCAG AA contrast ratios (4.5:1 for text)
|
|
||||||
- Font size changes must not break screen reader compatibility
|
|
||||||
- Theme settings must be keyboard navigable
|
|
||||||
- Color choices should not be the only way to convey information
|
|
||||||
- Provide high contrast mode option if possible
|
|
||||||
</accessibility_requirements>
|
|
||||||
|
|
||||||
<testing_requirements>
|
|
||||||
<regression_tests>
|
|
||||||
- Verify existing light/dark mode toggle still works
|
|
||||||
- Verify existing theme persistence still works
|
|
||||||
- Test that default theme matches current behavior
|
|
||||||
- Verify existing conversations display correctly
|
|
||||||
- Test that all UI elements are styled correctly
|
|
||||||
</regression_tests>
|
|
||||||
|
|
||||||
<feature_tests>
|
|
||||||
- Test each accent color preset
|
|
||||||
- Test each font size option
|
|
||||||
- Test each spacing option
|
|
||||||
- Test theme preview functionality
|
|
||||||
- Test theme persistence (localStorage/backend)
|
|
||||||
- Test theme reset to defaults
|
|
||||||
- Test theme with both light and dark modes
|
|
||||||
- Test theme changes in real-time
|
|
||||||
</feature_tests>
|
|
||||||
|
|
||||||
<compatibility_tests>
|
|
||||||
- Test with different browsers (Chrome, Firefox, Safari, Edge)
|
|
||||||
- Test with different screen sizes (responsive design)
|
|
||||||
- Test with long conversations
|
|
||||||
- Test with different message types (text, code, artifacts)
|
|
||||||
- Test accessibility with screen readers
|
|
||||||
</compatibility_tests>
|
|
||||||
</testing_requirements>
|
|
||||||
|
|
||||||
<success_criteria>
|
|
||||||
<functionality>
|
|
||||||
- Users can customize accent colors from preset options
|
|
||||||
- Users can adjust global font size (small, medium, large)
|
|
||||||
- Users can adjust message spacing (compact, comfortable, spacious)
|
|
||||||
- Theme preview shows changes in real-time
|
|
||||||
- Theme preferences persist across sessions
|
|
||||||
- Existing light/dark mode functionality works unchanged
|
|
||||||
- All theme options work together harmoniously
|
|
||||||
</functionality>
|
|
||||||
|
|
||||||
<user_experience>
|
|
||||||
- Theme customization is intuitive and easy to use
|
|
||||||
- Preview provides clear feedback before applying changes
|
|
||||||
- Changes apply smoothly without flickering
|
|
||||||
- Settings are easy to find and access
|
|
||||||
- Reset to defaults is easily accessible
|
|
||||||
</user_experience>
|
|
||||||
|
|
||||||
<technical>
|
|
||||||
- Code is well-organized and maintainable
|
|
||||||
- CSS custom properties are used consistently
|
|
||||||
- Theme preferences are stored reliably
|
|
||||||
- No performance degradation with theme changes
|
|
||||||
- Backward compatibility is maintained
|
|
||||||
</technical>
|
|
||||||
</success_criteria>
|
|
||||||
</project_specification>
|
|
||||||
679
prompts/app_spec_types_docs.backup.txt
Normal file
679
prompts/app_spec_types_docs.backup.txt
Normal file
@@ -0,0 +1,679 @@
|
|||||||
|
<project_specification>
|
||||||
|
<project_name>Library RAG - Type Safety & Documentation Enhancement</project_name>
|
||||||
|
|
||||||
|
<overview>
|
||||||
|
Enhance the Library RAG application (philosophical texts indexing and semantic search) by adding
|
||||||
|
strict type annotations and comprehensive Google-style docstrings to all Python modules. This will
|
||||||
|
improve code maintainability, enable static type checking with mypy, and provide clear documentation
|
||||||
|
for all functions, classes, and modules.
|
||||||
|
|
||||||
|
The application is a RAG pipeline that processes PDF documents through OCR, LLM-based extraction,
|
||||||
|
semantic chunking, and ingestion into Weaviate vector database. It includes a Flask web interface
|
||||||
|
for document upload, processing, and semantic search.
|
||||||
|
</overview>
|
||||||
|
|
||||||
|
<technology_stack>
|
||||||
|
<backend>
|
||||||
|
<runtime>Python 3.10+</runtime>
|
||||||
|
<web_framework>Flask 3.0</web_framework>
|
||||||
|
<vector_database>Weaviate 1.34.4 with text2vec-transformers</vector_database>
|
||||||
|
<ocr>Mistral OCR API</ocr>
|
||||||
|
<llm>Ollama (local) or Mistral API</llm>
|
||||||
|
<type_checking>mypy with strict configuration</type_checking>
|
||||||
|
</backend>
|
||||||
|
<infrastructure>
|
||||||
|
<containerization>Docker Compose (Weaviate + transformers)</containerization>
|
||||||
|
<dependencies>weaviate-client, flask, mistralai, python-dotenv</dependencies>
|
||||||
|
</infrastructure>
|
||||||
|
</technology_stack>
|
||||||
|
|
||||||
|
<current_state>
|
||||||
|
<project_structure>
|
||||||
|
- flask_app.py: Main Flask application (640 lines)
|
||||||
|
- schema.py: Weaviate schema definition (383 lines)
|
||||||
|
- utils/: 16+ modules for PDF processing pipeline
|
||||||
|
- pdf_pipeline.py: Main orchestration (879 lines)
|
||||||
|
- mistral_client.py: OCR API client
|
||||||
|
- ocr_processor.py: OCR processing
|
||||||
|
- markdown_builder.py: Markdown generation
|
||||||
|
- llm_metadata.py: Metadata extraction via LLM
|
||||||
|
- llm_toc.py: Table of contents extraction
|
||||||
|
- llm_classifier.py: Section classification
|
||||||
|
- llm_chunker.py: Semantic chunking
|
||||||
|
- llm_cleaner.py: Chunk cleaning
|
||||||
|
- llm_validator.py: Document validation
|
||||||
|
- weaviate_ingest.py: Database ingestion
|
||||||
|
- hierarchy_parser.py: Document hierarchy parsing
|
||||||
|
- image_extractor.py: Image extraction from PDFs
|
||||||
|
- toc_extractor*.py: Various TOC extraction methods
|
||||||
|
- templates/: Jinja2 templates for Flask UI
|
||||||
|
- tests/utils2/: Minimal test coverage (3 test files)
|
||||||
|
</project_structure>
|
||||||
|
|
||||||
|
<issues>
|
||||||
|
- Inconsistent type annotations across modules (some have partial types, many have none)
|
||||||
|
- Missing or incomplete docstrings (no Google-style format)
|
||||||
|
- No mypy configuration for strict type checking
|
||||||
|
- Type hints missing on function parameters and return values
|
||||||
|
- Dict[str, Any] used extensively without proper typing
|
||||||
|
- No type stubs for complex nested structures
|
||||||
|
</issues>
|
||||||
|
</current_state>
|
||||||
|
|
||||||
|
<core_features>
|
||||||
|
<type_annotations>
|
||||||
|
<strict_typing>
|
||||||
|
- Add complete type annotations to ALL functions and methods
|
||||||
|
- Use proper generic types (List, Dict, Optional, Union) from typing module
|
||||||
|
- Add TypedDict for complex dictionary structures
|
||||||
|
- Add Protocol types for duck-typed interfaces
|
||||||
|
- Use Literal types for string constants
|
||||||
|
- Add ParamSpec and TypeVar where appropriate
|
||||||
|
- Type all class attributes and instance variables
|
||||||
|
- Add type annotations to lambda functions where possible
|
||||||
|
</strict_typing>
|
||||||
|
|
||||||
|
<mypy_configuration>
|
||||||
|
- Create mypy.ini with strict configuration
|
||||||
|
- Enable: check_untyped_defs, disallow_untyped_defs, disallow_incomplete_defs
|
||||||
|
- Enable: disallow_untyped_calls, disallow_untyped_decorators
|
||||||
|
- Enable: warn_return_any, warn_redundant_casts
|
||||||
|
- Enable: strict_equality, strict_optional
|
||||||
|
- Set python_version to 3.10
|
||||||
|
- Configure per-module overrides if needed for gradual migration
|
||||||
|
</mypy_configuration>
|
||||||
|
|
||||||
|
<type_stubs>
|
||||||
|
- Create TypedDict definitions for common data structures:
|
||||||
|
- OCR response structures
|
||||||
|
- Metadata dictionaries
|
||||||
|
- TOC entries
|
||||||
|
- Chunk objects
|
||||||
|
- Weaviate objects
|
||||||
|
- Pipeline results
|
||||||
|
- Add NewType for semantic type safety (DocumentName, ChunkId, etc.)
|
||||||
|
- Create Protocol types for callback functions
|
||||||
|
</type_stubs>
|
||||||
|
|
||||||
|
<specific_improvements>
|
||||||
|
- pdf_pipeline.py: Type all 10 pipeline steps, callbacks, result dictionaries
|
||||||
|
- flask_app.py: Type all route handlers, request/response types
|
||||||
|
- schema.py: Type Weaviate configuration objects
|
||||||
|
- llm_*.py: Type LLM request/response structures
|
||||||
|
- mistral_client.py: Type API client methods and responses
|
||||||
|
- weaviate_ingest.py: Type ingestion functions and batch operations
|
||||||
|
</specific_improvements>
|
||||||
|
</type_annotations>
|
||||||
|
|
||||||
|
<documentation>
|
||||||
|
<google_style_docstrings>
|
||||||
|
- Add comprehensive Google-style docstrings to ALL:
|
||||||
|
- Module-level docstrings explaining purpose and usage
|
||||||
|
- Class docstrings with Attributes section
|
||||||
|
- Function/method docstrings with Args, Returns, Raises sections
|
||||||
|
- Complex algorithm explanations with Examples section
|
||||||
|
- Include code examples for public APIs
|
||||||
|
- Document all exceptions that can be raised
|
||||||
|
- Add Notes section for important implementation details
|
||||||
|
- Add See Also section for related functions
|
||||||
|
</google_style_docstrings>
|
||||||
|
|
||||||
|
<module_documentation>
|
||||||
|
<utils_modules>
|
||||||
|
- pdf_pipeline.py: Document the 10-step pipeline, each step's purpose
|
||||||
|
- mistral_client.py: Document OCR API usage, cost calculation
|
||||||
|
- llm_metadata.py: Document metadata extraction logic
|
||||||
|
- llm_toc.py: Document TOC extraction strategies
|
||||||
|
- llm_classifier.py: Document section classification types
|
||||||
|
- llm_chunker.py: Document semantic vs basic chunking
|
||||||
|
- llm_cleaner.py: Document cleaning rules and validation
|
||||||
|
- llm_validator.py: Document validation criteria
|
||||||
|
- weaviate_ingest.py: Document ingestion process, nested objects
|
||||||
|
- hierarchy_parser.py: Document hierarchy building algorithm
|
||||||
|
</utils_modules>
|
||||||
|
|
||||||
|
<flask_app>
|
||||||
|
- Document all routes with request/response examples
|
||||||
|
- Document SSE (Server-Sent Events) implementation
|
||||||
|
- Document Weaviate query patterns
|
||||||
|
- Document upload processing workflow
|
||||||
|
- Document background job management
|
||||||
|
</flask_app>
|
||||||
|
|
||||||
|
<schema>
|
||||||
|
- Document Weaviate schema design decisions
|
||||||
|
- Document each collection's purpose and relationships
|
||||||
|
- Document nested object structure
|
||||||
|
- Document vectorization strategy
|
||||||
|
</schema>
|
||||||
|
</module_documentation>
|
||||||
|
|
||||||
|
<inline_comments>
|
||||||
|
- Add inline comments for complex logic only (don't over-comment)
|
||||||
|
- Explain WHY not WHAT (code should be self-documenting)
|
||||||
|
- Document performance considerations
|
||||||
|
- Document cost implications (OCR, LLM API calls)
|
||||||
|
- Document error handling strategies
|
||||||
|
</inline_comments>
|
||||||
|
</documentation>
|
||||||
|
|
||||||
|
<validation>
|
||||||
|
<type_checking>
|
||||||
|
- All modules must pass mypy --strict
|
||||||
|
- No # type: ignore comments without justification
|
||||||
|
- CI/CD should run mypy checks
|
||||||
|
- Type coverage should be 100%
|
||||||
|
</type_checking>
|
||||||
|
|
||||||
|
<documentation_quality>
|
||||||
|
- All public functions must have docstrings
|
||||||
|
- All docstrings must follow Google style
|
||||||
|
- Examples should be executable and tested
|
||||||
|
- Documentation should be clear and concise
|
||||||
|
</documentation_quality>
|
||||||
|
</validation>
|
||||||
|
</core_features>
|
||||||
|
|
||||||
|
<implementation_priority>
|
||||||
|
<critical_modules>
|
||||||
|
Priority 1 (Most used, most complex):
|
||||||
|
1. utils/pdf_pipeline.py - Main orchestration
|
||||||
|
2. flask_app.py - Web application entry point
|
||||||
|
3. utils/weaviate_ingest.py - Database operations
|
||||||
|
4. schema.py - Schema definition
|
||||||
|
|
||||||
|
Priority 2 (Core LLM modules):
|
||||||
|
5. utils/llm_metadata.py
|
||||||
|
6. utils/llm_toc.py
|
||||||
|
7. utils/llm_classifier.py
|
||||||
|
8. utils/llm_chunker.py
|
||||||
|
9. utils/llm_cleaner.py
|
||||||
|
10. utils/llm_validator.py
|
||||||
|
|
||||||
|
Priority 3 (OCR and parsing):
|
||||||
|
11. utils/mistral_client.py
|
||||||
|
12. utils/ocr_processor.py
|
||||||
|
13. utils/markdown_builder.py
|
||||||
|
14. utils/hierarchy_parser.py
|
||||||
|
15. utils/image_extractor.py
|
||||||
|
|
||||||
|
Priority 4 (Supporting modules):
|
||||||
|
16. utils/toc_extractor.py
|
||||||
|
17. utils/toc_extractor_markdown.py
|
||||||
|
18. utils/toc_extractor_visual.py
|
||||||
|
19. utils/llm_structurer.py (legacy)
|
||||||
|
</critical_modules>
|
||||||
|
</implementation_priority>
|
||||||
|
|
||||||
|
<implementation_steps>
|
||||||
|
<feature_1>
|
||||||
|
<title>Setup Type Checking Infrastructure</title>
|
||||||
|
<description>
|
||||||
|
Configure mypy with strict settings and create foundational type definitions
|
||||||
|
</description>
|
||||||
|
<tasks>
|
||||||
|
- Create mypy.ini configuration file with strict settings
|
||||||
|
- Add mypy to requirements.txt or dev dependencies
|
||||||
|
- Create utils/types.py module for common TypedDict definitions
|
||||||
|
- Define core types: OCRResponse, Metadata, TOCEntry, ChunkData, PipelineResult
|
||||||
|
- Add NewType definitions for semantic types: DocumentName, ChunkId, SectionPath
|
||||||
|
- Create Protocol types for callbacks (ProgressCallback, etc.)
|
||||||
|
- Document type definitions in utils/types.py module docstring
|
||||||
|
- Test mypy configuration on a single module to verify settings
|
||||||
|
</tasks>
|
||||||
|
<acceptance_criteria>
|
||||||
|
- mypy.ini exists with strict configuration
|
||||||
|
- utils/types.py contains all foundational types with docstrings
|
||||||
|
- mypy runs without errors on utils/types.py
|
||||||
|
- Type definitions are comprehensive and reusable
|
||||||
|
</acceptance_criteria>
|
||||||
|
</feature_1>
|
||||||
|
|
||||||
|
<feature_2>
|
||||||
|
<title>Add Types to PDF Pipeline Orchestration</title>
|
||||||
|
<description>
|
||||||
|
Add complete type annotations to pdf_pipeline.py (879 lines, most complex module)
|
||||||
|
</description>
|
||||||
|
<tasks>
|
||||||
|
- Add type annotations to all function signatures in pdf_pipeline.py
|
||||||
|
- Type the 10-step pipeline: OCR, Markdown, Metadata, TOC, Classify, Chunk, Clean, Validate, Weaviate
|
||||||
|
- Type progress_callback parameter with Protocol or Callable
|
||||||
|
- Add TypedDict for pipeline options dictionary
|
||||||
|
- Add TypedDict for pipeline result dictionary structure
|
||||||
|
- Type all helper functions (extract_document_metadata_legacy, etc.)
|
||||||
|
- Add proper return types for process_pdf_v2, process_pdf, process_pdf_bytes
|
||||||
|
- Fix any mypy errors that arise
|
||||||
|
- Verify mypy --strict passes on pdf_pipeline.py
|
||||||
|
</tasks>
|
||||||
|
<acceptance_criteria>
|
||||||
|
- All functions in pdf_pipeline.py have complete type annotations
|
||||||
|
- progress_callback is properly typed with Protocol
|
||||||
|
- All Dict[str, Any] replaced with TypedDict where appropriate
|
||||||
|
- mypy --strict pdf_pipeline.py passes with zero errors
|
||||||
|
- No # type: ignore comments (or justified if absolutely necessary)
|
||||||
|
</acceptance_criteria>
|
||||||
|
</feature_2>
|
||||||
|
|
||||||
|
<feature_3>
|
||||||
|
<title>Add Types to Flask Application</title>
|
||||||
|
<description>
|
||||||
|
Add complete type annotations to flask_app.py and type all routes
|
||||||
|
</description>
|
||||||
|
<tasks>
|
||||||
|
- Add type annotations to all Flask route handlers
|
||||||
|
- Type request.args, request.form, request.files usage
|
||||||
|
- Type jsonify() return values
|
||||||
|
- Type get_weaviate_client context manager
|
||||||
|
- Type get_collection_stats, get_all_chunks, search_chunks functions
|
||||||
|
- Add TypedDict for Weaviate query results
|
||||||
|
- Type background job processing functions (run_processing_job)
|
||||||
|
- Type SSE generator function (upload_progress)
|
||||||
|
- Add type hints for template rendering
|
||||||
|
- Verify mypy --strict passes on flask_app.py
|
||||||
|
</tasks>
|
||||||
|
<acceptance_criteria>
|
||||||
|
- All Flask routes have complete type annotations
|
||||||
|
- Request/response types are clear and documented
|
||||||
|
- Weaviate query functions are properly typed
|
||||||
|
- SSE generator is correctly typed
|
||||||
|
- mypy --strict flask_app.py passes with zero errors
|
||||||
|
</acceptance_criteria>
|
||||||
|
</feature_3>
|
||||||
|
|
||||||
|
<feature_4>
|
||||||
|
<title>Add Types to Core LLM Modules</title>
|
||||||
|
<description>
|
||||||
|
Add complete type annotations to all LLM processing modules (metadata, TOC, classifier, chunker, cleaner, validator)
|
||||||
|
</description>
|
||||||
|
<tasks>
|
||||||
|
- llm_metadata.py: Type extract_metadata function, return structure
|
||||||
|
- llm_toc.py: Type extract_toc function, TOC hierarchy structure
|
||||||
|
- llm_classifier.py: Type classify_sections, section types (Literal), validation functions
|
||||||
|
- llm_chunker.py: Type chunk_section_with_llm, chunk objects
|
||||||
|
- llm_cleaner.py: Type clean_chunk, is_chunk_valid functions
|
||||||
|
- llm_validator.py: Type validate_document, validation result structure
|
||||||
|
- Add TypedDict for LLM request/response structures
|
||||||
|
- Type provider selection ("ollama" | "mistral" as Literal)
|
||||||
|
- Type model names with Literal or constants
|
||||||
|
- Verify mypy --strict passes on all llm_*.py modules
|
||||||
|
</tasks>
|
||||||
|
<acceptance_criteria>
|
||||||
|
- All LLM modules have complete type annotations
|
||||||
|
- Section types use Literal for type safety
|
||||||
|
- Provider and model parameters are strongly typed
|
||||||
|
- LLM request/response structures use TypedDict
|
||||||
|
- mypy --strict passes on all llm_*.py modules with zero errors
|
||||||
|
</acceptance_criteria>
|
||||||
|
</feature_4>
|
||||||
|
|
||||||
|
<feature_5>
|
||||||
|
<title>Add Types to Weaviate and Database Modules</title>
|
||||||
|
<description>
|
||||||
|
Add complete type annotations to schema.py and weaviate_ingest.py
|
||||||
|
</description>
|
||||||
|
<tasks>
|
||||||
|
- schema.py: Type Weaviate configuration objects
|
||||||
|
- schema.py: Type collection property definitions
|
||||||
|
- weaviate_ingest.py: Type ingest_document function signature
|
||||||
|
- weaviate_ingest.py: Type delete_document_chunks function
|
||||||
|
- weaviate_ingest.py: Add TypedDict for Weaviate object structure
|
||||||
|
- Type batch insertion operations
|
||||||
|
- Type nested object references (work, document)
|
||||||
|
- Add proper error types for Weaviate exceptions
|
||||||
|
- Verify mypy --strict passes on both modules
|
||||||
|
</tasks>
|
||||||
|
<acceptance_criteria>
|
||||||
|
- schema.py has complete type annotations for Weaviate config
|
||||||
|
- weaviate_ingest.py functions are fully typed
|
||||||
|
- Nested object structures use TypedDict
|
||||||
|
- Weaviate client operations are properly typed
|
||||||
|
- mypy --strict passes on both modules with zero errors
|
||||||
|
</acceptance_criteria>
|
||||||
|
</feature_5>
|
||||||
|
|
||||||
|
<feature_6>
|
||||||
|
<title>Add Types to OCR and Parsing Modules</title>
|
||||||
|
<description>
|
||||||
|
Add complete type annotations to mistral_client.py, ocr_processor.py, markdown_builder.py, hierarchy_parser.py
|
||||||
|
</description>
|
||||||
|
<tasks>
|
||||||
|
- mistral_client.py: Type create_client, run_ocr, estimate_ocr_cost
|
||||||
|
- mistral_client.py: Add TypedDict for Mistral API response structures
|
||||||
|
- ocr_processor.py: Type serialize_ocr_response, OCR object structures
|
||||||
|
- markdown_builder.py: Type build_markdown, image_writer parameter
|
||||||
|
- hierarchy_parser.py: Type build_hierarchy, flatten_hierarchy functions
|
||||||
|
- hierarchy_parser.py: Add TypedDict for hierarchy node structure
|
||||||
|
- image_extractor.py: Type create_image_writer, image handling
|
||||||
|
- Verify mypy --strict passes on all modules
|
||||||
|
</tasks>
|
||||||
|
<acceptance_criteria>
|
||||||
|
- All OCR/parsing modules have complete type annotations
|
||||||
|
- Mistral API structures use TypedDict
|
||||||
|
- Hierarchy nodes are properly typed
|
||||||
|
- Image handling functions are typed
|
||||||
|
- mypy --strict passes on all modules with zero errors
|
||||||
|
</acceptance_criteria>
|
||||||
|
</feature_6>
|
||||||
|
|
||||||
|
<feature_7>
|
||||||
|
<title>Add Google-Style Docstrings to Core Modules</title>
|
||||||
|
<description>
|
||||||
|
Add comprehensive Google-style docstrings to pdf_pipeline.py, flask_app.py, and weaviate modules
|
||||||
|
</description>
|
||||||
|
<tasks>
|
||||||
|
- pdf_pipeline.py: Add module docstring explaining the V2 pipeline
|
||||||
|
- pdf_pipeline.py: Add docstrings to process_pdf_v2 with Args, Returns, Raises sections
|
||||||
|
- pdf_pipeline.py: Document each of the 10 pipeline steps in comments
|
||||||
|
- pdf_pipeline.py: Add Examples section showing typical usage
|
||||||
|
- flask_app.py: Add module docstring explaining Flask application
|
||||||
|
- flask_app.py: Document all routes with request/response examples
|
||||||
|
- flask_app.py: Document Weaviate connection management
|
||||||
|
- schema.py: Add module docstring explaining schema design
|
||||||
|
- schema.py: Document each collection's purpose and relationships
|
||||||
|
- weaviate_ingest.py: Document ingestion process with examples
|
||||||
|
- All docstrings must follow Google style format exactly
|
||||||
|
</tasks>
|
||||||
|
<acceptance_criteria>
|
||||||
|
- All core modules have comprehensive module-level docstrings
|
||||||
|
- All public functions have Google-style docstrings
|
||||||
|
- Args, Returns, Raises sections are complete and accurate
|
||||||
|
- Examples are provided for complex functions
|
||||||
|
- Docstrings explain WHY, not just WHAT
|
||||||
|
</acceptance_criteria>
|
||||||
|
</feature_7>
|
||||||
|
|
||||||
|
<feature_8>
|
||||||
|
<title>Add Google-Style Docstrings to LLM Modules</title>
|
||||||
|
<description>
|
||||||
|
Add comprehensive Google-style docstrings to all LLM processing modules
|
||||||
|
</description>
|
||||||
|
<tasks>
|
||||||
|
- llm_metadata.py: Document metadata extraction logic with examples
|
||||||
|
- llm_toc.py: Document TOC extraction strategies and fallbacks
|
||||||
|
- llm_classifier.py: Document section types and classification criteria
|
||||||
|
- llm_chunker.py: Document semantic vs basic chunking approaches
|
||||||
|
- llm_cleaner.py: Document cleaning rules and validation logic
|
||||||
|
- llm_validator.py: Document validation criteria and corrections
|
||||||
|
- Add Examples sections showing input/output for each function
|
||||||
|
- Document LLM provider differences (Ollama vs Mistral)
|
||||||
|
- Document cost implications in Notes sections
|
||||||
|
- All docstrings must follow Google style format exactly
|
||||||
|
</tasks>
|
||||||
|
<acceptance_criteria>
|
||||||
|
- All LLM modules have comprehensive docstrings
|
||||||
|
- Each function has Args, Returns, Raises sections
|
||||||
|
- Examples show realistic input/output
|
||||||
|
- Provider differences are documented
|
||||||
|
- Cost implications are noted where relevant
|
||||||
|
</acceptance_criteria>
|
||||||
|
</feature_8>
|
||||||
|
|
||||||
|
<feature_9>
|
||||||
|
<title>Add Google-Style Docstrings to OCR and Parsing Modules</title>
|
||||||
|
<description>
|
||||||
|
Add comprehensive Google-style docstrings to OCR, markdown, hierarchy, and extraction modules
|
||||||
|
</description>
|
||||||
|
<tasks>
|
||||||
|
- mistral_client.py: Document OCR API usage, cost calculation
|
||||||
|
- ocr_processor.py: Document OCR response processing
|
||||||
|
- markdown_builder.py: Document markdown generation strategy
|
||||||
|
- hierarchy_parser.py: Document hierarchy building algorithm
|
||||||
|
- image_extractor.py: Document image extraction process
|
||||||
|
- toc_extractor*.py: Document various TOC extraction methods
|
||||||
|
- Add Examples sections for complex algorithms
|
||||||
|
- Document edge cases and error handling
|
||||||
|
- All docstrings must follow Google style format exactly
|
||||||
|
</tasks>
|
||||||
|
<acceptance_criteria>
|
||||||
|
- All OCR/parsing modules have comprehensive docstrings
|
||||||
|
- Complex algorithms are well explained
|
||||||
|
- Edge cases are documented
|
||||||
|
- Error handling is documented
|
||||||
|
- Examples demonstrate typical usage
|
||||||
|
</acceptance_criteria>
|
||||||
|
</feature_9>
|
||||||
|
|
||||||
|
<feature_10>
|
||||||
|
<title>Final Validation and CI Integration</title>
|
||||||
|
<description>
|
||||||
|
Verify all type annotations and docstrings, integrate mypy into CI/CD
|
||||||
|
</description>
|
||||||
|
<tasks>
|
||||||
|
- Run mypy --strict on entire codebase, verify 100% pass rate
|
||||||
|
- Verify all public functions have docstrings
|
||||||
|
- Check docstring formatting with pydocstyle or similar tool
|
||||||
|
- Create GitHub Actions workflow to run mypy on every commit
|
||||||
|
- Update README.md with type checking instructions
|
||||||
|
- Update CLAUDE.md with documentation standards
|
||||||
|
- Create CONTRIBUTING.md with type annotation and docstring guidelines
|
||||||
|
- Generate API documentation with Sphinx or pdoc
|
||||||
|
- Fix any remaining mypy errors or missing docstrings
|
||||||
|
</tasks>
|
||||||
|
<acceptance_criteria>
|
||||||
|
- mypy --strict passes on entire codebase with zero errors
|
||||||
|
- All public functions have Google-style docstrings
|
||||||
|
- CI/CD runs mypy checks automatically
|
||||||
|
- Documentation is generated and accessible
|
||||||
|
- Contributing guidelines document type/docstring requirements
|
||||||
|
</acceptance_criteria>
|
||||||
|
</feature_10>
|
||||||
|
</implementation_steps>
|
||||||
|
|
||||||
|
<success_criteria>
|
||||||
|
<type_safety>
|
||||||
|
- 100% type coverage across all modules
|
||||||
|
- mypy --strict passes with zero errors
|
||||||
|
- No # type: ignore comments without justification
|
||||||
|
- All Dict[str, Any] replaced with TypedDict where appropriate
|
||||||
|
- Proper use of generics, protocols, and type variables
|
||||||
|
- NewType used for semantic type safety
|
||||||
|
</type_safety>
|
||||||
|
|
||||||
|
<documentation_quality>
|
||||||
|
- All modules have comprehensive module-level docstrings
|
||||||
|
- All public functions/classes have Google-style docstrings
|
||||||
|
- All docstrings include Args, Returns, Raises sections
|
||||||
|
- Complex functions include Examples sections
|
||||||
|
- Cost implications documented in Notes sections
|
||||||
|
- Error handling clearly documented
|
||||||
|
- Provider differences (Ollama vs Mistral) documented
|
||||||
|
</documentation_quality>
|
||||||
|
|
||||||
|
<code_quality>
|
||||||
|
- Code is self-documenting with clear variable names
|
||||||
|
- Inline comments explain WHY, not WHAT
|
||||||
|
- Complex algorithms are well explained
|
||||||
|
- Performance considerations documented
|
||||||
|
- Security considerations documented
|
||||||
|
</code_quality>
|
||||||
|
|
||||||
|
<developer_experience>
|
||||||
|
- IDE autocomplete works perfectly with type hints
|
||||||
|
- Type errors caught at development time, not runtime
|
||||||
|
- Documentation is easily accessible in IDE
|
||||||
|
- API examples are executable and tested
|
||||||
|
- Contributing guidelines are clear and comprehensive
|
||||||
|
</developer_experience>
|
||||||
|
|
||||||
|
<maintainability>
|
||||||
|
- Refactoring is safer with type checking
|
||||||
|
- Function signatures are self-documenting
|
||||||
|
- API contracts are explicit and enforced
|
||||||
|
- Breaking changes are caught by type checker
|
||||||
|
- New developers can understand code quickly
|
||||||
|
</maintainability>
|
||||||
|
</success_criteria>
|
||||||
|
|
||||||
|
<constraints>
|
||||||
|
<compatibility>
|
||||||
|
- Must maintain backward compatibility with existing code
|
||||||
|
- Cannot break existing Flask routes or API contracts
|
||||||
|
- Weaviate schema must remain unchanged
|
||||||
|
- Existing tests must continue to pass
|
||||||
|
</compatibility>
|
||||||
|
|
||||||
|
<gradual_migration>
|
||||||
|
- Can use per-module mypy configuration for gradual migration
|
||||||
|
- Can temporarily disable strict checks on legacy modules
|
||||||
|
- Priority modules must be completed first
|
||||||
|
- Low-priority modules can be deferred
|
||||||
|
</gradual_migration>
|
||||||
|
|
||||||
|
<standards>
|
||||||
|
- All type annotations must use Python 3.10+ syntax
|
||||||
|
- Docstrings must follow Google style exactly (not NumPy or reStructuredText)
|
||||||
|
- Use typing module (List, Dict, Optional) until Python 3.9 support dropped
|
||||||
|
- Use from __future__ import annotations if needed for forward references
|
||||||
|
</standards>
|
||||||
|
</constraints>
|
||||||
|
|
||||||
|
<testing_strategy>
|
||||||
|
<type_checking>
|
||||||
|
- Run mypy --strict on each module after adding types
|
||||||
|
- Use mypy daemon (dmypy) for faster incremental checking
|
||||||
|
- Add mypy to pre-commit hooks
|
||||||
|
- CI/CD must run mypy and fail on type errors
|
||||||
|
</type_checking>
|
||||||
|
|
||||||
|
<documentation_validation>
|
||||||
|
- Use pydocstyle to validate Google-style format
|
||||||
|
- Use sphinx-build to generate docs and catch errors
|
||||||
|
- Manual review of docstring examples
|
||||||
|
- Verify examples are executable and correct
|
||||||
|
</documentation_validation>
|
||||||
|
|
||||||
|
<integration_testing>
|
||||||
|
- Verify existing tests still pass after type additions
|
||||||
|
- Add new tests for complex typed structures
|
||||||
|
- Test mypy configuration on sample code
|
||||||
|
- Verify IDE autocomplete works correctly
|
||||||
|
</integration_testing>
|
||||||
|
</testing_strategy>
|
||||||
|
|
||||||
|
<documentation_examples>
|
||||||
|
<module_docstring>
|
||||||
|
```python
|
||||||
|
"""
|
||||||
|
PDF Pipeline V2 - Intelligent document processing with LLM enhancement.
|
||||||
|
|
||||||
|
This module orchestrates a 10-step pipeline for processing PDF documents:
|
||||||
|
1. OCR via Mistral API
|
||||||
|
2. Markdown construction with images
|
||||||
|
3. Metadata extraction via LLM
|
||||||
|
4. Table of contents (TOC) extraction
|
||||||
|
5. Section classification
|
||||||
|
6. Semantic chunking
|
||||||
|
7. Chunk cleaning and validation
|
||||||
|
8. Enrichment with concepts
|
||||||
|
9. Validation and corrections
|
||||||
|
10. Ingestion into Weaviate vector database
|
||||||
|
|
||||||
|
The pipeline supports multiple LLM providers (Ollama local, Mistral API) and
|
||||||
|
various processing modes (skip OCR, semantic chunking, OCR annotations).
|
||||||
|
|
||||||
|
Typical usage:
|
||||||
|
>>> from pathlib import Path
|
||||||
|
>>> from utils.pdf_pipeline import process_pdf
|
||||||
|
>>>
|
||||||
|
>>> result = process_pdf(
|
||||||
|
... Path("document.pdf"),
|
||||||
|
... use_llm=True,
|
||||||
|
... llm_provider="ollama",
|
||||||
|
... ingest_to_weaviate=True,
|
||||||
|
... )
|
||||||
|
>>> print(f"Processed {result['pages']} pages, {result['chunks_count']} chunks")
|
||||||
|
|
||||||
|
See Also:
|
||||||
|
mistral_client: OCR API client
|
||||||
|
llm_metadata: Metadata extraction
|
||||||
|
weaviate_ingest: Database ingestion
|
||||||
|
"""
|
||||||
|
```
|
||||||
|
</module_docstring>
|
||||||
|
|
||||||
|
<function_docstring>
|
||||||
|
```python
|
||||||
|
def process_pdf_v2(
|
||||||
|
pdf_path: Path,
|
||||||
|
output_dir: Path = Path("output"),
|
||||||
|
*,
|
||||||
|
use_llm: bool = True,
|
||||||
|
llm_provider: Literal["ollama", "mistral"] = "ollama",
|
||||||
|
llm_model: Optional[str] = None,
|
||||||
|
skip_ocr: bool = False,
|
||||||
|
ingest_to_weaviate: bool = True,
|
||||||
|
progress_callback: Optional[ProgressCallback] = None,
|
||||||
|
) -> PipelineResult:
|
||||||
|
"""
|
||||||
|
Process a PDF through the complete V2 pipeline with LLM enhancement.
|
||||||
|
|
||||||
|
This function orchestrates all 10 steps of the intelligent document processing
|
||||||
|
pipeline, from OCR to Weaviate ingestion. It supports both local (Ollama) and
|
||||||
|
cloud (Mistral API) LLM providers, with optional caching via skip_ocr.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
pdf_path: Absolute path to the PDF file to process.
|
||||||
|
output_dir: Base directory for output files. Defaults to "./output".
|
||||||
|
use_llm: Enable LLM-based processing (metadata, TOC, chunking).
|
||||||
|
If False, uses basic heuristic processing.
|
||||||
|
llm_provider: LLM provider to use. "ollama" for local (free but slow),
|
||||||
|
"mistral" for API (fast but paid).
|
||||||
|
llm_model: Specific model name. If None, auto-detects based on provider
|
||||||
|
(qwen2.5:7b for ollama, mistral-small-latest for mistral).
|
||||||
|
skip_ocr: If True, reuses existing markdown file to avoid OCR cost.
|
||||||
|
Requires output_dir/<doc_name>/<doc_name>.md to exist.
|
||||||
|
ingest_to_weaviate: If True, ingests chunks into Weaviate after processing.
|
||||||
|
progress_callback: Optional callback for real-time progress updates.
|
||||||
|
Called with (step_id, status, detail) for each pipeline step.
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Dictionary containing processing results with the following keys:
|
||||||
|
- success (bool): True if processing completed without errors
|
||||||
|
- document_name (str): Name of the processed document
|
||||||
|
- pages (int): Number of pages in the PDF
|
||||||
|
- chunks_count (int): Number of chunks generated
|
||||||
|
- cost_ocr (float): OCR cost in euros (0 if skip_ocr=True)
|
||||||
|
- cost_llm (float): LLM API cost in euros (0 if provider=ollama)
|
||||||
|
- cost_total (float): Total cost (ocr + llm)
|
||||||
|
- metadata (dict): Extracted metadata (title, author, etc.)
|
||||||
|
- toc (list): Hierarchical table of contents
|
||||||
|
- files (dict): Paths to generated files (markdown, chunks, etc.)
|
||||||
|
|
||||||
|
Raises:
|
||||||
|
FileNotFoundError: If pdf_path does not exist.
|
||||||
|
ValueError: If skip_ocr=True but markdown file not found.
|
||||||
|
RuntimeError: If Weaviate connection fails during ingestion.
|
||||||
|
|
||||||
|
Examples:
|
||||||
|
Basic usage with Ollama (free):
|
||||||
|
>>> result = process_pdf_v2(
|
||||||
|
... Path("platon_menon.pdf"),
|
||||||
|
... llm_provider="ollama"
|
||||||
|
... )
|
||||||
|
>>> print(f"Cost: {result['cost_total']:.4f}€")
|
||||||
|
Cost: 0.0270€ # OCR only
|
||||||
|
|
||||||
|
With Mistral API (faster):
|
||||||
|
>>> result = process_pdf_v2(
|
||||||
|
... Path("platon_menon.pdf"),
|
||||||
|
... llm_provider="mistral",
|
||||||
|
... llm_model="mistral-small-latest"
|
||||||
|
... )
|
||||||
|
|
||||||
|
Skip OCR to avoid cost:
|
||||||
|
>>> result = process_pdf_v2(
|
||||||
|
... Path("platon_menon.pdf"),
|
||||||
|
... skip_ocr=True, # Reuses existing markdown
|
||||||
|
... ingest_to_weaviate=False
|
||||||
|
... )
|
||||||
|
|
||||||
|
Notes:
|
||||||
|
- OCR cost: ~0.003€/page (standard), ~0.009€/page (with annotations)
|
||||||
|
- LLM cost: Free with Ollama, variable with Mistral API
|
||||||
|
- Processing time: ~30s/page with Ollama, ~5s/page with Mistral
|
||||||
|
- Weaviate must be running (docker-compose up -d) before ingestion
|
||||||
|
"""
|
||||||
|
```
|
||||||
|
</function_docstring>
|
||||||
|
</documentation_examples>
|
||||||
|
</project_specification>
|
||||||
290
prompts/coding_prompt_library.md
Normal file
290
prompts/coding_prompt_library.md
Normal file
@@ -0,0 +1,290 @@
|
|||||||
|
## YOUR ROLE - CODING AGENT (Library RAG - Type Safety & Documentation)
|
||||||
|
|
||||||
|
You are working on adding strict type annotations and Google-style docstrings to a Python library project.
|
||||||
|
This is a FRESH context window - you have no memory of previous sessions.
|
||||||
|
|
||||||
|
You have access to Linear for project management via MCP tools. Linear is your single source of truth.
|
||||||
|
|
||||||
|
### STEP 1: GET YOUR BEARINGS (MANDATORY)
|
||||||
|
|
||||||
|
Start by orienting yourself:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# 1. See your working directory
|
||||||
|
pwd
|
||||||
|
|
||||||
|
# 2. List files to understand project structure
|
||||||
|
ls -la
|
||||||
|
|
||||||
|
# 3. Read the project specification
|
||||||
|
cat app_spec.txt
|
||||||
|
|
||||||
|
# 4. Read the Linear project state
|
||||||
|
cat .linear_project.json
|
||||||
|
|
||||||
|
# 5. Check recent git history
|
||||||
|
git log --oneline -20
|
||||||
|
```
|
||||||
|
|
||||||
|
### STEP 2: CHECK LINEAR STATUS
|
||||||
|
|
||||||
|
Query Linear to understand current project state using the project_id from `.linear_project.json`.
|
||||||
|
|
||||||
|
1. **Get all issues and count progress:**
|
||||||
|
```
|
||||||
|
mcp__linear__list_issues with project_id
|
||||||
|
```
|
||||||
|
Count:
|
||||||
|
- Issues "Done" = completed
|
||||||
|
- Issues "Todo" = remaining
|
||||||
|
- Issues "In Progress" = currently being worked on
|
||||||
|
|
||||||
|
2. **Find META issue** (if exists) for session context
|
||||||
|
|
||||||
|
3. **Check for in-progress work** - complete it first if found
|
||||||
|
|
||||||
|
### STEP 3: SELECT NEXT ISSUE
|
||||||
|
|
||||||
|
Get Todo issues sorted by priority:
|
||||||
|
```
|
||||||
|
mcp__linear__list_issues with project_id, status="Todo", limit=5
|
||||||
|
```
|
||||||
|
|
||||||
|
Select ONE highest-priority issue to work on.
|
||||||
|
|
||||||
|
### STEP 4: CLAIM THE ISSUE
|
||||||
|
|
||||||
|
Use `mcp__linear__update_issue` to set status to "In Progress"
|
||||||
|
|
||||||
|
### STEP 5: IMPLEMENT THE ISSUE
|
||||||
|
|
||||||
|
Based on issue category:
|
||||||
|
|
||||||
|
**For Type Annotation Issues (e.g., "Types - Add type annotations to X.py"):**
|
||||||
|
|
||||||
|
1. Read the target Python file
|
||||||
|
2. Identify all functions, methods, and variables
|
||||||
|
3. Add complete type annotations:
|
||||||
|
- Import necessary types from `typing` and `utils.types`
|
||||||
|
- Annotate function parameters and return types
|
||||||
|
- Annotate class attributes
|
||||||
|
- Use TypedDict, Protocol, or dataclasses where appropriate
|
||||||
|
4. Save the file
|
||||||
|
5. Run mypy to verify (MANDATORY):
|
||||||
|
```bash
|
||||||
|
cd generations/library_rag
|
||||||
|
mypy --config-file=mypy.ini <file_path>
|
||||||
|
```
|
||||||
|
6. Fix any mypy errors
|
||||||
|
7. Commit the changes
|
||||||
|
|
||||||
|
**For Documentation Issues (e.g., "Docs - Add docstrings to X.py"):**
|
||||||
|
|
||||||
|
1. Read the target Python file
|
||||||
|
2. Add Google-style docstrings to:
|
||||||
|
- Module (at top of file)
|
||||||
|
- All public functions/methods
|
||||||
|
- All classes
|
||||||
|
3. Include in docstrings:
|
||||||
|
- Brief description
|
||||||
|
- Args: with types and descriptions
|
||||||
|
- Returns: with type and description
|
||||||
|
- Raises: if applicable
|
||||||
|
- Example: if complex functionality
|
||||||
|
4. Save the file
|
||||||
|
5. Optionally run pydocstyle to verify (if installed)
|
||||||
|
6. Commit the changes
|
||||||
|
|
||||||
|
**For Setup/Infrastructure Issues:**
|
||||||
|
|
||||||
|
Follow the specific instructions in the issue description.
|
||||||
|
|
||||||
|
### STEP 6: VERIFICATION
|
||||||
|
|
||||||
|
**Type Annotation Issues:**
|
||||||
|
- Run mypy on the modified file(s)
|
||||||
|
- Ensure zero type errors
|
||||||
|
- If errors exist, fix them before proceeding
|
||||||
|
|
||||||
|
**Documentation Issues:**
|
||||||
|
- Review docstrings for completeness
|
||||||
|
- Ensure Args/Returns sections match function signatures
|
||||||
|
- Check that examples are accurate
|
||||||
|
|
||||||
|
**Functional Changes (rare):**
|
||||||
|
- If the issue changes behavior, test manually
|
||||||
|
- Start Flask server if needed: `python flask_app.py`
|
||||||
|
- Test the affected functionality
|
||||||
|
|
||||||
|
### STEP 7: GIT COMMIT
|
||||||
|
|
||||||
|
Make a descriptive commit:
|
||||||
|
```bash
|
||||||
|
git add <files>
|
||||||
|
git commit -m "<Issue ID>: <Short description>
|
||||||
|
|
||||||
|
- <List of changes>
|
||||||
|
- Verified with mypy (for type issues)
|
||||||
|
- Linear issue: <issue identifier>
|
||||||
|
"
|
||||||
|
```
|
||||||
|
|
||||||
|
### STEP 8: UPDATE LINEAR ISSUE
|
||||||
|
|
||||||
|
1. **Add implementation comment:**
|
||||||
|
```markdown
|
||||||
|
## Implementation Complete
|
||||||
|
|
||||||
|
### Changes Made
|
||||||
|
- [List of files modified]
|
||||||
|
- [Key changes]
|
||||||
|
|
||||||
|
### Verification
|
||||||
|
- mypy passes with zero errors (for type issues)
|
||||||
|
- All test steps from issue description verified
|
||||||
|
|
||||||
|
### Git Commit
|
||||||
|
[commit hash and message]
|
||||||
|
```
|
||||||
|
|
||||||
|
2. **Update status to "Done"** using `mcp__linear__update_issue`
|
||||||
|
|
||||||
|
### STEP 9: DECIDE NEXT ACTION
|
||||||
|
|
||||||
|
After completing an issue, ask yourself:
|
||||||
|
|
||||||
|
1. Have I been working for a while? (Use judgment based on complexity of work done)
|
||||||
|
2. Is the code in a stable state?
|
||||||
|
3. Would this be a good handoff point?
|
||||||
|
|
||||||
|
**If YES to all three:**
|
||||||
|
- Proceed to STEP 10 (Session Summary)
|
||||||
|
- End cleanly
|
||||||
|
|
||||||
|
**If NO:**
|
||||||
|
- Continue to another issue (go back to STEP 3)
|
||||||
|
- But commit first!
|
||||||
|
|
||||||
|
**Pacing Guidelines:**
|
||||||
|
- Early phase (< 20% done): Can complete multiple simple issues
|
||||||
|
- Mid/late phase (> 20% done): 1-2 issues per session for quality
|
||||||
|
|
||||||
|
### STEP 10: SESSION SUMMARY (When Ending)
|
||||||
|
|
||||||
|
If META issue exists, add a comment:
|
||||||
|
|
||||||
|
```markdown
|
||||||
|
## Session Complete
|
||||||
|
|
||||||
|
### Completed This Session
|
||||||
|
- [Issue ID]: [Title] - [Brief summary]
|
||||||
|
|
||||||
|
### Current Progress
|
||||||
|
- X issues Done
|
||||||
|
- Y issues In Progress
|
||||||
|
- Z issues Todo
|
||||||
|
|
||||||
|
### Notes for Next Session
|
||||||
|
- [Important context]
|
||||||
|
- [Recommendations]
|
||||||
|
- [Any concerns]
|
||||||
|
```
|
||||||
|
|
||||||
|
Ensure:
|
||||||
|
- All code committed
|
||||||
|
- No uncommitted changes
|
||||||
|
- App in working state
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## LINEAR WORKFLOW RULES
|
||||||
|
|
||||||
|
**Status Transitions:**
|
||||||
|
- Todo → In Progress (when starting)
|
||||||
|
- In Progress → Done (when verified)
|
||||||
|
|
||||||
|
**NEVER:**
|
||||||
|
- Delete or modify issue descriptions
|
||||||
|
- Mark Done without verification
|
||||||
|
- Leave issues In Progress when switching
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## TYPE ANNOTATION GUIDELINES
|
||||||
|
|
||||||
|
**Imports needed:**
|
||||||
|
```python
|
||||||
|
from typing import Optional, Dict, List, Any, Tuple, Callable
|
||||||
|
from pathlib import Path
|
||||||
|
from utils.types import <ProjectSpecificTypes>
|
||||||
|
```
|
||||||
|
|
||||||
|
**Common patterns:**
|
||||||
|
```python
|
||||||
|
# Functions
|
||||||
|
def process_data(input: str, options: Optional[Dict[str, Any]] = None) -> List[str]:
|
||||||
|
"""Process input data."""
|
||||||
|
...
|
||||||
|
|
||||||
|
# Methods with self
|
||||||
|
def save(self, path: Path) -> None:
|
||||||
|
"""Save to file."""
|
||||||
|
...
|
||||||
|
|
||||||
|
# Async functions
|
||||||
|
async def fetch_data(url: str) -> Dict[str, Any]:
|
||||||
|
"""Fetch from API."""
|
||||||
|
...
|
||||||
|
```
|
||||||
|
|
||||||
|
**Use project types from `utils/types.py`:**
|
||||||
|
- Metadata, OCRResponse, TOCEntry, ChunkData, PipelineResult, etc.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## DOCSTRING TEMPLATE (Google Style)
|
||||||
|
|
||||||
|
```python
|
||||||
|
def function_name(param1: str, param2: int = 0) -> List[str]:
|
||||||
|
"""
|
||||||
|
Brief one-line description.
|
||||||
|
|
||||||
|
More detailed description if needed. Explain what the function does,
|
||||||
|
any important behavior, side effects, etc.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
param1: Description of param1.
|
||||||
|
param2: Description of param2. Defaults to 0.
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Description of return value.
|
||||||
|
|
||||||
|
Raises:
|
||||||
|
ValueError: When param1 is empty.
|
||||||
|
IOError: When file cannot be read.
|
||||||
|
|
||||||
|
Example:
|
||||||
|
>>> result = function_name("test", 5)
|
||||||
|
>>> print(result)
|
||||||
|
['test', 'test', 'test', 'test', 'test']
|
||||||
|
"""
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## IMPORTANT REMINDERS
|
||||||
|
|
||||||
|
**Your Goal:** Add strict type annotations and comprehensive documentation to all Python modules
|
||||||
|
|
||||||
|
**This Session's Goal:** Complete 1-2 issues with quality work and clean handoff
|
||||||
|
|
||||||
|
**Quality Bar:**
|
||||||
|
- mypy --strict passes with zero errors
|
||||||
|
- All public functions have complete Google-style docstrings
|
||||||
|
- Code is clean and well-documented
|
||||||
|
|
||||||
|
**Context is finite.** End sessions early with good handoff notes. The next agent will continue.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
Begin by running STEP 1 (Get Your Bearings).
|
||||||
@@ -60,6 +60,13 @@ using the `mcp__linear__create_issue` tool.
|
|||||||
- Do NOT modify existing issues
|
- Do NOT modify existing issues
|
||||||
- Only create NEW issues for the NEW features
|
- Only create NEW issues for the NEW features
|
||||||
|
|
||||||
|
**IMPORTANT - Issue Count:**
|
||||||
|
Create EXACTLY ONE issue per feature listed in the `<implementation_steps>` section of the new spec file.
|
||||||
|
- If the spec has 8 features → create 8 issues
|
||||||
|
- If the spec has 15 features → create 15 issues
|
||||||
|
- Do NOT create a fixed number like 50 issues
|
||||||
|
- Each `<feature_N>` in the spec = 1 Linear issue
|
||||||
|
|
||||||
**For each NEW feature, create an issue with:**
|
**For each NEW feature, create an issue with:**
|
||||||
|
|
||||||
```
|
```
|
||||||
|
|||||||
@@ -30,9 +30,16 @@ Before creating issues, you need to set up Linear:
|
|||||||
|
|
||||||
### CRITICAL TASK: Create Linear Issues
|
### CRITICAL TASK: Create Linear Issues
|
||||||
|
|
||||||
|
**IMPORTANT - Issue Count:**
|
||||||
|
Create EXACTLY ONE issue per feature listed in the `<implementation_steps>` section of app_spec.txt.
|
||||||
|
- Count the `<feature_N>` tags in the spec file
|
||||||
|
- If the spec has 8 features → create 8 issues
|
||||||
|
- If the spec has 50 features → create 50 issues
|
||||||
|
- Do NOT create a fixed arbitrary number
|
||||||
|
- Each `<feature_N>` in `<implementation_steps>` = 1 Linear issue
|
||||||
|
|
||||||
Based on `app_spec.txt`, create Linear issues for each feature using the
|
Based on `app_spec.txt`, create Linear issues for each feature using the
|
||||||
`mcp__linear__create_issue` tool. Create 50 detailed issues that
|
`mcp__linear__create_issue` tool.
|
||||||
comprehensively cover all features in the spec.
|
|
||||||
|
|
||||||
**For each feature, create an issue with:**
|
**For each feature, create an issue with:**
|
||||||
|
|
||||||
@@ -66,7 +73,7 @@ priority: 1-4 based on importance (1=urgent/foundational, 4=low/polish)
|
|||||||
```
|
```
|
||||||
|
|
||||||
**Requirements for Linear Issues:**
|
**Requirements for Linear Issues:**
|
||||||
- Create 50 issues total covering all features in the spec
|
- Create ONE issue per `<feature_N>` tag in `<implementation_steps>`
|
||||||
- Mix of functional and style features (note category in description)
|
- Mix of functional and style features (note category in description)
|
||||||
- Order by priority: foundational features get priority 1-2, polish features get 3-4
|
- Order by priority: foundational features get priority 1-2, polish features get 3-4
|
||||||
- Include detailed test steps in each issue description
|
- Include detailed test steps in each issue description
|
||||||
|
|||||||
576
prompts/spec_embed_BAAI.txt
Normal file
576
prompts/spec_embed_BAAI.txt
Normal file
@@ -0,0 +1,576 @@
|
|||||||
|
<project_specification>
|
||||||
|
<project_name>Library RAG - Migration to BGE-M3 Embeddings</project_name>
|
||||||
|
|
||||||
|
<overview>
|
||||||
|
Migrate the Library RAG embedding model from sentence-transformers MiniLM-L6 (384-dim)
|
||||||
|
to BAAI/bge-m3 (1024-dim) for superior performance on multilingual philosophical texts.
|
||||||
|
|
||||||
|
**Why BGE-M3?**
|
||||||
|
- 1024 dimensions vs 384 (2.7x richer semantic representation)
|
||||||
|
- 8192 token context vs 512 (16x longer sequences)
|
||||||
|
- Superior multilingual support (Greek, Latin, French, English)
|
||||||
|
- Better trained on academic/research texts
|
||||||
|
- Captures philosophical nuances more effectively
|
||||||
|
|
||||||
|
**Scope:**
|
||||||
|
This is a focused migration that only affects the vectorization layer.
|
||||||
|
LLM processing (Ollama/Mistral) remains completely unchanged.
|
||||||
|
|
||||||
|
**Migration Strategy:**
|
||||||
|
- Auto-detect GPU availability and configure accordingly
|
||||||
|
- Delete existing collections (384-dim vectors incompatible with 1024-dim)
|
||||||
|
- Recreate schema with BGE-M3 vectorizer
|
||||||
|
- Re-ingest existing 2 documents from cached chunks
|
||||||
|
- Validate search quality improvements
|
||||||
|
</overview>
|
||||||
|
|
||||||
|
<technology_stack>
|
||||||
|
<backend>
|
||||||
|
<weaviate>1.34.4 (no change)</weaviate>
|
||||||
|
<new_vectorizer>BAAI/bge-m3 via text2vec-transformers</new_vectorizer>
|
||||||
|
<old_vectorizer>sentence-transformers-multi-qa-MiniLM-L6-cos-v1</old_vectorizer>
|
||||||
|
<gpu_support>Auto-detect CUDA availability (ENABLE_CUDA="1" if GPU, "0" if CPU)</gpu_support>
|
||||||
|
</backend>
|
||||||
|
<unchanged>
|
||||||
|
<llm>Ollama/Mistral (no impact on LLM processing)</llm>
|
||||||
|
<ocr>Mistral OCR (no change)</ocr>
|
||||||
|
<pipeline>PDF pipeline steps 1-9 unchanged</pipeline>
|
||||||
|
</unchanged>
|
||||||
|
</technology_stack>
|
||||||
|
|
||||||
|
<prerequisites>
|
||||||
|
<environment_setup>
|
||||||
|
- Existing Library RAG application (generations/library_rag/)
|
||||||
|
- Docker and Docker Compose installed
|
||||||
|
- NVIDIA Docker runtime (if GPU available)
|
||||||
|
- Only 2 documents currently ingested (will be re-ingested)
|
||||||
|
- No production data to preserve
|
||||||
|
- RTX 4070 GPU available (will be auto-detected and used)
|
||||||
|
</environment_setup>
|
||||||
|
</prerequisites>
|
||||||
|
|
||||||
|
<architecture_impact>
|
||||||
|
<independent_components>
|
||||||
|
**LLM Processing (Steps 1-9):**
|
||||||
|
- OCR extraction (Mistral API)
|
||||||
|
- Metadata extraction (Ollama/Mistral)
|
||||||
|
- TOC extraction (Ollama/Mistral)
|
||||||
|
- Section classification (Ollama/Mistral)
|
||||||
|
- Semantic chunking (Ollama/Mistral)
|
||||||
|
- Cleaning and validation (Ollama/Mistral)
|
||||||
|
|
||||||
|
→ **None of these are affected by embedding model change**
|
||||||
|
|
||||||
|
**Vectorization (Step 10):**
|
||||||
|
- Text → Vector conversion (text2vec-transformers in Weaviate)
|
||||||
|
- This is the ONLY component that changes
|
||||||
|
- Happens automatically during Weaviate ingestion
|
||||||
|
- No Python code changes required
|
||||||
|
</independent_components>
|
||||||
|
|
||||||
|
<breaking_changes>
|
||||||
|
**IMPORTANT: Vector dimensions are incompatible**
|
||||||
|
|
||||||
|
- Existing collections use 384-dim vectors (MiniLM-L6)
|
||||||
|
- New model generates 1024-dim vectors (BGE-M3)
|
||||||
|
- Weaviate cannot mix dimensions in same collection
|
||||||
|
- All collections must be deleted and recreated
|
||||||
|
- All documents must be re-ingested
|
||||||
|
|
||||||
|
**Why this is safe:**
|
||||||
|
- Only 2 documents currently ingested
|
||||||
|
- Source chunks.json files preserved in output/ directory
|
||||||
|
- No OCR/LLM re-processing needed (reuse existing chunks)
|
||||||
|
- No additional costs incurred
|
||||||
|
- Estimated total migration time: 20-25 minutes
|
||||||
|
</breaking_changes>
|
||||||
|
</architecture_impact>
|
||||||
|
|
||||||
|
<implementation_steps>
|
||||||
|
<feature_1>
|
||||||
|
<title>Complete BGE-M3 Setup with GPU Auto-Detection</title>
|
||||||
|
<description>
|
||||||
|
Atomic migration: GPU detection → Docker configuration → Schema deletion → Recreation.
|
||||||
|
This feature must be completed entirely in one session (cannot be partially done).
|
||||||
|
|
||||||
|
**Step 1: GPU Auto-Detection**
|
||||||
|
- Check for NVIDIA GPU availability: nvidia-smi or docker run --gpus all nvidia/cuda
|
||||||
|
- If GPU detected: Set ENABLE_CUDA="1"
|
||||||
|
- If no GPU: Set ENABLE_CUDA="0"
|
||||||
|
- Verify NVIDIA Docker runtime if GPU available
|
||||||
|
|
||||||
|
**Step 2: Update Docker Compose**
|
||||||
|
- Backup current docker-compose.yml to docker-compose.yml.backup
|
||||||
|
- Update text2vec-transformers service:
|
||||||
|
* Change image to: cr.weaviate.io/semitechnologies/transformers-inference:sentence-transformers-BAAI-bge-m3
|
||||||
|
* Set ENABLE_CUDA based on GPU detection
|
||||||
|
* Add GPU device mapping if CUDA enabled
|
||||||
|
- Update comments to reflect BGE-M3 model
|
||||||
|
- Stop containers: docker-compose down
|
||||||
|
- Remove old transformers image: docker rmi [old-image-name]
|
||||||
|
- Start new containers: docker-compose up -d
|
||||||
|
- Verify BGE-M3 loaded: docker-compose logs text2vec-transformers | grep -i "model"
|
||||||
|
- If GPU enabled, verify GPU usage: nvidia-smi (should show transformers process)
|
||||||
|
|
||||||
|
**Step 3: Delete Existing Collections**
|
||||||
|
- Create migrate_to_bge_m3.py script with safety checks
|
||||||
|
- List all existing collections and object counts
|
||||||
|
- Confirm deletion prompt: "Delete all collections? (yes/no)"
|
||||||
|
- Delete all collections: client.collections.delete_all()
|
||||||
|
- Verify deletion: client.collections.list_all() should return empty
|
||||||
|
- Log deleted collections and counts for reference
|
||||||
|
|
||||||
|
**Step 4: Recreate Schema with BGE-M3**
|
||||||
|
- Update schema.py docstring (line 40: MiniLM-L6 → BGE-M3)
|
||||||
|
- Add migration note at top of schema.py
|
||||||
|
- Run: python schema.py to recreate all collections
|
||||||
|
- Weaviate will auto-detect 1024-dim from text2vec-transformers service
|
||||||
|
- Verify collections created: Work, Document, Chunk, Summary
|
||||||
|
- Verify vectorizer configured: display_schema() should show text2vec-transformers
|
||||||
|
- Query text2vec-transformers service to confirm 1024 dimensions
|
||||||
|
|
||||||
|
**Validation:**
|
||||||
|
- All containers running (docker-compose ps)
|
||||||
|
- BGE-M3 model loaded successfully
|
||||||
|
- GPU utilized if available (check nvidia-smi)
|
||||||
|
- All collections exist with empty state
|
||||||
|
- Vector dimensions = 1024 (query Weaviate schema)
|
||||||
|
|
||||||
|
**Rollback if needed:**
|
||||||
|
- Restore docker-compose.yml.backup
|
||||||
|
- docker-compose down && docker-compose up -d
|
||||||
|
- python schema.py to recreate with old model
|
||||||
|
</description>
|
||||||
|
<priority>1</priority>
|
||||||
|
<category>migration</category>
|
||||||
|
<test_steps>
|
||||||
|
1. Run GPU detection: nvidia-smi or equivalent
|
||||||
|
2. Verify ENABLE_CUDA set correctly based on GPU availability
|
||||||
|
3. Backup docker-compose.yml created
|
||||||
|
4. Stop containers: docker-compose down
|
||||||
|
5. Start with BGE-M3: docker-compose up -d
|
||||||
|
6. Check logs: docker-compose logs text2vec-transformers
|
||||||
|
7. Verify "BAAI/bge-m3" appears in logs
|
||||||
|
8. If GPU: verify nvidia-smi shows transformers process
|
||||||
|
9. Run migrate_to_bge_m3.py and confirm deletion
|
||||||
|
10. Verify all collections deleted
|
||||||
|
11. Run schema.py to recreate
|
||||||
|
12. Verify 4 collections exist: Work, Document, Chunk, Summary
|
||||||
|
13. Query Weaviate API to confirm vector dimensions = 1024
|
||||||
|
14. Verify collections are empty (object count = 0)
|
||||||
|
</test_steps>
|
||||||
|
</feature_1>
|
||||||
|
|
||||||
|
<feature_2>
|
||||||
|
<title>Document Re-ingestion from Cached Chunks</title>
|
||||||
|
<description>
|
||||||
|
Re-ingest the 2 existing documents using their cached chunks.json files.
|
||||||
|
No OCR or LLM re-processing needed (saves time and cost).
|
||||||
|
|
||||||
|
**Process:**
|
||||||
|
1. Identify existing documents in output/ directory
|
||||||
|
2. For each document directory:
|
||||||
|
- Read {document_name}_chunks.json
|
||||||
|
- Verify chunk structure contains all required fields
|
||||||
|
- Extract Work metadata (title, author, year, language, genre)
|
||||||
|
- Extract Document metadata (sourceId, edition, pages, toc, hierarchy)
|
||||||
|
- Extract Chunk data (text, keywords, sectionPath, etc.)
|
||||||
|
|
||||||
|
3. Ingest to Weaviate using utils/weaviate_ingest.py:
|
||||||
|
- Create Work object (if not exists)
|
||||||
|
- Create Document object with nested Work reference
|
||||||
|
- Create Chunk objects with nested Document and Work references
|
||||||
|
- text2vec-transformers will auto-generate 1024-dim vectors
|
||||||
|
|
||||||
|
4. Verify ingestion success:
|
||||||
|
- Query Weaviate for each document by sourceId
|
||||||
|
- Verify chunk counts match original
|
||||||
|
- Check that vectors are 1024 dimensions
|
||||||
|
- Verify nested Work/Document metadata accessible
|
||||||
|
|
||||||
|
**Example code:**
|
||||||
|
```python
|
||||||
|
import json
|
||||||
|
from pathlib import Path
|
||||||
|
from utils.weaviate_ingest import (
|
||||||
|
create_work, create_document, ingest_chunks_to_weaviate
|
||||||
|
)
|
||||||
|
|
||||||
|
output_dir = Path("output")
|
||||||
|
for doc_dir in output_dir.iterdir():
|
||||||
|
if doc_dir.is_dir():
|
||||||
|
chunks_file = doc_dir / f"{doc_dir.name}_chunks.json"
|
||||||
|
if chunks_file.exists():
|
||||||
|
with open(chunks_file) as f:
|
||||||
|
data = json.load(f)
|
||||||
|
|
||||||
|
# Create Work
|
||||||
|
work_id = create_work(client, data["work_metadata"])
|
||||||
|
|
||||||
|
# Create Document
|
||||||
|
doc_id = create_document(client, data["document_metadata"], work_id)
|
||||||
|
|
||||||
|
# Ingest chunks
|
||||||
|
ingest_chunks_to_weaviate(client, data["chunks"], doc_id, work_id)
|
||||||
|
|
||||||
|
print(f"✓ Ingested {doc_dir.name}")
|
||||||
|
```
|
||||||
|
|
||||||
|
**Success criteria:**
|
||||||
|
- All documents from output/ directory ingested
|
||||||
|
- Chunk counts match original (verify in Weaviate)
|
||||||
|
- No vectorization errors in logs
|
||||||
|
- All vectors are 1024 dimensions
|
||||||
|
</description>
|
||||||
|
<priority>1</priority>
|
||||||
|
<category>data</category>
|
||||||
|
<test_steps>
|
||||||
|
1. List all directories in output/
|
||||||
|
2. For each directory, verify {name}_chunks.json exists
|
||||||
|
3. Load first chunks.json and inspect structure
|
||||||
|
4. Run re-ingestion script for all documents
|
||||||
|
5. Query Weaviate for total Chunk count
|
||||||
|
6. Verify count matches sum of all original chunks
|
||||||
|
7. Query a sample chunk and verify:
|
||||||
|
- Vector dimensions = 1024
|
||||||
|
- Nested work.title and work.author present
|
||||||
|
- Nested document.sourceId present
|
||||||
|
8. Verify no errors in Weaviate logs
|
||||||
|
9. Check text2vec-transformers logs for vectorization activity
|
||||||
|
</test_steps>
|
||||||
|
</feature_2>
|
||||||
|
|
||||||
|
<feature_3>
|
||||||
|
<title>Search Quality Validation and Performance Testing</title>
|
||||||
|
<description>
|
||||||
|
Validate that BGE-M3 provides superior search quality for philosophical texts.
|
||||||
|
Test multilingual capabilities and measure performance improvements.
|
||||||
|
|
||||||
|
**Create test script: test_bge_m3_quality.py**
|
||||||
|
|
||||||
|
**Test 1: Multilingual Queries**
|
||||||
|
- Test French philosophical terms: "justice", "vertu", "liberté"
|
||||||
|
- Test English philosophical terms: "virtue", "knowledge", "ethics"
|
||||||
|
- Test Greek philosophical terms: "ἀρετή" (arete), "τέλος" (telos), "ψυχή" (psyche)
|
||||||
|
- Test Latin philosophical terms: "virtus", "sapientia", "forma"
|
||||||
|
- Verify results are semantically relevant
|
||||||
|
- Compare with expected passages (if baseline available)
|
||||||
|
|
||||||
|
**Test 2: Long Query Handling**
|
||||||
|
- Test query with 100+ words (BGE-M3 supports 8192 tokens)
|
||||||
|
- Test query with complex philosophical argument
|
||||||
|
- Verify no truncation warnings
|
||||||
|
- Verify semantically appropriate results
|
||||||
|
|
||||||
|
**Test 3: Semantic Understanding**
|
||||||
|
- Query: "What is the nature of reality?"
|
||||||
|
- Expected: Results about ontology, metaphysics, being
|
||||||
|
- Query: "How should we live?"
|
||||||
|
- Expected: Results about ethics, virtue, good life
|
||||||
|
- Query: "What can we know?"
|
||||||
|
- Expected: Results about epistemology, knowledge, certainty
|
||||||
|
|
||||||
|
**Test 4: Performance Metrics**
|
||||||
|
- Measure query latency (should be <500ms)
|
||||||
|
- Measure indexing speed during ingestion
|
||||||
|
- Monitor GPU utilization (if enabled)
|
||||||
|
- Monitor memory usage (~2GB for BGE-M3)
|
||||||
|
- Compare with baseline (MiniLM-L6) if metrics available
|
||||||
|
|
||||||
|
**Test 5: Vector Dimension Verification**
|
||||||
|
- Query Weaviate schema API
|
||||||
|
- Verify all Chunk vectors are 1024 dimensions
|
||||||
|
- Verify no 384-dim vectors remain (from old model)
|
||||||
|
|
||||||
|
**Example test script:**
|
||||||
|
```python
|
||||||
|
import weaviate
|
||||||
|
import weaviate.classes.query as wvq
|
||||||
|
import time
|
||||||
|
|
||||||
|
client = weaviate.connect_to_local()
|
||||||
|
chunks = client.collections.get("Chunk")
|
||||||
|
|
||||||
|
# Test multilingual
|
||||||
|
test_queries = [
|
||||||
|
("justice", "French philosophical concept"),
|
||||||
|
("ἀρετή", "Greek virtue/excellence"),
|
||||||
|
("What is the good life?", "Long philosophical query"),
|
||||||
|
]
|
||||||
|
|
||||||
|
for query, description in test_queries:
|
||||||
|
start = time.time()
|
||||||
|
result = chunks.query.near_text(
|
||||||
|
query=query,
|
||||||
|
limit=5,
|
||||||
|
return_metadata=wvq.MetadataQuery(distance=True),
|
||||||
|
)
|
||||||
|
latency = (time.time() - start) * 1000
|
||||||
|
|
||||||
|
print(f"\nQuery: {query} ({description})")
|
||||||
|
print(f"Latency: {latency:.1f}ms")
|
||||||
|
|
||||||
|
for obj in result.objects:
|
||||||
|
similarity = (1 - obj.metadata.distance) * 100
|
||||||
|
print(f" [{similarity:.1f}%] {obj.properties['work']['title']}")
|
||||||
|
print(f" {obj.properties['text'][:150]}...")
|
||||||
|
|
||||||
|
client.close()
|
||||||
|
```
|
||||||
|
|
||||||
|
**Document results:**
|
||||||
|
- Create SEARCH_QUALITY_RESULTS.md with:
|
||||||
|
* Sample queries and results
|
||||||
|
* Performance metrics
|
||||||
|
* Comparison with MiniLM-L6 (if available)
|
||||||
|
* Notes on quality improvements observed
|
||||||
|
</description>
|
||||||
|
<priority>1</priority>
|
||||||
|
<category>validation</category>
|
||||||
|
<test_steps>
|
||||||
|
1. Create test_bge_m3_quality.py script
|
||||||
|
2. Run multilingual query tests (French, English, Greek, Latin)
|
||||||
|
3. Verify results are semantically relevant
|
||||||
|
4. Test long queries (100+ words)
|
||||||
|
5. Measure average query latency over 10 queries
|
||||||
|
6. Verify latency <500ms
|
||||||
|
7. Query Weaviate schema to verify vector dimensions = 1024
|
||||||
|
8. If GPU enabled, monitor nvidia-smi during queries
|
||||||
|
9. Document search quality improvements in markdown file
|
||||||
|
10. Compare results with expected philosophical passages
|
||||||
|
</test_steps>
|
||||||
|
</feature_3>
|
||||||
|
|
||||||
|
<feature_4>
|
||||||
|
<title>Documentation Update</title>
|
||||||
|
<description>
|
||||||
|
Update all documentation to reflect BGE-M3 migration.
|
||||||
|
|
||||||
|
**Files to update:**
|
||||||
|
|
||||||
|
1. **docker-compose.yml**
|
||||||
|
- Update comments to mention BGE-M3
|
||||||
|
- Note GPU auto-detection logic
|
||||||
|
- Document ENABLE_CUDA setting
|
||||||
|
|
||||||
|
2. **README.md**
|
||||||
|
- Update "Embedding Model" section
|
||||||
|
- Change: MiniLM-L6 (384-dim) → BGE-M3 (1024-dim)
|
||||||
|
- Add benefits: multilingual, longer context, better quality
|
||||||
|
- Update docker-compose instructions if needed
|
||||||
|
|
||||||
|
3. **CLAUDE.md**
|
||||||
|
- Update schema documentation (line ~35)
|
||||||
|
- Change vectorizer description
|
||||||
|
- Update example queries to showcase multilingual
|
||||||
|
- Add migration notes section
|
||||||
|
|
||||||
|
4. **schema.py**
|
||||||
|
- Update module docstring (line 40)
|
||||||
|
- Change "MiniLM-L6" references to "BGE-M3"
|
||||||
|
- Add migration date and rationale in comments
|
||||||
|
- Update display_schema() output text
|
||||||
|
|
||||||
|
5. **Create MIGRATION_BGE_M3.md**
|
||||||
|
- Document migration process
|
||||||
|
- Explain why BGE-M3 chosen
|
||||||
|
- List breaking changes (dimension incompatibility)
|
||||||
|
- Document rollback procedure
|
||||||
|
- Include before/after comparison
|
||||||
|
- Note LLM independence (Ollama/Mistral unaffected)
|
||||||
|
- Document search quality improvements
|
||||||
|
|
||||||
|
6. **MCP_README.md** (if exists)
|
||||||
|
- Update technical details about embeddings
|
||||||
|
- Update vector dimension references
|
||||||
|
|
||||||
|
**Migration notes template:**
|
||||||
|
```markdown
|
||||||
|
# BGE-M3 Migration - [Date]
|
||||||
|
|
||||||
|
## Why
|
||||||
|
- Superior multilingual support (Greek, Latin, French, English)
|
||||||
|
- 1024-dim vectors (2.7x richer than MiniLM-L6)
|
||||||
|
- 8192 token context (16x longer than MiniLM-L6)
|
||||||
|
- Better trained on academic/philosophical texts
|
||||||
|
|
||||||
|
## What Changed
|
||||||
|
- Embedding model: MiniLM-L6 → BAAI/bge-m3
|
||||||
|
- Vector dimensions: 384 → 1024
|
||||||
|
- All collections deleted and recreated
|
||||||
|
- 2 documents re-ingested
|
||||||
|
|
||||||
|
## Impact
|
||||||
|
- LLM processing (Ollama/Mistral): **No impact**
|
||||||
|
- Search quality: **Significantly improved**
|
||||||
|
- GPU acceleration: **Auto-enabled** (if available)
|
||||||
|
- Migration time: ~25 minutes
|
||||||
|
|
||||||
|
## Search Quality Improvements
|
||||||
|
[Insert results from Feature 3 testing]
|
||||||
|
```
|
||||||
|
|
||||||
|
**Verify:**
|
||||||
|
- Search all files for "MiniLM-L6" references
|
||||||
|
- Search all files for "384" dimension references
|
||||||
|
- Replace with "BGE-M3" and "1024" respectively
|
||||||
|
- Grep for "text2vec" and update comments where needed
|
||||||
|
</description>
|
||||||
|
<priority>2</priority>
|
||||||
|
<category>documentation</category>
|
||||||
|
<test_steps>
|
||||||
|
1. Update docker-compose.yml comments
|
||||||
|
2. Update README.md embedding section
|
||||||
|
3. Update CLAUDE.md schema documentation
|
||||||
|
4. Update schema.py docstring and comments
|
||||||
|
5. Create MIGRATION_BGE_M3.md with full migration notes
|
||||||
|
6. Search codebase for "MiniLM-L6" references: grep -r "MiniLM" .
|
||||||
|
7. Replace all with "BGE-M3"
|
||||||
|
8. Search for "384" dimension references
|
||||||
|
9. Replace with "1024" where appropriate
|
||||||
|
10. Review all updated files for consistency
|
||||||
|
11. Verify no outdated references remain
|
||||||
|
</test_steps>
|
||||||
|
</feature_4>
|
||||||
|
</implementation_steps>
|
||||||
|
|
||||||
|
<deliverables>
|
||||||
|
<code>
|
||||||
|
- Updated docker-compose.yml with BGE-M3 and GPU auto-detection
|
||||||
|
- migrate_to_bge_m3.py script for safe collection deletion
|
||||||
|
- Updated schema.py with BGE-M3 documentation
|
||||||
|
- Re-ingestion script (or integration with existing utils)
|
||||||
|
- test_bge_m3_quality.py for validation
|
||||||
|
</code>
|
||||||
|
|
||||||
|
<documentation>
|
||||||
|
- MIGRATION_BGE_M3.md with complete migration notes
|
||||||
|
- Updated README.md with BGE-M3 details
|
||||||
|
- Updated CLAUDE.md with schema changes
|
||||||
|
- SEARCH_QUALITY_RESULTS.md with validation results
|
||||||
|
- Updated inline comments in all affected files
|
||||||
|
</documentation>
|
||||||
|
</deliverables>
|
||||||
|
|
||||||
|
<success_criteria>
|
||||||
|
<functionality>
|
||||||
|
- BGE-M3 model loads successfully in Weaviate
|
||||||
|
- GPU auto-detected and utilized if available
|
||||||
|
- All collections recreated with 1024-dim vectors
|
||||||
|
- Documents re-ingested successfully from cached chunks
|
||||||
|
- Semantic search returns relevant results
|
||||||
|
- Multilingual queries work correctly (Greek, Latin, French, English)
|
||||||
|
</functionality>
|
||||||
|
|
||||||
|
<quality>
|
||||||
|
- Search quality demonstrably improved vs MiniLM-L6
|
||||||
|
- Greek/Latin philosophical terms properly embedded
|
||||||
|
- Long queries (>512 tokens) handled correctly
|
||||||
|
- No vectorization errors in logs
|
||||||
|
- Vector dimensions verified as 1024 across all collections
|
||||||
|
</quality>
|
||||||
|
|
||||||
|
<performance>
|
||||||
|
- Query latency acceptable (<500ms average)
|
||||||
|
- GPU utilized if available (verified via nvidia-smi)
|
||||||
|
- Memory usage stable (~2GB for text2vec-transformers)
|
||||||
|
- Indexing throughput acceptable during re-ingestion
|
||||||
|
- No performance degradation vs MiniLM-L6
|
||||||
|
</performance>
|
||||||
|
|
||||||
|
<documentation>
|
||||||
|
- All documentation updated to reflect BGE-M3
|
||||||
|
- No outdated MiniLM-L6 references remain
|
||||||
|
- Migration process fully documented
|
||||||
|
- Rollback procedure documented and tested
|
||||||
|
- Search quality improvements quantified
|
||||||
|
</documentation>
|
||||||
|
</success_criteria>
|
||||||
|
|
||||||
|
<migration_notes>
|
||||||
|
<breaking_changes>
|
||||||
|
**IMPORTANT: This is a destructive migration**
|
||||||
|
|
||||||
|
- All existing Weaviate collections must be deleted
|
||||||
|
- Vector dimensions change: 384 → 1024 (incompatible)
|
||||||
|
- Weaviate cannot mix dimensions in same collection
|
||||||
|
- All documents must be re-ingested
|
||||||
|
|
||||||
|
**Low impact:**
|
||||||
|
- Only 2 documents currently ingested
|
||||||
|
- Source chunks.json files preserved in output/ directory
|
||||||
|
- No OCR re-processing needed (saves ~0.006€ per doc)
|
||||||
|
- No LLM re-processing needed (saves time and cost)
|
||||||
|
- Estimated migration time: 20-25 minutes total
|
||||||
|
</breaking_changes>
|
||||||
|
|
||||||
|
<rollback_plan>
|
||||||
|
If BGE-M3 causes issues, rollback is straightforward:
|
||||||
|
|
||||||
|
1. Stop containers: docker-compose down
|
||||||
|
2. Restore backup: mv docker-compose.yml.backup docker-compose.yml
|
||||||
|
3. Start containers: docker-compose up -d
|
||||||
|
4. Recreate schema: python schema.py
|
||||||
|
5. Re-ingest documents from output/ directory (same process as Feature 2)
|
||||||
|
|
||||||
|
**Time to rollback: ~15 minutes**
|
||||||
|
|
||||||
|
**Note:** Backup of docker-compose.yml created automatically in Feature 1
|
||||||
|
</rollback_plan>
|
||||||
|
|
||||||
|
<gpu_auto_detection>
|
||||||
|
**GPU is NOT optional - it's auto-detected**
|
||||||
|
|
||||||
|
The system will automatically detect GPU availability and configure accordingly:
|
||||||
|
|
||||||
|
- **If GPU available (RTX 4070 detected):**
|
||||||
|
* ENABLE_CUDA="1" in docker-compose.yml
|
||||||
|
* GPU device mapping added to text2vec-transformers service
|
||||||
|
* Vectorization uses GPU (5-10x faster)
|
||||||
|
* ~2GB VRAM used (plenty of headroom on 4070)
|
||||||
|
* Ollama/Qwen can still use remaining VRAM
|
||||||
|
|
||||||
|
- **If NO GPU available:**
|
||||||
|
* ENABLE_CUDA="0" in docker-compose.yml
|
||||||
|
* Vectorization uses CPU (slower but functional)
|
||||||
|
* No GPU device mapping needed
|
||||||
|
|
||||||
|
**Detection method:**
|
||||||
|
```bash
|
||||||
|
# Try nvidia-smi
|
||||||
|
if command -v nvidia-smi &> /dev/null; then
|
||||||
|
GPU_AVAILABLE=true
|
||||||
|
else
|
||||||
|
# Try Docker GPU test
|
||||||
|
if docker run --rm --gpus all nvidia/cuda:11.8.0-base-ubuntu22.04 nvidia-smi &> /dev/null; then
|
||||||
|
GPU_AVAILABLE=true
|
||||||
|
else
|
||||||
|
GPU_AVAILABLE=false
|
||||||
|
fi
|
||||||
|
fi
|
||||||
|
```
|
||||||
|
|
||||||
|
**User has RTX 4070:** GPU will be detected and used automatically.
|
||||||
|
</gpu_auto_detection>
|
||||||
|
|
||||||
|
<llm_independence>
|
||||||
|
**Ollama/Mistral are NOT affected by this change**
|
||||||
|
|
||||||
|
The embedding model migration ONLY affects Weaviate vectorization (pipeline step 10).
|
||||||
|
All LLM processing (steps 1-9) remains unchanged:
|
||||||
|
- OCR extraction (Mistral API)
|
||||||
|
- Metadata extraction (Ollama/Mistral)
|
||||||
|
- TOC extraction (Ollama/Mistral)
|
||||||
|
- Section classification (Ollama/Mistral)
|
||||||
|
- Semantic chunking (Ollama/Mistral)
|
||||||
|
- Cleaning and validation (Ollama/Mistral)
|
||||||
|
|
||||||
|
**No Python code changes required.**
|
||||||
|
Weaviate handles vectorization automatically via text2vec-transformers service.
|
||||||
|
|
||||||
|
**Ollama can still use GPU:**
|
||||||
|
BGE-M3 uses ~2GB VRAM. RTX 4070 has 12GB.
|
||||||
|
Ollama/Qwen can use remaining 10GB without conflict.
|
||||||
|
</llm_independence>
|
||||||
|
</migration_notes>
|
||||||
|
</project_specification>
|
||||||
@@ -1 +1,2 @@
|
|||||||
claude-code-sdk>=0.0.25
|
claude-code-sdk>=0.0.25
|
||||||
|
python-dotenv>=1.0.0
|
||||||
|
|||||||
@@ -29,6 +29,11 @@ ALLOWED_COMMANDS = {
|
|||||||
# Node.js development
|
# Node.js development
|
||||||
"npm",
|
"npm",
|
||||||
"node",
|
"node",
|
||||||
|
# Python development
|
||||||
|
"python",
|
||||||
|
"python3",
|
||||||
|
"mypy",
|
||||||
|
"pytest",
|
||||||
# Version control
|
# Version control
|
||||||
"git",
|
"git",
|
||||||
# Process management
|
# Process management
|
||||||
|
|||||||
Reference in New Issue
Block a user