Initial commit: Linear-integrated autonomous coding agent with Initializer Bis support

This commit is contained in:
2025-12-14 00:45:40 +01:00
commit a310d4b3cf
19 changed files with 3919 additions and 0 deletions

View File

@@ -0,0 +1,9 @@
{
"permissions": {
"allow": [
"Bash(test:*)",
"Bash(cat:*)",
"Bash(netstat:*)"
]
}
}

9
.gitignore vendored Normal file
View File

@@ -0,0 +1,9 @@
# Agent-generated output directories
generations/
# Log files
logs/
.env
venv
__pycache__

21
LICENSE Normal file
View File

@@ -0,0 +1,21 @@
MIT License
Copyright (c) 2025 Cole Medin
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

280
README.md Normal file
View File

@@ -0,0 +1,280 @@
# Autonomous Coding Agent Demo (Linear-Integrated)
A minimal harness demonstrating long-running autonomous coding with the Claude Agent SDK. This demo implements a two-agent pattern (initializer + coding agent) with **Linear as the core project management system** for tracking all work.
## Key Features
- **Linear Integration**: All work is tracked as Linear issues, not local files
- **Real-time Visibility**: Watch agent progress directly in your Linear workspace
- **Session Handoff**: Agents communicate via Linear comments, not text files
- **Two-Agent Pattern**: Initializer creates Linear project & issues, coding agents implement them
- **Initializer Bis**: Add new features to existing projects without re-initializing
- **Browser Testing**: Puppeteer MCP for UI verification
- **Claude Opus 4.5**: Uses Claude's most capable model by default
## Prerequisites
### 1. Install Claude Code CLI and Python SDK
```bash
# Install Claude Code CLI (latest version required)
npm install -g @anthropic-ai/claude-code
# Install Python dependencies
pip install -r requirements.txt
```
### 2. Set Up Authentication
You need two authentication tokens:
**Claude Code OAuth Token:**
```bash
# Generate the token using Claude Code CLI
claude setup-token
# Set the environment variable
export CLAUDE_CODE_OAUTH_TOKEN='your-oauth-token-here'
```
**Linear API Key:**
```bash
# Get your API key from: https://linear.app/YOUR-TEAM/settings/api
export LINEAR_API_KEY='lin_api_xxxxxxxxxxxxx'
```
### 3. Verify Installation
```bash
claude --version # Should be latest version
pip show claude-code-sdk # Check SDK is installed
```
## Quick Start
```bash
# Initialize a new project
python autonomous_agent_demo.py --project-dir ./my_project
# Add new features to an existing project
python autonomous_agent_demo.py --project-dir ./my_project --new-spec app_spec_theme_customization.txt
```
For testing with limited iterations:
```bash
python autonomous_agent_demo.py --project-dir ./my_project --max-iterations 3
```
## How It Works
### Linear-Centric Workflow
```
┌─────────────────────────────────────────────────────────────┐
│ LINEAR-INTEGRATED WORKFLOW │
├─────────────────────────────────────────────────────────────┤
│ app_spec.txt ──► Initializer Agent ──► Linear Issues (50) │
│ │ │
│ ┌─────────────────────────▼──────────┐ │
│ │ LINEAR WORKSPACE │ │
│ │ ┌────────────────────────────┐ │ │
│ │ │ Issue: Auth - Login flow │ │ │
│ │ │ Status: Todo → In Progress │ │ │
│ │ │ Comments: [session notes] │ │ │
│ │ └────────────────────────────┘ │ │
│ └────────────────────────────────────┘ │
│ │ │
│ Coding Agent queries Linear │
│ ├── Search for Todo issues │
│ ├── Update status to In Progress │
│ ├── Implement & test with Puppeteer │
│ ├── Add comment with implementation notes│
│ └── Update status to Done │
└─────────────────────────────────────────────────────────────┘
```
### Two-Agent Pattern
1. **Initializer Agent (Session 1):**
- Reads `app_spec.txt`
- Lists teams and creates a new Linear project
- Creates 50 Linear issues with detailed test steps
- Creates a META issue for session tracking
- Sets up project structure, `init.sh`, and git
2. **Coding Agent (Sessions 2+):**
- Queries Linear for highest-priority Todo issue
- Runs verification tests on previously completed features
- Claims issue (status → In Progress)
- Implements the feature
- Tests via Puppeteer browser automation
- Adds implementation comment to issue
- Marks complete (status → Done)
- Updates META issue with session summary
### Initializer Bis: Adding New Features
The **Initializer Bis** agent allows you to add new features to an existing project without re-initializing it. This is useful when you want to extend your application with additional functionality.
**How it works:**
1. Create a new specification file (e.g., `app_spec_theme_customization.txt`) in the `prompts/` directory
2. Run the agent with `--new-spec` flag pointing to your new spec file
3. The Initializer Bis agent will:
- Read the existing project state from `.linear_project.json`
- Read the new specification file
- Create new Linear issues for each `<feature>` tag in the spec
- Add these issues to the existing Linear project
- Update the META issue with information about the new features
- Copy the new spec file to the project directory
**Example:**
```bash
# Add theme customization features to an existing project
python autonomous_agent_demo.py --project-dir ./my_project --new-spec app_spec_theme_customization.txt
```
This will create multiple Linear issues (one per `<feature>` tag) that will be worked on by subsequent coding agent sessions.
### Session Handoff via Linear
Instead of local text files, agents communicate through:
- **Issue Comments**: Implementation details, blockers, context
- **META Issue**: Session summaries and handoff notes
- **Issue Status**: Todo / In Progress / Done workflow
## Environment Variables
| Variable | Description | Required |
|----------|-------------|----------|
| `CLAUDE_CODE_OAUTH_TOKEN` | Claude Code OAuth token (from `claude setup-token`) | Yes |
| `LINEAR_API_KEY` | Linear API key for MCP access | Yes |
## Command Line Options
| Option | Description | Default |
|--------|-------------|---------|
| `--project-dir` | Directory for the project | `./autonomous_demo_project` |
| `--max-iterations` | Max agent iterations | Unlimited |
| `--model` | Claude model to use | `claude-opus-4-5-20251101` |
| `--new-spec` | Name of new specification file to add (e.g., 'app_spec_new1.txt'). Use this to add new features to an existing project. | None |
## Project Structure
```
linear-agent-harness/
├── autonomous_agent_demo.py # Main entry point
├── agent.py # Agent session logic
├── client.py # Claude SDK + MCP client configuration
├── security.py # Bash command allowlist and validation
├── progress.py # Progress tracking utilities
├── prompts.py # Prompt loading utilities
├── linear_config.py # Linear configuration constants
├── prompts/
│ ├── app_spec.txt # Application specification
│ ├── app_spec_theme_customization.txt # Example: Theme customization spec
│ ├── app_spec_mistral_extensible.txt # Example: Mistral provider spec
│ ├── initializer_prompt.md # First session prompt (creates Linear issues)
│ ├── initializer_bis_prompt.md # Prompt for adding new features
│ └── coding_prompt.md # Continuation session prompt (works issues)
└── requirements.txt # Python dependencies
```
## Generated Project Structure
After running, your project directory will contain:
```
my_project/
├── .linear_project.json # Linear project state (marker file)
├── app_spec.txt # Copied specification
├── app_spec_theme_customization.txt # New spec file (if using --new-spec)
├── init.sh # Environment setup script
├── .claude_settings.json # Security settings
└── [application files] # Generated application code
```
## MCP Servers Used
| Server | Transport | Purpose |
|--------|-----------|---------|
| **Linear** | HTTP (Streamable HTTP) | Project management - issues, status, comments |
| **Puppeteer** | stdio | Browser automation for UI testing |
## Security Model
This demo uses defense-in-depth security (see `security.py` and `client.py`):
1. **OS-level Sandbox:** Bash commands run in an isolated environment
2. **Filesystem Restrictions:** File operations restricted to project directory
3. **Bash Allowlist:** Only specific commands permitted (npm, node, git, etc.)
4. **MCP Permissions:** Tools explicitly allowed in security settings
## Linear Setup
Before running, ensure you have:
1. A Linear workspace with at least one team
2. An API key with read/write permissions (from Settings > API)
3. The agent will automatically detect your team and create a project
The initializer agent will create:
- A new Linear project named after your app
- 50 feature issues based on `app_spec.txt`
- 1 META issue for session tracking and handoff
All subsequent coding agents will work from this Linear project.
## Customization
### Changing the Application
Edit `prompts/app_spec.txt` to specify a different application to build.
### Adding New Features to Existing Projects
1. Create a new specification file in `prompts/` directory (e.g., `app_spec_new_feature.txt`)
2. Format it with `<feature>` tags following the same structure as `app_spec.txt`
3. Run with `--new-spec` flag:
```bash
python autonomous_agent_demo.py --project-dir ./my_project --new-spec app_spec_new_feature.txt
```
4. The Initializer Bis agent will create new Linear issues for each feature in the spec file
### Adjusting Issue Count
Edit `prompts/initializer_prompt.md` and change "50 issues" to your desired count.
### Modifying Allowed Commands
Edit `security.py` to add or remove commands from `ALLOWED_COMMANDS`.
## Troubleshooting
**"CLAUDE_CODE_OAUTH_TOKEN not set"**
Run `claude setup-token` to generate a token, then export it.
**"LINEAR_API_KEY not set"**
Get your API key from `https://linear.app/YOUR-TEAM/settings/api`
**"Appears to hang on first run"**
Normal behavior. The initializer is creating a Linear project and 50 issues with detailed descriptions. Watch for `[Tool: mcp__linear__create_issue]` output.
**"Command blocked by security hook"**
The agent tried to run a disallowed command. Add it to `ALLOWED_COMMANDS` in `security.py` if needed.
**"MCP server connection failed"**
Verify your `LINEAR_API_KEY` is valid and has appropriate permissions. The Linear MCP server uses HTTP transport at `https://mcp.linear.app/mcp`.
## Viewing Progress
Open your Linear workspace to see:
- The project created by the initializer agent
- All 50 issues organized under the project
- Real-time status changes (Todo → In Progress → Done)
- Implementation comments on each issue
- Session summaries on the META issue
- New issues added by Initializer Bis when using `--new-spec`
## License
MIT License - see [LICENSE](LICENSE) for details.

231
agent.py Normal file
View File

@@ -0,0 +1,231 @@
"""
Agent Session Logic
===================
Core agent interaction functions for running autonomous coding sessions.
"""
import asyncio
from pathlib import Path
from typing import Optional
from claude_code_sdk import ClaudeSDKClient
from client import create_client
from progress import print_session_header, print_progress_summary, is_linear_initialized
from prompts import (
get_initializer_prompt,
get_initializer_bis_prompt,
get_coding_prompt,
copy_spec_to_project,
copy_new_spec_to_project,
)
# Configuration
AUTO_CONTINUE_DELAY_SECONDS = 3
async def run_agent_session(
client: ClaudeSDKClient,
message: str,
project_dir: Path,
) -> tuple[str, str]:
"""
Run a single agent session using Claude Agent SDK.
Args:
client: Claude SDK client
message: The prompt to send
project_dir: Project directory path
Returns:
(status, response_text) where status is:
- "continue" if agent should continue working
- "error" if an error occurred
"""
print("Sending prompt to Claude Agent SDK...\n")
try:
# Send the query
await client.query(message)
# Collect response text and show tool use
response_text = ""
async for msg in client.receive_response():
msg_type = type(msg).__name__
# Handle AssistantMessage (text and tool use)
if msg_type == "AssistantMessage" and hasattr(msg, "content"):
for block in msg.content:
block_type = type(block).__name__
if block_type == "TextBlock" and hasattr(block, "text"):
response_text += block.text
print(block.text, end="", flush=True)
elif block_type == "ToolUseBlock" and hasattr(block, "name"):
print(f"\n[Tool: {block.name}]", flush=True)
if hasattr(block, "input"):
input_str = str(block.input)
if len(input_str) > 200:
print(f" Input: {input_str[:200]}...", flush=True)
else:
print(f" Input: {input_str}", flush=True)
# Handle UserMessage (tool results)
elif msg_type == "UserMessage" and hasattr(msg, "content"):
for block in msg.content:
block_type = type(block).__name__
if block_type == "ToolResultBlock":
result_content = getattr(block, "content", "")
is_error = getattr(block, "is_error", False)
# Check if command was blocked by security hook
if "blocked" in str(result_content).lower():
print(f" [BLOCKED] {result_content}", flush=True)
elif is_error:
# Show errors (truncated)
error_str = str(result_content)[:500]
print(f" [Error] {error_str}", flush=True)
else:
# Tool succeeded - just show brief confirmation
print(" [Done]", flush=True)
print("\n" + "-" * 70 + "\n")
return "continue", response_text
except Exception as e:
print(f"Error during agent session: {e}")
return "error", str(e)
async def run_autonomous_agent(
project_dir: Path,
model: str,
max_iterations: Optional[int] = None,
new_spec_filename: Optional[str] = None,
) -> None:
"""
Run the autonomous agent loop.
Args:
project_dir: Directory for the project
model: Claude model to use
max_iterations: Maximum number of iterations (None for unlimited)
"""
print("\n" + "=" * 70)
print(" AUTONOMOUS CODING AGENT DEMO")
print("=" * 70)
print(f"\nProject directory: {project_dir}")
print(f"Model: {model}")
if max_iterations:
print(f"Max iterations: {max_iterations}")
else:
print("Max iterations: Unlimited (will run until completion)")
print()
# Create project directory
project_dir.mkdir(parents=True, exist_ok=True)
# Check if this is a fresh start, continuation, or adding new specs
# We use .linear_project.json as the marker for initialization
is_first_run = not is_linear_initialized(project_dir)
use_initializer_bis = new_spec_filename is not None and not is_first_run
if is_first_run:
print("Fresh start - will use initializer agent")
print()
print("=" * 70)
print(" NOTE: First session takes 10-20+ minutes!")
print(" The agent is creating 50 Linear issues and setting up the project.")
print(" This may appear to hang - it's working. Watch for [Tool: ...] output.")
print("=" * 70)
print()
# Copy the app spec into the project directory for the agent to read
copy_spec_to_project(project_dir)
elif use_initializer_bis:
print("Adding new specifications - will use initializer bis agent")
print()
print("=" * 70)
print(f" NOTE: Adding new features from {new_spec_filename}")
print(" The agent will create new Linear issues for the additional features.")
print(" This may take several minutes. Watch for [Tool: ...] output.")
print("=" * 70)
print()
# Copy the new spec file into the project directory
copy_new_spec_to_project(project_dir, new_spec_filename)
print_progress_summary(project_dir)
else:
print("Continuing existing project (Linear initialized)")
print_progress_summary(project_dir)
# Main loop
iteration = 0
while True:
iteration += 1
# Check max iterations
if max_iterations and iteration > max_iterations:
print(f"\nReached max iterations ({max_iterations})")
print("To continue, run the script again without --max-iterations")
break
# Print session header
is_initializer_session = is_first_run or (use_initializer_bis and iteration == 1)
is_bis_session = use_initializer_bis and iteration == 1
print_session_header(iteration, is_initializer_session, is_bis_session)
# Create client (fresh context)
client = create_client(project_dir, model)
# Choose prompt based on session type
if is_first_run:
prompt = get_initializer_prompt()
is_first_run = False # Only use initializer once
elif use_initializer_bis and iteration == 1:
prompt = get_initializer_bis_prompt()
use_initializer_bis = False # Only use initializer bis once
else:
prompt = get_coding_prompt()
# Run session with async context manager
async with client:
status, response = await run_agent_session(client, prompt, project_dir)
# Handle status
if status == "continue":
print(f"\nAgent will auto-continue in {AUTO_CONTINUE_DELAY_SECONDS}s...")
print_progress_summary(project_dir)
await asyncio.sleep(AUTO_CONTINUE_DELAY_SECONDS)
elif status == "error":
print("\nSession encountered an error")
print("Will retry with a fresh session...")
await asyncio.sleep(AUTO_CONTINUE_DELAY_SECONDS)
# Small delay between sessions
if max_iterations is None or iteration < max_iterations:
print("\nPreparing next session...\n")
await asyncio.sleep(1)
# Final summary
print("\n" + "=" * 70)
print(" SESSION COMPLETE")
print("=" * 70)
print(f"\nProject directory: {project_dir}")
print_progress_summary(project_dir)
# Print instructions for running the generated application
print("\n" + "-" * 70)
print(" TO RUN THE GENERATED APPLICATION:")
print("-" * 70)
print(f"\n cd {project_dir.resolve()}")
print(" ./init.sh # Run the setup script")
print(" # Or manually:")
print(" npm install && npm run dev")
print("\n Then open http://localhost:3000 (or check init.sh for the URL)")
print("-" * 70)
print("\nDone!")

138
autonomous_agent_demo.py Normal file
View File

@@ -0,0 +1,138 @@
#!/usr/bin/env python3
"""
Autonomous Coding Agent Demo
============================
A minimal harness demonstrating long-running autonomous coding with Claude.
This script implements the two-agent pattern (initializer + coding agent) and
incorporates all the strategies from the long-running agents guide.
Example Usage:
python autonomous_agent_demo.py --project-dir ./claude_clone_demo
python autonomous_agent_demo.py --project-dir ./claude_clone_demo --max-iterations 5
"""
import argparse
import asyncio
import os
from pathlib import Path
from agent import run_autonomous_agent
# Configuration
# Using Claude Opus 4.5 as default for best coding and agentic performance
# See: https://www.anthropic.com/news/claude-opus-4-5
DEFAULT_MODEL = "claude-opus-4-5-20251101"
def parse_args() -> argparse.Namespace:
"""Parse command line arguments."""
parser = argparse.ArgumentParser(
description="Autonomous Coding Agent Demo - Long-running agent harness",
formatter_class=argparse.RawDescriptionHelpFormatter,
epilog="""
Examples:
# Start fresh project
python autonomous_agent_demo.py --project-dir ./claude_clone
# Use a specific model
python autonomous_agent_demo.py --project-dir ./claude_clone --model claude-sonnet-4-5-20250929
# Limit iterations for testing
python autonomous_agent_demo.py --project-dir ./claude_clone --max-iterations 5
# Continue existing project
python autonomous_agent_demo.py --project-dir ./claude_clone
# Add new specifications to existing project
python autonomous_agent_demo.py --project-dir ./claude_clone --new-spec app_spec_new1.txt
Environment Variables:
CLAUDE_CODE_OAUTH_TOKEN Claude Code OAuth token (required)
LINEAR_API_KEY Linear API key (required)
""",
)
parser.add_argument(
"--project-dir",
type=Path,
default=Path("./autonomous_demo_project"),
help="Directory for the project (default: generations/autonomous_demo_project). Relative paths automatically placed in generations/ directory.",
)
parser.add_argument(
"--max-iterations",
type=int,
default=None,
help="Maximum number of agent iterations (default: unlimited)",
)
parser.add_argument(
"--model",
type=str,
default=DEFAULT_MODEL,
help=f"Claude model to use (default: {DEFAULT_MODEL})",
)
parser.add_argument(
"--new-spec",
type=str,
default=None,
help="Name of new specification file to add (e.g., 'app_spec_new1.txt'). Use this to add new features to an existing project.",
)
return parser.parse_args()
def main() -> None:
"""Main entry point."""
args = parse_args()
# Check for Claude Code OAuth token
if not os.environ.get("CLAUDE_CODE_OAUTH_TOKEN"):
print("Error: CLAUDE_CODE_OAUTH_TOKEN environment variable not set")
print("\nRun 'claude setup-token' after installing the Claude Code CLI.")
print("\nThen set it:")
print(" export CLAUDE_CODE_OAUTH_TOKEN='your-token-here'")
return
# Check for Linear API key
if not os.environ.get("LINEAR_API_KEY"):
print("Error: LINEAR_API_KEY environment variable not set")
print("\nGet your API key from: https://linear.app/YOUR-TEAM/settings/api")
print("\nThen set it:")
print(" export LINEAR_API_KEY='lin_api_xxxxxxxxxxxxx'")
return
# Automatically place projects in generations/ directory unless already specified
project_dir = args.project_dir
if not str(project_dir).startswith("generations/"):
# Convert relative paths to be under generations/
if project_dir.is_absolute():
# If absolute path, use as-is
pass
else:
# Prepend generations/ to relative paths
project_dir = Path("generations") / project_dir
# Run the agent
try:
asyncio.run(
run_autonomous_agent(
project_dir=project_dir,
model=args.model,
max_iterations=args.max_iterations,
new_spec_filename=args.new_spec,
)
)
except KeyboardInterrupt:
print("\n\nInterrupted by user")
print("To resume, run the same command again")
except Exception as e:
print(f"\nFatal error: {e}")
raise
if __name__ == "__main__":
main()

169
client.py Normal file
View File

@@ -0,0 +1,169 @@
"""
Claude SDK Client Configuration
===============================
Functions for creating and configuring the Claude Agent SDK client.
"""
import json
import os
from pathlib import Path
from claude_code_sdk import ClaudeCodeOptions, ClaudeSDKClient
from claude_code_sdk.types import HookMatcher
from security import bash_security_hook
# Puppeteer MCP tools for browser automation
PUPPETEER_TOOLS = [
"mcp__puppeteer__puppeteer_navigate",
"mcp__puppeteer__puppeteer_screenshot",
"mcp__puppeteer__puppeteer_click",
"mcp__puppeteer__puppeteer_fill",
"mcp__puppeteer__puppeteer_select",
"mcp__puppeteer__puppeteer_hover",
"mcp__puppeteer__puppeteer_evaluate",
]
# Linear MCP tools for project management
# Official Linear MCP server at mcp.linear.app
LINEAR_TOOLS = [
# Team & Project discovery
"mcp__linear__list_teams",
"mcp__linear__get_team",
"mcp__linear__list_projects",
"mcp__linear__get_project",
"mcp__linear__create_project",
"mcp__linear__update_project",
# Issue management
"mcp__linear__list_issues",
"mcp__linear__get_issue",
"mcp__linear__create_issue",
"mcp__linear__update_issue",
"mcp__linear__list_my_issues",
# Comments
"mcp__linear__list_comments",
"mcp__linear__create_comment",
# Workflow
"mcp__linear__list_issue_statuses",
"mcp__linear__get_issue_status",
"mcp__linear__list_issue_labels",
# Users
"mcp__linear__list_users",
"mcp__linear__get_user",
]
# Built-in tools
BUILTIN_TOOLS = [
"Read",
"Write",
"Edit",
"Glob",
"Grep",
"Bash",
]
def create_client(project_dir: Path, model: str) -> ClaudeSDKClient:
"""
Create a Claude Agent SDK client with multi-layered security.
Args:
project_dir: Directory for the project
model: Claude model to use
Returns:
Configured ClaudeSDKClient
Security layers (defense in depth):
1. Sandbox - OS-level bash command isolation prevents filesystem escape
2. Permissions - File operations restricted to project_dir only
3. Security hooks - Bash commands validated against an allowlist
(see security.py for ALLOWED_COMMANDS)
"""
api_key = os.environ.get("CLAUDE_CODE_OAUTH_TOKEN")
if not api_key:
raise ValueError(
"CLAUDE_CODE_OAUTH_TOKEN environment variable not set.\n"
"Run 'claude setup-token after installing the Claude Code CLI."
)
linear_api_key = os.environ.get("LINEAR_API_KEY")
if not linear_api_key:
raise ValueError(
"LINEAR_API_KEY environment variable not set.\n"
"Get your API key from: https://linear.app/YOUR-TEAM/settings/api"
)
# Create comprehensive security settings
# Note: Using relative paths ("./**") restricts access to project directory
# since cwd is set to project_dir
security_settings = {
"sandbox": {"enabled": True, "autoAllowBashIfSandboxed": True},
"permissions": {
"defaultMode": "acceptEdits", # Auto-approve edits within allowed directories
"allow": [
# Allow all file operations within the project directory
"Read(./**)",
"Write(./**)",
"Edit(./**)",
"Glob(./**)",
"Grep(./**)",
# Bash permission granted here, but actual commands are validated
# by the bash_security_hook (see security.py for allowed commands)
"Bash(*)",
# Allow Puppeteer MCP tools for browser automation
*PUPPETEER_TOOLS,
# Allow Linear MCP tools for project management
*LINEAR_TOOLS,
],
},
}
# Ensure project directory exists before creating settings file
project_dir.mkdir(parents=True, exist_ok=True)
# Write settings to a file in the project directory
settings_file = project_dir / ".claude_settings.json"
with open(settings_file, "w") as f:
json.dump(security_settings, f, indent=2)
print(f"Created security settings at {settings_file}")
print(" - Sandbox enabled (OS-level bash isolation)")
print(f" - Filesystem restricted to: {project_dir.resolve()}")
print(" - Bash commands restricted to allowlist (see security.py)")
print(" - MCP servers: puppeteer (browser automation), linear (project management)")
print()
return ClaudeSDKClient(
options=ClaudeCodeOptions(
model=model,
system_prompt="You are an expert full-stack developer building a production-quality web application. You use Linear for project management and tracking all your work.",
allowed_tools=[
*BUILTIN_TOOLS,
*PUPPETEER_TOOLS,
*LINEAR_TOOLS,
],
mcp_servers={
"puppeteer": {"command": "npx", "args": ["puppeteer-mcp-server"]},
# Linear MCP with Streamable HTTP transport (recommended over SSE)
# See: https://linear.app/docs/mcp
"linear": {
"type": "http",
"url": "https://mcp.linear.app/mcp",
"headers": {
"Authorization": f"Bearer {linear_api_key}"
}
}
},
hooks={
"PreToolUse": [
HookMatcher(matcher="Bash", hooks=[bash_security_hook]),
],
},
max_turns=1000,
cwd=str(project_dir.resolve()),
settings=str(settings_file.resolve()), # Use absolute path
)
)

38
linear_config.py Normal file
View File

@@ -0,0 +1,38 @@
"""
Linear Configuration
====================
Configuration constants for Linear integration.
These values are used in prompts and for project state management.
"""
import os
# Environment variables (must be set before running)
LINEAR_API_KEY = os.environ.get("LINEAR_API_KEY")
LINEAR_TEAM_ID = os.environ.get("LINEAR_TEAM_ID")
# Default number of issues to create (can be overridden via command line)
DEFAULT_ISSUE_COUNT = 50
# Issue status workflow (Linear default states)
STATUS_TODO = "Todo"
STATUS_IN_PROGRESS = "In Progress"
STATUS_DONE = "Done"
# Label categories (map to feature types)
LABEL_FUNCTIONAL = "functional"
LABEL_STYLE = "style"
LABEL_INFRASTRUCTURE = "infrastructure"
# Priority mapping (Linear uses 0-4 where 1=Urgent, 4=Low, 0=No priority)
PRIORITY_URGENT = 1
PRIORITY_HIGH = 2
PRIORITY_MEDIUM = 3
PRIORITY_LOW = 4
# Local marker file to track Linear project initialization
LINEAR_PROJECT_MARKER = ".linear_project.json"
# Meta issue title for project tracking and session handoff
META_ISSUE_TITLE = "[META] Project Progress Tracker"

86
progress.py Normal file
View File

@@ -0,0 +1,86 @@
"""
Progress Tracking Utilities
===========================
Functions for tracking and displaying progress of the autonomous coding agent.
Progress is tracked via Linear issues, with local state cached in .linear_project.json.
"""
import json
from pathlib import Path
from linear_config import LINEAR_PROJECT_MARKER
def load_linear_project_state(project_dir: Path) -> dict | None:
"""
Load the Linear project state from the marker file.
Args:
project_dir: Directory containing .linear_project.json
Returns:
Project state dict or None if not initialized
"""
marker_file = project_dir / LINEAR_PROJECT_MARKER
if not marker_file.exists():
return None
try:
with open(marker_file, "r") as f:
return json.load(f)
except (json.JSONDecodeError, IOError):
return None
def is_linear_initialized(project_dir: Path) -> bool:
"""
Check if Linear project has been initialized.
Args:
project_dir: Directory to check
Returns:
True if .linear_project.json exists and is valid
"""
state = load_linear_project_state(project_dir)
return state is not None and state.get("initialized", False)
def print_session_header(session_num: int, is_initializer: bool, is_initializer_bis: bool = False) -> None:
"""Print a formatted header for the session."""
if is_initializer_bis:
session_type = "INITIALIZER BIS"
elif is_initializer:
session_type = "INITIALIZER"
else:
session_type = "CODING AGENT"
print("\n" + "=" * 70)
print(f" SESSION {session_num}: {session_type}")
print("=" * 70)
print()
def print_progress_summary(project_dir: Path) -> None:
"""
Print a summary of current progress.
Since actual progress is tracked in Linear, this reads the local
state file for cached information. The agent updates Linear directly
and reports progress in session comments.
"""
state = load_linear_project_state(project_dir)
if state is None:
print("\nProgress: Linear project not yet initialized")
return
total = state.get("total_issues", 0)
meta_issue = state.get("meta_issue_id", "unknown")
print(f"\nLinear Project Status:")
print(f" Total issues created: {total}")
print(f" META issue ID: {meta_issue}")
print(f" (Check Linear for current Done/In Progress/Todo counts)")

63
prompts.py Normal file
View File

@@ -0,0 +1,63 @@
"""
Prompt Loading Utilities
========================
Functions for loading prompt templates from the prompts directory.
"""
import shutil
from pathlib import Path
PROMPTS_DIR = Path(__file__).parent / "prompts"
def load_prompt(name: str) -> str:
"""Load a prompt template from the prompts directory."""
prompt_path = PROMPTS_DIR / f"{name}.md"
return prompt_path.read_text()
def get_initializer_prompt() -> str:
"""Load the initializer prompt."""
return load_prompt("initializer_prompt")
def get_coding_prompt() -> str:
"""Load the coding agent prompt."""
return load_prompt("coding_prompt")
def copy_spec_to_project(project_dir: Path) -> None:
"""Copy the app spec file into the project directory for the agent to read."""
spec_source = PROMPTS_DIR / "app_spec.txt"
spec_dest = project_dir / "app_spec.txt"
if not spec_dest.exists():
shutil.copy(spec_source, spec_dest)
print("Copied app_spec.txt to project directory")
############################################################################################
# New specifications added by davebb
############################################################################################
def get_initializer_bis_prompt() -> str:
"""Load the initializer bis prompt for adding new specifications."""
return load_prompt("initializer_bis_prompt")
def copy_new_spec_to_project(project_dir: Path, new_spec_filename: str) -> None:
"""
Copy a new specification file into the project directory for the agent to read.
Args:
project_dir: Project directory path
new_spec_filename: Name of the new spec file (e.g., "app_spec_new1.txt")
"""
spec_source = PROMPTS_DIR / new_spec_filename
if not spec_source.exists():
raise FileNotFoundError(f"New specification file not found: {spec_source}")
spec_dest = project_dir / new_spec_filename
shutil.copy(spec_source, spec_dest)
print(f"Copied {new_spec_filename} to project directory")

681
prompts/app_spec.txt Normal file
View File

@@ -0,0 +1,681 @@
<project_specification>
<project_name>Claude.ai Clone - AI Chat Interface</project_name>
<overview>
Build a fully functional clone of claude.ai, Anthropic's conversational AI interface. The application should
provide a clean, modern chat interface for interacting with Claude via the API, including features like
conversation management, artifact rendering, project organization, multiple model selection, and advanced
settings. The UI should closely match claude.ai's design using Tailwind CSS with a focus on excellent
user experience and responsive design.
</overview>
<technology_stack>
<api_key>
You can use an API key located at /tmp/api-key for testing. You will not be allowed to read this file, but you can reference it in code.
</api_key>
<frontend>
<framework>React with Vite</framework>
<styling>Tailwind CSS (via CDN)</styling>
<state_management>React hooks and context</state_management>
<routing>React Router for navigation</routing>
<markdown>React Markdown for message rendering</markdown>
<code_highlighting>Syntax highlighting for code blocks</code_highlighting>
<port>Only launch on port {frontend_port}</port>
</frontend>
<backend>
<runtime>Node.js with Express</runtime>
<database>SQLite with better-sqlite3</database>
<api_integration>Claude API for chat completions</api_integration>
<streaming>Server-Sent Events for streaming responses</streaming>
</backend>
<communication>
<api>RESTful endpoints</api>
<streaming>SSE for real-time message streaming</streaming>
<claude_api>Integration with Claude API using Anthropic SDK</claude_api>
</communication>
</technology_stack>
<prerequisites>
<environment_setup>
- Repository includes .env with VITE_ANTHROPIC_API_KEY configured
- Frontend dependencies pre-installed via pnpm
- Backend code goes in /server directory
- Install backend dependencies as needed
</environment_setup>
</prerequisites>
<core_features>
<chat_interface>
- Clean, centered chat layout with message bubbles
- Streaming message responses with typing indicator
- Markdown rendering with proper formatting
- Code blocks with syntax highlighting and copy button
- LaTeX/math equation rendering
- Image upload and display in messages
- Multi-turn conversations with context
- Message editing and regeneration
- Stop generation button during streaming
- Input field with auto-resize textarea
- Character count and token estimation
- Keyboard shortcuts (Enter to send, Shift+Enter for newline)
</chat_interface>
<artifacts>
- Artifact detection and rendering in side panel
- Code artifact viewer with syntax highlighting
- HTML/SVG preview with live rendering
- React component preview
- Mermaid diagram rendering
- Text document artifacts
- Artifact editing and re-prompting
- Full-screen artifact view
- Download artifact content
- Artifact versioning and history
</artifacts>
<conversation_management>
- Create new conversations
- Conversation list in sidebar
- Rename conversations
- Delete conversations
- Search conversations by title/content
- Pin important conversations
- Archive conversations
- Conversation folders/organization
- Duplicate conversation
- Export conversation (JSON, Markdown, PDF)
- Conversation timestamps (created, last updated)
- Unread message indicators
</conversation_management>
<projects>
- Create projects to group related conversations
- Project knowledge base (upload documents)
- Project-specific custom instructions
- Share projects with team (mock feature)
- Project settings and configuration
- Move conversations between projects
- Project templates
- Project analytics (usage stats)
</projects>
<model_selection>
- Model selector dropdown with the following models:
- Claude Sonnet 4.5 (claude-sonnet-4-5-20250929) - default
- Claude Haiku 4.5 (claude-haiku-4-5-20251001)
- Claude Opus 4.1 (claude-opus-4-1-20250805)
- Model capabilities display
- Context window indicator
- Model-specific pricing info (display only)
- Switch models mid-conversation
- Model comparison view
</model_selection>
<custom_instructions>
- Global custom instructions
- Project-specific custom instructions
- Conversation-specific system prompts
- Custom instruction templates
- Preview how instructions affect responses
</custom_instructions>
<settings_preferences>
- Theme selection (Light, Dark, Auto)
- Font size adjustment
- Message density (compact, comfortable, spacious)
- Code theme selection
- Language preferences
- Accessibility options
- Keyboard shortcuts reference
- Data export options
- Privacy settings
- API key management
</settings_preferences>
<advanced_features>
- Temperature control slider
- Max tokens adjustment
- Top-p (nucleus sampling) control
- System prompt override
- Thinking/reasoning mode toggle
- Multi-modal input (text + images)
- Voice input (optional, mock UI)
- Response suggestions
- Related prompts
- Conversation branching
</advanced_features>
<collaboration>
- Share conversation via link (read-only)
- Export conversation formats
- Conversation templates
- Prompt library
- Share artifacts
- Team workspaces (mock UI)
</collaboration>
<search_discovery>
- Search across all conversations
- Filter by project, date, model
- Prompt library with categories
- Example conversations
- Quick actions menu
- Command palette (Cmd/Ctrl+K)
</search_discovery>
<usage_tracking>
- Token usage display per message
- Conversation cost estimation
- Daily/monthly usage dashboard
- Usage limits and warnings
- API quota tracking
</usage_tracking>
<onboarding>
- Welcome screen for new users
- Feature tour highlights
- Example prompts to get started
- Quick tips and best practices
- Keyboard shortcuts tutorial
</onboarding>
<accessibility>
- Full keyboard navigation
- Screen reader support
- ARIA labels and roles
- High contrast mode
- Focus management
- Reduced motion support
</accessibility>
<responsive_design>
- Mobile-first responsive layout
- Touch-optimized interface
- Collapsible sidebar on mobile
- Swipe gestures for navigation
- Adaptive artifact display
- Progressive Web App (PWA) support
</responsive_design>
</core_features>
<database_schema>
<tables>
<users>
- id, email, name, avatar_url
- created_at, last_login
- preferences (JSON: theme, font_size, etc.)
- custom_instructions
</users>
<projects>
- id, user_id, name, description, color
- custom_instructions, knowledge_base_path
- created_at, updated_at
- is_archived, is_pinned
</projects>
<conversations>
- id, user_id, project_id, title
- model, created_at, updated_at, last_message_at
- is_archived, is_pinned, is_deleted
- settings (JSON: temperature, max_tokens, etc.)
- token_count, message_count
</conversations>
<messages>
- id, conversation_id, role (user/assistant/system)
- content, created_at, edited_at
- tokens, finish_reason
- images (JSON array of image data)
- parent_message_id (for branching)
</messages>
<artifacts>
- id, message_id, conversation_id
- type (code/html/svg/react/mermaid/text)
- title, identifier, language
- content, version
- created_at, updated_at
</artifacts>
<shared_conversations>
- id, conversation_id, share_token
- created_at, expires_at, view_count
- is_public
</shared_conversations>
<prompt_library>
- id, user_id, title, description
- prompt_template, category, tags (JSON)
- is_public, usage_count
- created_at, updated_at
</prompt_library>
<conversation_folders>
- id, user_id, project_id, name, parent_folder_id
- created_at, position
</conversation_folders>
<conversation_folder_items>
- id, folder_id, conversation_id
</conversation_folder_items>
<usage_tracking>
- id, user_id, conversation_id, message_id
- model, input_tokens, output_tokens
- cost_estimate, created_at
</usage_tracking>
<api_keys>
- id, user_id, key_name, api_key_hash
- created_at, last_used_at
- is_active
</api_keys>
</tables>
</database_schema>
<api_endpoints_summary>
<authentication>
- POST /api/auth/login
- POST /api/auth/logout
- GET /api/auth/me
- PUT /api/auth/profile
</authentication>
<conversations>
- GET /api/conversations
- POST /api/conversations
- GET /api/conversations/:id
- PUT /api/conversations/:id
- DELETE /api/conversations/:id
- POST /api/conversations/:id/duplicate
- POST /api/conversations/:id/export
- PUT /api/conversations/:id/archive
- PUT /api/conversations/:id/pin
- POST /api/conversations/:id/branch
</conversations>
<messages>
- GET /api/conversations/:id/messages
- POST /api/conversations/:id/messages
- PUT /api/messages/:id
- DELETE /api/messages/:id
- POST /api/messages/:id/regenerate
- GET /api/messages/stream (SSE endpoint)
</messages>
<artifacts>
- GET /api/conversations/:id/artifacts
- GET /api/artifacts/:id
- PUT /api/artifacts/:id
- DELETE /api/artifacts/:id
- POST /api/artifacts/:id/fork
- GET /api/artifacts/:id/versions
</artifacts>
<projects>
- GET /api/projects
- POST /api/projects
- GET /api/projects/:id
- PUT /api/projects/:id
- DELETE /api/projects/:id
- POST /api/projects/:id/knowledge
- GET /api/projects/:id/conversations
- PUT /api/projects/:id/settings
</projects>
<sharing>
- POST /api/conversations/:id/share
- GET /api/share/:token
- DELETE /api/share/:token
- PUT /api/share/:token/settings
</sharing>
<prompts>
- GET /api/prompts/library
- POST /api/prompts/library
- GET /api/prompts/:id
- PUT /api/prompts/:id
- DELETE /api/prompts/:id
- GET /api/prompts/categories
- GET /api/prompts/examples
</prompts>
<search>
- GET /api/search/conversations?q=query
- GET /api/search/messages?q=query
- GET /api/search/artifacts?q=query
- GET /api/search/prompts?q=query
</search>
<folders>
- GET /api/folders
- POST /api/folders
- PUT /api/folders/:id
- DELETE /api/folders/:id
- POST /api/folders/:id/items
- DELETE /api/folders/:id/items/:conversationId
</folders>
<usage>
- GET /api/usage/daily
- GET /api/usage/monthly
- GET /api/usage/by-model
- GET /api/usage/conversations/:id
</usage>
<settings>
- GET /api/settings
- PUT /api/settings
- GET /api/settings/custom-instructions
- PUT /api/settings/custom-instructions
</settings>
<claude_api>
- POST /api/claude/chat (proxy to Claude API)
- POST /api/claude/chat/stream (streaming proxy)
- GET /api/claude/models
- POST /api/claude/images/upload
</claude_api>
</api_endpoints_summary>
<ui_layout>
<main_structure>
- Three-column layout: sidebar (conversations), main (chat), panel (artifacts)
- Collapsible sidebar with resize handle
- Responsive breakpoints: mobile (single column), tablet (two column), desktop (three column)
- Persistent header with project/model selector
- Bottom input area with send button and options
</main_structure>
<sidebar_left>
- New chat button (prominent)
- Project selector dropdown
- Search conversations input
- Conversations list (grouped by date: Today, Yesterday, Previous 7 days, etc.)
- Folder tree view (collapsible)
- Settings gear icon at bottom
- User profile at bottom
</sidebar_left>
<main_chat_area>
- Conversation title (editable inline)
- Model selector badge
- Message history (scrollable)
- Welcome screen for new conversations
- Suggested prompts (empty state)
- Input area with formatting toolbar
- Attachment button for images
- Send button with loading state
- Stop generation button
</main_chat_area>
<artifacts_panel>
- Artifact header with title and type badge
- Code editor or preview pane
- Tabs for multiple artifacts
- Full-screen toggle
- Download button
- Edit/Re-prompt button
- Version selector
- Close panel button
</artifacts_panel>
<modals_overlays>
- Settings modal (tabbed interface)
- Share conversation modal
- Export options modal
- Project settings modal
- Prompt library modal
- Command palette overlay
- Keyboard shortcuts reference
</modals_overlays>
</ui_layout>
<design_system>
<color_palette>
- Primary: Orange/amber accent (#CC785C claude-style)
- Background: White (light mode), Dark gray (#1A1A1A dark mode)
- Surface: Light gray (#F5F5F5 light), Darker gray (#2A2A2A dark)
- Text: Near black (#1A1A1A light), Off-white (#E5E5E5 dark)
- Borders: Light gray (#E5E5E5 light), Dark gray (#404040 dark)
- Code blocks: Monaco editor theme
</color_palette>
<typography>
- Sans-serif system font stack (Inter, SF Pro, Roboto, system-ui)
- Headings: font-semibold
- Body: font-normal, leading-relaxed
- Code: Monospace (JetBrains Mono, Consolas, Monaco)
- Message text: text-base (16px), comfortable line-height
</typography>
<components>
<message_bubble>
- User messages: Right-aligned, subtle background
- Assistant messages: Left-aligned, no background
- Markdown formatting with proper spacing
- Inline code with bg-gray-100 background
- Code blocks with syntax highlighting
- Copy button on code blocks
</message_bubble>
<buttons>
- Primary: Orange/amber background, white text, rounded
- Secondary: Border style with hover fill
- Icon buttons: Square with hover background
- Disabled state: Reduced opacity, no pointer events
</buttons>
<inputs>
- Rounded borders with focus ring
- Textarea auto-resize
- Placeholder text in gray
- Error states in red
- Character counter
</inputs>
<cards>
- Subtle border or shadow
- Rounded corners (8px)
- Padding: p-4 to p-6
- Hover state: slight shadow increase
</cards>
</components>
<animations>
- Smooth transitions (150-300ms)
- Fade in for new messages
- Slide in for sidebar
- Typing indicator animation
- Loading spinner for generation
- Skeleton loaders for content
</animations>
</design_system>
<key_interactions>
<message_flow>
1. User types message in input field
2. Optional: Attach images via button
3. Click send or press Enter
4. Message appears in chat immediately
5. Typing indicator shows while waiting
6. Response streams in word by word
7. Code blocks render with syntax highlighting
8. Artifacts detected and rendered in side panel
9. Message complete, enable regenerate option
</message_flow>
<artifact_flow>
1. Assistant generates artifact in response
2. Artifact panel slides in from right
3. Content renders (code with highlighting or live preview)
4. User can edit artifact inline
5. "Re-prompt" button to iterate with Claude
6. Download or copy artifact content
7. Full-screen mode for detailed work
8. Close panel to return to chat focus
</artifact_flow>
<conversation_management>
1. Click "New Chat" to start fresh conversation
2. Conversations auto-save with first message
3. Auto-generate title from first exchange
4. Click title to rename inline
5. Drag conversations into folders
6. Right-click for context menu (pin, archive, delete, export)
7. Search filters conversations in real-time
8. Click conversation to switch context
</conversation_management>
</key_interactions>
<implementation_steps>
<step number="1">
<title>Setup Project Foundation and Database</title>
<tasks>
- Initialize Express server with SQLite database
- Set up Claude API client with streaming support
- Create database schema with migrations
- Implement authentication endpoints
- Set up basic CORS and middleware
- Create health check endpoint
</tasks>
</step>
<step number="2">
<title>Build Core Chat Interface</title>
<tasks>
- Create main layout with sidebar and chat area
- Implement message display with markdown rendering
- Add streaming message support with SSE
- Build input area with auto-resize textarea
- Add code block syntax highlighting
- Implement stop generation functionality
- Add typing indicators and loading states
</tasks>
</step>
<step number="3">
<title>Conversation Management</title>
<tasks>
- Create conversation list in sidebar
- Implement new conversation creation
- Add conversation switching
- Build conversation rename functionality
- Implement delete with confirmation
- Add conversation search
- Create conversation grouping by date
</tasks>
</step>
<step number="4">
<title>Artifacts System</title>
<tasks>
- Build artifact detection from Claude responses
- Create artifact rendering panel
- Implement code artifact viewer
- Add HTML/SVG live preview
- Build artifact editing interface
- Add artifact versioning
- Implement full-screen artifact view
</tasks>
</step>
<step number="5">
<title>Projects and Organization</title>
<tasks>
- Create projects CRUD endpoints
- Build project selector UI
- Implement project-specific custom instructions
- Add folder system for conversations
- Create drag-and-drop organization
- Build project settings panel
</tasks>
</step>
<step number="6">
<title>Advanced Features</title>
<tasks>
- Add model selection dropdown
- Implement temperature and parameter controls
- Build image upload functionality
- Create message editing and regeneration
- Add conversation branching
- Implement export functionality
</tasks>
</step>
<step number="7">
<title>Settings and Customization</title>
<tasks>
- Build settings modal with tabs
- Implement theme switching (light/dark)
- Add custom instructions management
- Create keyboard shortcuts
- Build prompt library
- Add usage tracking dashboard
</tasks>
</step>
<step number="8">
<title>Sharing and Collaboration</title>
<tasks>
- Implement conversation sharing with tokens
- Create public share view
- Add export to multiple formats
- Build prompt templates
- Create example conversations
</tasks>
</step>
<step number="9">
<title>Polish and Optimization</title>
<tasks>
- Optimize for mobile responsiveness
- Add command palette (Cmd+K)
- Implement comprehensive keyboard navigation
- Add onboarding flow
- Create accessibility improvements
- Performance optimization and caching
</tasks>
</step>
</implementation_steps>
<success_criteria>
<functionality>
- Streaming chat responses work smoothly
- Artifact detection and rendering accurate
- Conversation management intuitive and reliable
- Project organization clear and useful
- Image upload and display working
- All CRUD operations functional
</functionality>
<user_experience>
- Interface matches claude.ai design language
- Responsive on all device sizes
- Smooth animations and transitions
- Fast response times and minimal lag
- Intuitive navigation and workflows
- Clear feedback for all actions
</user_experience>
<technical_quality>
- Clean, maintainable code structure
- Proper error handling throughout
- Secure API key management
- Optimized database queries
- Efficient streaming implementation
- Comprehensive testing coverage
</technical_quality>
<design_polish>
- Consistent with claude.ai visual design
- Beautiful typography and spacing
- Smooth animations and micro-interactions
- Excellent contrast and accessibility
- Professional, polished appearance
- Dark mode fully implemented
</design_polish>
</success_criteria>
</project_specification>

View File

@@ -0,0 +1,448 @@
<project_specification>
<project_name>Claude.ai Clone - Multi-Provider Support (Mistral + Extensible)</project_name>
<overview>
This specification adds Mistral AI model support AND creates an extensible provider architecture
that makes it easy to add additional AI providers (OpenAI, Gemini, etc.) in the future.
This uses the "Open/Closed Principle" - open for extension, closed for modification.
All changes are additive and backward-compatible. Existing Claude functionality remains unchanged.
</overview>
<safety_requirements>
<critical>
- DO NOT modify existing Claude API integration code directly
- DO NOT change existing model selection logic for Claude models
- DO NOT modify existing database schema without safe migrations
- DO NOT break existing conversations or messages
- All new code must be in separate files/modules when possible
- Test thoroughly before marking issues as complete
- Maintain backward compatibility at all times
- Refactor Claude code to use BaseProvider WITHOUT changing functionality
</critical>
</safety_requirements>
<architecture_design>
<provider_pattern>
Create an abstract provider interface that all AI providers implement:
- BaseProvider (abstract class/interface) - defines common interface
- ClaudeProvider (existing code refactored to extend BaseProvider)
- MistralProvider (new, extends BaseProvider)
- OpenAIProvider (future, extends BaseProvider - easy to add)
- GeminiProvider (future, extends BaseProvider - easy to add)
</provider_pattern>
<benefits>
- Easy to add new providers without modifying existing code
- Consistent interface across all providers
- Isolated error handling per provider
- Unified model selection UI
- Shared functionality (streaming, error handling, logging)
- Future-proof architecture
</benefits>
</architecture_design>
<new_features>
<feature_provider_architecture>
<title>Extensible Provider Architecture (Foundation)</title>
<description>
Create a provider abstraction layer that allows easy addition of multiple AI providers.
This is the foundation that makes adding OpenAI, Gemini, etc. trivial in the future.
BaseProvider abstract class should define:
- sendMessage(messages, options) -&gt; Promise&lt;response&gt;
- streamMessage(messages, options) -&gt; AsyncGenerator&lt;chunk&gt;
- getModels() -&gt; Promise&lt;array&gt; of available models
- validateApiKey(key) -&gt; Promise&lt;boolean&gt;
- getCapabilities() -&gt; object with provider capabilities
- getName() -&gt; string (provider name: 'claude', 'mistral', 'openai', etc.)
- getDefaultModel() -&gt; string (default model ID for this provider)
ProviderRegistry should:
- Register all available providers
- Provide list of all providers
- Check which providers are configured (have API keys)
- Enable/disable providers
ProviderFactory should:
- Create provider instances based on model ID or provider name
- Handle provider selection logic
- Route requests to correct provider
</description>
<priority>1</priority>
<category>functional</category>
<implementation_approach>
- Create server/providers/BaseProvider.js (abstract base class)
- Refactor existing Claude code to server/providers/ClaudeProvider.js (extends BaseProvider)
- Create server/providers/ProviderRegistry.js (manages all providers)
- Create server/providers/ProviderFactory.js (creates provider instances)
- Update existing routes to use ProviderFactory instead of direct Claude calls
- Keep all provider code in server/providers/ directory
</implementation_approach>
<test_steps>
1. Verify Claude still works after refactoring to use BaseProvider
2. Test that ProviderFactory creates ClaudeProvider correctly
3. Test that ProviderRegistry lists Claude provider
4. Verify error handling works correctly
5. Test that adding a mock provider is straightforward
6. Verify no regression in existing Claude functionality
</test_steps>
</feature_provider_architecture>
<feature_mistral_provider>
<title>Mistral Provider Implementation</title>
<description>
Implement MistralProvider extending BaseProvider. This should:
- Implement all BaseProvider abstract methods
- Handle Mistral-specific API calls (https://api.mistral.ai/v1/chat/completions)
- Support Mistral streaming (Server-Sent Events)
- Handle Mistral-specific error codes and messages
- Provide Mistral model list:
* mistral-large-latest (default)
* mistral-medium-latest
* mistral-small-latest
* mistral-7b-instruct
- Manage Mistral API authentication
- Return responses in unified format (same as Claude)
</description>
<priority>2</priority>
<category>functional</category>
<implementation_approach>
- Create server/providers/MistralProvider.js
- Extend BaseProvider class
- Implement Mistral API integration using fetch or axios
- Register in ProviderRegistry
- Use same response format as ClaudeProvider for consistency
</implementation_approach>
<test_steps>
1. Test MistralProvider.sendMessage() works with valid API key
2. Test MistralProvider.streamMessage() works
3. Test MistralProvider.getModels() returns correct models
4. Test error handling for invalid API key
5. Test error handling for API rate limits
6. Verify it integrates with ProviderFactory
7. Verify responses match expected format
</test_steps>
</feature_mistral_provider>
<feature_unified_model_selector>
<title>Unified Model Selector (All Providers)</title>
<description>
Update model selector to dynamically load models from all registered providers.
The selector should:
- Query all providers for available models via GET /api/models
- Group models by provider (Claude, Mistral, etc.)
- Display provider badges/icons next to model names
- Show which provider each model belongs to
- Filter models by provider (optional toggle)
- Show provider-specific capabilities (streaming, images, etc.)
- Only show models from providers with configured API keys
- Handle providers gracefully (show "Configure API key" if not set)
</description>
<priority>2</priority>
<category>functional</category>
<implementation_approach>
- Create API endpoint: GET /api/models (returns all models from all providers)
- Update frontend ModelSelector component to handle multiple providers
- Add provider grouping/filtering in UI
- Show provider badges/icons next to model names
- Group models by provider with collapsible sections
- Show provider status (configured/not configured)
</implementation_approach>
<test_steps>
1. Verify model selector shows Claude models (existing functionality)
2. Verify model selector shows Mistral models (if key configured)
3. Test grouping by provider works
4. Test filtering by provider works
5. Verify provider badges display correctly
6. Test that providers without API keys show "Configure" message
7. Verify selecting a model works for both providers
</test_steps>
</feature_unified_model_selector>
<feature_provider_settings>
<title>Multi-Provider API Key Management</title>
<description>
Create unified API key management that supports multiple providers. Users should be able to:
- Manage API keys for each provider separately (Claude, Mistral, OpenAI, etc.)
- See which providers are available
- See which providers are configured (have API keys)
- Test each provider's API key independently
- Enable/disable providers (hide models if key not configured)
- See provider status indicators (configured/not configured/error)
- Update or remove API keys for any provider
- See usage statistics per provider
</description>
<priority>2</priority>
<category>functional</category>
<implementation_approach>
- Create server/routes/providers.js with unified provider management
- Update settings UI to show provider cards (one per provider)
- Each provider card has:
* Provider name and logo/icon
* API key input field (masked)
* "Test Connection" button
* Status indicator (green/yellow/red)
* Enable/disable toggle
- Store keys in api_keys table with key_name = 'claude_api_key', 'mistral_api_key', etc.
- Use same encryption method for all providers
</implementation_approach>
<test_steps>
1. Configure Claude API key (verify existing functionality still works)
2. Configure Mistral API key
3. Verify both keys are stored separately
4. Test each provider's "Test Connection" button
5. Remove one key and verify only that provider's models are hidden
6. Verify provider status indicators update correctly
7. Test that disabling a provider hides its models
</test_steps>
</feature_provider_settings>
<feature_database_provider_support>
<title>Database Support for Multiple Providers (Future-Proof)</title>
<description>
Update database schema to support multiple providers in a future-proof way.
This should:
- Add provider field to conversations table (TEXT, default: 'claude')
- Add provider field to messages/usage_tracking (TEXT, default: 'claude')
- Use TEXT field (not ENUM) to allow easy addition of new providers without schema changes
- Migration should be safe, idempotent, and backward compatible
- All existing records default to 'claude' provider
- Add indexes for performance on provider queries
</description>
<priority>1</priority>
<category>functional</category>
<implementation_approach>
- Create migration: server/migrations/add_provider_support.sql
- Use TEXT field (not ENUM) for provider name (allows 'claude', 'mistral', 'openai', etc.)
- Default all existing records to 'claude'
- Add indexes on provider columns for performance
- Make migration idempotent (can run multiple times safely)
- Create rollback script if needed
</implementation_approach>
<test_steps>
1. Backup existing database
2. Run migration script
3. Verify all existing conversations have provider='claude'
4. Verify all existing messages have provider='claude' (via usage_tracking)
5. Create new conversation with Mistral provider
6. Verify provider='mistral' is saved correctly
7. Query conversations by provider (test index performance)
8. Verify existing Claude conversations still work
9. Test rollback script if needed
</test_steps>
</feature_database_provider_support>
<feature_unified_chat_endpoint>
<title>Unified Chat Endpoint (Works with Any Provider)</title>
<description>
Update chat endpoints to use ProviderFactory, making them work with any provider.
The endpoint should:
- Accept provider or model ID in request
- Use ProviderFactory to get correct provider
- Route request to appropriate provider
- Return unified response format
- Handle provider-specific errors gracefully
- Support streaming for all providers that support it
</description>
<priority>1</priority>
<category>functional</category>
<implementation_approach>
- Update POST /api/chat to use ProviderFactory
- Update POST /api/chat/stream to use ProviderFactory
- Extract provider from model ID or accept provider parameter
- Route to correct provider instance
- Return unified response format
</implementation_approach>
<test_steps>
1. Test POST /api/chat with Claude model (verify no regression)
2. Test POST /api/chat with Mistral model
3. Test POST /api/chat/stream with Claude (verify streaming still works)
4. Test POST /api/chat/stream with Mistral
5. Test error handling for invalid provider
6. Test error handling for missing API key
</test_steps>
</feature_unified_chat_endpoint>
</new_features>
<future_extensibility>
<openai_provider_example>
<title>How to Add OpenAI in the Future</title>
<description>
To add OpenAI support later, simply follow these steps (NO changes to existing code needed):
1. Create server/providers/OpenAIProvider.js extending BaseProvider
2. Implement OpenAI API calls (https://api.openai.com/v1/chat/completions)
3. Register in ProviderRegistry: ProviderRegistry.register('openai', OpenAIProvider)
4. That's it! OpenAI models will automatically appear in model selector.
Example OpenAIProvider structure:
- Extends BaseProvider
- Implements sendMessage() using OpenAI API
- Implements streamMessage() for streaming support
- Returns models: gpt-4, gpt-3.5-turbo, etc.
- Handles OpenAI-specific authentication and errors
</description>
</openai_provider_example>
<other_providers>
<note>
Same pattern works for any AI provider:
- Google Gemini (GeminiProvider)
- Cohere (CohereProvider)
- Any other AI API that follows similar patterns
Just create a new Provider class extending BaseProvider and register it.
</note>
</other_providers>
</future_extensibility>
<implementation_notes>
<code_structure>
server/
providers/
BaseProvider.js # Abstract base class (NEW)
ClaudeProvider.js # Refactored Claude (extends BaseProvider)
MistralProvider.js # New Mistral (extends BaseProvider)
ProviderRegistry.js # Manages all providers (NEW)
ProviderFactory.js # Creates provider instances (NEW)
routes/
providers.js # Unified provider management (NEW)
chat.js # Updated to use ProviderFactory
migrations/
add_provider_support.sql # Database migration (NEW)
</code_structure>
<safety_guidelines>
- Refactor Claude code to use BaseProvider WITHOUT changing functionality
- All providers are isolated - errors in one don't affect others
- Database changes are backward compatible (TEXT field, not ENUM)
- Existing conversations default to 'claude' provider
- Test Claude thoroughly after refactoring
- Use feature flags if needed to enable/disable providers
- Log all provider operations separately for debugging
</safety_guidelines>
<error_handling>
- Each provider handles its own errors
- Provider errors should NOT affect other providers
- Show user-friendly error messages
- Log errors with provider context
- Don't throw unhandled exceptions
</error_handling>
</implementation_notes>
<database_changes>
<safe_migrations>
<migration_1>
<description>Add provider support (TEXT field for extensibility)</description>
<sql>
-- Add provider column to conversations (TEXT allows any provider name)
-- Default to 'claude' for backward compatibility
ALTER TABLE conversations
ADD COLUMN provider TEXT DEFAULT 'claude';
-- Add provider column to usage_tracking
ALTER TABLE usage_tracking
ADD COLUMN provider TEXT DEFAULT 'claude';
-- Add indexes for performance
CREATE INDEX IF NOT EXISTS idx_conversations_provider
ON conversations(provider);
CREATE INDEX IF NOT EXISTS idx_usage_tracking_provider
ON usage_tracking(provider);
</sql>
<rollback>
-- Rollback script (use with caution - may cause data issues)
DROP INDEX IF EXISTS idx_conversations_provider;
DROP INDEX IF EXISTS idx_usage_tracking_provider;
-- Note: SQLite doesn't support DROP COLUMN easily
-- Would need to recreate table without provider column
</rollback>
<note>
Using TEXT instead of ENUM allows adding new providers (OpenAI, Gemini, etc.)
without database schema changes in the future. This is future-proof.
</note>
</migration_1>
</safe_migrations>
<data_integrity>
- All existing conversations default to provider='claude'
- All existing messages default to provider='claude'
- Migration is idempotent (can run multiple times safely)
- No data loss during migration
- Existing queries continue to work
</data_integrity>
</database_changes>
<api_endpoints>
<new_endpoints>
- GET /api/models - Get all models from all configured providers
- GET /api/providers - Get list of available providers and their status
- POST /api/providers/:provider/key - Set API key for specific provider
- POST /api/providers/:provider/test - Test provider API key
- GET /api/providers/:provider/status - Get provider configuration status
- DELETE /api/providers/:provider/key - Remove provider API key
</new_endpoints>
<updated_endpoints>
- POST /api/chat - Updated to use ProviderFactory (works with any provider)
* Accepts: { model: 'model-id', messages: [...], ... }
* Provider is determined from model ID or can be specified
- POST /api/chat/stream - Updated to use ProviderFactory (streaming for any provider)
* Same interface, works with any provider that supports streaming
</updated_endpoints>
</api_endpoints>
<dependencies>
<backend>
- No new dependencies required (use native fetch for Mistral API)
- Optional: @mistralai/mistralai (only if provides significant value)
- Keep dependencies minimal to avoid conflicts
</backend>
</dependencies>
<testing_requirements>
<regression_tests>
- Verify all existing Claude functionality still works
- Test that existing conversations load correctly
- Verify Claude model selection still works
- Test Claude API endpoints are unaffected
- Verify database queries for Claude still work
- Test Claude streaming still works
</regression_tests>
<integration_tests>
- Test switching between Claude and Mistral models
- Test conversations with different providers
- Test error handling doesn't affect other providers
- Test migration doesn't break existing data
- Test ProviderFactory routes correctly
- Test unified model selector with multiple providers
</integration_tests>
<extensibility_tests>
- Verify adding a mock provider is straightforward
- Test that ProviderFactory correctly routes to providers
- Verify provider isolation (errors don't propagate)
- Test that new providers automatically appear in UI
</extensibility_tests>
</testing_requirements>
<success_criteria>
<functionality>
- Claude functionality works exactly as before (no regression)
- Mistral models appear in selector and work correctly
- Users can switch between Claude and Mistral seamlessly
- API key management works for both providers
- Database migration is safe and backward compatible
</functionality>
<extensibility>
- Adding a new provider (like OpenAI) requires only creating one new file
- No changes needed to existing code when adding providers
- Provider architecture is documented and easy to follow
- Code is organized and maintainable
</extensibility>
</success_criteria>
</project_specification>

View File

@@ -0,0 +1,403 @@
<project_specification>
<project_name>Claude.ai Clone - Advanced Theme Customization</project_name>
<overview>
This specification adds advanced theme customization features to the Claude.ai clone application.
Users will be able to customize accent colors, font sizes, message spacing, and choose from
preset color themes. All changes are additive and backward-compatible with existing theme functionality.
The existing light/dark mode toggle remains unchanged and functional.
</overview>
<safety_requirements>
<critical>
- DO NOT modify existing light/dark mode functionality
- DO NOT break existing theme persistence
- DO NOT change existing CSS classes without ensuring backward compatibility
- All new theme options must be optional (defaults should match current behavior)
- Test thoroughly to ensure existing themes still work
- Maintain backward compatibility at all times
- New theme preferences should be stored separately from existing theme settings
</critical>
</safety_requirements>
<new_features>
<feature_6_theme_customization>
<title>Advanced Theme Customization</title>
<description>
Add advanced theme customization options. Users should be able to:
- Customize accent colors (beyond just light/dark mode)
- Choose from preset color themes (blue, green, purple, orange)
- Adjust font size globally (small, medium, large)
- Adjust message spacing (compact, comfortable, spacious)
- Preview theme changes before applying
- Save custom theme preferences
The customization interface should be intuitive and provide real-time preview
of changes before they are applied. All preferences should persist across sessions.
</description>
<priority>3</priority>
<category>style</category>
<implementation_approach>
- Create a new "Appearance" or "Theme" section in settings
- Add accent color picker with preset options (blue, green, purple, orange)
- Add font size slider/selector (small, medium, large)
- Add message spacing selector (compact, comfortable, spacious)
- Implement preview functionality that shows changes in real-time
- Store theme preferences in localStorage or backend (user preferences)
- Apply theme using CSS custom properties (CSS variables)
- Ensure theme works with both light and dark modes
</implementation_approach>
<test_steps>
1. Open settings menu
2. Navigate to "Appearance" or "Theme" section
3. Select a different accent color (e.g., green)
4. Verify accent color changes are visible in preview
5. Adjust font size slider to "large"
6. Verify font size changes in preview
7. Adjust message spacing option to "spacious"
8. Verify spacing changes in preview
9. Click "Preview" to see changes applied temporarily
10. Click "Apply" to save changes permanently
11. Verify changes persist after page refresh
12. Test with both light and dark mode
13. Test reset to default theme
14. Verify existing conversations display correctly with new theme
</test_steps>
</feature_6_theme_customization>
<feature_accent_colors>
<title>Accent Color Customization</title>
<description>
Allow users to customize the accent color used throughout the application.
This includes:
- Primary button colors
- Link colors
- Focus states
- Active states
- Selection highlights
- Progress indicators
Preset options:
- Blue (default, matches Claude.ai)
- Green
- Purple
- Orange
Users should be able to see a preview of each color before applying.
</description>
<priority>3</priority>
<category>style</category>
<implementation_approach>
- Define accent colors as CSS custom properties
- Create color palette for each preset (light and dark variants)
- Add color picker UI component in settings
- Update all accent color usages to use CSS variables
- Ensure colors have proper contrast ratios for accessibility
- Store selected accent color in user preferences
</implementation_approach>
<test_steps>
1. Open theme settings
2. Select "Green" accent color
3. Verify buttons, links, and highlights use green
4. Switch to dark mode and verify green accent still works
5. Test all preset colors (blue, green, purple, orange)
6. Verify color persists after refresh
7. Test accessibility (contrast ratios)
</test_steps>
</feature_accent_colors>
<feature_font_size>
<title>Global Font Size Adjustment</title>
<description>
Allow users to adjust the global font size for better readability.
Options:
- Small (12px base)
- Medium (14px base, default)
- Large (16px base)
Font size should scale proportionally across all text elements:
- Message text
- UI labels
- Input fields
- Buttons
- Sidebar text
</description>
<priority>3</priority>
<category>style</category>
<implementation_approach>
- Use CSS rem units for all font sizes
- Set base font size on root element
- Create font size presets (small, medium, large)
- Add font size selector in settings
- Store preference in user settings
- Ensure responsive design still works with different font sizes
</implementation_approach>
<test_steps>
1. Open theme settings
2. Select "Small" font size
3. Verify all text is smaller throughout the app
4. Select "Large" font size
5. Verify all text is larger throughout the app
6. Verify layout doesn't break with different font sizes
7. Test with long messages to ensure wrapping works
8. Verify preference persists after refresh
</test_steps>
</feature_font_size>
<feature_message_spacing>
<title>Message Spacing Customization</title>
<description>
Allow users to adjust the spacing between messages and within message bubbles.
Options:
- Compact: Minimal spacing (for users who prefer dense layouts)
- Comfortable: Default spacing (current behavior)
- Spacious: Increased spacing (for better readability)
This affects:
- Vertical spacing between messages
- Padding within message bubbles
- Spacing between message elements (avatar, text, timestamp)
</description>
<priority>3</priority>
<category>style</category>
<implementation_approach>
- Define spacing scale using CSS custom properties
- Create spacing presets (compact, comfortable, spacious)
- Apply spacing to message containers and bubbles
- Add spacing selector in settings
- Store preference in user settings
- Ensure spacing works well with different font sizes
</implementation_approach>
<test_steps>
1. Open theme settings
2. Select "Compact" spacing
3. Verify messages are closer together
4. Select "Spacious" spacing
5. Verify messages have more space between them
6. Test with long conversations to ensure scrolling works
7. Verify spacing preference persists after refresh
8. Test with different font sizes to ensure compatibility
</test_steps>
</feature_message_spacing>
<feature_theme_preview>
<title>Theme Preview Functionality</title>
<description>
Allow users to preview theme changes before applying them permanently.
The preview should:
- Show a sample conversation with the new theme applied
- Update in real-time as settings are changed
- Allow users to cancel and revert to previous theme
- Show both light and dark mode previews if applicable
Users should be able to:
- See preview immediately when changing settings
- Click "Apply" to save changes
- Click "Cancel" to discard changes
- Click "Reset" to return to default theme
</description>
<priority>3</priority>
<category>functional</category>
<implementation_approach>
- Create preview component showing sample conversation
- Apply theme changes temporarily to preview
- Store original theme state for cancel functionality
- Update preview in real-time as settings change
- Only persist changes when "Apply" is clicked
- Show clear visual feedback for preview vs. applied state
</implementation_approach>
<test_steps>
1. Open theme settings
2. Change accent color to green
3. Verify preview updates immediately
4. Change font size to large
5. Verify preview updates with new font size
6. Click "Cancel" and verify changes are reverted
7. Make changes again and click "Apply"
8. Verify changes are saved and applied to actual interface
9. Test preview with both light and dark mode
</test_steps>
</feature_theme_preview>
</new_features>
<implementation_notes>
<code_structure>
frontend/
components/
ThemeSettings.jsx # New theme customization UI (NEW)
ThemePreview.jsx # Preview component (NEW)
styles/
theme-variables.css # CSS custom properties for themes (NEW)
accent-colors.css # Accent color definitions (NEW)
hooks/
useTheme.js # Updated to handle new theme options
utils/
themeStorage.js # Theme preference persistence (NEW)
</code_structure>
<css_architecture>
Use CSS custom properties (CSS variables) for all theme values:
- --accent-color-primary
- --accent-color-hover
- --font-size-base
- --message-spacing-vertical
- --message-padding
This allows easy theme switching without JavaScript manipulation.
</css_architecture>
<storage_approach>
Store theme preferences in:
- localStorage for client-side persistence
- Or backend user preferences table if available
Structure:
{
accentColor: 'blue' | 'green' | 'purple' | 'orange',
fontSize: 'small' | 'medium' | 'large',
messageSpacing: 'compact' | 'comfortable' | 'spacious',
theme: 'light' | 'dark' (existing)
}
</storage_approach>
<safety_guidelines>
- Keep existing theme functionality intact
- Default values should match current behavior
- Use feature detection for new theme features
- Gracefully degrade if CSS custom properties not supported
- Test with existing conversations and UI elements
- Ensure accessibility standards are maintained
</safety_guidelines>
</implementation_notes>
<ui_components>
<theme_settings_panel>
<description>Settings panel for theme customization</description>
<sections>
- Accent Color: Radio buttons or color swatches for preset colors
- Font Size: Slider or dropdown (small, medium, large)
- Message Spacing: Radio buttons (compact, comfortable, spacious)
- Preview: Live preview of theme changes
- Actions: Apply, Cancel, Reset buttons
</sections>
</theme_settings_panel>
<theme_preview>
<description>Preview component showing sample conversation</description>
<elements>
- Sample user message
- Sample AI response
- Shows current accent color
- Shows current font size
- Shows current spacing
- Updates in real-time
</elements>
</theme_preview>
</ui_components>
<css_custom_properties>
<accent_colors>
Define CSS variables for each accent color preset:
--accent-blue: #2563eb;
--accent-green: #10b981;
--accent-purple: #8b5cf6;
--accent-orange: #f59e0b;
Each should have hover, active, and focus variants.
</accent_colors>
<font_sizes>
Define base font sizes:
--font-size-small: 0.75rem; (12px)
--font-size-medium: 0.875rem; (14px, default)
--font-size-large: 1rem; (16px)
</font_sizes>
<spacing>
Define spacing scales:
--spacing-compact: 0.5rem;
--spacing-comfortable: 1rem; (default)
--spacing-spacious: 1.5rem;
</spacing>
</css_custom_properties>
<api_endpoints>
<if_backend_storage>
If storing preferences in backend:
- GET /api/user/preferences - Get user theme preferences
- PUT /api/user/preferences - Update user theme preferences
- GET /api/user/preferences/theme - Get theme preferences only
</if_backend_storage>
<note>
If using localStorage only, no API endpoints needed.
Backend storage is optional but recommended for multi-device sync.
</note>
</api_endpoints>
<accessibility_requirements>
- All accent colors must meet WCAG AA contrast ratios (4.5:1 for text)
- Font size changes must not break screen reader compatibility
- Theme settings must be keyboard navigable
- Color choices should not be the only way to convey information
- Provide high contrast mode option if possible
</accessibility_requirements>
<testing_requirements>
<regression_tests>
- Verify existing light/dark mode toggle still works
- Verify existing theme persistence still works
- Test that default theme matches current behavior
- Verify existing conversations display correctly
- Test that all UI elements are styled correctly
</regression_tests>
<feature_tests>
- Test each accent color preset
- Test each font size option
- Test each spacing option
- Test theme preview functionality
- Test theme persistence (localStorage/backend)
- Test theme reset to defaults
- Test theme with both light and dark modes
- Test theme changes in real-time
</feature_tests>
<compatibility_tests>
- Test with different browsers (Chrome, Firefox, Safari, Edge)
- Test with different screen sizes (responsive design)
- Test with long conversations
- Test with different message types (text, code, artifacts)
- Test accessibility with screen readers
</compatibility_tests>
</testing_requirements>
<success_criteria>
<functionality>
- Users can customize accent colors from preset options
- Users can adjust global font size (small, medium, large)
- Users can adjust message spacing (compact, comfortable, spacious)
- Theme preview shows changes in real-time
- Theme preferences persist across sessions
- Existing light/dark mode functionality works unchanged
- All theme options work together harmoniously
</functionality>
<user_experience>
- Theme customization is intuitive and easy to use
- Preview provides clear feedback before applying changes
- Changes apply smoothly without flickering
- Settings are easy to find and access
- Reset to defaults is easily accessible
</user_experience>
<technical>
- Code is well-organized and maintainable
- CSS custom properties are used consistently
- Theme preferences are stored reliably
- No performance degradation with theme changes
- Backward compatibility is maintained
</technical>
</success_criteria>
</project_specification>

304
prompts/coding_prompt.md Normal file
View File

@@ -0,0 +1,304 @@
## YOUR ROLE - CODING AGENT
You are continuing work on a long-running autonomous development task.
This is a FRESH context window - you have no memory of previous sessions.
You have access to Linear for project management via MCP tools. Linear is your
single source of truth for what needs to be built and what's been completed.
### STEP 1: GET YOUR BEARINGS (MANDATORY)
Start by orienting yourself:
```bash
# 1. See your working directory
pwd
# 2. List files to understand project structure
ls -la
# 3. Read the project specification to understand what you're building
cat app_spec.txt
# 4. Read the Linear project state
cat .linear_project.json
# 5. Check recent git history
git log --oneline -20
```
Understanding the `app_spec.txt` is critical - it contains the full requirements
for the application you're building.
### STEP 2: CHECK LINEAR STATUS
Query Linear to understand current project state. The `.linear_project.json` file
contains the `project_id` and `team_id` you should use for all Linear queries.
1. **Find the META issue** for session context:
Use `mcp__linear__list_issues` with the project ID from `.linear_project.json`
and search for "[META] Project Progress Tracker".
Read the issue description and recent comments for context from previous sessions.
2. **Count progress:**
Use `mcp__linear__list_issues` with the project ID to get all issues, then count:
- Issues with status "Done" = completed
- Issues with status "Todo" = remaining
- Issues with status "In Progress" = currently being worked on
3. **Check for in-progress work:**
If any issue is "In Progress", that should be your first priority.
A previous session may have been interrupted.
### STEP 3: START SERVERS (IF NOT RUNNING)
If `init.sh` exists, run it:
```bash
chmod +x init.sh
./init.sh
```
Otherwise, start servers manually and document the process.
### STEP 4: VERIFICATION TEST (CRITICAL!)
**MANDATORY BEFORE NEW WORK:**
The previous session may have introduced bugs. Before implementing anything
new, you MUST run verification tests.
Use `mcp__linear__list_issues` with the project ID and status "Done" to find 1-2
completed features that are core to the app's functionality.
Test these through the browser using Puppeteer:
- Navigate to the feature
- Verify it still works as expected
- Take screenshots to confirm
**If you find ANY issues (functional or visual):**
- Use `mcp__linear__update_issue` to set status back to "In Progress"
- Add a comment explaining what broke
- Fix the issue BEFORE moving to new features
- This includes UI bugs like:
* White-on-white text or poor contrast
* Random characters displayed
* Incorrect timestamps
* Layout issues or overflow
* Buttons too close together
* Missing hover states
* Console errors
### STEP 5: SELECT NEXT ISSUE TO WORK ON
Use `mcp__linear__list_issues` with the project ID from `.linear_project.json`:
- Filter by `status`: "Todo"
- Sort by priority (1=urgent is highest)
- `limit`: 5
Review the highest-priority unstarted issues and select ONE to work on.
### STEP 6: CLAIM THE ISSUE
Before starting work, use `mcp__linear__update_issue` to:
- Set the issue's `status` to "In Progress"
This signals to any other agents (or humans watching) that this issue is being worked on.
### STEP 7: IMPLEMENT THE FEATURE
Read the issue description for test steps and implement accordingly:
1. Write the code (frontend and/or backend as needed)
2. Test manually using browser automation (see Step 8)
3. Fix any issues discovered
4. Verify the feature works end-to-end
### STEP 8: VERIFY WITH BROWSER AUTOMATION
**CRITICAL:** You MUST verify features through the actual UI.
Use browser automation tools:
- `mcp__puppeteer__puppeteer_navigate` - Start browser and go to URL
- `mcp__puppeteer__puppeteer_screenshot` - Capture screenshot
- `mcp__puppeteer__puppeteer_click` - Click elements
- `mcp__puppeteer__puppeteer_fill` - Fill form inputs
**DO:**
- Test through the UI with clicks and keyboard input
- Take screenshots to verify visual appearance
- Check for console errors in browser
- Verify complete user workflows end-to-end
**DON'T:**
- Only test with curl commands (backend testing alone is insufficient)
- Use JavaScript evaluation to bypass UI (no shortcuts)
- Skip visual verification
- Mark issues Done without thorough verification
### STEP 9: UPDATE LINEAR ISSUE (CAREFULLY!)
After thorough verification:
1. **Add implementation comment** using `mcp__linear__create_comment`:
```markdown
## Implementation Complete
### Changes Made
- [List of files changed]
- [Key implementation details]
### Verification
- Tested via Puppeteer browser automation
- Screenshots captured
- All test steps from issue description verified
### Git Commit
[commit hash and message]
```
2. **Update status** using `mcp__linear__update_issue`:
- Set `status` to "Done"
**ONLY update status to Done AFTER:**
- All test steps in the issue description pass
- Visual verification via screenshots
- No console errors
- Code committed to git
### STEP 10: COMMIT YOUR PROGRESS
Make a descriptive git commit:
```bash
git add .
git commit -m "Implement [feature name]
- Added [specific changes]
- Tested with browser automation
- Linear issue: [issue identifier]
"
```
### STEP 11: UPDATE META ISSUE
Add a comment to the "[META] Project Progress Tracker" issue with session summary:
```markdown
## Session Complete - [Brief description]
### Completed This Session
- [Issue title]: [Brief summary of implementation]
### Current Progress
- X issues Done
- Y issues In Progress
- Z issues remaining in Todo
### Verification Status
- Ran verification tests on [feature names]
- All previously completed features still working: [Yes/No]
### Notes for Next Session
- [Any important context]
- [Recommendations for what to work on next]
- [Any blockers or concerns]
```
### STEP 12: END SESSION CLEANLY
Before context fills up:
1. Commit all working code
2. If working on an issue you can't complete:
- Add a comment explaining progress and what's left
- Keep status as "In Progress" (don't revert to Todo)
3. Update META issue with session summary
4. Ensure no uncommitted changes
5. Leave app in working state (no broken features)
---
## LINEAR WORKFLOW RULES
**Status Transitions:**
- Todo → In Progress (when you start working)
- In Progress → Done (when verified complete)
- Done → In Progress (only if regression found)
**Comments Are Your Memory:**
- Every implementation gets a detailed comment
- Session handoffs happen via META issue comments
- Comments are permanent - future agents will read them
**NEVER:**
- Delete or archive issues
- Modify issue descriptions or test steps
- Work on issues already "In Progress" by someone else
- Mark "Done" without verification
- Leave issues "In Progress" when switching to another issue
---
## TESTING REQUIREMENTS
**ALL testing must use browser automation tools.**
Available Puppeteer tools:
- `mcp__puppeteer__puppeteer_navigate` - Go to URL
- `mcp__puppeteer__puppeteer_screenshot` - Capture screenshot
- `mcp__puppeteer__puppeteer_click` - Click elements
- `mcp__puppeteer__puppeteer_fill` - Fill form inputs
- `mcp__puppeteer__puppeteer_select` - Select dropdown options
- `mcp__puppeteer__puppeteer_hover` - Hover over elements
Test like a human user with mouse and keyboard. Don't take shortcuts.
---
## SESSION PACING
**How many issues should you complete per session?**
This depends on the project phase:
**Early phase (< 20% Done):** You may complete multiple issues per session when:
- Setting up infrastructure/scaffolding that unlocks many issues at once
- Fixing build issues that were blocking progress
- Auditing existing code and marking already-implemented features as Done
**Mid/Late phase (> 20% Done):** Slow down to **1-2 issues per session**:
- Each feature now requires focused implementation and testing
- Quality matters more than quantity
- Clean handoffs are critical
**After completing an issue, ask yourself:**
1. Is the app in a stable, working state right now?
2. Have I been working for a while? (You can't measure this precisely, but use judgment)
3. Would this be a good stopping point for handoff?
If yes to all three → proceed to Step 11 (session summary) and end cleanly.
If no → you may continue to the next issue, but **commit first** and stay aware.
**Golden rule:** It's always better to end a session cleanly with good handoff notes
than to start another issue and risk running out of context mid-implementation.
---
## IMPORTANT REMINDERS
**Your Goal:** Production-quality application with all Linear issues Done
**This Session's Goal:** Make meaningful progress with clean handoff
**Priority:** Fix regressions before implementing new features
**Quality Bar:**
- Zero console errors
- Polished UI matching the design in app_spec.txt
- All features work end-to-end through the UI
- Fast, responsive, professional
**Context is finite.** You cannot monitor your context usage, so err on the side
of ending sessions early with good handoff notes. The next agent will continue.
---
Begin by running Step 1 (Get Your Bearings).

View File

@@ -0,0 +1,187 @@
## YOUR ROLE - INITIALIZER BIS AGENT (Adding New Specifications)
You are an EXTENSION agent in a long-running autonomous development process.
Your job is to ADD NEW SPECIFICATIONS to an EXISTING project that has already been initialized.
**IMPORTANT:** This project already exists and has been initialized. You are NOT creating a new project.
You are ADDING new features based on a new specification file.
You have access to Linear for project management via MCP tools. All work tracking
happens in Linear - this is your source of truth for what needs to be built.
### FIRST: Understand the Existing Project
Start by reading the existing project state:
1. **Read `.linear_project.json`:**
```bash
cat .linear_project.json
```
This file contains:
- `project_id`: The Linear project ID (you'll use this for new issues)
- `team_id`: The team ID (you'll use this for new issues)
- `meta_issue_id`: The META issue ID (you'll add a comment here)
- `total_issues`: Current total number of issues
2. **Read the original `app_spec.txt`** (if it exists) to understand what was already built:
```bash
cat app_spec.txt
```
3. **Check existing Linear issues** to understand what's already been done:
Use `mcp__linear__list_issues` with the project ID from `.linear_project.json`
to see existing issues and their statuses.
### SECOND: Read the New Specification File
Read the NEW specification file that was provided. This file contains the ADDITIONAL
features to be added to the existing project. The filename will be something like
`app_spec_new1.txt` or similar.
```bash
# List files to find the new spec file
ls -la *.txt
# Read the new specification file
cat app_spec_new*.txt
# (or whatever the filename is)
```
Read it carefully to understand what NEW features need to be added.
### CRITICAL TASK: Create NEW Linear Issues
Based on the NEW specification file, create NEW Linear issues for each NEW feature
using the `mcp__linear__create_issue` tool.
**IMPORTANT:**
- Use the EXISTING `project_id` and `team_id` from `.linear_project.json`
- Do NOT create a new Linear project
- Do NOT modify existing issues
- Only create NEW issues for the NEW features
**For each NEW feature, create an issue with:**
```
title: Brief feature name (e.g., "New Feature - Advanced search")
teamId: [Use the team ID from .linear_project.json]
projectId: [Use the project ID from .linear_project.json]
description: Markdown with feature details and test steps (see template below)
priority: 1-4 based on importance (1=urgent/foundational, 4=low/polish)
```
**Issue Description Template:**
```markdown
## Feature Description
[Brief description of what this NEW feature does and why it matters]
## Category
[functional OR style]
## Test Steps
1. Navigate to [page/location]
2. [Specific action to perform]
3. [Another action]
4. Verify [expected result]
5. [Additional verification steps as needed]
## Acceptance Criteria
- [ ] [Specific criterion 1]
- [ ] [Specific criterion 2]
- [ ] [Specific criterion 3]
## Note
This is a NEW feature added via initializer bis. It extends the existing application.
```
**Requirements for NEW Linear Issues:**
- Create issues ONLY for NEW features from the new spec file
- Do NOT duplicate features that already exist
- Mix of functional and style features (note category in description)
- Order by priority: foundational features get priority 1-2, polish features get 3-4
- Include detailed test steps in each issue description
- All issues start in "Todo" status (default)
- Prefix issue titles with something like "[NEW]" if helpful to distinguish from existing issues
**Priority Guidelines:**
- Priority 1 (Urgent): Core infrastructure additions, critical new features
- Priority 2 (High): Important user-facing new features
- Priority 3 (Medium): Secondary new features, enhancements
- Priority 4 (Low): Polish, nice-to-haves, edge cases
**CRITICAL INSTRUCTION:**
Once created, issues can ONLY have their status changed (Todo → In Progress → Done).
Never delete issues, never modify descriptions after creation.
This ensures no functionality is missed across sessions.
### NEXT TASK: Update Linear Project State
Update the `.linear_project.json` file to reflect the new total number of issues:
1. Read the current `.linear_project.json`
2. Count how many NEW issues you created
3. Add that number to the existing `total_issues` count
4. Update the file with the new total
Example update:
```json
{
"initialized": true,
"created_at": "[original timestamp]",
"team_id": "[existing team ID]",
"project_id": "[existing project ID]",
"project_name": "[existing project name]",
"meta_issue_id": "[existing meta issue ID]",
"total_issues": [original_count + new_issues_count],
"notes": "Project initialized by initializer agent. Extended by initializer bis with [X] new issues."
}
```
### NEXT TASK: Update META Issue
Add a comment to the existing "[META] Project Progress Tracker" issue (use the `meta_issue_id`
from `.linear_project.json`) summarizing what you accomplished:
```markdown
## Initializer Bis Session Complete - New Specifications Added
### Accomplished
- Read new specification file: [filename]
- Created [X] NEW Linear issues for additional features
- Updated .linear_project.json with new total issue count
- [Any other relevant information]
### New Issues Created
- Total new issues: [X]
- Priority 1: [X]
- Priority 2: [X]
- Priority 3: [X]
- Priority 4: [X]
### Updated Linear Status
- Previous total issues: [Y]
- New total issues: [Y + X]
- All new issues start in "Todo" status
### Notes for Next Session
- [Any important context about the new features]
- [Recommendations for what to work on next]
- [Any dependencies or integration points with existing features]
```
### ENDING THIS SESSION
Before your context fills up:
1. Commit all work with descriptive messages
2. Add a comment to the META issue (as described above)
3. Ensure `.linear_project.json` is updated with the new total
4. Leave the environment in a clean, working state
The next agent (coding agent) will continue from here with a fresh context window and will
see both the original issues and the new issues you created.
---
**Remember:** You are EXTENDING an existing project, not creating a new one.
Focus on adding the new features cleanly without breaking existing functionality.
Production-ready integration is the goal.

View File

@@ -0,0 +1,202 @@
## YOUR ROLE - INITIALIZER AGENT (Session 1 of Many)
You are the FIRST agent in a long-running autonomous development process.
Your job is to set up the foundation for all future coding agents.
You have access to Linear for project management via MCP tools. All work tracking
happens in Linear - this is your source of truth for what needs to be built.
### FIRST: Read the Project Specification
Start by reading `app_spec.txt` in your working directory. This file contains
the complete specification for what you need to build. Read it carefully
before proceeding.
### SECOND: Set Up Linear Project
Before creating issues, you need to set up Linear:
1. **Get the team ID:**
Use `mcp__linear__list_teams` to see available teams.
Note the team ID (e.g., "TEAM-123") for the team where you'll create issues.
2. **Create a Linear project:**
Use `mcp__linear__create_project` to create a new project:
- `name`: Use the project name from app_spec.txt (e.g., "Claude.ai Clone")
- `teamIds`: Array with your team ID
- `description`: Brief project overview from app_spec.txt
Save the returned project ID - you'll use it when creating issues.
### CRITICAL TASK: Create Linear Issues
Based on `app_spec.txt`, create Linear issues for each feature using the
`mcp__linear__create_issue` tool. Create 50 detailed issues that
comprehensively cover all features in the spec.
**For each feature, create an issue with:**
```
title: Brief feature name (e.g., "Auth - User login flow")
teamId: [Use the team ID you found earlier]
projectId: [Use the project ID from the project you created]
description: Markdown with feature details and test steps (see template below)
priority: 1-4 based on importance (1=urgent/foundational, 4=low/polish)
```
**Issue Description Template:**
```markdown
## Feature Description
[Brief description of what this feature does and why it matters]
## Category
[functional OR style]
## Test Steps
1. Navigate to [page/location]
2. [Specific action to perform]
3. [Another action]
4. Verify [expected result]
5. [Additional verification steps as needed]
## Acceptance Criteria
- [ ] [Specific criterion 1]
- [ ] [Specific criterion 2]
- [ ] [Specific criterion 3]
```
**Requirements for Linear Issues:**
- Create 50 issues total covering all features in the spec
- Mix of functional and style features (note category in description)
- Order by priority: foundational features get priority 1-2, polish features get 3-4
- Include detailed test steps in each issue description
- All issues start in "Todo" status (default)
**Priority Guidelines:**
- Priority 1 (Urgent): Core infrastructure, database, basic UI layout
- Priority 2 (High): Primary user-facing features, authentication
- Priority 3 (Medium): Secondary features, enhancements
- Priority 4 (Low): Polish, nice-to-haves, edge cases
**CRITICAL INSTRUCTION:**
Once created, issues can ONLY have their status changed (Todo → In Progress → Done).
Never delete issues, never modify descriptions after creation.
This ensures no functionality is missed across sessions.
### NEXT TASK: Create Meta Issue for Session Tracking
Create a special issue titled "[META] Project Progress Tracker" with:
```markdown
## Project Overview
[Copy the project name and brief overview from app_spec.txt]
## Session Tracking
This issue is used for session handoff between coding agents.
Each agent should add a comment summarizing their session.
## Key Milestones
- [ ] Project setup complete
- [ ] Core infrastructure working
- [ ] Primary features implemented
- [ ] All features complete
- [ ] Polish and refinement done
## Notes
[Any important context about the project]
```
This META issue will be used by all future agents to:
- Read context from previous sessions (via comments)
- Write session summaries before ending
- Track overall project milestones
### NEXT TASK: Create init.sh
Create a script called `init.sh` that future agents can use to quickly
set up and run the development environment. The script should:
1. Install any required dependencies
2. Start any necessary servers or services
3. Print helpful information about how to access the running application
Base the script on the technology stack specified in `app_spec.txt`.
### NEXT TASK: Initialize Git
Create a git repository and make your first commit with:
- init.sh (environment setup script)
- README.md (project overview and setup instructions)
- Any initial project structure files
Commit message: "Initial setup: project structure and init script"
### NEXT TASK: Create Project Structure
Set up the basic project structure based on what's specified in `app_spec.txt`.
This typically includes directories for frontend, backend, and any other
components mentioned in the spec.
### NEXT TASK: Save Linear Project State
Create a file called `.linear_project.json` with the following information:
```json
{
"initialized": true,
"created_at": "[current timestamp]",
"team_id": "[ID of the team you used]",
"project_id": "[ID of the Linear project you created]",
"project_name": "[Name of the project from app_spec.txt]",
"meta_issue_id": "[ID of the META issue you created]",
"total_issues": 50,
"notes": "Project initialized by initializer agent"
}
```
This file tells future sessions that Linear has been set up.
### OPTIONAL: Start Implementation
If you have time remaining in this session, you may begin implementing
the highest-priority features. Remember:
- Use `mcp__linear__linear_search_issues` to find Todo issues with priority 1
- Use `mcp__linear__linear_update_issue` to set status to "In Progress"
- Work on ONE feature at a time
- Test thoroughly before marking status as "Done"
- Add a comment to the issue with implementation notes
- Commit your progress before session ends
### ENDING THIS SESSION
Before your context fills up:
1. Commit all work with descriptive messages
2. Add a comment to the META issue summarizing what you accomplished:
```markdown
## Session 1 Complete - Initialization
### Accomplished
- Created 50 Linear issues from app_spec.txt
- Set up project structure
- Created init.sh
- Initialized git repository
- [Any features started/completed]
### Linear Status
- Total issues: 50
- Done: X
- In Progress: Y
- Todo: Z
### Notes for Next Session
- [Any important context]
- [Recommendations for what to work on next]
```
3. Ensure `.linear_project.json` exists
4. Leave the environment in a clean, working state
The next agent will continue from here with a fresh context window.
---
**Remember:** You have unlimited time across many sessions. Focus on
quality over speed. Production-ready is the goal.

1
requirements.txt Normal file
View File

@@ -0,0 +1 @@
claude-code-sdk>=0.0.25

359
security.py Normal file
View File

@@ -0,0 +1,359 @@
"""
Security Hooks for Autonomous Coding Agent
==========================================
Pre-tool-use hooks that validate bash commands for security.
Uses an allowlist approach - only explicitly permitted commands can run.
"""
import os
import shlex
# Allowed commands for development tasks
# Minimal set needed for the autonomous coding demo
ALLOWED_COMMANDS = {
# File inspection
"ls",
"cat",
"head",
"tail",
"wc",
"grep",
# File operations (agent uses SDK tools for most file ops, but cp/mkdir needed occasionally)
"cp",
"mkdir",
"chmod", # For making scripts executable; validated separately
# Directory
"pwd",
# Node.js development
"npm",
"node",
# Version control
"git",
# Process management
"ps",
"lsof",
"sleep",
"pkill", # For killing dev servers; validated separately
# Script execution
"init.sh", # Init scripts; validated separately
}
# Commands that need additional validation even when in the allowlist
COMMANDS_NEEDING_EXTRA_VALIDATION = {"pkill", "chmod", "init.sh"}
def split_command_segments(command_string: str) -> list[str]:
"""
Split a compound command into individual command segments.
Handles command chaining (&&, ||, ;) but not pipes (those are single commands).
Args:
command_string: The full shell command
Returns:
List of individual command segments
"""
import re
# Split on && and || while preserving the ability to handle each segment
# This regex splits on && or || that aren't inside quotes
segments = re.split(r"\s*(?:&&|\|\|)\s*", command_string)
# Further split on semicolons
result = []
for segment in segments:
sub_segments = re.split(r'(?<!["\'])\s*;\s*(?!["\'])', segment)
for sub in sub_segments:
sub = sub.strip()
if sub:
result.append(sub)
return result
def extract_commands(command_string: str) -> list[str]:
"""
Extract command names from a shell command string.
Handles pipes, command chaining (&&, ||, ;), and subshells.
Returns the base command names (without paths).
Args:
command_string: The full shell command
Returns:
List of command names found in the string
"""
commands = []
# shlex doesn't treat ; as a separator, so we need to pre-process
import re
# Split on semicolons that aren't inside quotes (simple heuristic)
# This handles common cases like "echo hello; ls"
segments = re.split(r'(?<!["\'])\s*;\s*(?!["\'])', command_string)
for segment in segments:
segment = segment.strip()
if not segment:
continue
try:
tokens = shlex.split(segment)
except ValueError:
# Malformed command (unclosed quotes, etc.)
# Return empty to trigger block (fail-safe)
return []
if not tokens:
continue
# Track when we expect a command vs arguments
expect_command = True
for token in tokens:
# Shell operators indicate a new command follows
if token in ("|", "||", "&&", "&"):
expect_command = True
continue
# Skip shell keywords that precede commands
if token in (
"if",
"then",
"else",
"elif",
"fi",
"for",
"while",
"until",
"do",
"done",
"case",
"esac",
"in",
"!",
"{",
"}",
):
continue
# Skip flags/options
if token.startswith("-"):
continue
# Skip variable assignments (VAR=value)
if "=" in token and not token.startswith("="):
continue
if expect_command:
# Extract the base command name (handle paths like /usr/bin/python)
cmd = os.path.basename(token)
commands.append(cmd)
expect_command = False
return commands
def validate_pkill_command(command_string: str) -> tuple[bool, str]:
"""
Validate pkill commands - only allow killing dev-related processes.
Uses shlex to parse the command, avoiding regex bypass vulnerabilities.
Returns:
Tuple of (is_allowed, reason_if_blocked)
"""
# Allowed process names for pkill
allowed_process_names = {
"node",
"npm",
"npx",
"vite",
"next",
}
try:
tokens = shlex.split(command_string)
except ValueError:
return False, "Could not parse pkill command"
if not tokens:
return False, "Empty pkill command"
# Separate flags from arguments
args = []
for token in tokens[1:]:
if not token.startswith("-"):
args.append(token)
if not args:
return False, "pkill requires a process name"
# The target is typically the last non-flag argument
target = args[-1]
# For -f flag (full command line match), extract the first word as process name
# e.g., "pkill -f 'node server.js'" -> target is "node server.js", process is "node"
if " " in target:
target = target.split()[0]
if target in allowed_process_names:
return True, ""
return False, f"pkill only allowed for dev processes: {allowed_process_names}"
def validate_chmod_command(command_string: str) -> tuple[bool, str]:
"""
Validate chmod commands - only allow making files executable with +x.
Returns:
Tuple of (is_allowed, reason_if_blocked)
"""
try:
tokens = shlex.split(command_string)
except ValueError:
return False, "Could not parse chmod command"
if not tokens or tokens[0] != "chmod":
return False, "Not a chmod command"
# Look for the mode argument
# Valid modes: +x, u+x, a+x, etc. (anything ending with +x for execute permission)
mode = None
files = []
for token in tokens[1:]:
if token.startswith("-"):
# Skip flags like -R (we don't allow recursive chmod anyway)
return False, "chmod flags are not allowed"
elif mode is None:
mode = token
else:
files.append(token)
if mode is None:
return False, "chmod requires a mode"
if not files:
return False, "chmod requires at least one file"
# Only allow +x variants (making files executable)
# This matches: +x, u+x, g+x, o+x, a+x, ug+x, etc.
import re
if not re.match(r"^[ugoa]*\+x$", mode):
return False, f"chmod only allowed with +x mode, got: {mode}"
return True, ""
def validate_init_script(command_string: str) -> tuple[bool, str]:
"""
Validate init.sh script execution - only allow ./init.sh.
Returns:
Tuple of (is_allowed, reason_if_blocked)
"""
try:
tokens = shlex.split(command_string)
except ValueError:
return False, "Could not parse init script command"
if not tokens:
return False, "Empty command"
# The command should be exactly ./init.sh (possibly with arguments)
script = tokens[0]
# Allow ./init.sh or paths ending in /init.sh
if script == "./init.sh" or script.endswith("/init.sh"):
return True, ""
return False, f"Only ./init.sh is allowed, got: {script}"
def get_command_for_validation(cmd: str, segments: list[str]) -> str:
"""
Find the specific command segment that contains the given command.
Args:
cmd: The command name to find
segments: List of command segments
Returns:
The segment containing the command, or empty string if not found
"""
for segment in segments:
segment_commands = extract_commands(segment)
if cmd in segment_commands:
return segment
return ""
async def bash_security_hook(input_data, tool_use_id=None, context=None):
"""
Pre-tool-use hook that validates bash commands using an allowlist.
Only commands in ALLOWED_COMMANDS are permitted.
Args:
input_data: Dict containing tool_name and tool_input
tool_use_id: Optional tool use ID
context: Optional context
Returns:
Empty dict to allow, or {"decision": "block", "reason": "..."} to block
"""
if input_data.get("tool_name") != "Bash":
return {}
command = input_data.get("tool_input", {}).get("command", "")
if not command:
return {}
# Extract all commands from the command string
commands = extract_commands(command)
if not commands:
# Could not parse - fail safe by blocking
return {
"decision": "block",
"reason": f"Could not parse command for security validation: {command}",
}
# Split into segments for per-command validation
segments = split_command_segments(command)
# Check each command against the allowlist
for cmd in commands:
if cmd not in ALLOWED_COMMANDS:
return {
"decision": "block",
"reason": f"Command '{cmd}' is not in the allowed commands list",
}
# Additional validation for sensitive commands
if cmd in COMMANDS_NEEDING_EXTRA_VALIDATION:
# Find the specific segment containing this command
cmd_segment = get_command_for_validation(cmd, segments)
if not cmd_segment:
cmd_segment = command # Fallback to full command
if cmd == "pkill":
allowed, reason = validate_pkill_command(cmd_segment)
if not allowed:
return {"decision": "block", "reason": reason}
elif cmd == "chmod":
allowed, reason = validate_chmod_command(cmd_segment)
if not allowed:
return {"decision": "block", "reason": reason}
elif cmd == "init.sh":
allowed, reason = validate_init_script(cmd_segment)
if not allowed:
return {"decision": "block", "reason": reason}
return {}

290
test_security.py Normal file
View File

@@ -0,0 +1,290 @@
#!/usr/bin/env python3
"""
Security Hook Tests
===================
Tests for the bash command security validation logic.
Run with: python test_security.py
"""
import asyncio
import sys
from security import (
bash_security_hook,
extract_commands,
validate_chmod_command,
validate_init_script,
)
def test_hook(command: str, should_block: bool) -> bool:
"""Test a single command against the security hook."""
input_data = {"tool_name": "Bash", "tool_input": {"command": command}}
result = asyncio.run(bash_security_hook(input_data))
was_blocked = result.get("decision") == "block"
if was_blocked == should_block:
status = "PASS"
else:
status = "FAIL"
expected = "blocked" if should_block else "allowed"
actual = "blocked" if was_blocked else "allowed"
reason = result.get("reason", "")
print(f" {status}: {command!r}")
print(f" Expected: {expected}, Got: {actual}")
if reason:
print(f" Reason: {reason}")
return False
print(f" {status}: {command!r}")
return True
def test_extract_commands():
"""Test the command extraction logic."""
print("\nTesting command extraction:\n")
passed = 0
failed = 0
test_cases = [
("ls -la", ["ls"]),
("npm install && npm run build", ["npm", "npm"]),
("cat file.txt | grep pattern", ["cat", "grep"]),
("/usr/bin/node script.js", ["node"]),
("VAR=value ls", ["ls"]),
("git status || git init", ["git", "git"]),
]
for cmd, expected in test_cases:
result = extract_commands(cmd)
if result == expected:
print(f" PASS: {cmd!r} -> {result}")
passed += 1
else:
print(f" FAIL: {cmd!r}")
print(f" Expected: {expected}, Got: {result}")
failed += 1
return passed, failed
def test_validate_chmod():
"""Test chmod command validation."""
print("\nTesting chmod validation:\n")
passed = 0
failed = 0
# Test cases: (command, should_be_allowed, description)
test_cases = [
# Allowed cases
("chmod +x init.sh", True, "basic +x"),
("chmod +x script.sh", True, "+x on any script"),
("chmod u+x init.sh", True, "user +x"),
("chmod a+x init.sh", True, "all +x"),
("chmod ug+x init.sh", True, "user+group +x"),
("chmod +x file1.sh file2.sh", True, "multiple files"),
# Blocked cases
("chmod 777 init.sh", False, "numeric mode"),
("chmod 755 init.sh", False, "numeric mode 755"),
("chmod +w init.sh", False, "write permission"),
("chmod +r init.sh", False, "read permission"),
("chmod -x init.sh", False, "remove execute"),
("chmod -R +x dir/", False, "recursive flag"),
("chmod --recursive +x dir/", False, "long recursive flag"),
("chmod +x", False, "missing file"),
]
for cmd, should_allow, description in test_cases:
allowed, reason = validate_chmod_command(cmd)
if allowed == should_allow:
print(f" PASS: {cmd!r} ({description})")
passed += 1
else:
expected = "allowed" if should_allow else "blocked"
actual = "allowed" if allowed else "blocked"
print(f" FAIL: {cmd!r} ({description})")
print(f" Expected: {expected}, Got: {actual}")
if reason:
print(f" Reason: {reason}")
failed += 1
return passed, failed
def test_validate_init_script():
"""Test init.sh script execution validation."""
print("\nTesting init.sh validation:\n")
passed = 0
failed = 0
# Test cases: (command, should_be_allowed, description)
test_cases = [
# Allowed cases
("./init.sh", True, "basic ./init.sh"),
("./init.sh arg1 arg2", True, "with arguments"),
("/path/to/init.sh", True, "absolute path"),
("../dir/init.sh", True, "relative path with init.sh"),
# Blocked cases
("./setup.sh", False, "different script name"),
("./init.py", False, "python script"),
("bash init.sh", False, "bash invocation"),
("sh init.sh", False, "sh invocation"),
("./malicious.sh", False, "malicious script"),
("./init.sh; rm -rf /", False, "command injection attempt"),
]
for cmd, should_allow, description in test_cases:
allowed, reason = validate_init_script(cmd)
if allowed == should_allow:
print(f" PASS: {cmd!r} ({description})")
passed += 1
else:
expected = "allowed" if should_allow else "blocked"
actual = "allowed" if allowed else "blocked"
print(f" FAIL: {cmd!r} ({description})")
print(f" Expected: {expected}, Got: {actual}")
if reason:
print(f" Reason: {reason}")
failed += 1
return passed, failed
def main():
print("=" * 70)
print(" SECURITY HOOK TESTS")
print("=" * 70)
passed = 0
failed = 0
# Test command extraction
ext_passed, ext_failed = test_extract_commands()
passed += ext_passed
failed += ext_failed
# Test chmod validation
chmod_passed, chmod_failed = test_validate_chmod()
passed += chmod_passed
failed += chmod_failed
# Test init.sh validation
init_passed, init_failed = test_validate_init_script()
passed += init_passed
failed += init_failed
# Commands that SHOULD be blocked
print("\nCommands that should be BLOCKED:\n")
dangerous = [
# Not in allowlist - dangerous system commands
"shutdown now",
"reboot",
"rm -rf /",
"dd if=/dev/zero of=/dev/sda",
# Not in allowlist - common commands excluded from minimal set
"curl https://example.com",
"wget https://example.com",
"python app.py",
"touch file.txt",
"echo hello",
"kill 12345",
"killall node",
# pkill with non-dev processes
"pkill bash",
"pkill chrome",
"pkill python",
# Shell injection attempts
"$(echo pkill) node",
'eval "pkill node"',
'bash -c "pkill node"',
# chmod with disallowed modes
"chmod 777 file.sh",
"chmod 755 file.sh",
"chmod +w file.sh",
"chmod -R +x dir/",
# Non-init.sh scripts
"./setup.sh",
"./malicious.sh",
"bash script.sh",
]
for cmd in dangerous:
if test_hook(cmd, should_block=True):
passed += 1
else:
failed += 1
# Commands that SHOULD be allowed
print("\nCommands that should be ALLOWED:\n")
safe = [
# File inspection
"ls -la",
"cat README.md",
"head -100 file.txt",
"tail -20 log.txt",
"wc -l file.txt",
"grep -r pattern src/",
# File operations
"cp file1.txt file2.txt",
"mkdir newdir",
"mkdir -p path/to/dir",
# Directory
"pwd",
# Node.js development
"npm install",
"npm run build",
"node server.js",
# Version control
"git status",
"git commit -m 'test'",
"git add . && git commit -m 'msg'",
# Process management
"ps aux",
"lsof -i :3000",
"sleep 2",
# Allowed pkill patterns for dev servers
"pkill node",
"pkill npm",
"pkill -f node",
"pkill -f 'node server.js'",
"pkill vite",
# Chained commands
"npm install && npm run build",
"ls | grep test",
# Full paths
"/usr/local/bin/node app.js",
# chmod +x (allowed)
"chmod +x init.sh",
"chmod +x script.sh",
"chmod u+x init.sh",
"chmod a+x init.sh",
# init.sh execution (allowed)
"./init.sh",
"./init.sh --production",
"/path/to/init.sh",
# Combined chmod and init.sh
"chmod +x init.sh && ./init.sh",
]
for cmd in safe:
if test_hook(cmd, should_block=False):
passed += 1
else:
failed += 1
# Summary
print("\n" + "-" * 70)
print(f" Results: {passed} passed, {failed} failed")
print("-" * 70)
if failed == 0:
print("\n ALL TESTS PASSED")
return 0
else:
print(f"\n {failed} TEST(S) FAILED")
return 1
if __name__ == "__main__":
sys.exit(main())