Files
David Blanc Brioir d2f7165120 Add Library RAG project and cleanup root directory
- Add complete Library RAG application (Flask + MCP server)
  - PDF processing pipeline with OCR and LLM extraction
  - Weaviate vector database integration (BGE-M3 embeddings)
  - Flask web interface with search and document management
  - MCP server for Claude Desktop integration
  - Comprehensive test suite (134 tests)

- Clean up root directory
  - Remove obsolete documentation files
  - Remove backup and temporary files
  - Update autonomous agent configuration

- Update prompts
  - Enhance initializer bis prompt with better instructions

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-30 11:57:12 +01:00

170 lines
5.1 KiB
Python

"""Mistral API Client Management.
This module provides utilities for managing the Mistral API client,
including API key retrieval and OCR cost estimation. It serves as the
foundation for all Mistral API interactions in the Library RAG pipeline.
Key Features:
- Automatic API key discovery from multiple sources
- Client instantiation with proper authentication
- OCR cost estimation for budget planning
API Key Priority:
The module searches for the Mistral API key in this order:
1. Explicit argument passed to functions
2. MISTRAL_API_KEY environment variable
3. .env file in the project root
Cost Estimation:
Mistral OCR pricing (as of 2024):
- Standard OCR: ~1 EUR per 1000 pages (0.001 EUR/page)
- OCR with annotations: ~3 EUR per 1000 pages (0.003 EUR/page)
Example:
Basic client creation and usage::
from utils.mistral_client import create_client, estimate_ocr_cost
# Create authenticated client
client = create_client()
# Estimate cost for a 100-page document
cost = estimate_ocr_cost(100, use_annotations=False)
print(f"Estimated cost: {cost:.2f} EUR") # Output: Estimated cost: 0.10 EUR
Using explicit API key::
client = create_client(api_key="your-api-key-here")
See Also:
- :mod:`utils.ocr_processor`: OCR execution functions using this client
- :mod:`utils.pdf_uploader`: PDF upload utilities for OCR processing
Note:
Ensure MISTRAL_API_KEY is set before using this module in production.
The API key can be obtained from the Mistral AI platform dashboard.
"""
import os
from typing import Optional
from dotenv import load_dotenv
from mistralai import Mistral
def get_api_key(api_key: Optional[str] = None) -> str:
"""Retrieve the Mistral API key from available sources.
Searches for the API key in the following priority order:
1. Explicit argument passed to this function
2. MISTRAL_API_KEY environment variable
3. .env file in the project root
Args:
api_key: Optional API key to use directly. If provided and non-empty,
this value is used without checking other sources.
Returns:
The Mistral API key as a string.
Raises:
RuntimeError: If no API key is found in any of the checked sources.
Example:
>>> # Using environment variable
>>> key = get_api_key()
>>> len(key) > 0
True
>>> # Using explicit key
>>> key = get_api_key("my-api-key")
>>> key
'my-api-key'
"""
# 1. Argument fourni
if api_key and api_key.strip():
return api_key.strip()
# 2. Variable d environnement
env_key = os.getenv("MISTRAL_API_KEY", "").strip()
if env_key:
return env_key
# 3. Fichier .env
load_dotenv()
env_key = os.getenv("MISTRAL_API_KEY", "").strip()
if env_key:
return env_key
raise RuntimeError(
"MISTRAL_API_KEY manquante. "
"Definissez la variable d environnement ou creez un fichier .env"
)
def create_client(api_key: Optional[str] = None) -> Mistral:
"""Create and return an authenticated Mistral client.
This is the primary entry point for obtaining a Mistral client instance.
The client can be used for OCR operations, chat completions, and other
Mistral API features.
Args:
api_key: Optional API key. If not provided, the key is automatically
retrieved from environment variables or .env file.
Returns:
An authenticated Mistral client instance ready for API calls.
Raises:
RuntimeError: If no API key is found (propagated from get_api_key).
Example:
>>> client = create_client()
>>> # Client is now ready for OCR or other operations
>>> response = client.ocr.process(...) # doctest: +SKIP
"""
key = get_api_key(api_key)
return Mistral(api_key=key)
def estimate_ocr_cost(nb_pages: int, use_annotations: bool = False) -> float:
"""Estimate the cost of OCR processing for a document.
Calculates the expected cost based on Mistral OCR pricing model.
This is useful for budget planning before processing large document
collections.
Pricing Model:
- Standard OCR: ~1 EUR per 1000 pages (0.001 EUR/page)
- OCR with annotations: ~3 EUR per 1000 pages (0.003 EUR/page)
The annotation mode is approximately 3x more expensive but provides
additional structural information useful for TOC extraction.
Args:
nb_pages: Number of pages in the document to process.
use_annotations: If True, uses the higher annotation pricing.
Annotations provide bounding box and structural data.
Returns:
Estimated cost in euros as a float.
Example:
>>> # Standard OCR for 100 pages
>>> estimate_ocr_cost(100)
0.1
>>> # OCR with annotations for 100 pages
>>> estimate_ocr_cost(100, use_annotations=True)
0.3
>>> # Large document collection
>>> estimate_ocr_cost(10000)
10.0
"""
if use_annotations:
return nb_pages * 0.003 # 3 EUR / 1000 pages
else:
return nb_pages * 0.001 # 1 EUR / 1000 pages