API Reference
=============

This section provides detailed documentation for all Prompture modules, classes, and functions.

Overview
--------

Prompture is organized into several key modules:

- **Core Module** (:doc:`core`) - Main extraction functions and utilities
- **Field Definitions** (:doc:`field_definitions`) - Field registry and validation system  
- **Drivers** (:doc:`drivers`) - LLM provider interfaces
- **Runner** (:doc:`runner`) - Test suite and batch processing
- **Validator** (:doc:`validator`) - Data validation utilities

Quick Reference
---------------

**Main Extraction Functions**

.. code-block:: python

    from prompture import (
        extract_and_jsonify,        # Basic extraction with field definitions
        extract_with_model,         # Extract using Pydantic models  
        stepwise_extract_with_model # Multi-step extraction process
    )

**Field Registry System**

.. code-block:: python

    from prompture import (
        field_from_registry,        # Get field for Pydantic models
        register_field,             # Register custom field definition
        get_registry_snapshot,      # View all registered fields
        clear_registry             # Clear custom fields
    )

**Driver and Utilities**

.. code-block:: python

    from prompture import (
        Driver,                     # Base driver class
        validate_against_schema,    # JSON schema validation
        run_suite_from_spec        # Run test suites
    )

Core Functions
--------------

extract_and_jsonify()
~~~~~~~~~~~~~~~~~~~~~

Main function for extracting structured data from text using field definitions.

.. code-block:: python

    def extract_and_jsonify(
        prompt: str,
        fields: dict,
        model_name: str = "auto",
        **kwargs
    ) -> dict:
        """
        Extract structured JSON data from text using field definitions.
        
        Args:
            prompt: The input text to extract data from
            fields: Dictionary mapping field names to field types or definitions
            model_name: LLM model to use (e.g., "openai/gpt-4")
            **kwargs: Additional parameters passed to the driver
            
        Returns:
            Dictionary containing extracted structured data
            
        Raises:
            ValueError: If extraction fails or data is invalid
            RuntimeError: If model or driver is not available
        """

**Example:**

.. code-block:: python

    result = extract_and_jsonify(
        prompt="John Smith is 25 years old",
        fields={"name": "name", "age": "age"},
        model_name="openai/gpt-4"
    )
    # Returns: {"name": "John Smith", "age": 25}

extract_with_model()
~~~~~~~~~~~~~~~~~~~~

Extract data using Pydantic models with the field registry system.

.. code-block:: python

    def extract_with_model(
        model_class: Type[BaseModel],
        prompt: str,
        model_name: str = "auto",
        **kwargs
    ) -> BaseModel:
        """
        Extract structured data using a Pydantic model.
        
        Args:
            model_class: Pydantic model class defining the output structure
            prompt: The input text to extract data from
            model_name: LLM model to use
            **kwargs: Additional parameters
            
        Returns:
            Instance of the Pydantic model with extracted data
        """

**Example:**

.. code-block:: python

    class Person(BaseModel):
        name: str = field_from_registry("name")
        age: int = field_from_registry("age")
    
    person = extract_with_model(
        model_class=Person,
        prompt="Alice Johnson, 32 years old",
        model_name="openai/gpt-4"
    )

stepwise_extract_with_model()
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Multi-step extraction process with enhanced validation and error handling.

.. code-block:: python

    def stepwise_extract_with_model(
        model_class: Type[BaseModel],
        prompt: str,
        model_name: str = "auto",
        **kwargs
    ) -> BaseModel:
        """
        Extract data using a multi-step validation process.
        
        This function performs extraction in multiple phases with
        validation at each step for improved accuracy.
        
        Args:
            model_class: Pydantic model class
            prompt: Input text
            model_name: LLM model to use
            **kwargs: Additional parameters
            
        Returns:
            Validated Pydantic model instance
        """

Field Registry System
---------------------

field_from_registry()
~~~~~~~~~~~~~~~~~~~~~

Get a field definition from the registry for use in Pydantic models.

.. code-block:: python

    def field_from_registry(field_name: str) -> Any:
        """
        Retrieve a field definition from the registry.
        
        Args:
            field_name: Name of the registered field
            
        Returns:
            Pydantic Field object with the registered definition
            
        Raises:
            KeyError: If field_name is not registered
        """

register_field()
~~~~~~~~~~~~~~~~

Register a custom field definition in the global registry.

.. code-block:: python

    def register_field(name: str, definition: dict) -> None:
        """
        Register a custom field definition.
        
        Args:
            name: Field name identifier
            definition: Dictionary containing field specification
            
        Definition format:
            {
                "type": "str|int|float|list|dict|bool",
                "description": "Human readable description",
                "instructions": "Instructions for LLM extraction", 
                "default": "Default value or template variable",
                "nullable": True/False,
                "validation": {...}  # Optional validation rules
            }
        """

**Example:**

.. code-block:: python

    register_field("skills", {
        "type": "list",
        "description": "List of professional skills",
        "instructions": "Extract skills as list of strings",
        "default": [],
        "nullable": True
    })

Built-in Field Types
--------------------

Prompture includes many built-in field definitions:

**Personal Information**
  - ``name`` - Person's full name
  - ``age`` - Age in years (0-150)
  - ``email`` - Email address with validation
  - ``phone`` - Phone number
  - ``address`` - Physical address

**Professional Fields**
  - ``occupation`` - Job title or profession
  - ``company`` - Company or organization name
  - ``experience_years`` - Years of experience

**Temporal Fields**
  - ``date`` - Date in various formats
  - ``year`` - Year (1900-current)
  - ``last_updated`` - Timestamp field

**Content Fields**
  - ``title`` - Title or heading text
  - ``description`` - Longer descriptive text
  - ``category`` - Classification or category
  - ``content`` - General content field

Driver System
-------------

The driver system provides a unified interface for different LLM providers.

Supported Models
~~~~~~~~~~~~~~~~

**OpenAI**
  - ``openai/gpt-4`` - GPT-4 (recommended for complex tasks)
  - ``openai/gpt-3.5-turbo`` - GPT-3.5 Turbo (fast and cost-effective)

**Anthropic**
  - ``anthropic/claude-3-opus-20240229`` - Claude 3 Opus (most capable)
  - ``anthropic/claude-3-sonnet-20240229`` - Claude 3 Sonnet (balanced)
  - ``anthropic/claude-3-haiku-20240307`` - Claude 3 Haiku (fast)

**Google**
  - ``google/gemini-pro`` - Gemini Pro
  - ``google/gemini-pro-vision`` - Gemini Pro with vision

**Groq**
  - ``groq/llama2-70b-4096`` - Llama 2 70B (fast inference)
  - ``groq/mixtral-8x7b-32768`` - Mixtral 8x7B

**Local Models**
  - ``ollama/llama2`` - Local Llama 2 via Ollama
  - ``ollama/mistral`` - Local Mistral via Ollama

Driver Base Class
~~~~~~~~~~~~~~~~~

.. code-block:: python

    class Driver:
        """Base class for LLM drivers."""
        
        def __init__(self, model_name: str, **kwargs):
            """Initialize the driver with model configuration."""
            
        def ask_for_json(self, prompt: str, **kwargs) -> dict:
            """Send prompt to LLM and return JSON response."""
            
        def validate_response(self, response: dict) -> bool:
            """Validate the LLM response format."""

Validation and Utilities
------------------------

validate_against_schema()
~~~~~~~~~~~~~~~~~~~~~~~~~

Validate extracted data against a JSON schema.

.. code-block:: python

    def validate_against_schema(data: dict, schema: dict) -> bool:
        """
        Validate data against a JSON schema.
        
        Args:
            data: Dictionary to validate
            schema: JSON schema specification
            
        Returns:
            True if data matches schema, False otherwise
        """

**Example:**

.. code-block:: python

    schema = {
        "type": "object",
        "properties": {
            "name": {"type": "string"},
            "age": {"type": "integer", "minimum": 0, "maximum": 150}
        },
        "required": ["name", "age"]
    }
    
    is_valid = validate_against_schema(result, schema)

Error Handling
--------------

Prompture defines several custom exceptions:

.. code-block:: python

    class PromptureError(Exception):
        """Base exception for Prompture errors."""
        
    class ExtractionError(PromptureError):
        """Raised when data extraction fails."""
        
    class ValidationError(PromptureError):
        """Raised when data validation fails."""
        
    class DriverError(PromptureError):
        """Raised when driver operations fail."""

Configuration
-------------

Environment Variables
~~~~~~~~~~~~~~~~~~~~~

Prompture uses environment variables for configuration:

.. code-block:: bash

    # API Keys
    OPENAI_API_KEY=your_openai_key
    ANTHROPIC_API_KEY=your_anthropic_key
    GOOGLE_API_KEY=your_google_key
    GROQ_API_KEY=your_groq_key
    
    # Custom Endpoints
    OPENAI_BASE_URL=https://api.openai.com/v1
    LOCAL_API_BASE_URL=http://localhost:8000
    OLLAMA_BASE_URL=http://localhost:11434

Template Variables
~~~~~~~~~~~~~~~~~~

Field definitions support template variables that are automatically resolved:

- ``{{current_year}}`` - Current year (e.g., 2024)
- ``{{current_date}}`` - Current date (YYYY-MM-DD format)
- ``{{current_datetime}}`` - Current datetime (ISO format)

**Example:**

.. code-block:: python

    register_field("processed_at", {
        "type": "str",
        "description": "Processing timestamp",
        "default": "{{current_datetime}}",
        "nullable": False
    })

Module Reference
----------------

.. toctree::
   :maxdepth: 2
   
   core
   field_definitions
   drivers
   tools
   runner
   validator

For detailed module documentation, select a module from the list above.

The following API documentation files have been generated using Sphinx autodoc:

Core Modules
~~~~~~~~~~~~

- **Core Module** (:doc:`core`) - Main extraction functions: [`extract_and_jsonify()`](core.rst#extract_and_jsonify), [`extract_with_model()`](core.rst#extract_with_model), [`stepwise_extract_with_model()`](core.rst#stepwise_extract_with_model)
- **Field Definitions** (:doc:`field_definitions`) - Field registry system: [`field_from_registry()`](field_definitions.rst#field_from_registry), [`register_field()`](field_definitions.rst#register_field), [`get_registry_snapshot()`](field_definitions.rst#get_registry_snapshot)
- **Drivers** (:doc:`drivers`) - LLM provider interfaces: [`get_driver_for_model()`](drivers.rst#get_driver_for_model), [`OpenAIDriver`](drivers.rst#openaidriver), [`ClaudeDriver`](drivers.rst#claudedriver), and more

Utility Modules
~~~~~~~~~~~~~~~

- **Tools** (:doc:`tools`) - Utility functions: [`convert_value()`](tools.rst#convert_value), [`log_debug()`](tools.rst#log_debug), [`clean_json_text()`](tools.rst#clean_json_text)
- **Runner** (:doc:`runner`) - Test suite and batch processing utilities
- **Validator** (:doc:`validator`) - Data validation and schema checking utilities