API Reference

This section provides detailed documentation for all Prompture modules, classes, and functions.

Overview

Prompture is organized into several key modules:

Core Module (Core Module) - Main extraction functions and utilities
Field Definitions (Field Definitions Module) - Field registry and validation system
Drivers (Drivers Module) - LLM provider interfaces
Runner (Runner Module) - Test suite and batch processing
Validator (Validator Module) - Data validation utilities

Quick Reference

Main Extraction Functions

from prompture import (
    extract_and_jsonify,        # Basic extraction with field definitions
    extract_with_model,         # Extract using Pydantic models
    stepwise_extract_with_model # Multi-step extraction process
)

Field Registry System

from prompture import (
    field_from_registry,        # Get field for Pydantic models
    register_field,             # Register custom field definition
    get_registry_snapshot,      # View all registered fields
    clear_registry             # Clear custom fields
)

Driver and Utilities

from prompture import (
    Driver,                     # Base driver class
    validate_against_schema,    # JSON schema validation
    run_suite_from_spec        # Run test suites
)

Core Functions

extract_and_jsonify()

Main function for extracting structured data from text using field definitions.

def extract_and_jsonify(
    prompt: str,
    fields: dict,
    model_name: str = "auto",
    **kwargs
) -> dict:
    """
    Extract structured JSON data from text using field definitions.

    Args:
        prompt: The input text to extract data from
        fields: Dictionary mapping field names to field types or definitions
        model_name: LLM model to use (e.g., "openai/gpt-4")
        **kwargs: Additional parameters passed to the driver

    Returns:
        Dictionary containing extracted structured data

    Raises:
        ValueError: If extraction fails or data is invalid
        RuntimeError: If model or driver is not available
    """

Example:

result = extract_and_jsonify(
    prompt="John Smith is 25 years old",
    fields={"name": "name", "age": "age"},
    model_name="openai/gpt-4"
)
# Returns: {"name": "John Smith", "age": 25}

extract_with_model()

Extract data using Pydantic models with the field registry system.

def extract_with_model(
    model_class: Type[BaseModel],
    prompt: str,
    model_name: str = "auto",
    **kwargs
) -> BaseModel:
    """
    Extract structured data using a Pydantic model.

    Args:
        model_class: Pydantic model class defining the output structure
        prompt: The input text to extract data from
        model_name: LLM model to use
        **kwargs: Additional parameters

    Returns:
        Instance of the Pydantic model with extracted data
    """

Example:

class Person(BaseModel):
    name: str = field_from_registry("name")
    age: int = field_from_registry("age")

person = extract_with_model(
    model_class=Person,
    prompt="Alice Johnson, 32 years old",
    model_name="openai/gpt-4"
)

stepwise_extract_with_model()

Multi-step extraction process with enhanced validation and error handling.

def stepwise_extract_with_model(
    model_class: Type[BaseModel],
    prompt: str,
    model_name: str = "auto",
    **kwargs
) -> BaseModel:
    """
    Extract data using a multi-step validation process.

    This function performs extraction in multiple phases with
    validation at each step for improved accuracy.

    Args:
        model_class: Pydantic model class
        prompt: Input text
        model_name: LLM model to use
        **kwargs: Additional parameters

    Returns:
        Validated Pydantic model instance
    """

Field Registry System

field_from_registry()

Get a field definition from the registry for use in Pydantic models.

def field_from_registry(field_name: str) -> Any:
    """
    Retrieve a field definition from the registry.

    Args:
        field_name: Name of the registered field

    Returns:
        Pydantic Field object with the registered definition

    Raises:
        KeyError: If field_name is not registered
    """

register_field()

Register a custom field definition in the global registry.

def register_field(name: str, definition: dict) -> None:
    """
    Register a custom field definition.

    Args:
        name: Field name identifier
        definition: Dictionary containing field specification

    Definition format:
        {
            "type": "str|int|float|list|dict|bool",
            "description": "Human readable description",
            "instructions": "Instructions for LLM extraction",
            "default": "Default value or template variable",
            "nullable": True/False,
            "validation": {...}  # Optional validation rules
        }
    """

Example:

register_field("skills", {
    "type": "list",
    "description": "List of professional skills",
    "instructions": "Extract skills as list of strings",
    "default": [],
    "nullable": True
})

Built-in Field Types

Prompture includes many built-in field definitions:

Personal Information

name - Person’s full name
age - Age in years (0-150)
email - Email address with validation
phone - Phone number
address - Physical address

Professional Fields

occupation - Job title or profession
company - Company or organization name
experience_years - Years of experience

Temporal Fields

date - Date in various formats
year - Year (1900-current)
last_updated - Timestamp field

Content Fields

title - Title or heading text
description - Longer descriptive text
category - Classification or category
content - General content field

Driver System

The driver system provides a unified interface for different LLM providers.

Supported Models

OpenAI

openai/gpt-4 - GPT-4 (recommended for complex tasks)
openai/gpt-3.5-turbo - GPT-3.5 Turbo (fast and cost-effective)

Anthropic

anthropic/claude-3-opus-20240229 - Claude 3 Opus (most capable)
anthropic/claude-3-sonnet-20240229 - Claude 3 Sonnet (balanced)
anthropic/claude-3-haiku-20240307 - Claude 3 Haiku (fast)

Google

google/gemini-pro - Gemini Pro
google/gemini-pro-vision - Gemini Pro with vision

Groq

groq/llama2-70b-4096 - Llama 2 70B (fast inference)
groq/mixtral-8x7b-32768 - Mixtral 8x7B

Local Models

ollama/llama2 - Local Llama 2 via Ollama
ollama/mistral - Local Mistral via Ollama

Driver Base Class

class Driver:
    """Base class for LLM drivers."""

    def __init__(self, model_name: str, **kwargs):
        """Initialize the driver with model configuration."""

    def ask_for_json(self, prompt: str, **kwargs) -> dict:
        """Send prompt to LLM and return JSON response."""

    def validate_response(self, response: dict) -> bool:
        """Validate the LLM response format."""

Validation and Utilities

validate_against_schema()

Validate extracted data against a JSON schema.

def validate_against_schema(data: dict, schema: dict) -> bool:
    """
    Validate data against a JSON schema.

    Args:
        data: Dictionary to validate
        schema: JSON schema specification

    Returns:
        True if data matches schema, False otherwise
    """

Example:

schema = {
    "type": "object",
    "properties": {
        "name": {"type": "string"},
        "age": {"type": "integer", "minimum": 0, "maximum": 150}
    },
    "required": ["name", "age"]
}

is_valid = validate_against_schema(result, schema)

Error Handling

Prompture defines several custom exceptions:

class PromptureError(Exception):
    """Base exception for Prompture errors."""

class ExtractionError(PromptureError):
    """Raised when data extraction fails."""

class ValidationError(PromptureError):
    """Raised when data validation fails."""

class DriverError(PromptureError):
    """Raised when driver operations fail."""

Configuration

Environment Variables

Prompture uses environment variables for configuration:

# API Keys
OPENAI_API_KEY=your_openai_key
ANTHROPIC_API_KEY=your_anthropic_key
GOOGLE_API_KEY=your_google_key
GROQ_API_KEY=your_groq_key

# Custom Endpoints
OPENAI_BASE_URL=https://api.openai.com/v1
LOCAL_API_BASE_URL=http://localhost:8000
OLLAMA_BASE_URL=http://localhost:11434

Template Variables

Field definitions support template variables that are automatically resolved:

{{current_year}} - Current year (e.g., 2024)
{{current_date}} - Current date (YYYY-MM-DD format)
{{current_datetime}} - Current datetime (ISO format)

Example:

register_field("processed_at", {
    "type": "str",
    "description": "Processing timestamp",
    "default": "{{current_datetime}}",
    "nullable": False
})

Module Reference

For detailed module documentation, select a module from the list above.

The following API documentation files have been generated using Sphinx autodoc:

Core Modules

Core Module (Core Module) - Main extraction functions: [extract_and_jsonify()](core.rst#extract_and_jsonify), [extract_with_model()](core.rst#extract_with_model), [stepwise_extract_with_model()](core.rst#stepwise_extract_with_model)
Field Definitions (Field Definitions Module) - Field registry system: [field_from_registry()](field_definitions.rst#field_from_registry), [register_field()](field_definitions.rst#register_field), [get_registry_snapshot()](field_definitions.rst#get_registry_snapshot)
Drivers (Drivers Module) - LLM provider interfaces: [get_driver_for_model()](drivers.rst#get_driver_for_model), [OpenAIDriver](drivers.rst#openaidriver), [ClaudeDriver](drivers.rst#claudedriver), and more

Utility Modules

Tools (Tools Module) - Utility functions: [convert_value()](tools.rst#convert_value), [log_debug()](tools.rst#log_debug), [clean_json_text()](tools.rst#clean_json_text)
Runner (Runner Module) - Test suite and batch processing utilities
Validator (Validator Module) - Data validation and schema checking utilities