API Reference

This section provides detailed documentation for all Prompture modules, classes, and functions.

Overview

Prompture is organized into several key modules:

Quick Reference

Main Extraction Functions

from prompture import (
    extract_and_jsonify,        # Basic extraction with field definitions
    extract_with_model,         # Extract using Pydantic models
    stepwise_extract_with_model # Multi-step extraction process
)

Field Registry System

from prompture import (
    field_from_registry,        # Get field for Pydantic models
    register_field,             # Register custom field definition
    get_registry_snapshot,      # View all registered fields
    clear_registry             # Clear custom fields
)

Driver and Utilities

from prompture import (
    Driver,                     # Base driver class
    validate_against_schema,    # JSON schema validation
    run_suite_from_spec        # Run test suites
)

Core Functions

extract_and_jsonify()

Main function for extracting structured data from text using field definitions.

def extract_and_jsonify(
    prompt: str,
    fields: dict,
    model_name: str = "auto",
    **kwargs
) -> dict:
    """
    Extract structured JSON data from text using field definitions.

    Args:
        prompt: The input text to extract data from
        fields: Dictionary mapping field names to field types or definitions
        model_name: LLM model to use (e.g., "openai/gpt-4")
        **kwargs: Additional parameters passed to the driver

    Returns:
        Dictionary containing extracted structured data

    Raises:
        ValueError: If extraction fails or data is invalid
        RuntimeError: If model or driver is not available
    """

Example:

result = extract_and_jsonify(
    prompt="John Smith is 25 years old",
    fields={"name": "name", "age": "age"},
    model_name="openai/gpt-4"
)
# Returns: {"name": "John Smith", "age": 25}

extract_with_model()

Extract data using Pydantic models with the field registry system.

def extract_with_model(
    model_class: Type[BaseModel],
    prompt: str,
    model_name: str = "auto",
    **kwargs
) -> BaseModel:
    """
    Extract structured data using a Pydantic model.

    Args:
        model_class: Pydantic model class defining the output structure
        prompt: The input text to extract data from
        model_name: LLM model to use
        **kwargs: Additional parameters

    Returns:
        Instance of the Pydantic model with extracted data
    """

Example:

class Person(BaseModel):
    name: str = field_from_registry("name")
    age: int = field_from_registry("age")

person = extract_with_model(
    model_class=Person,
    prompt="Alice Johnson, 32 years old",
    model_name="openai/gpt-4"
)

stepwise_extract_with_model()

Multi-step extraction process with enhanced validation and error handling.

def stepwise_extract_with_model(
    model_class: Type[BaseModel],
    prompt: str,
    model_name: str = "auto",
    **kwargs
) -> BaseModel:
    """
    Extract data using a multi-step validation process.

    This function performs extraction in multiple phases with
    validation at each step for improved accuracy.

    Args:
        model_class: Pydantic model class
        prompt: Input text
        model_name: LLM model to use
        **kwargs: Additional parameters

    Returns:
        Validated Pydantic model instance
    """

Field Registry System

field_from_registry()

Get a field definition from the registry for use in Pydantic models.

def field_from_registry(field_name: str) -> Any:
    """
    Retrieve a field definition from the registry.

    Args:
        field_name: Name of the registered field

    Returns:
        Pydantic Field object with the registered definition

    Raises:
        KeyError: If field_name is not registered
    """

register_field()

Register a custom field definition in the global registry.

def register_field(name: str, definition: dict) -> None:
    """
    Register a custom field definition.

    Args:
        name: Field name identifier
        definition: Dictionary containing field specification

    Definition format:
        {
            "type": "str|int|float|list|dict|bool",
            "description": "Human readable description",
            "instructions": "Instructions for LLM extraction",
            "default": "Default value or template variable",
            "nullable": True/False,
            "validation": {...}  # Optional validation rules
        }
    """

Example:

register_field("skills", {
    "type": "list",
    "description": "List of professional skills",
    "instructions": "Extract skills as list of strings",
    "default": [],
    "nullable": True
})

Built-in Field Types

Prompture includes many built-in field definitions:

Personal Information
  • name - Person’s full name

  • age - Age in years (0-150)

  • email - Email address with validation

  • phone - Phone number

  • address - Physical address

Professional Fields
  • occupation - Job title or profession

  • company - Company or organization name

  • experience_years - Years of experience

Temporal Fields
  • date - Date in various formats

  • year - Year (1900-current)

  • last_updated - Timestamp field

Content Fields
  • title - Title or heading text

  • description - Longer descriptive text

  • category - Classification or category

  • content - General content field

Driver System

The driver system provides a unified interface for different LLM providers.

Supported Models

OpenAI
  • openai/gpt-4 - GPT-4 (recommended for complex tasks)

  • openai/gpt-3.5-turbo - GPT-3.5 Turbo (fast and cost-effective)

Anthropic
  • anthropic/claude-3-opus-20240229 - Claude 3 Opus (most capable)

  • anthropic/claude-3-sonnet-20240229 - Claude 3 Sonnet (balanced)

  • anthropic/claude-3-haiku-20240307 - Claude 3 Haiku (fast)

Google
  • google/gemini-pro - Gemini Pro

  • google/gemini-pro-vision - Gemini Pro with vision

Groq
  • groq/llama2-70b-4096 - Llama 2 70B (fast inference)

  • groq/mixtral-8x7b-32768 - Mixtral 8x7B

Local Models
  • ollama/llama2 - Local Llama 2 via Ollama

  • ollama/mistral - Local Mistral via Ollama

Driver Base Class

class Driver:
    """Base class for LLM drivers."""

    def __init__(self, model_name: str, **kwargs):
        """Initialize the driver with model configuration."""

    def ask_for_json(self, prompt: str, **kwargs) -> dict:
        """Send prompt to LLM and return JSON response."""

    def validate_response(self, response: dict) -> bool:
        """Validate the LLM response format."""

Validation and Utilities

validate_against_schema()

Validate extracted data against a JSON schema.

def validate_against_schema(data: dict, schema: dict) -> bool:
    """
    Validate data against a JSON schema.

    Args:
        data: Dictionary to validate
        schema: JSON schema specification

    Returns:
        True if data matches schema, False otherwise
    """

Example:

schema = {
    "type": "object",
    "properties": {
        "name": {"type": "string"},
        "age": {"type": "integer", "minimum": 0, "maximum": 150}
    },
    "required": ["name", "age"]
}

is_valid = validate_against_schema(result, schema)

Error Handling

Prompture defines several custom exceptions:

class PromptureError(Exception):
    """Base exception for Prompture errors."""

class ExtractionError(PromptureError):
    """Raised when data extraction fails."""

class ValidationError(PromptureError):
    """Raised when data validation fails."""

class DriverError(PromptureError):
    """Raised when driver operations fail."""

Configuration

Environment Variables

Prompture uses environment variables for configuration:

# API Keys
OPENAI_API_KEY=your_openai_key
ANTHROPIC_API_KEY=your_anthropic_key
GOOGLE_API_KEY=your_google_key
GROQ_API_KEY=your_groq_key

# Custom Endpoints
OPENAI_BASE_URL=https://api.openai.com/v1
LOCAL_API_BASE_URL=http://localhost:8000
OLLAMA_BASE_URL=http://localhost:11434

Template Variables

Field definitions support template variables that are automatically resolved:

  • {{current_year}} - Current year (e.g., 2024)

  • {{current_date}} - Current date (YYYY-MM-DD format)

  • {{current_datetime}} - Current datetime (ISO format)

Example:

register_field("processed_at", {
    "type": "str",
    "description": "Processing timestamp",
    "default": "{{current_datetime}}",
    "nullable": False
})

Module Reference

For detailed module documentation, select a module from the list above.

The following API documentation files have been generated using Sphinx autodoc:

Core Modules

  • Core Module (Core Module) - Main extraction functions: [extract_and_jsonify()](core.rst#extract_and_jsonify), [extract_with_model()](core.rst#extract_with_model), [stepwise_extract_with_model()](core.rst#stepwise_extract_with_model)

  • Field Definitions (Field Definitions Module) - Field registry system: [field_from_registry()](field_definitions.rst#field_from_registry), [register_field()](field_definitions.rst#register_field), [get_registry_snapshot()](field_definitions.rst#get_registry_snapshot)

  • Drivers (Drivers Module) - LLM provider interfaces: [get_driver_for_model()](drivers.rst#get_driver_for_model), [OpenAIDriver](drivers.rst#openaidriver), [ClaudeDriver](drivers.rst#claudedriver), and more

Utility Modules

  • Tools (Tools Module) - Utility functions: [convert_value()](tools.rst#convert_value), [log_debug()](tools.rst#log_debug), [clean_json_text()](tools.rst#clean_json_text)

  • Runner (Runner Module) - Test suite and batch processing utilities

  • Validator (Validator Module) - Data validation and schema checking utilities