Field Definitions Module

The field definitions module provides a centralized registry system for defining reusable field specifications that enhance structured data extraction with type hints, descriptions, validation rules, and LLM-specific extraction instructions.

Overview

The field definitions system allows you to:

  • Define reusable field specifications with type information, descriptions, and extraction instructions

  • Register custom fields that can be used across multiple extraction operations

  • Leverage built-in field definitions for common data types (names, ages, emails, etc.)

  • Use template variables for dynamic values like {{current_year}} and {{current_date}}

  • Integrate with Pydantic models through the [field_from_registry()](#field_from_registry) function

Registry Management Functions

get_field_definition()

Retrieve a specific field definition from the registry with optional template variable substitution.

Example:

# Get the built-in "age" field definition
age_def = get_field_definition("age")
print(age_def)
# {
#     "type": int,
#     "description": "The age of the person in number of years.",
#     "instructions": "Calculate as 2024 - birth_year if needed.",
#     "default": 0,
#     "nullable": False
# }

register_field()

Register a new field definition or update an existing one in the global registry.

Example:

register_field("skills", {
    "type": list,
    "description": "List of professional skills and competencies",
    "instructions": "Extract as a list of strings, one skill per item",
    "default": [],
    "nullable": True
})

add_field_definition()

Alias for [register_field()](#register_field) - adds or updates a field definition in the registry.

add_field_definitions()

Register multiple field definitions at once from a dictionary.

Example:

new_fields = {
    "salary": {
        "type": float,
        "description": "Annual salary in USD",
        "instructions": "Extract numeric value, convert K/M suffixes",
        "default": 0.0,
        "nullable": True
    },
    "department": {
        "type": str,
        "description": "Department or division name",
        "instructions": "Extract official department name",
        "default": "",
        "nullable": True
    }
}
add_field_definitions(new_fields)

field_from_registry()

Create a Pydantic Field object from a registered field definition for use in Pydantic models.

Key Features:

  • Automatic conversion to Pydantic Field objects

  • Template variable substitution in defaults and descriptions

  • Type annotation integration

  • Custom field configuration support

Example:

from pydantic import BaseModel
from prompture import field_from_registry

class Employee(BaseModel):
    name: str = field_from_registry("name")
    age: int = field_from_registry("age")
    email: str = field_from_registry("email")
    department: str = field_from_registry("department")

Registry Inspection Functions

get_registry_snapshot()

Get a complete copy of the current field registry for inspection or backup.

Example:

registry = get_registry_snapshot()
print(f"Available fields: {list(registry.keys())}")

# Use in stepwise extraction for enhanced defaults
from prompture.core import stepwise_extract_with_model
result = stepwise_extract_with_model(
    model_cls=Person,
    text="...",
    model_name="openai/gpt-4",
    field_definitions=registry  # Explicit registry usage
)

get_field_names()

Get a list of all currently registered field names.

Example:

available_fields = get_field_names()
print("Available field types:", available_fields)

get_required_fields()

Get a list of field names that are marked as non-nullable (required fields).

Example:

required = get_required_fields()
print("Required fields:", required)

Registry Maintenance Functions

clear_registry()

Remove all custom field definitions from the registry, keeping only built-in fields.

Example:

# Clear custom fields while preserving built-ins
clear_registry()

# Only built-in fields like "name", "age", "email" remain
print(get_field_names())

reset_registry()

Completely reset the registry to its initial state with only built-in field definitions.

Example:

# Add some custom fields
register_field("custom_field", {...})

# Reset to built-in fields only
reset_registry()

# Custom field is now gone
assert "custom_field" not in get_field_names()

Built-in Field Definitions

Prompture includes extensive built-in field definitions organized by category:

Personal Information Fields

Field Name

Type

Description

name

str

Full legal name of the person

age

int

Age in years (0-150)

birth_year

int

Year born (YYYY format)

Contact Information Fields

Field Name

Type

Description

email

str

Primary email address with validation

phone

str

Phone number in standardized format

address

str

Full mailing address

Professional Fields

Field Name

Type

Description

occupation

str

Job title or profession

company

str

Company or organization name

experience_years

int

Years of professional experience

Temporal Fields

Field Name

Type

Description

date

str

Date in various formats

year

int

Year (1900-current)

last_updated

str

Timestamp with template support

Content and Classification Fields

Field Name

Type

Description

title

str

Title or heading text

description

str

Longer descriptive text

category

str

Classification or category

content

str

General content field

Template Variable System

Field definitions support template variables that are automatically resolved:

Available Template Variables

Variable

Description

Example Value

{{current_year}}

Current year

2024

{{current_date}}

Current date (ISO format)

2024-03-15

{{current_datetime}}

Current datetime (ISO format)

2024-03-15T14:30:00

{{current_timestamp}}

Unix timestamp

1710512400

Example with Template Variables:

register_field("processed_date", {
    "type": str,
    "description": "Date when this record was processed",
    "instructions": "Use the current date",
    "default": "{{current_date}}",  # Resolves to actual date
    "nullable": False
})

# When used, "{{current_date}}" becomes "2024-03-15"
field_def = get_field_definition("processed_date")
print(field_def["default"])  # "2024-03-15"

Field Definition Structure

Each field definition is a dictionary with the following structure:

{
    "type": str | int | float | list | dict | bool,  # Python type
    "description": "Human-readable field description",
    "instructions": "LLM extraction instructions",
    "default": "Default value or template variable",
    "nullable": True | False,  # Whether field can be None
    "validation": {...}  # Optional validation rules
}

Field Definition Properties:

  • type: The Python type for this field (str, int, float, list, dict, bool)

  • description: Human-readable description of what this field represents

  • instructions: Specific instructions for the LLM on how to extract this field

  • default: Default value to use when extraction fails or field is missing

  • nullable: Whether the field can accept None/null values

  • validation: Optional dictionary containing validation rules and constraints

Advanced Usage Patterns

Loading Field Definitions from Files

You can load field definitions from JSON or YAML files:

from prompture.tools import load_field_definitions

# Load from JSON file
custom_fields = load_field_definitions("my_fields.json")
add_field_definitions(custom_fields)

# Load from YAML file
yaml_fields = load_field_definitions("fields.yaml")
add_field_definitions(yaml_fields)

Thread Safety

The field registry is thread-safe and can be used safely in multi-threaded applications:

import threading

def worker_thread():
    # Safe to call from multiple threads
    register_field("thread_field", {...})
    field_def = get_field_definition("name")

threads = [threading.Thread(target=worker_thread) for _ in range(10)]
for t in threads:
    t.start()

Integration with Core Functions

The field definitions system integrates seamlessly with core extraction functions:

With extract_with_model():

from prompture import extract_with_model, field_from_registry

class Person(BaseModel):
    name: str = field_from_registry("name")
    email: str = field_from_registry("email")

result = extract_with_model(Person, text, "openai/gpt-4")

With stepwise_extract_with_model():

from prompture.core import stepwise_extract_with_model

# Automatically uses global registry for enhanced defaults
result = stepwise_extract_with_model(
    model_cls=Person,
    text="...",
    model_name="openai/gpt-4"
)

Best Practices

  1. Use built-in fields when possible before creating custom ones

  2. Provide clear instructions that help LLMs extract the field correctly

  3. Set appropriate defaults that make sense when extraction fails

  4. Use template variables for dynamic values like dates and timestamps

  5. Group related fields logically when registering multiple definitions

  6. Document custom fields with descriptive names and comprehensive instructions

  7. Test field definitions with representative text samples to ensure accuracy