Field Definitions Reference

Overview

The Prompture field definitions system provides a centralized registry of structured data extraction fields. Each field definition specifies the data type, description, extraction instructions, default values, and validation rules. This system enables consistent, reusable field configurations across your data extraction workflows.

Key Features:

  • Centralized Registry: All field definitions stored in a global registry with thread-safe access

  • Template Variables: Dynamic defaults using {{current_year}}, {{current_date}}, etc.

  • Pydantic Integration: Seamless integration with Pydantic models via field_from_registry()

  • Custom Fields: Easy registration of domain-specific fields with register_field()

  • Type Safety: Full type hints and validation support

Quick Start

Basic Usage with Built-in Fields

from pydantic import BaseModel
from prompture import field_from_registry, stepwise_extract_with_model

class Person(BaseModel):
    name: str = field_from_registry("name")
    age: int = field_from_registry("age")
    email: str = field_from_registry("email")

# Use with extraction
result = stepwise_extract_with_model(
    Person,
    "John Smith is 25 years old, email: john@example.com",
    model_name="openai/gpt-4"
)

Registering Custom Fields

from prompture import register_field, field_from_registry

# Register a custom field with template variables
register_field("document_date", {
    "type": "str",
    "description": "Document creation or processing date",
    "instructions": "Use {{current_date}} if not specified in document",
    "default": "{{current_date}}",
    "nullable": False
})

# Use in Pydantic model
class Document(BaseModel):
    title: str = field_from_registry("name")  # Reuse built-in field
    created_date: str = field_from_registry("document_date")  # Custom field

Built-in Field Definitions

The following field definitions are available by default in the BASE_FIELD_DEFINITIONS registry:

Person/Identity Fields

Field Name

Type

Description

Instructions

Default

Nullable

Notes

name

str

Full legal name of the person

Extract as-is, no modifications

""

False

Required field

age

int

The age of the person in number of years

Calculate as {{current_year}} - birth_year if needed

0

False

Uses {{current_year}}

birth_year

int

The year the person was born (YYYY)

Extract as a 4-digit year number

None

True

Optional field

Usage Example:

class Person(BaseModel):
    name: str = field_from_registry("name")
    age: int = field_from_registry("age")
    birth_year: int = field_from_registry("birth_year")

Contact Information Fields

Field Name

Type

Description

Instructions

Default

Nullable

Notes

email

str

Primary email address

Extract in lowercase, verify basic email format

""

True

Optional field

phone

str

Primary phone number

Extract digits only, standardize to E.164 format if possible

""

True

Optional field

address

str

Full mailing address

Combine all address components into a single string

""

True

Optional field

Usage Example:

class ContactInfo(BaseModel):
    email: str = field_from_registry("email")
    phone: str = field_from_registry("phone")
    address: str = field_from_registry("address")

Professional Information Fields

Field Name

Type

Description

Instructions

Default

Nullable

Notes

occupation

str

Current job title or profession

Extract primary occupation, standardize common titles

""

True

Optional field

company

str

Current employer or company name

Extract organization name, remove legal suffixes

""

True

Optional field

experience_years

int

Years of professional experience

Calculate total years of relevant experience

0

True

Optional field

Usage Example:

class Professional(BaseModel):
    occupation: str = field_from_registry("occupation")
    company: str = field_from_registry("company")
    experience_years: int = field_from_registry("experience_years")

Metadata Fields

Field Name

Type

Description

Instructions

Default

Nullable

Notes

source

str

Source of the extracted information

Record origin of data (e.g., ‘resume’, ‘linkedin’)

"unknown"

False

Required field

last_updated

str

Last update timestamp (ISO format)

Use ISO 8601 format (YYYY-MM-DDTHH:MM:SSZ), default to {{current_datetime}}

"{{current_datetime}}"

False

Uses {{current_datetime}}

confidence_score

float

Confidence score of extraction (0.0-1.0)

Calculate based on extraction certainty

0.0

False

Required field

Usage Example:

class DataRecord(BaseModel):
    source: str = field_from_registry("source")
    last_updated: str = field_from_registry("last_updated")
    confidence_score: float = field_from_registry("confidence_score")

Location Fields

Field Name

Type

Description

Instructions

Default

Nullable

Notes

city

str

City name

Extract city name, standardize capitalization

""

True

Optional field

state

str

State or province name

Extract state/province, use full name or abbreviation

""

True

Optional field

postal_code

str

Postal or ZIP code

Extract postal code, maintain original format

""

True

Optional field

country

str

Country name

Extract country name, use full English name

""

True

Optional field

coordinates

str

Geographic coordinates (lat, long)

Extract as ‘latitude,longitude’ format if available

""

True

Optional field

Usage Example:

class LocationData(BaseModel):
    city: str = field_from_registry("city")
    country: str = field_from_registry("country")
    postal_code: str = field_from_registry("postal_code")

Demographic Fields

Field Name

Type

Description

Instructions

Default

Nullable

Notes

gender

str

Gender identification

Extract gender if explicitly stated, otherwise leave empty

""

True

Optional field

nationality

str

Nationality or citizenship

Extract nationality, use country demonym

""

True

Optional field

marital_status

str

Marital status

Extract marital status (single, married, divorced, etc.)

""

True

Optional field

language

str

Primary language spoken

Extract primary or native language

""

True

Optional field

Usage Example:

class DemographicData(BaseModel):
    nationality: str = field_from_registry("nationality")
    language: str = field_from_registry("language")

Education Fields

Field Name

Type

Description

Instructions

Default

Nullable

Notes

education_level

str

Highest education level

Extract highest degree (High School, Bachelor’s, Master’s, PhD, etc.)

""

True

Optional field

graduation_year

int

Year of graduation

Extract graduation year as 4-digit number

None

True

Optional field

gpa

float

Grade point average

Extract GPA, convert to 4.0 scale if needed

None

True

Optional field

Usage Example:

class EducationData(BaseModel):
    education_level: str = field_from_registry("education_level")
    graduation_year: int = field_from_registry("graduation_year")
    gpa: float = field_from_registry("gpa")

Financial Fields

Field Name

Type

Description

Instructions

Default

Nullable

Notes

salary

float

Annual salary amount

Extract salary as numeric value, remove currency symbols

None

True

Optional field

currency

str

Currency code

Extract or infer currency code (USD, EUR, GBP, etc.)

"USD"

True

Optional field

bonus

float

Bonus amount

Extract bonus as numeric value

None

True

Optional field

Usage Example:

class FinancialData(BaseModel):
    salary: float = field_from_registry("salary")
    currency: str = field_from_registry("currency")

Social Media Fields

Field Name

Type

Description

Instructions

Default

Nullable

Notes

sentiment

str

Sentiment classification

Classify as positive, negative, or neutral

"neutral"

True

Optional field

hashtags

str

Hashtags from content

Extract all hashtags as comma-separated list

""

True

Optional field

mentions

str

User mentions from content

Extract all @mentions as comma-separated list

""

True

Optional field

topic

str

Main topic or subject

Identify primary topic or theme of content

""

True

Optional field

Usage Example:

class SocialMediaData(BaseModel):
    sentiment: str = field_from_registry("sentiment")
    hashtags: str = field_from_registry("hashtags")
    topic: str = field_from_registry("topic")

Template Variable System

Template variables provide dynamic default values that are resolved at runtime. They’re especially useful for timestamps, dates, and calculated values.

Available Template Variables

The following template variables are available for use in field definitions:

{{current_year}}

Current year as 4-digit integer (e.g., 2024)

{{current_date}}

Current date in ISO format (YYYY-MM-DD)

{{current_datetime}}

Current datetime in ISO 8601 format (YYYY-MM-DDTHH:MM:SSZ)

{{current_timestamp}}

Current Unix timestamp as integer

{{current_month}}

Current month as integer (1-12)

{{current_day}}

Current day of month as integer (1-31)

{{current_weekday}}

Current day name as string (e.g., “Monday”, “Tuesday”)

{{current_iso_week}}

Current ISO week number as integer (1-53)

Using Template Variables

Template variables can be used in any string field within a field definition:

register_field("processing_date", {
    "type": "str",
    "description": "Date when document was processed",
    "instructions": "Use {{current_date}} if processing date not available",
    "default": "{{current_date}}",
    "nullable": False
})

register_field("academic_year", {
    "type": "str",
    "description": "Academic year for enrollment",
    "instructions": "Use {{current_year}} for current enrollment",
    "default": "{{current_year}}-{{current_year}}",  # e.g. "2024-2024"
    "nullable": True
})

Custom Template Variables

You can provide custom template variables when retrieving field definitions:

from prompture import get_field_definition

# Custom variables for specific use cases
custom_vars = {
    "report_year": 2023,
    "department": "Engineering"
}

# Register field with custom template
register_field("report_title", {
    "type": "str",
    "description": "Report title",
    "instructions": "Use format: {{department}} Report {{report_year}}",
    "default": "{{department}} Report {{report_year}}",
    "nullable": False
})

# Retrieve with custom variables
field_def = get_field_definition("report_title",
                               apply_templates=True,
                               custom_template_vars=custom_vars)

Custom Field Registration

Creating Custom Fields

Register custom fields using register_field() to extend the built-in definitions:

from prompture import register_field, field_from_registry

# Define field structure
register_field("product_price", {
    "type": "str",
    "description": "Product price with currency symbol",
    "instructions": "Extract price including currency, handle ranges like $10-$15",
    "default": "Price not available",
    "nullable": True
})

register_field("skills", {
    "type": "list",
    "description": "List of professional skills",
    "instructions": "Extract skills as comma-separated list, normalize tech names",
    "default": [],
    "nullable": True
})

Field Definition Structure

Each field definition must include these required properties:

type (required)

Python type or string representation (str, int, float, bool, list, dict)

description (required)

Human-readable description of the field purpose

instructions (required)

Specific extraction instructions for LLM processing

default (required)

Default value when field is not extracted (supports template variables)

nullable (required)

Boolean indicating if field accepts None/null values

Example:

field_definition = {
    "type": "str",
    "description": "Product category classification",
    "instructions": "Classify into: Electronics, Clothing, Books, Home, Other",
    "default": "Other",
    "nullable": True
}

Validation

Field definitions are automatically validated when registered:

from prompture.tools import validate_field_definition

# Validate before registering
field_def = {
    "type": "str",
    "description": "Valid field",
    "instructions": "Extract text value",
    "default": "",
    "nullable": True
}

if validate_field_definition(field_def):
    register_field("my_field", field_def)
else:
    print("Invalid field definition")

Integration Examples

Complete Extraction Workflow

Here’s a complete example showing field definitions in a real extraction scenario:

from pydantic import BaseModel
from prompture import (
    field_from_registry,
    register_field,
    stepwise_extract_with_model
)

# Register custom business fields
register_field("industry", {
    "type": "str",
    "description": "Business industry classification",
    "instructions": "Classify into standard industry categories",
    "default": "Unknown",
    "nullable": True
})

register_field("founded_year", {
    "type": "int",
    "description": "Year company was founded",
    "instructions": "Extract founding year, use {{current_year}} if recent",
    "default": None,
    "nullable": True
})

# Create comprehensive model
class BusinessProfile(BaseModel):
    # Built-in fields
    name: str = field_from_registry("name")
    email: str = field_from_registry("email")
    phone: str = field_from_registry("phone")
    address: str = field_from_registry("address")

    # Professional fields
    company: str = field_from_registry("company")

    # Custom fields
    industry: str = field_from_registry("industry")
    founded_year: int = field_from_registry("founded_year")

    # Metadata
    source: str = field_from_registry("source")
    last_updated: str = field_from_registry("last_updated")
    confidence_score: float = field_from_registry("confidence_score")

# Sample business text
business_text = """
TechStart Solutions is a cloud computing company founded in 2019.
Contact: Sarah Johnson, CEO
Email: sarah@techstart.com
Phone: (555) 123-4567
Address: 123 Innovation Drive, San Francisco, CA 94105
Industry: Software as a Service (SaaS)
"""

# Extract structured data
result = stepwise_extract_with_model(
    BusinessProfile,
    business_text,
    model_name="openai/gpt-4"
)

print(result.model_dump())

Multi-Domain Field Sets

Organize fields by domain for better maintainability:

# E-commerce fields
ecommerce_fields = {
    "product_name": {
        "type": "str",
        "description": "Product name or title",
        "instructions": "Extract main product name, exclude brand",
        "default": "Unknown Product",
        "nullable": False
    },
    "sku": {
        "type": "str",
        "description": "Product SKU or model number",
        "instructions": "Extract alphanumeric SKU code",
        "default": "",
        "nullable": True
    },
    "category": {
        "type": "str",
        "description": "Product category",
        "instructions": "Classify into Electronics, Clothing, Books, etc.",
        "default": "Other",
        "nullable": True
    }
}

# Medical fields
medical_fields = {
    "patient_id": {
        "type": "str",
        "description": "Patient identification number",
        "instructions": "Extract patient ID, mask if sensitive",
        "default": "",
        "nullable": True
    },
    "diagnosis": {
        "type": "str",
        "description": "Primary diagnosis or condition",
        "instructions": "Extract main diagnosis, use medical terminology",
        "default": "",
        "nullable": True
    },
    "treatment_date": {
        "type": "str",
        "description": "Date of treatment or visit",
        "instructions": "Extract date, use {{current_date}} if not specified",
        "default": "{{current_date}}",
        "nullable": False
    }
}

# Register field sets
from prompture import add_field_definitions

add_field_definitions(ecommerce_fields)
add_field_definitions(medical_fields)

External Configuration Files

Load field definitions from external YAML or JSON files:

field_definitions.yaml:

document_fields:
  title:
    type: str
    description: "Document title or heading"
    instructions: "Extract main document title"
    default: "Untitled Document"
    nullable: false

  author:
    type: str
    description: "Document author or creator"
    instructions: "Extract author name, handle multiple authors"
    default: "Unknown Author"
    nullable: true

  created_date:
    type: str
    description: "Document creation date"
    instructions: "Use {{current_date}} if date not found"
    default: "{{current_date}}"
    nullable: false

Python integration:

from prompture.tools import load_field_definitions
from prompture import add_field_definitions

# Load from external file
external_fields = load_field_definitions("field_definitions.yaml")

# Register all fields from the file
add_field_definitions(external_fields)

# Use in models
class Document(BaseModel):
    title: str = field_from_registry("title")
    author: str = field_from_registry("author")
    created_date: str = field_from_registry("created_date")

Registry Management

The field definitions registry provides several utility functions for managing field definitions:

Inspecting the Registry

from prompture import (
    get_field_names,
    get_required_fields,
    get_field_definition,
    get_registry_snapshot
)

# List all available fields
all_fields = get_field_names()
print(f"Available fields: {all_fields}")

# Get required fields only
required_fields = get_required_fields()
print(f"Required fields: {required_fields}")

# Inspect specific field
name_field = get_field_definition("name")
print(f"Name field: {name_field}")

# Get full registry snapshot
registry = get_registry_snapshot()
print(f"Registry contains {len(registry)} fields")

Registry Maintenance

from prompture import reset_registry, clear_registry

# Reset to base definitions only
reset_registry()  # Keeps built-in fields, removes custom ones

# Clear everything (use with caution)
clear_registry()  # Removes ALL fields including built-ins

Best Practices

Field Naming Conventions

  • Use descriptive, lowercase names with underscores: first_name, created_date

  • Group related fields with prefixes: contact_email, contact_phone

  • Avoid abbreviations: use experience_years not exp_yrs

  • Be consistent across your domain

Type Selection Guidelines

  • Use str for text, IDs, formatted data (dates, phone numbers)

  • Use int for counts, years, numeric IDs

  • Use float for scores, percentages, monetary values

  • Use list for multiple values of same type

  • Use dict for nested structured data

Template Variable Usage

  • Use {{current_date}} for document dates and timestamps

  • Use {{current_year}} for age calculations and academic years

  • Use {{current_datetime}} for precise processing timestamps

  • Provide fallback values when templates might not resolve

Validation and Testing

from prompture.tools import validate_field_definition

# Always validate custom fields
def create_safe_field(name, definition):
    if validate_field_definition(definition):
        register_field(name, definition)
        return True
    else:
        print(f"Invalid field definition for '{name}'")
        return False

# Test field definitions with sample data
def test_field_extraction(field_name, sample_text):
    class TestModel(BaseModel):
        test_field: str = field_from_registry(field_name)

    # Test extraction (requires API key)
    # result = stepwise_extract_with_model(TestModel, sample_text)
    # return result.test_field

Performance Considerations

  • Register fields once at application startup

  • Use get_registry_snapshot() for bulk operations

  • Cache field definitions for frequently used fields

  • Validate definitions before registration to avoid runtime errors

API Reference

Core Functions

field_from_registry(field_name: str, apply_templates: bool = True, custom_template_vars: Dict[str, Any] | None = None) Field

Create a Pydantic Field from a registered field definition.

Parameters:
  • field_name – Name of field in the registry

  • apply_templates – Whether to apply template variable substitution

  • custom_template_vars – Custom template variables for substitution

Returns:

Configured Pydantic Field object

Raises:

KeyError – If field_name not found in registry

register_field(field_name: str, field_definition: FieldDefinition) None

Register a single field definition in the global registry.

Parameters:
  • field_name – Name of the field

  • field_definition – Dictionary containing field configuration

Raises:

ValueError – If field definition is invalid

get_field_definition(field_name: str, apply_templates: bool = True, custom_template_vars: Dict[str, Any] | None = None) FieldDefinition | None

Retrieve a field definition from the registry.

Parameters:
  • field_name – Name of field to retrieve

  • apply_templates – Whether to apply template substitution

  • custom_template_vars – Custom variables for templates

Returns:

Field definition dictionary or None if not found

Registry Management Functions

get_field_names() List[str]

Get list of all registered field names.

Returns:

List of field names in the registry

get_required_fields() List[str]

Get list of required (non-nullable) field names.

Returns:

List of required field names

add_field_definitions(field_definitions: Dict[str, FieldDefinition]) None

Register multiple field definitions at once.

Parameters:

field_definitions – Dictionary mapping field names to definitions

get_registry_snapshot() Dict[str, FieldDefinition]

Get a copy of the current registry state.

Returns:

Dictionary of all registered field definitions

reset_registry() None

Reset registry to contain only base field definitions.

clear_registry() None

Remove all field definitions from registry.

Warning

This removes all fields including built-in definitions.

See Also