Field Definitions Module
The field definitions module provides a centralized registry system for defining reusable field specifications that enhance structured data extraction with type hints, descriptions, validation rules, and LLM-specific extraction instructions.
Overview
The field definitions system allows you to:
Define reusable field specifications with type information, descriptions, and extraction instructions
Register custom fields that can be used across multiple extraction operations
Leverage built-in field definitions for common data types (names, ages, emails, etc.)
Use template variables for dynamic values like
{{current_year}}
and{{current_date}}
Integrate with Pydantic models through the [field_from_registry()](#field_from_registry) function
Registry Management Functions
get_field_definition()
Retrieve a specific field definition from the registry with optional template variable substitution.
Example:
# Get the built-in "age" field definition
age_def = get_field_definition("age")
print(age_def)
# {
# "type": int,
# "description": "The age of the person in number of years.",
# "instructions": "Calculate as 2024 - birth_year if needed.",
# "default": 0,
# "nullable": False
# }
register_field()
Register a new field definition or update an existing one in the global registry.
Example:
register_field("skills", {
"type": list,
"description": "List of professional skills and competencies",
"instructions": "Extract as a list of strings, one skill per item",
"default": [],
"nullable": True
})
add_field_definition()
Alias for [register_field()](#register_field) - adds or updates a field definition in the registry.
add_field_definitions()
Register multiple field definitions at once from a dictionary.
Example:
new_fields = {
"salary": {
"type": float,
"description": "Annual salary in USD",
"instructions": "Extract numeric value, convert K/M suffixes",
"default": 0.0,
"nullable": True
},
"department": {
"type": str,
"description": "Department or division name",
"instructions": "Extract official department name",
"default": "",
"nullable": True
}
}
add_field_definitions(new_fields)
field_from_registry()
Create a Pydantic Field object from a registered field definition for use in Pydantic models.
Key Features:
Automatic conversion to Pydantic Field objects
Template variable substitution in defaults and descriptions
Type annotation integration
Custom field configuration support
Example:
from pydantic import BaseModel
from prompture import field_from_registry
class Employee(BaseModel):
name: str = field_from_registry("name")
age: int = field_from_registry("age")
email: str = field_from_registry("email")
department: str = field_from_registry("department")
Registry Inspection Functions
get_registry_snapshot()
Get a complete copy of the current field registry for inspection or backup.
Example:
registry = get_registry_snapshot()
print(f"Available fields: {list(registry.keys())}")
# Use in stepwise extraction for enhanced defaults
from prompture.core import stepwise_extract_with_model
result = stepwise_extract_with_model(
model_cls=Person,
text="...",
model_name="openai/gpt-4",
field_definitions=registry # Explicit registry usage
)
get_field_names()
Get a list of all currently registered field names.
Example:
available_fields = get_field_names()
print("Available field types:", available_fields)
get_required_fields()
Get a list of field names that are marked as non-nullable (required fields).
Example:
required = get_required_fields()
print("Required fields:", required)
Registry Maintenance Functions
clear_registry()
Remove all custom field definitions from the registry, keeping only built-in fields.
Example:
# Clear custom fields while preserving built-ins
clear_registry()
# Only built-in fields like "name", "age", "email" remain
print(get_field_names())
reset_registry()
Completely reset the registry to its initial state with only built-in field definitions.
Example:
# Add some custom fields
register_field("custom_field", {...})
# Reset to built-in fields only
reset_registry()
# Custom field is now gone
assert "custom_field" not in get_field_names()
Built-in Field Definitions
Prompture includes extensive built-in field definitions organized by category:
Personal Information Fields
Field Name |
Type |
Description |
---|---|---|
|
str |
Full legal name of the person |
|
int |
Age in years (0-150) |
|
int |
Year born (YYYY format) |
Contact Information Fields
Field Name |
Type |
Description |
---|---|---|
|
str |
Primary email address with validation |
|
str |
Phone number in standardized format |
|
str |
Full mailing address |
Professional Fields
Field Name |
Type |
Description |
---|---|---|
|
str |
Job title or profession |
|
str |
Company or organization name |
|
int |
Years of professional experience |
Temporal Fields
Field Name |
Type |
Description |
---|---|---|
|
str |
Date in various formats |
|
int |
Year (1900-current) |
|
str |
Timestamp with template support |
Content and Classification Fields
Field Name |
Type |
Description |
---|---|---|
|
str |
Title or heading text |
|
str |
Longer descriptive text |
|
str |
Classification or category |
|
str |
General content field |
Template Variable System
Field definitions support template variables that are automatically resolved:
Available Template Variables
Variable |
Description |
Example Value |
---|---|---|
|
Current year |
|
|
Current date (ISO format) |
|
|
Current datetime (ISO format) |
|
|
Unix timestamp |
|
Example with Template Variables:
register_field("processed_date", {
"type": str,
"description": "Date when this record was processed",
"instructions": "Use the current date",
"default": "{{current_date}}", # Resolves to actual date
"nullable": False
})
# When used, "{{current_date}}" becomes "2024-03-15"
field_def = get_field_definition("processed_date")
print(field_def["default"]) # "2024-03-15"
Field Definition Structure
Each field definition is a dictionary with the following structure:
{
"type": str | int | float | list | dict | bool, # Python type
"description": "Human-readable field description",
"instructions": "LLM extraction instructions",
"default": "Default value or template variable",
"nullable": True | False, # Whether field can be None
"validation": {...} # Optional validation rules
}
Field Definition Properties:
type: The Python type for this field (str, int, float, list, dict, bool)
description: Human-readable description of what this field represents
instructions: Specific instructions for the LLM on how to extract this field
default: Default value to use when extraction fails or field is missing
nullable: Whether the field can accept None/null values
validation: Optional dictionary containing validation rules and constraints
Advanced Usage Patterns
Loading Field Definitions from Files
You can load field definitions from JSON or YAML files:
from prompture.tools import load_field_definitions
# Load from JSON file
custom_fields = load_field_definitions("my_fields.json")
add_field_definitions(custom_fields)
# Load from YAML file
yaml_fields = load_field_definitions("fields.yaml")
add_field_definitions(yaml_fields)
Thread Safety
The field registry is thread-safe and can be used safely in multi-threaded applications:
import threading
def worker_thread():
# Safe to call from multiple threads
register_field("thread_field", {...})
field_def = get_field_definition("name")
threads = [threading.Thread(target=worker_thread) for _ in range(10)]
for t in threads:
t.start()
Integration with Core Functions
The field definitions system integrates seamlessly with core extraction functions:
With extract_with_model():
from prompture import extract_with_model, field_from_registry
class Person(BaseModel):
name: str = field_from_registry("name")
email: str = field_from_registry("email")
result = extract_with_model(Person, text, "openai/gpt-4")
With stepwise_extract_with_model():
from prompture.core import stepwise_extract_with_model
# Automatically uses global registry for enhanced defaults
result = stepwise_extract_with_model(
model_cls=Person,
text="...",
model_name="openai/gpt-4"
)
Best Practices
Use built-in fields when possible before creating custom ones
Provide clear instructions that help LLMs extract the field correctly
Set appropriate defaults that make sense when extraction fails
Use template variables for dynamic values like dates and timestamps
Group related fields logically when registering multiple definitions
Document custom fields with descriptive names and comprehensive instructions
Test field definitions with representative text samples to ensure accuracy