Field Definitions Reference
Overview
The Prompture field definitions system provides a centralized registry of structured data extraction fields. Each field definition specifies the data type, description, extraction instructions, default values, and validation rules. This system enables consistent, reusable field configurations across your data extraction workflows.
Key Features:
Centralized Registry: All field definitions stored in a global registry with thread-safe access
Template Variables: Dynamic defaults using
{{current_year}}
,{{current_date}}
, etc.Pydantic Integration: Seamless integration with Pydantic models via
field_from_registry()
Custom Fields: Easy registration of domain-specific fields with
register_field()
Type Safety: Full type hints and validation support
Quick Start
Basic Usage with Built-in Fields
from pydantic import BaseModel
from prompture import field_from_registry, stepwise_extract_with_model
class Person(BaseModel):
name: str = field_from_registry("name")
age: int = field_from_registry("age")
email: str = field_from_registry("email")
# Use with extraction
result = stepwise_extract_with_model(
Person,
"John Smith is 25 years old, email: john@example.com",
model_name="openai/gpt-4"
)
Registering Custom Fields
from prompture import register_field, field_from_registry
# Register a custom field with template variables
register_field("document_date", {
"type": "str",
"description": "Document creation or processing date",
"instructions": "Use {{current_date}} if not specified in document",
"default": "{{current_date}}",
"nullable": False
})
# Use in Pydantic model
class Document(BaseModel):
title: str = field_from_registry("name") # Reuse built-in field
created_date: str = field_from_registry("document_date") # Custom field
Built-in Field Definitions
The following field definitions are available by default in the BASE_FIELD_DEFINITIONS
registry:
Person/Identity Fields
Field Name |
Type |
Description |
Instructions |
Default |
Nullable |
Notes |
---|---|---|---|---|---|---|
|
|
Full legal name of the person |
Extract as-is, no modifications |
|
|
Required field |
|
|
The age of the person in number of years |
Calculate as |
|
|
Uses |
|
|
The year the person was born (YYYY) |
Extract as a 4-digit year number |
|
|
Optional field |
Usage Example:
class Person(BaseModel):
name: str = field_from_registry("name")
age: int = field_from_registry("age")
birth_year: int = field_from_registry("birth_year")
Contact Information Fields
Field Name |
Type |
Description |
Instructions |
Default |
Nullable |
Notes |
---|---|---|---|---|---|---|
|
|
Primary email address |
Extract in lowercase, verify basic email format |
|
|
Optional field |
|
|
Primary phone number |
Extract digits only, standardize to E.164 format if possible |
|
|
Optional field |
|
|
Full mailing address |
Combine all address components into a single string |
|
|
Optional field |
Usage Example:
class ContactInfo(BaseModel):
email: str = field_from_registry("email")
phone: str = field_from_registry("phone")
address: str = field_from_registry("address")
Professional Information Fields
Field Name |
Type |
Description |
Instructions |
Default |
Nullable |
Notes |
---|---|---|---|---|---|---|
|
|
Current job title or profession |
Extract primary occupation, standardize common titles |
|
|
Optional field |
|
|
Current employer or company name |
Extract organization name, remove legal suffixes |
|
|
Optional field |
|
|
Years of professional experience |
Calculate total years of relevant experience |
|
|
Optional field |
Usage Example:
class Professional(BaseModel):
occupation: str = field_from_registry("occupation")
company: str = field_from_registry("company")
experience_years: int = field_from_registry("experience_years")
Metadata Fields
Field Name |
Type |
Description |
Instructions |
Default |
Nullable |
Notes |
---|---|---|---|---|---|---|
|
|
Source of the extracted information |
Record origin of data (e.g., ‘resume’, ‘linkedin’) |
|
|
Required field |
|
|
Last update timestamp (ISO format) |
Use ISO 8601 format (YYYY-MM-DDTHH:MM:SSZ), default to |
|
|
Uses |
|
|
Confidence score of extraction (0.0-1.0) |
Calculate based on extraction certainty |
|
|
Required field |
Usage Example:
class DataRecord(BaseModel):
source: str = field_from_registry("source")
last_updated: str = field_from_registry("last_updated")
confidence_score: float = field_from_registry("confidence_score")
Location Fields
Field Name |
Type |
Description |
Instructions |
Default |
Nullable |
Notes |
---|---|---|---|---|---|---|
|
|
City name |
Extract city name, standardize capitalization |
|
|
Optional field |
|
|
State or province name |
Extract state/province, use full name or abbreviation |
|
|
Optional field |
|
|
Postal or ZIP code |
Extract postal code, maintain original format |
|
|
Optional field |
|
|
Country name |
Extract country name, use full English name |
|
|
Optional field |
|
|
Geographic coordinates (lat, long) |
Extract as ‘latitude,longitude’ format if available |
|
|
Optional field |
Usage Example:
class LocationData(BaseModel):
city: str = field_from_registry("city")
country: str = field_from_registry("country")
postal_code: str = field_from_registry("postal_code")
Demographic Fields
Field Name |
Type |
Description |
Instructions |
Default |
Nullable |
Notes |
---|---|---|---|---|---|---|
|
|
Gender identification |
Extract gender if explicitly stated, otherwise leave empty |
|
|
Optional field |
|
|
Nationality or citizenship |
Extract nationality, use country demonym |
|
|
Optional field |
|
|
Marital status |
Extract marital status (single, married, divorced, etc.) |
|
|
Optional field |
|
|
Primary language spoken |
Extract primary or native language |
|
|
Optional field |
Usage Example:
class DemographicData(BaseModel):
nationality: str = field_from_registry("nationality")
language: str = field_from_registry("language")
Education Fields
Field Name |
Type |
Description |
Instructions |
Default |
Nullable |
Notes |
---|---|---|---|---|---|---|
|
|
Highest education level |
Extract highest degree (High School, Bachelor’s, Master’s, PhD, etc.) |
|
|
Optional field |
|
|
Year of graduation |
Extract graduation year as 4-digit number |
|
|
Optional field |
|
|
Grade point average |
Extract GPA, convert to 4.0 scale if needed |
|
|
Optional field |
Usage Example:
class EducationData(BaseModel):
education_level: str = field_from_registry("education_level")
graduation_year: int = field_from_registry("graduation_year")
gpa: float = field_from_registry("gpa")
Financial Fields
Field Name |
Type |
Description |
Instructions |
Default |
Nullable |
Notes |
---|---|---|---|---|---|---|
|
|
Annual salary amount |
Extract salary as numeric value, remove currency symbols |
|
|
Optional field |
|
|
Currency code |
Extract or infer currency code (USD, EUR, GBP, etc.) |
|
|
Optional field |
|
|
Bonus amount |
Extract bonus as numeric value |
|
|
Optional field |
Usage Example:
class FinancialData(BaseModel):
salary: float = field_from_registry("salary")
currency: str = field_from_registry("currency")
Template Variable System
Template variables provide dynamic default values that are resolved at runtime. They’re especially useful for timestamps, dates, and calculated values.
Available Template Variables
The following template variables are available for use in field definitions:
{{current_year}}
Current year as 4-digit integer (e.g., 2024)
{{current_date}}
Current date in ISO format (YYYY-MM-DD)
{{current_datetime}}
Current datetime in ISO 8601 format (YYYY-MM-DDTHH:MM:SSZ)
{{current_timestamp}}
Current Unix timestamp as integer
{{current_month}}
Current month as integer (1-12)
{{current_day}}
Current day of month as integer (1-31)
{{current_weekday}}
Current day name as string (e.g., “Monday”, “Tuesday”)
{{current_iso_week}}
Current ISO week number as integer (1-53)
Using Template Variables
Template variables can be used in any string field within a field definition:
register_field("processing_date", {
"type": "str",
"description": "Date when document was processed",
"instructions": "Use {{current_date}} if processing date not available",
"default": "{{current_date}}",
"nullable": False
})
register_field("academic_year", {
"type": "str",
"description": "Academic year for enrollment",
"instructions": "Use {{current_year}} for current enrollment",
"default": "{{current_year}}-{{current_year}}", # e.g. "2024-2024"
"nullable": True
})
Custom Template Variables
You can provide custom template variables when retrieving field definitions:
from prompture import get_field_definition
# Custom variables for specific use cases
custom_vars = {
"report_year": 2023,
"department": "Engineering"
}
# Register field with custom template
register_field("report_title", {
"type": "str",
"description": "Report title",
"instructions": "Use format: {{department}} Report {{report_year}}",
"default": "{{department}} Report {{report_year}}",
"nullable": False
})
# Retrieve with custom variables
field_def = get_field_definition("report_title",
apply_templates=True,
custom_template_vars=custom_vars)
Custom Field Registration
Creating Custom Fields
Register custom fields using register_field()
to extend the built-in definitions:
from prompture import register_field, field_from_registry
# Define field structure
register_field("product_price", {
"type": "str",
"description": "Product price with currency symbol",
"instructions": "Extract price including currency, handle ranges like $10-$15",
"default": "Price not available",
"nullable": True
})
register_field("skills", {
"type": "list",
"description": "List of professional skills",
"instructions": "Extract skills as comma-separated list, normalize tech names",
"default": [],
"nullable": True
})
Field Definition Structure
Each field definition must include these required properties:
type
(required)Python type or string representation (
str
,int
,float
,bool
,list
,dict
)description
(required)Human-readable description of the field purpose
instructions
(required)Specific extraction instructions for LLM processing
default
(required)Default value when field is not extracted (supports template variables)
nullable
(required)Boolean indicating if field accepts None/null values
Example:
field_definition = {
"type": "str",
"description": "Product category classification",
"instructions": "Classify into: Electronics, Clothing, Books, Home, Other",
"default": "Other",
"nullable": True
}
Validation
Field definitions are automatically validated when registered:
from prompture.tools import validate_field_definition
# Validate before registering
field_def = {
"type": "str",
"description": "Valid field",
"instructions": "Extract text value",
"default": "",
"nullable": True
}
if validate_field_definition(field_def):
register_field("my_field", field_def)
else:
print("Invalid field definition")
Integration Examples
Complete Extraction Workflow
Here’s a complete example showing field definitions in a real extraction scenario:
from pydantic import BaseModel
from prompture import (
field_from_registry,
register_field,
stepwise_extract_with_model
)
# Register custom business fields
register_field("industry", {
"type": "str",
"description": "Business industry classification",
"instructions": "Classify into standard industry categories",
"default": "Unknown",
"nullable": True
})
register_field("founded_year", {
"type": "int",
"description": "Year company was founded",
"instructions": "Extract founding year, use {{current_year}} if recent",
"default": None,
"nullable": True
})
# Create comprehensive model
class BusinessProfile(BaseModel):
# Built-in fields
name: str = field_from_registry("name")
email: str = field_from_registry("email")
phone: str = field_from_registry("phone")
address: str = field_from_registry("address")
# Professional fields
company: str = field_from_registry("company")
# Custom fields
industry: str = field_from_registry("industry")
founded_year: int = field_from_registry("founded_year")
# Metadata
source: str = field_from_registry("source")
last_updated: str = field_from_registry("last_updated")
confidence_score: float = field_from_registry("confidence_score")
# Sample business text
business_text = """
TechStart Solutions is a cloud computing company founded in 2019.
Contact: Sarah Johnson, CEO
Email: sarah@techstart.com
Phone: (555) 123-4567
Address: 123 Innovation Drive, San Francisco, CA 94105
Industry: Software as a Service (SaaS)
"""
# Extract structured data
result = stepwise_extract_with_model(
BusinessProfile,
business_text,
model_name="openai/gpt-4"
)
print(result.model_dump())
Multi-Domain Field Sets
Organize fields by domain for better maintainability:
# E-commerce fields
ecommerce_fields = {
"product_name": {
"type": "str",
"description": "Product name or title",
"instructions": "Extract main product name, exclude brand",
"default": "Unknown Product",
"nullable": False
},
"sku": {
"type": "str",
"description": "Product SKU or model number",
"instructions": "Extract alphanumeric SKU code",
"default": "",
"nullable": True
},
"category": {
"type": "str",
"description": "Product category",
"instructions": "Classify into Electronics, Clothing, Books, etc.",
"default": "Other",
"nullable": True
}
}
# Medical fields
medical_fields = {
"patient_id": {
"type": "str",
"description": "Patient identification number",
"instructions": "Extract patient ID, mask if sensitive",
"default": "",
"nullable": True
},
"diagnosis": {
"type": "str",
"description": "Primary diagnosis or condition",
"instructions": "Extract main diagnosis, use medical terminology",
"default": "",
"nullable": True
},
"treatment_date": {
"type": "str",
"description": "Date of treatment or visit",
"instructions": "Extract date, use {{current_date}} if not specified",
"default": "{{current_date}}",
"nullable": False
}
}
# Register field sets
from prompture import add_field_definitions
add_field_definitions(ecommerce_fields)
add_field_definitions(medical_fields)
External Configuration Files
Load field definitions from external YAML or JSON files:
field_definitions.yaml:
document_fields:
title:
type: str
description: "Document title or heading"
instructions: "Extract main document title"
default: "Untitled Document"
nullable: false
author:
type: str
description: "Document author or creator"
instructions: "Extract author name, handle multiple authors"
default: "Unknown Author"
nullable: true
created_date:
type: str
description: "Document creation date"
instructions: "Use {{current_date}} if date not found"
default: "{{current_date}}"
nullable: false
Python integration:
from prompture.tools import load_field_definitions
from prompture import add_field_definitions
# Load from external file
external_fields = load_field_definitions("field_definitions.yaml")
# Register all fields from the file
add_field_definitions(external_fields)
# Use in models
class Document(BaseModel):
title: str = field_from_registry("title")
author: str = field_from_registry("author")
created_date: str = field_from_registry("created_date")
Registry Management
The field definitions registry provides several utility functions for managing field definitions:
Inspecting the Registry
from prompture import (
get_field_names,
get_required_fields,
get_field_definition,
get_registry_snapshot
)
# List all available fields
all_fields = get_field_names()
print(f"Available fields: {all_fields}")
# Get required fields only
required_fields = get_required_fields()
print(f"Required fields: {required_fields}")
# Inspect specific field
name_field = get_field_definition("name")
print(f"Name field: {name_field}")
# Get full registry snapshot
registry = get_registry_snapshot()
print(f"Registry contains {len(registry)} fields")
Registry Maintenance
from prompture import reset_registry, clear_registry
# Reset to base definitions only
reset_registry() # Keeps built-in fields, removes custom ones
# Clear everything (use with caution)
clear_registry() # Removes ALL fields including built-ins
Best Practices
Field Naming Conventions
Use descriptive, lowercase names with underscores:
first_name
,created_date
Group related fields with prefixes:
contact_email
,contact_phone
Avoid abbreviations: use
experience_years
notexp_yrs
Be consistent across your domain
Type Selection Guidelines
Use
str
for text, IDs, formatted data (dates, phone numbers)Use
int
for counts, years, numeric IDsUse
float
for scores, percentages, monetary valuesUse
list
for multiple values of same typeUse
dict
for nested structured data
Template Variable Usage
Use
{{current_date}}
for document dates and timestampsUse
{{current_year}}
for age calculations and academic yearsUse
{{current_datetime}}
for precise processing timestampsProvide fallback values when templates might not resolve
Validation and Testing
from prompture.tools import validate_field_definition
# Always validate custom fields
def create_safe_field(name, definition):
if validate_field_definition(definition):
register_field(name, definition)
return True
else:
print(f"Invalid field definition for '{name}'")
return False
# Test field definitions with sample data
def test_field_extraction(field_name, sample_text):
class TestModel(BaseModel):
test_field: str = field_from_registry(field_name)
# Test extraction (requires API key)
# result = stepwise_extract_with_model(TestModel, sample_text)
# return result.test_field
Performance Considerations
Register fields once at application startup
Use
get_registry_snapshot()
for bulk operationsCache field definitions for frequently used fields
Validate definitions before registration to avoid runtime errors
API Reference
Core Functions
- field_from_registry(field_name: str, apply_templates: bool = True, custom_template_vars: Dict[str, Any] | None = None) Field
Create a Pydantic Field from a registered field definition.
- Parameters:
field_name – Name of field in the registry
apply_templates – Whether to apply template variable substitution
custom_template_vars – Custom template variables for substitution
- Returns:
Configured Pydantic Field object
- Raises:
KeyError – If field_name not found in registry
- register_field(field_name: str, field_definition: FieldDefinition) None
Register a single field definition in the global registry.
- Parameters:
field_name – Name of the field
field_definition – Dictionary containing field configuration
- Raises:
ValueError – If field definition is invalid
- get_field_definition(field_name: str, apply_templates: bool = True, custom_template_vars: Dict[str, Any] | None = None) FieldDefinition | None
Retrieve a field definition from the registry.
- Parameters:
field_name – Name of field to retrieve
apply_templates – Whether to apply template substitution
custom_template_vars – Custom variables for templates
- Returns:
Field definition dictionary or None if not found
Registry Management Functions
- get_field_names() List[str]
Get list of all registered field names.
- Returns:
List of field names in the registry
- get_required_fields() List[str]
Get list of required (non-nullable) field names.
- Returns:
List of required field names
- add_field_definitions(field_definitions: Dict[str, FieldDefinition]) None
Register multiple field definitions at once.
- Parameters:
field_definitions – Dictionary mapping field names to definitions
See Also
Field Definitions Module - API documentation for field definitions module
Examples - Usage examples and tutorials
Quick Start Guide - Getting started guide
Core Module - Core extraction functions
Social Media Fields
Field Name
Type
Description
Instructions
Default
Nullable
Notes
sentiment
str
Sentiment classification
Classify as positive, negative, or neutral
"neutral"
True
Optional field
hashtags
str
Hashtags from content
Extract all hashtags as comma-separated list
""
True
Optional field
mentions
str
User mentions from content
Extract all @mentions as comma-separated list
""
True
Optional field
topic
str
Main topic or subject
Identify primary topic or theme of content
""
True
Optional field
Usage Example: