Tools Module
The tools module provides utility functions for data conversion, validation, field schema generation, and text processing used throughout Prompture’s extraction pipeline.
Overview
The tools module contains:
Data Conversion: Robust type conversion with fallback handling
Schema Generation: JSON schema creation from Python types and field definitions
Parsing Utilities: Specialized parsers for dates, numbers, and boolean values
Validation: Field definition validation and type checking
Text Processing: JSON cleanup and text manipulation utilities
Note
The custom LogLevel enum and log_debug() function have been removed.
Prompture now uses Python’s standard logging module. Use
prompture.configure_logging() to enable library-level logging output.
Data Parsing and Conversion
parse_boolean()
Robustly parse various boolean representations into Python boolean values.
Supported Formats:
Strings: “true”, “false”, “yes”, “no”, “1”, “0”, “on”, “off”
Numbers: 1 (True), 0 (False), any non-zero number (True)
Booleans: Direct pass-through
Case-insensitive: “TRUE”, “True”, “tRuE” all work
Example:
from prompture.tools import parse_boolean
assert parse_boolean("yes") == True
assert parse_boolean("FALSE") == False
assert parse_boolean(1) == True
assert parse_boolean("0") == False
as_list()
Convert various input types to lists with intelligent parsing.
Features:
String splitting: Automatic delimiter detection or custom separators
Single values: Wrap non-list values in lists
List pass-through: Return lists unchanged
Empty handling: Proper handling of None and empty strings
Example:
from prompture.tools import as_list
# Automatic delimiter detection
assert as_list("apple,banana,cherry") == ["apple", "banana", "cherry"]
assert as_list("red; blue; green") == ["red", "blue", "green"]
# Custom separator
assert as_list("a|b|c", sep="|") == ["a", "b", "c"]
# Single value wrapping
assert as_list("single") == ["single"]
assert as_list(42) == [42]
parse_datetime()
Parse datetime strings in various formats into Python datetime objects.
Supported Formats:
ISO 8601: “2024-03-15T14:30:00Z”
Date only: “2024-03-15”, “03/15/2024”
Relative: “today”, “yesterday”, “tomorrow”
Timestamps: Unix timestamps (integers)
Example:
from prompture.tools import parse_datetime
dt1 = parse_datetime("2024-03-15T14:30:00")
dt2 = parse_datetime("03/15/2024")
dt3 = parse_datetime("today")
dt4 = parse_datetime(1710512400) # Unix timestamp
parse_shorthand_number()
Parse numbers with shorthand suffixes like “1K”, “2.5M”, “1.2B”.
Supported Suffixes:
K/k: Thousands (x1,000)
M/m: Millions (x1,000,000)
B/b: Billions (x1,000,000,000)
T/t: Trillions (x1,000,000,000,000)
Example:
from prompture.tools import parse_shorthand_number
assert parse_shorthand_number("1.5K") == 1500
assert parse_shorthand_number("2M") == 2000000
assert parse_shorthand_number("1.2B") == 1200000000
assert parse_shorthand_number("500") == 500 # No suffix
Schema Generation and Validation
create_field_schema()
Generate JSON schema definitions from field names, types, and field definitions.
validate_field_definition()
Validate that a field definition dictionary contains all required properties and valid values.
Data Conversion and Type Handling
convert_value()
Robust value conversion with intelligent type coercion and fallback handling.
extract_fields()
Extract and validate specific fields from data dictionaries with type conversion and default value handling.
File and Data Loading
load_field_definitions()
Load field definitions from JSON or YAML files with automatic format detection.
Default Value Management
get_type_default()
Get appropriate default values for Python types.
get_field_default()
Get default values for fields using field definitions, Pydantic field info, or type defaults.
Text Processing Utilities
clean_json_text()
Clean and normalize JSON text by removing markdown formatting, extra whitespace, and common text artifacts.
Best Practices
Use configure_logging(): Call
configure_logging(logging.DEBUG)to see internal debug outputHandle Conversion Failures: Always provide sensible fallback values
Validate Field Definitions: Use
validate_field_definition()before registering custom fieldsLeverage Smart Conversion: Use
convert_value()for robust type handlingClean External Data: Use parsing utilities for user input and external data sources
Load Definitions from Files: Use
load_field_definitions()for maintainable configuration