Tools Module

The tools module provides utility functions for data conversion, validation, field schema generation, and debugging support used throughout Prompture’s extraction pipeline.

Overview

The tools module contains:

  • Data Conversion: Robust type conversion with fallback handling

  • Schema Generation: JSON schema creation from Python types and field definitions

  • Parsing Utilities: Specialized parsers for dates, numbers, and boolean values

  • Validation: Field definition validation and type checking

  • Debugging: Comprehensive logging system with configurable levels

  • Text Processing: JSON cleanup and text manipulation utilities

Logging and Debugging

LogLevel

Enumeration defining logging levels for debug output throughout Prompture.

Available Levels:

  • LogLevel.OFF (0) - No logging output

  • LogLevel.ERROR (1) - Error messages only

  • LogLevel.WARN (2) - Warnings and errors

  • LogLevel.INFO (3) - Informational messages

  • LogLevel.DEBUG (4) - Detailed debugging information

  • LogLevel.TRACE (5) - Maximum verbosity with full data dumps

log_debug()

Conditional logging function that outputs debug information based on current and target log levels.

Example:

from prompture.tools import log_debug, LogLevel

# Log at different levels
log_debug(LogLevel.INFO, LogLevel.DEBUG, "Processing started", prefix="[main]")
log_debug(LogLevel.DEBUG, LogLevel.DEBUG, {"field": "name", "value": "John"})
log_debug(LogLevel.ERROR, LogLevel.INFO, "Validation failed")

Data Parsing and Conversion

parse_boolean()

Robustly parse various boolean representations into Python boolean values.

Supported Formats:

  • Strings: “true”, “false”, “yes”, “no”, “1”, “0”, “on”, “off”

  • Numbers: 1 (True), 0 (False), any non-zero number (True)

  • Booleans: Direct pass-through

  • Case-insensitive: “TRUE”, “True”, “tRuE” all work

Example:

from prompture.tools import parse_boolean

assert parse_boolean("yes") == True
assert parse_boolean("FALSE") == False
assert parse_boolean(1) == True
assert parse_boolean("0") == False

as_list()

Convert various input types to lists with intelligent parsing.

Features:

  • String splitting: Automatic delimiter detection or custom separators

  • Single values: Wrap non-list values in lists

  • List pass-through: Return lists unchanged

  • Empty handling: Proper handling of None and empty strings

Example:

from prompture.tools import as_list

# Automatic delimiter detection
assert as_list("apple,banana,cherry") == ["apple", "banana", "cherry"]
assert as_list("red; blue; green") == ["red", "blue", "green"]

# Custom separator
assert as_list("a|b|c", sep="|") == ["a", "b", "c"]

# Single value wrapping
assert as_list("single") == ["single"]
assert as_list(42) == [42]

parse_datetime()

Parse datetime strings in various formats into Python datetime objects.

Supported Formats:

  • ISO 8601: “2024-03-15T14:30:00Z”

  • Date only: “2024-03-15”, “03/15/2024”

  • Relative: “today”, “yesterday”, “tomorrow”

  • Timestamps: Unix timestamps (integers)

Example:

from prompture.tools import parse_datetime

dt1 = parse_datetime("2024-03-15T14:30:00")
dt2 = parse_datetime("03/15/2024")
dt3 = parse_datetime("today")
dt4 = parse_datetime(1710512400)  # Unix timestamp

parse_shorthand_number()

Parse numbers with shorthand suffixes like “1K”, “2.5M”, “1.2B”.

Supported Suffixes:

  • K/k: Thousands (×1,000)

  • M/m: Millions (×1,000,000)

  • B/b: Billions (×1,000,000,000)

  • T/t: Trillions (×1,000,000,000,000)

Example:

from prompture.tools import parse_shorthand_number

assert parse_shorthand_number("1.5K") == 1500
assert parse_shorthand_number("2M") == 2000000
assert parse_shorthand_number("1.2B") == 1200000000
assert parse_shorthand_number("500") == 500  # No suffix

Schema Generation and Validation

create_field_schema()

Generate JSON schema definitions from field names, types, and field definitions.

Features:

  • Type mapping: Python types to JSON schema types

  • Field definitions: Integration with Prompture’s field definition system

  • Validation rules: Automatic constraint generation

  • Default values: Schema default value handling

Example:

from prompture.tools import create_field_schema

# Basic type schema
schema = create_field_schema("age", int)
# Returns: {"type": "integer", "description": "Value for age"}

# With field definitions
field_defs = {"name": {"type": str, "description": "Person's name"}}
schema = create_field_schema("name", str, field_definitions=field_defs)

validate_field_definition()

Validate that a field definition dictionary contains all required properties and valid values.

Required Properties:

  • type: Valid Python type (str, int, float, list, dict, bool)

  • description: Non-empty string description

  • instructions: Extraction instructions for LLMs

  • default: Default value matching the specified type

  • nullable: Boolean indicating if None values are allowed

Example:

from prompture.tools import validate_field_definition

valid_def = {
    "type": str,
    "description": "Person's full name",
    "instructions": "Extract complete name as written",
    "default": "",
    "nullable": False
}

assert validate_field_definition(valid_def) == True

Data Conversion and Type Handling

convert_value()

Robust value conversion with intelligent type coercion and fallback handling.

Key Features:

  • Smart Type Conversion: Handles strings, numbers, lists, dictionaries

  • Pydantic Integration: Works with Pydantic models and field types

  • Graceful Fallbacks: Returns appropriate defaults when conversion fails

  • Recursive Processing: Deep conversion for nested data structures

  • Special Type Handling: Decimal, datetime, custom classes

Conversion Examples:

from prompture.tools import convert_value
from typing import List

# String to integer
result = convert_value("42", int)  # Returns: 42

# String to list
result = convert_value("apple,banana,cherry", List[str])
# Returns: ["apple", "banana", "cherry"]

# Failed conversion with fallback
result = convert_value("invalid", int, fallback_value=0)  # Returns: 0

# Dictionary with type conversion
data = {"age": "25", "scores": "80,90,95"}
types = {"age": int, "scores": List[int]}
result = convert_value(data, dict, value_types=types)
# Returns: {"age": 25, "scores": [80, 90, 95]}

extract_fields()

Extract and validate specific fields from data dictionaries with type conversion and default value handling.

Features:

  • Field Selection: Extract only specified fields from larger datasets

  • Type Conversion: Automatic conversion using [convert_value()](#convert_value)

  • Default Handling: Intelligent defaults from field definitions or type defaults

  • Validation: Field presence and type validation

  • Error Recovery: Graceful handling of missing or invalid fields

Example:

from prompture.tools import extract_fields

raw_data = {
    "name": "John Doe",
    "age": "25",
    "scores": "85,92,78",
    "extra": "ignored"
}

field_types = {
    "name": str,
    "age": int,
    "scores": List[int]
}

field_definitions = {
    "name": {"default": "Unknown", "nullable": False},
    "age": {"default": 0, "nullable": False}
}

result = extract_fields(
    data=raw_data,
    field_names=["name", "age", "scores"],
    field_types=field_types,
    field_definitions=field_definitions
)
# Returns: {"name": "John Doe", "age": 25, "scores": [85, 92, 78]}

File and Data Loading

load_field_definitions()

Load field definitions from JSON or YAML files with automatic format detection.

Supported Formats:

  • JSON: Standard JSON field definition files

  • YAML: YAML format for more readable configuration files

  • Auto-detection: Based on file extension (.json, .yaml, .yml)

Example:

from prompture.tools import load_field_definitions

# Load from JSON
fields = load_field_definitions("custom_fields.json")

# Load from YAML
fields = load_field_definitions("fields.yaml")

# Use loaded definitions
from prompture.field_definitions import add_field_definitions
add_field_definitions(fields)

Default Value Management

get_type_default()

Get appropriate default values for Python types.

Type Defaults:

  • str"" (empty string)

  • int0

  • float0.0

  • boolFalse

  • list[] (empty list)

  • dict{} (empty dictionary)

  • NoneNone

Example:

from prompture.tools import get_type_default

assert get_type_default(str) == ""
assert get_type_default(int) == 0
assert get_type_default(list) == []

get_field_default()

Get default values for fields using field definitions, Pydantic field info, or type defaults.

Priority Order:

  1. Field definition default value

  2. Pydantic Field default value

  3. Type-based default from [get_type_default()](#get_type_default)

Example:

from prompture.tools import get_field_default
from pydantic import Field

# With field definition
field_defs = {"name": {"default": "Anonymous", "type": str}}
default = get_field_default("name", None, field_defs)  # "Anonymous"

# With Pydantic Field
field_info = Field(default="Unknown")
default = get_field_default("name", field_info)  # "Unknown"

Text Processing Utilities

clean_json_text()

Clean and normalize JSON text by removing markdown formatting, extra whitespace, and common text artifacts.

Cleaning Operations:

  • Remove markdown code block markers (`json, `)

  • Strip extra whitespace and newlines

  • Remove common prefixes (“Here’s the JSON:”, “Result:”)

  • Normalize quote characters

  • Fix common JSON syntax issues

Example:

from prompture.tools import clean_json_text

messy_json = '''
Here's your JSON:
```json
{
  "name": "John",
  "age": 25
}
```
'''

clean = clean_json_text(messy_json)
# Returns: '{"name": "John", "age": 25}'

Internal Utility Functions

The tools module also includes several internal utility functions used by other Prompture modules:

  • _base_schema_for_type(): Generate base JSON schemas for Python types

  • _strip_desc(): Remove description fields from schemas

  • _to_decimal(): Convert values to Decimal objects safely

  • _safe_convert_recursive(): Recursive type conversion with error handling

Integration with Other Modules

The tools module provides essential utilities used throughout Prompture:

Core Module Integration:

  • [convert_value()](#convert_value) used in [stepwise_extract_with_model()](../api/core.rst#stepwise_extract_with_model)

  • [clean_json_text()](#clean_json_text) used in [ask_for_json()](../api/core.rst#ask_for_json)

  • [log_debug()](#log_debug) used for debugging throughout extraction functions

Field Definitions Integration:

  • [validate_field_definition()](#validate_field_definition) used in field registration

  • [get_field_default()](#get_field_default) used in [field_from_registry()](../api/field_definitions.rst#field_from_registry)

Driver Integration:

  • Logging utilities used across all driver implementations

  • Type conversion used in response processing

Best Practices

  1. Use Appropriate Log Levels: Set log levels based on your debugging needs

  2. Handle Conversion Failures: Always provide sensible fallback values

  3. Validate Field Definitions: Use [validate_field_definition()](#validate_field_definition) before registering custom fields

  4. Leverage Smart Conversion: Use [convert_value()](#convert_value) for robust type handling

  5. Clean External Data: Use parsing utilities for user input and external data sources

  6. Load Definitions from Files: Use [load_field_definitions()](#load_field_definitions) for maintainable configuration