Validator Module
The validator module provides JSON schema validation functionality for ensuring that extracted data conforms to expected structures and constraints.
Overview
The validator module enables:
Schema Validation: Validate JSON data against JSON Schema specifications
Type Checking: Ensure extracted values match expected data types
Constraint Validation: Verify that data meets defined constraints (ranges, formats, etc.)
Error Reporting: Detailed validation error messages for debugging
Main Functions
validate_against_schema()
Validate a JSON string against a JSON Schema specification and return detailed validation results.
Features:
JSON Schema Compliance: Full support for JSON Schema Draft 7 specification
Comprehensive Validation: Type checking, format validation, constraint verification
Detailed Error Reporting: Specific error messages with location information
Graceful Error Handling: Returns structured error information instead of raising exceptions
Parameters:
instance_json
(str): JSON string to validateschema
(Dict[str, Any]): JSON Schema specification as a dictionary
Returns:
Dictionary containing validation results:
{
"valid": True, # Boolean indicating if validation passed
"errors": [], # List of validation errors (empty if valid)
"data": {...}, # Parsed JSON data (if valid)
"error_count": 0, # Number of validation errors
"error_summary": "..." # Human-readable error summary
}
Example Usage:
from prompture.validator import validate_against_schema
# Define JSON Schema
person_schema = {
"type": "object",
"properties": {
"name": {
"type": "string",
"minLength": 1,
"maxLength": 100
},
"age": {
"type": "integer",
"minimum": 0,
"maximum": 150
},
"email": {
"type": "string",
"format": "email"
}
},
"required": ["name", "age"],
"additionalProperties": False
}
# Valid JSON
valid_json = '{"name": "John Doe", "age": 25, "email": "john@example.com"}'
result = validate_against_schema(valid_json, person_schema)
if result["valid"]:
print("Validation passed!")
print(f"Data: {result['data']}")
else:
print(f"Validation failed: {result['error_summary']}")
for error in result["errors"]:
print(f"- {error}")
Validation Error Examples:
# Invalid JSON - missing required field
invalid_json = '{"name": "John Doe"}' # Missing required 'age' field
result = validate_against_schema(invalid_json, person_schema)
# result["valid"] == False
# result["errors"] contains details about missing 'age' field
# Invalid JSON - type mismatch
invalid_json = '{"name": "John Doe", "age": "twenty-five"}' # Age as string
result = validate_against_schema(invalid_json, person_schema)
# result["valid"] == False
# result["errors"] contains type mismatch error for 'age' field
# Invalid JSON - constraint violation
invalid_json = '{"name": "", "age": -5}' # Empty name, negative age
result = validate_against_schema(invalid_json, person_schema)
# result["valid"] == False
# result["errors"] contains constraint violation errors
Supported JSON Schema Features
The validator supports comprehensive JSON Schema Draft 7 features:
Type Validation
Schema Type |
Python Type |
Description |
---|---|---|
|
str |
Text values with optional format constraints |
|
int |
Whole numbers with optional range constraints |
|
int, float |
Numeric values including decimals |
|
bool |
True/False values |
|
list |
Ordered collections with item type constraints |
|
dict |
Key-value structures with property constraints |
|
None |
Null/None values |
String Constraints and Formats
Length Constraints:
minLength
: Minimum string lengthmaxLength
: Maximum string length
Pattern Matching:
pattern
: Regular expression pattern matching
Built-in Formats:
email
: Email address validationdate
: Date format (YYYY-MM-DD)date-time
: ISO 8601 datetime formaturi
: URI/URL format validationuuid
: UUID format validation
Example:
string_schema = {
"type": "string",
"minLength": 3,
"maxLength": 50,
"pattern": "^[A-Za-z ]+$", # Letters and spaces only
"format": "email" # Email format validation
}
Numeric Constraints
Range Constraints:
minimum
: Minimum value (inclusive)maximum
: Maximum value (inclusive)exclusiveMinimum
: Minimum value (exclusive)exclusiveMaximum
: Maximum value (exclusive)
Multiple Constraints:
multipleOf
: Value must be multiple of specified number
Example:
numeric_schema = {
"type": "integer",
"minimum": 18,
"maximum": 65,
"multipleOf": 5 # Age in 5-year increments
}
Array Constraints
Length Constraints:
minItems
: Minimum array lengthmaxItems
: Maximum array length
Item Constraints:
items
: Schema for array itemsuniqueItems
: Require unique items
Example:
array_schema = {
"type": "array",
"minItems": 1,
"maxItems": 10,
"uniqueItems": True,
"items": {
"type": "string",
"minLength": 1
}
}
Object Constraints
Property Constraints:
properties
: Schema for specific propertiesrequired
: List of required property namesadditionalProperties
: Allow/disallow extra properties
Property Count:
minProperties
: Minimum number of propertiesmaxProperties
: Maximum number of properties
Example:
object_schema = {
"type": "object",
"properties": {
"name": {"type": "string"},
"age": {"type": "integer"}
},
"required": ["name"],
"additionalProperties": False,
"minProperties": 1,
"maxProperties": 5
}
Advanced Schema Features
Conditional Validation
Use if
/then
/else
for conditional constraints:
conditional_schema = {
"type": "object",
"properties": {
"age": {"type": "integer"},
"driver_license": {"type": "boolean"}
},
"if": {
"properties": {"age": {"minimum": 18}}
},
"then": {
"properties": {"driver_license": {"type": "boolean"}}
},
"else": {
"properties": {"driver_license": {"const": False}}
}
}
Composition Keywords
Combine schemas with logical operators:
allOf
: Must match all sub-schemasanyOf
: Must match at least one sub-schemaoneOf
: Must match exactly one sub-schemanot
: Must not match the sub-schema
Example:
composition_schema = {
"anyOf": [
{"type": "string", "format": "email"},
{"type": "string", "pattern": "^\\+[1-9]\\d{1,14}$"} # Phone number
]
}
Integration with Prompture
The validator integrates with core Prompture functionality:
Automatic Validation in Extraction:
from prompture.core import extract_and_jsonify
from prompture.validator import validate_against_schema
# Extract data
result = extract_and_jsonify(
text="John Doe is 25 years old",
json_schema=person_schema,
model_name="openai/gpt-4"
)
# Validate extracted data
validation = validate_against_schema(result["json_string"], person_schema)
if validation["valid"]:
print("Extraction and validation successful!")
else:
print(f"Validation errors: {validation['error_summary']}")
Integration with Test Suites:
from prompture.runner import run_suite_from_spec
# Test specifications automatically use validation
test_spec = {
"tests": [{
"schema": person_schema, # Automatically validated
"expected": {...}
}]
}
Error Handling and Debugging
The validator provides comprehensive error information:
Error Structure:
{
"message": "Detailed error description",
"path": ["property", "name"], # Location in data structure
"schema_path": ["properties", "name"], # Location in schema
"instance": "invalid_value", # The invalid value
"validator": "minLength", # Which validation rule failed
"constraint": 1 # The constraint value
}
Common Error Types:
Type Errors: Data type doesn’t match schema expectation
Constraint Errors: Value violates defined constraints (min/max, length, etc.)
Format Errors: String doesn’t match required format (email, date, etc.)
Required Errors: Missing required properties in objects
Additional Property Errors: Extra properties when
additionalProperties
is False
Debugging Tips:
Check Error Paths: Use the
path
field to locate problematic dataExamine Constraints: Review the
constraint
field to understand requirementsValidate Incrementally: Test individual properties before complex schemas
Use Simple Schemas First: Start with basic validation and add complexity gradually
Best Practices
Define Comprehensive Schemas: Include all relevant constraints and formats
Use Appropriate Data Types: Choose the most specific type for each field
Validate Early: Validate data as soon as possible after extraction
Handle Validation Errors: Always check validation results before using data
Provide Clear Error Messages: Use schema descriptions for better error reporting
Test Edge Cases: Validate schemas with boundary values and invalid inputs
Version Your Schemas: Keep track of schema changes for backward compatibility
Dependencies
The validator module requires:
jsonschema: JSON Schema validation library (
pip install jsonschema
)
If the jsonschema library is not available, validation functions will return appropriate error messages indicating the missing dependency.
Installation:
pip install jsonschema
The library is automatically included when installing Prompture with validation extras:
pip install prompture[validation]