Validator Module

The validator module provides JSON schema validation functionality for ensuring that extracted data conforms to expected structures and constraints.

Overview

The validator module enables:

  • Schema Validation: Validate JSON data against JSON Schema specifications

  • Type Checking: Ensure extracted values match expected data types

  • Constraint Validation: Verify that data meets defined constraints (ranges, formats, etc.)

  • Error Reporting: Detailed validation error messages for debugging

Main Functions

validate_against_schema()

Validate a JSON string against a JSON Schema specification and return detailed validation results.

Features:

  • JSON Schema Compliance: Full support for JSON Schema Draft 7 specification

  • Comprehensive Validation: Type checking, format validation, constraint verification

  • Detailed Error Reporting: Specific error messages with location information

  • Graceful Error Handling: Returns structured error information instead of raising exceptions

Parameters:

  • instance_json (str): JSON string to validate

  • schema (Dict[str, Any]): JSON Schema specification as a dictionary

Returns:

Dictionary containing validation results:

{
    "valid": True,           # Boolean indicating if validation passed
    "errors": [],            # List of validation errors (empty if valid)
    "data": {...},          # Parsed JSON data (if valid)
    "error_count": 0,        # Number of validation errors
    "error_summary": "..."   # Human-readable error summary
}

Example Usage:

from prompture.validator import validate_against_schema

# Define JSON Schema
person_schema = {
    "type": "object",
    "properties": {
        "name": {
            "type": "string",
            "minLength": 1,
            "maxLength": 100
        },
        "age": {
            "type": "integer",
            "minimum": 0,
            "maximum": 150
        },
        "email": {
            "type": "string",
            "format": "email"
        }
    },
    "required": ["name", "age"],
    "additionalProperties": False
}

# Valid JSON
valid_json = '{"name": "John Doe", "age": 25, "email": "john@example.com"}'
result = validate_against_schema(valid_json, person_schema)

if result["valid"]:
    print("Validation passed!")
    print(f"Data: {result['data']}")
else:
    print(f"Validation failed: {result['error_summary']}")
    for error in result["errors"]:
        print(f"- {error}")

Validation Error Examples:

# Invalid JSON - missing required field
invalid_json = '{"name": "John Doe"}'  # Missing required 'age' field
result = validate_against_schema(invalid_json, person_schema)
# result["valid"] == False
# result["errors"] contains details about missing 'age' field

# Invalid JSON - type mismatch
invalid_json = '{"name": "John Doe", "age": "twenty-five"}'  # Age as string
result = validate_against_schema(invalid_json, person_schema)
# result["valid"] == False
# result["errors"] contains type mismatch error for 'age' field

# Invalid JSON - constraint violation
invalid_json = '{"name": "", "age": -5}'  # Empty name, negative age
result = validate_against_schema(invalid_json, person_schema)
# result["valid"] == False
# result["errors"] contains constraint violation errors

Supported JSON Schema Features

The validator supports comprehensive JSON Schema Draft 7 features:

Type Validation

Schema Type

Python Type

Description

string

str

Text values with optional format constraints

integer

int

Whole numbers with optional range constraints

number

int, float

Numeric values including decimals

boolean

bool

True/False values

array

list

Ordered collections with item type constraints

object

dict

Key-value structures with property constraints

null

None

Null/None values

String Constraints and Formats

Length Constraints:

  • minLength: Minimum string length

  • maxLength: Maximum string length

Pattern Matching:

  • pattern: Regular expression pattern matching

Built-in Formats:

  • email: Email address validation

  • date: Date format (YYYY-MM-DD)

  • date-time: ISO 8601 datetime format

  • uri: URI/URL format validation

  • uuid: UUID format validation

Example:

string_schema = {
    "type": "string",
    "minLength": 3,
    "maxLength": 50,
    "pattern": "^[A-Za-z ]+$",  # Letters and spaces only
    "format": "email"           # Email format validation
}

Numeric Constraints

Range Constraints:

  • minimum: Minimum value (inclusive)

  • maximum: Maximum value (inclusive)

  • exclusiveMinimum: Minimum value (exclusive)

  • exclusiveMaximum: Maximum value (exclusive)

Multiple Constraints:

  • multipleOf: Value must be multiple of specified number

Example:

numeric_schema = {
    "type": "integer",
    "minimum": 18,
    "maximum": 65,
    "multipleOf": 5  # Age in 5-year increments
}

Array Constraints

Length Constraints:

  • minItems: Minimum array length

  • maxItems: Maximum array length

Item Constraints:

  • items: Schema for array items

  • uniqueItems: Require unique items

Example:

array_schema = {
    "type": "array",
    "minItems": 1,
    "maxItems": 10,
    "uniqueItems": True,
    "items": {
        "type": "string",
        "minLength": 1
    }
}

Object Constraints

Property Constraints:

  • properties: Schema for specific properties

  • required: List of required property names

  • additionalProperties: Allow/disallow extra properties

Property Count:

  • minProperties: Minimum number of properties

  • maxProperties: Maximum number of properties

Example:

object_schema = {
    "type": "object",
    "properties": {
        "name": {"type": "string"},
        "age": {"type": "integer"}
    },
    "required": ["name"],
    "additionalProperties": False,
    "minProperties": 1,
    "maxProperties": 5
}

Advanced Schema Features

Conditional Validation

Use if/then/else for conditional constraints:

conditional_schema = {
    "type": "object",
    "properties": {
        "age": {"type": "integer"},
        "driver_license": {"type": "boolean"}
    },
    "if": {
        "properties": {"age": {"minimum": 18}}
    },
    "then": {
        "properties": {"driver_license": {"type": "boolean"}}
    },
    "else": {
        "properties": {"driver_license": {"const": False}}
    }
}

Composition Keywords

Combine schemas with logical operators:

  • allOf: Must match all sub-schemas

  • anyOf: Must match at least one sub-schema

  • oneOf: Must match exactly one sub-schema

  • not: Must not match the sub-schema

Example:

composition_schema = {
    "anyOf": [
        {"type": "string", "format": "email"},
        {"type": "string", "pattern": "^\\+[1-9]\\d{1,14}$"}  # Phone number
    ]
}

Integration with Prompture

The validator integrates with core Prompture functionality:

Automatic Validation in Extraction:

from prompture.core import extract_and_jsonify
from prompture.validator import validate_against_schema

# Extract data
result = extract_and_jsonify(
    text="John Doe is 25 years old",
    json_schema=person_schema,
    model_name="openai/gpt-4"
)

# Validate extracted data
validation = validate_against_schema(result["json_string"], person_schema)

if validation["valid"]:
    print("Extraction and validation successful!")
else:
    print(f"Validation errors: {validation['error_summary']}")

Integration with Test Suites:

from prompture.runner import run_suite_from_spec

# Test specifications automatically use validation
test_spec = {
    "tests": [{
        "schema": person_schema,  # Automatically validated
        "expected": {...}
    }]
}

Error Handling and Debugging

The validator provides comprehensive error information:

Error Structure:

{
    "message": "Detailed error description",
    "path": ["property", "name"],  # Location in data structure
    "schema_path": ["properties", "name"],  # Location in schema
    "instance": "invalid_value",  # The invalid value
    "validator": "minLength",     # Which validation rule failed
    "constraint": 1              # The constraint value
}

Common Error Types:

  • Type Errors: Data type doesn’t match schema expectation

  • Constraint Errors: Value violates defined constraints (min/max, length, etc.)

  • Format Errors: String doesn’t match required format (email, date, etc.)

  • Required Errors: Missing required properties in objects

  • Additional Property Errors: Extra properties when additionalProperties is False

Debugging Tips:

  1. Check Error Paths: Use the path field to locate problematic data

  2. Examine Constraints: Review the constraint field to understand requirements

  3. Validate Incrementally: Test individual properties before complex schemas

  4. Use Simple Schemas First: Start with basic validation and add complexity gradually

Best Practices

  1. Define Comprehensive Schemas: Include all relevant constraints and formats

  2. Use Appropriate Data Types: Choose the most specific type for each field

  3. Validate Early: Validate data as soon as possible after extraction

  4. Handle Validation Errors: Always check validation results before using data

  5. Provide Clear Error Messages: Use schema descriptions for better error reporting

  6. Test Edge Cases: Validate schemas with boundary values and invalid inputs

  7. Version Your Schemas: Keep track of schema changes for backward compatibility

Dependencies

The validator module requires:

  • jsonschema: JSON Schema validation library (pip install jsonschema)

If the jsonschema library is not available, validation functions will return appropriate error messages indicating the missing dependency.

Installation:

pip install jsonschema

The library is automatically included when installing Prompture with validation extras:

pip install prompture[validation]