TOON Input Conversion Guide

Prompture now supports TOON input conversion for structured data, allowing you to achieve significant token savings (typically 45-60%) when analyzing JSON arrays or Pandas DataFrames with LLMs.

Overview

TOON (Tabular Object Oriented Notation) is a compact format for representing uniform data structures. When you have structured data like product catalogs, user lists, or transaction records, converting to TOON format before sending to the LLM can dramatically reduce token usage while maintaining the same analytical capabilities.

Key Benefits:

45-60% token reduction for uniform data arrays
Automatic conversion from JSON/DataFrames to TOON
JSON responses for easy consumption
Token usage tracking with savings analysis
No changes to existing code - new functions complement existing API

Quick Start

Installation

TOON functionality is included by default when you install Prompture:

pip install prompture

Both python-toon and pandas are now installed automatically - no extra steps needed!

Basic Usage

Analyze JSON Array Data

from prompture import extract_from_data

# Your structured data
products = [
    {"id": 1, "name": "Laptop", "price": 999.99, "rating": 4.5},
    {"id": 2, "name": "Book", "price": 19.99, "rating": 4.2},
    {"id": 3, "name": "Headphones", "price": 149.99, "rating": 4.7}
]

# Define what you want to extract
schema = {
    "type": "object",
    "properties": {
        "average_price": {"type": "number"},
        "highest_rated": {"type": "string"},
        "total_items": {"type": "integer"}
    }
}

# Ask questions about your data
result = extract_from_data(
    data=products,
    question="What is the average price, highest rated product, and total count?",
    json_schema=schema,
    model_name="openai/gpt-4"
)

print(result["json_object"])
# {"average_price": 389.96, "highest_rated": "Headphones", "total_items": 3}

# Check token savings
savings = result["token_savings"]
print(f"Token savings: {savings['percentage_saved']}%")
# Token savings: 62.3%

Analyze Pandas DataFrames

from prompture import extract_from_pandas
import pandas as pd

# Load your DataFrame
df = pd.read_csv("sales_data.csv")
# or create from existing data
df = pd.DataFrame(products)

schema = {
    "type": "object",
    "properties": {
        "top_category": {"type": "string"},
        "price_trend": {"type": "string"},
        "outliers": {"type": "array", "items": {"type": "string"}}
    }
}

result = extract_from_pandas(
    df=df,
    question="What category has the highest average price? Any pricing outliers?",
    json_schema=schema,
    model_name="openai/gpt-4"
)

print(result["json_object"])
print(f"DataFrame shape: {result['dataframe_info']['shape']}")

Advanced Usage

Working with Nested Data

When your data is nested in an API response or larger structure:

api_response = {
    "status": "success",
    "page": 1,
    "results": [
        {"user_id": 101, "score": 85, "level": "advanced"},
        {"user_id": 102, "score": 72, "level": "intermediate"}
    ]
}

result = extract_from_data(
    data=api_response,
    data_key="results",  # Specify which key contains the array
    question="What is the average score?",
    json_schema={"type": "object", "properties": {"avg_score": {"type": "number"}}},
    model_name="openai/gpt-4"
)

Custom Instructions and Options

result = extract_from_data(
    data=products,
    question="Find products under $100",
    json_schema=schema,
    model_name="openai/gpt-4",
    instruction_template="Analyze this product data carefully and answer: {question}",
    ai_cleanup=True,
    options={"temperature": 0.1, "max_tokens": 500}
)

Token Savings Analysis

Understanding the Efficiency Gains

The functions provide detailed token usage analysis:

result = extract_from_data(data=products, question=question, json_schema=schema, model_name=model)

# Token savings breakdown
savings = result["token_savings"]
print(f"JSON characters: {savings['json_characters']}")
print(f"TOON characters: {savings['toon_characters']}")
print(f"Character reduction: {savings['saved_characters']} ({savings['percentage_saved']}%)")
print(f"Estimated token savings: ~{savings['estimated_saved_tokens']} tokens")

# The TOON data that was sent to the LLM
print("TOON format used:")
print(result["toon_data"])

Preview Savings Without LLM Calls

Use the token comparison utility to analyze your data structure efficiency:

python examples/token_comparison_utility.py

Or programmatically:

from examples.token_comparison_utility import compare_formats, print_comparison_report

stats = compare_formats(your_data)
print_comparison_report(stats)

Best Practices

When TOON is Most Effective

✅ Ideal for TOON:

Uniform data structures (all objects have same keys)
Tabular data from databases, CSVs, APIs
Product catalogs, user lists, transaction records
Arrays with 3+ objects

⚠️ Less effective:

Non-uniform objects (different key sets)
Deeply nested structures
Very small arrays (1-2 items)
Already compact data

Data Structure Requirements

# ✅ Perfect for TOON - uniform structure
good_data = [
    {"id": 1, "name": "A", "price": 10.0},
    {"id": 2, "name": "B", "price": 20.0},
    {"id": 3, "name": "C", "price": 30.0}
]

# ⚠️ Less efficient - non-uniform structure
mixed_data = [
    {"id": 1, "name": "A", "price": 10.0},
    {"id": 2, "title": "B", "cost": 20.0, "extra": "data"},
    {"user": 3, "label": "C"}
]

Performance Considerations

Large datasets: TOON conversion is fast, but consider chunking very large DataFrames
Model compatibility: All models work with TOON input (it’s converted automatically)
Cost optimization: Higher token savings = lower API costs

Error Handling

Common Issues and Solutions

try:
    result = extract_from_data(data=data, question=question, json_schema=schema, model_name=model)
except ValueError as e:
    if "empty" in str(e):
        print("Data array is empty")
    elif "dictionaries" in str(e):
        print("All array items must be dictionaries")
    elif "python-toon" in str(e):
        print("Install python-toon: pip install python-toon")
except RuntimeError as e:
    if "pandas" in str(e):
        print("Install pandas: pip install pandas")

Migration from Existing Code

Easy Upgrade Path

Replace existing extraction calls to get automatic token savings:

# Before: Using extract_and_jsonify with JSON text
json_text = json.dumps(products)
result = extract_and_jsonify(
    text=f"Analyze this data: {json_text}",
    json_schema=schema,
    model_name=model_name
)

# After: Using extract_from_data with automatic TOON conversion
result = extract_from_data(
    data=products,  # Pass data directly
    question="Analyze this data comprehensively",
    json_schema=schema,
    model_name=model_name
)
# Automatic 45-60% token savings!

API Reference

extract_from_data()

extract_from_pandas()

Examples and Use Cases

E-commerce Analysis

products = load_product_catalog()

result = extract_from_data(
    data=products,
    question="Which products are underperforming? Consider price, rating, and sales.",
    json_schema={
        "type": "object",
        "properties": {
            "underperforming_products": {"type": "array", "items": {"type": "string"}},
            "recommended_actions": {"type": "array", "items": {"type": "string"}},
            "price_optimization": {"type": "object"}
        }
    },
    model_name="openai/gpt-4"
)

Financial Data Analysis

import pandas as pd

transactions_df = pd.read_csv("transactions.csv")

result = extract_from_pandas(
    df=transactions_df,
    question="Identify spending patterns and categorize expenses by priority",
    json_schema=expense_analysis_schema,
    model_name="anthropic/claude-3-sonnet"
)

User Behavior Analysis

user_activity = fetch_user_data_from_api()["users"]

result = extract_from_data(
    data=user_activity,
    question="Segment users by engagement level and recommend retention strategies",
    json_schema=user_segmentation_schema,
    model_name="openai/gpt-4"
)

Troubleshooting

Installation Issues

# If TOON conversion fails
pip install --upgrade python-toon

# If pandas functions fail
pip install --upgrade pandas

# Or install with optional dependencies
pip install prompture[pandas]

Performance Optimization

# For large DataFrames, consider sampling
if len(df) > 1000:
    sample_df = df.sample(n=500, random_state=42)
    result = extract_from_pandas(df=sample_df, ...)

# Or chunk processing
chunk_size = 100
results = []
for chunk in pd.read_csv("large_file.csv", chunksize=chunk_size):
    result = extract_from_pandas(df=chunk, ...)
    results.append(result)

Conclusion

TOON input conversion represents a significant advancement in LLM efficiency for structured data analysis. By automatically converting your JSON arrays and DataFrames to TOON format, you can achieve substantial token savings while maintaining full analytical capabilities.

Key takeaways:

45-60% token reduction for uniform data structures
Drop-in replacement for existing extraction workflows
Comprehensive analysis capabilities with automatic optimization
Cost reduction through more efficient token usage

Try the token comparison utility to see how much you can save with your own data!