TOON Input Conversion Guide

Prompture now supports TOON input conversion for structured data, allowing you to achieve significant token savings (typically 45-60%) when analyzing JSON arrays or Pandas DataFrames with LLMs.

Overview

TOON (Tabular Object Oriented Notation) is a compact format for representing uniform data structures. When you have structured data like product catalogs, user lists, or transaction records, converting to TOON format before sending to the LLM can dramatically reduce token usage while maintaining the same analytical capabilities.

Key Benefits:

  • 45-60% token reduction for uniform data arrays

  • Automatic conversion from JSON/DataFrames to TOON

  • JSON responses for easy consumption

  • Token usage tracking with savings analysis

  • No changes to existing code - new functions complement existing API

Quick Start

Installation

TOON functionality is included by default when you install Prompture:

pip install prompture

Both python-toon and pandas are now installed automatically - no extra steps needed!

Basic Usage

Analyze JSON Array Data

from prompture import extract_from_data

# Your structured data
products = [
    {"id": 1, "name": "Laptop", "price": 999.99, "rating": 4.5},
    {"id": 2, "name": "Book", "price": 19.99, "rating": 4.2},
    {"id": 3, "name": "Headphones", "price": 149.99, "rating": 4.7}
]

# Define what you want to extract
schema = {
    "type": "object",
    "properties": {
        "average_price": {"type": "number"},
        "highest_rated": {"type": "string"},
        "total_items": {"type": "integer"}
    }
}

# Ask questions about your data
result = extract_from_data(
    data=products,
    question="What is the average price, highest rated product, and total count?",
    json_schema=schema,
    model_name="openai/gpt-4"
)

print(result["json_object"])
# {"average_price": 389.96, "highest_rated": "Headphones", "total_items": 3}

# Check token savings
savings = result["token_savings"]
print(f"Token savings: {savings['percentage_saved']}%")
# Token savings: 62.3%

Analyze Pandas DataFrames

from prompture import extract_from_pandas
import pandas as pd

# Load your DataFrame
df = pd.read_csv("sales_data.csv")
# or create from existing data
df = pd.DataFrame(products)

schema = {
    "type": "object",
    "properties": {
        "top_category": {"type": "string"},
        "price_trend": {"type": "string"},
        "outliers": {"type": "array", "items": {"type": "string"}}
    }
}

result = extract_from_pandas(
    df=df,
    question="What category has the highest average price? Any pricing outliers?",
    json_schema=schema,
    model_name="openai/gpt-4"
)

print(result["json_object"])
print(f"DataFrame shape: {result['dataframe_info']['shape']}")

Advanced Usage

Working with Nested Data

When your data is nested in an API response or larger structure:

api_response = {
    "status": "success",
    "page": 1,
    "results": [
        {"user_id": 101, "score": 85, "level": "advanced"},
        {"user_id": 102, "score": 72, "level": "intermediate"}
    ]
}

result = extract_from_data(
    data=api_response,
    data_key="results",  # Specify which key contains the array
    question="What is the average score?",
    json_schema={"type": "object", "properties": {"avg_score": {"type": "number"}}},
    model_name="openai/gpt-4"
)

Custom Instructions and Options

result = extract_from_data(
    data=products,
    question="Find products under $100",
    json_schema=schema,
    model_name="openai/gpt-4",
    instruction_template="Analyze this product data carefully and answer: {question}",
    ai_cleanup=True,
    options={"temperature": 0.1, "max_tokens": 500}
)

Token Savings Analysis

Understanding the Efficiency Gains

The functions provide detailed token usage analysis:

result = extract_from_data(data=products, question=question, json_schema=schema, model_name=model)

# Token savings breakdown
savings = result["token_savings"]
print(f"JSON characters: {savings['json_characters']}")
print(f"TOON characters: {savings['toon_characters']}")
print(f"Character reduction: {savings['saved_characters']} ({savings['percentage_saved']}%)")
print(f"Estimated token savings: ~{savings['estimated_saved_tokens']} tokens")

# The TOON data that was sent to the LLM
print("TOON format used:")
print(result["toon_data"])

Preview Savings Without LLM Calls

Use the token comparison utility to analyze your data structure efficiency:

python examples/token_comparison_utility.py

Or programmatically:

from examples.token_comparison_utility import compare_formats, print_comparison_report

stats = compare_formats(your_data)
print_comparison_report(stats)

Best Practices

When TOON is Most Effective

✅ Ideal for TOON:

  • Uniform data structures (all objects have same keys)

  • Tabular data from databases, CSVs, APIs

  • Product catalogs, user lists, transaction records

  • Arrays with 3+ objects

⚠️ Less effective:

  • Non-uniform objects (different key sets)

  • Deeply nested structures

  • Very small arrays (1-2 items)

  • Already compact data

Data Structure Requirements

# ✅ Perfect for TOON - uniform structure
good_data = [
    {"id": 1, "name": "A", "price": 10.0},
    {"id": 2, "name": "B", "price": 20.0},
    {"id": 3, "name": "C", "price": 30.0}
]

# ⚠️ Less efficient - non-uniform structure
mixed_data = [
    {"id": 1, "name": "A", "price": 10.0},
    {"id": 2, "title": "B", "cost": 20.0, "extra": "data"},
    {"user": 3, "label": "C"}
]

Performance Considerations

  • Large datasets: TOON conversion is fast, but consider chunking very large DataFrames

  • Model compatibility: All models work with TOON input (it’s converted automatically)

  • Cost optimization: Higher token savings = lower API costs

Error Handling

Common Issues and Solutions

try:
    result = extract_from_data(data=data, question=question, json_schema=schema, model_name=model)
except ValueError as e:
    if "empty" in str(e):
        print("Data array is empty")
    elif "dictionaries" in str(e):
        print("All array items must be dictionaries")
    elif "python-toon" in str(e):
        print("Install python-toon: pip install python-toon")
except RuntimeError as e:
    if "pandas" in str(e):
        print("Install pandas: pip install pandas")

Migration from Existing Code

Easy Upgrade Path

Replace existing extraction calls to get automatic token savings:

# Before: Using extract_and_jsonify with JSON text
json_text = json.dumps(products)
result = extract_and_jsonify(
    text=f"Analyze this data: {json_text}",
    json_schema=schema,
    model_name=model_name
)

# After: Using extract_from_data with automatic TOON conversion
result = extract_from_data(
    data=products,  # Pass data directly
    question="Analyze this data comprehensively",
    json_schema=schema,
    model_name=model_name
)
# Automatic 45-60% token savings!

API Reference

extract_from_data()

extract_from_pandas()

Examples and Use Cases

E-commerce Analysis

products = load_product_catalog()

result = extract_from_data(
    data=products,
    question="Which products are underperforming? Consider price, rating, and sales.",
    json_schema={
        "type": "object",
        "properties": {
            "underperforming_products": {"type": "array", "items": {"type": "string"}},
            "recommended_actions": {"type": "array", "items": {"type": "string"}},
            "price_optimization": {"type": "object"}
        }
    },
    model_name="openai/gpt-4"
)

Financial Data Analysis

import pandas as pd

transactions_df = pd.read_csv("transactions.csv")

result = extract_from_pandas(
    df=transactions_df,
    question="Identify spending patterns and categorize expenses by priority",
    json_schema=expense_analysis_schema,
    model_name="anthropic/claude-3-sonnet"
)

User Behavior Analysis

user_activity = fetch_user_data_from_api()["users"]

result = extract_from_data(
    data=user_activity,
    question="Segment users by engagement level and recommend retention strategies",
    json_schema=user_segmentation_schema,
    model_name="openai/gpt-4"
)

Troubleshooting

Installation Issues

# If TOON conversion fails
pip install --upgrade python-toon

# If pandas functions fail
pip install --upgrade pandas

# Or install with optional dependencies
pip install prompture[pandas]

Performance Optimization

# For large DataFrames, consider sampling
if len(df) > 1000:
    sample_df = df.sample(n=500, random_state=42)
    result = extract_from_pandas(df=sample_df, ...)

# Or chunk processing
chunk_size = 100
results = []
for chunk in pd.read_csv("large_file.csv", chunksize=chunk_size):
    result = extract_from_pandas(df=chunk, ...)
    results.append(result)

Conclusion

TOON input conversion represents a significant advancement in LLM efficiency for structured data analysis. By automatically converting your JSON arrays and DataFrames to TOON format, you can achieve substantial token savings while maintaining full analytical capabilities.

Key takeaways:

  • 45-60% token reduction for uniform data structures

  • Drop-in replacement for existing extraction workflows

  • Comprehensive analysis capabilities with automatic optimization

  • Cost reduction through more efficient token usage

Try the token comparison utility to see how much you can save with your own data!