User Guide
=========
Core Concepts
------------
Tukuy is built around a few key concepts that make it powerful and flexible:
Transformers
~~~~~~~~~~~
Transformers are the basic building blocks in Tukuy. Each transformer takes an input value, performs a specific operation, and returns a result. Transformers can:
- **Validate** input data (like validating email formats)
- **Transform** data from one form to another (like stripping HTML tags)
- **Extract** specific pieces of information (like selecting elements from HTML)
- **Calculate** new values (like calculating age from a date)
Chaining
~~~~~~~~
Transformers can be chained together to create complex transformation pipelines. This allows you to:
- Apply multiple transformations in sequence
- Build reusable transformation sequences
- Combine different types of transformations
Plugins
~~~~~~~
Tukuy's plugin system allows for extensibility. Plugins:
- Provide sets of related transformers
- Can be registered with the main TukuyTransformer
- Allow for modular organization of functionality
- Make it easy to add custom transformations
Using Transformers
-----------------
Text Transformations
~~~~~~~~~~~~~~~~~~~
.. code-block:: python
from tukuy import TukuyTransformer
TUKUY = TukuyTransformer()
# Basic transformations
text = " Hello World! "
result = TUKUY.transform(text, [
"strip", # Remove leading/trailing whitespace
"lowercase", # Convert to lowercase
{"function": "truncate", "length": 5} # Truncate to 5 chars
])
print(result) # "hello..."
# Using regex
text = "Hello 123 World"
result = TUKUY.transform(text, [
{"function": "regex_replace", "pattern": r"\d+", "replacement": "#"}
])
print(result) # "Hello # World"
HTML Transformations
~~~~~~~~~~~~~~~~~~
.. code-block:: python
html = "
Title
This is important content.
"
# Strip tags
clean_text = TUKUY.transform(html, ["strip_html_tags"])
print(clean_text) # "Title This is important content."
# Extract specific content
title = TUKUY.transform(html, [
{"function": "select", "selector": "h1"}
])
print(title) # "Title"
# Extract and transform
important = TUKUY.transform(html, [
{"function": "select", "selector": "b"},
"uppercase"
])
print(important) # "IMPORTANT"
Date Transformations
~~~~~~~~~~~~~~~~~~
.. code-block:: python
from datetime import date
# Calculate age
birth_date = "1990-05-15"
age = TUKUY.transform(birth_date, [
{"function": "age_calc"}
])
print(f"Age: {age} years")
# Calculate duration
start_date = "2023-01-01"
days = TUKUY.transform(start_date, [
{"function": "duration_calc", "unit": "days", "end": "2023-12-31"}
])
print(f"Days: {days}")
JSON Transformations
~~~~~~~~~~~~~~~~~~
.. code-block:: python
json_str = '{"user": {"name": "John", "email": "john@example.com", "age": 30}}'
# Extract values
name = TUKUY.transform(json_str, [
{"function": "extract", "selector": "user.name"}
])
print(name) # "John"
# Transform extracted values
email = TUKUY.transform(json_str, [
{"function": "extract", "selector": "user.email"},
{"function": "validate_email"}
])
print(email) # "john@example.com" or None if invalid
Numerical Transformations
~~~~~~~~~~~~~~~~~~~~~~~
.. code-block:: python
# Format number
number = 1234.56
formatted = TUKUY.transform(number, [
{"function": "format_number", "decimals": 1}
])
print(formatted) # "1,234.6"
# Convert to percentage
decimal = 0.75
percentage = TUKUY.transform(decimal, [
{"function": "percentage", "decimals": 0}
])
print(percentage) # "75%"
Validation Transformations
~~~~~~~~~~~~~~~~~~~~~~~~
.. code-block:: python
# Validate email
email = "test@example.com"
valid_email = TUKUY.transform(email, ["email_validator"])
print(valid_email) # "test@example.com" or None if invalid
# Validate URL
url = "https://example.com"
valid_url = TUKUY.transform(url, ["url_validator"])
print(valid_url) # "https://example.com" or None if invalid
# Validate number range
number = 15
in_range = TUKUY.transform(number, [
{"function": "range_validator", "min": 10, "max": 20}
])
print(in_range) # 15 or None if outside range
Pattern-based Data Extraction
----------------------------
HTML Pattern Extraction
~~~~~~~~~~~~~~~~~~~~~
.. code-block:: python
html = """
Main Title
John Doe
2023-05-15
First paragraph
Second paragraph
"""
pattern = {
"properties": [
{
"name": "title",
"selector": "h1",
"transform": ["strip", "uppercase"]
},
{
"name": "author",
"selector": ".author",
"transform": ["strip"]
},
{
"name": "paragraphs",
"selector": "p",
"type": "array"
},
{
"name": "tags",
"selector": ".tags li",
"type": "array"
}
]
}
result = TUKUY.extract_html_with_pattern(html, pattern)
print(result)
# {
# "title": "MAIN TITLE",
# "author": "John Doe",
# "paragraphs": ["First paragraph", "Second paragraph"],
# "tags": ["tech", "python", "data"]
# }
JSON Pattern Extraction
~~~~~~~~~~~~~~~~~~~~~
.. code-block:: python
json_str = """
{
"data": {
"user": {
"profile": {
"name": "John Doe",
"email": "john@example.com"
},
"settings": {
"theme": "dark",
"notifications": true
},
"posts": [
{"id": 1, "title": "First Post", "likes": 10},
{"id": 2, "title": "Second Post", "likes": 15},
{"id": 3, "title": "Third Post", "likes": 5}
]
}
}
}
"""
pattern = {
"properties": [
{
"name": "userName",
"selector": "data.user.profile.name"
},
{
"name": "email",
"selector": "data.user.profile.email",
"transform": ["email_validator"]
},
{
"name": "darkMode",
"selector": "data.user.settings.theme",
"transform": [{"function": "equals", "value": "dark"}]
},
{
"name": "postTitles",
"selector": "data.user.posts[*].title",
"type": "array"
},
{
"name": "totalLikes",
"selector": "data.user.posts[*].likes",
"transform": [{"function": "sum"}]
}
]
}
result = TUKUY.extract_json_with_pattern(json_str, pattern)
print(result)
# {
# "userName": "John Doe",
# "email": "john@example.com",
# "darkMode": true,
# "postTitles": ["First Post", "Second Post", "Third Post"],
# "totalLikes": 30
# }
Error Handling
-------------
Tukuy provides comprehensive error handling through specific exception types:
.. code-block:: python
from tukuy import TukuyTransformer
from tukuy.exceptions import ValidationError, TransformationError, ParseError
TUKUY = TukuyTransformer()
try:
# Try to validate an invalid email
result = TUKUY.transform("not-an-email", ["email_validator"])
except ValidationError as e:
print(f"Validation error: {e}")
try:
# Try to parse invalid JSON
result = TUKUY.transform("{invalid-json}", [{"function": "parse_json"}])
except ParseError as e:
print(f"Parse error: {e}")
try:
# Try to use a non-existent transformer
result = TUKUY.transform("hello", ["non_existent_transformer"])
except TransformationError as e:
print(f"Transformation error: {e}")
Best Practices
-------------
Chain Transformations
~~~~~~~~~~~~~~~~~~~
Chain transformations to avoid creating intermediate objects and to make your code more readable:
.. code-block:: python
# Less efficient with intermediate objects:
text = " Hello World! "
text = TUKUY.transform(text, ["strip"])
text = TUKUY.transform(text, ["lowercase"])
# More efficient with chaining:
text = " Hello World! "
text = TUKUY.transform(text, ["strip", "lowercase"])
Use Specific Selectors
~~~~~~~~~~~~~~~~~~~~
When extracting data from HTML or JSON, use specific selectors to improve performance:
.. code-block:: python
# Less efficient:
title = TUKUY.transform(html, [
{"function": "select", "selector": "div"} # Too general
])
# More efficient:
title = TUKUY.transform(html, [
{"function": "select", "selector": "div.article h1"} # More specific
])
Reuse Transformer Instances
~~~~~~~~~~~~~~~~~~~~~~~~~
Create a single TukuyTransformer instance and reuse it throughout your application:
.. code-block:: python
# Create once:
TUKUY = TukuyTransformer()
# Reuse across your application:
def process_user(user_data):
name = TUKUY.transform(user_data, [{"function": "extract", "selector": "name"}])
email = TUKUY.transform(user_data, [{"function": "extract", "selector": "email"}])
# ...
Create Custom Transformers
~~~~~~~~~~~~~~~~~~~~~~~~
For performance-critical operations or specialized transformations, create custom transformers:
.. code-block:: python
from tukuy.base import ChainableTransformer
from tukuy.plugins import TransformerPlugin
class CustomTransformer(ChainableTransformer[str, str]):
def validate(self, value: str) -> bool:
return isinstance(value, str)
def _transform(self, value: str, context=None) -> str:
# Custom implementation here
return value.replace('specific_pattern', 'replacement')
class CustomPlugin(TransformerPlugin):
def __init__(self):
super().__init__("custom_plugin")
@property
def transformers(self):
return {
'custom_transform': lambda _: CustomTransformer('custom_transform')
}
TUKUY = TukuyTransformer()
TUKUY.register_plugin(CustomPlugin())