Core Concepts¶

Clearstone is built around three foundational pillars that work together to make AI agents safe, observable, and debuggable. Understanding these core concepts will help you use Clearstone effectively.

The Three Pillars¶

1. Governance (Policy-as-Code)¶

Policies are declarative rules that control agent behavior at runtime. They intercept agent actions before they execute and decide whether to allow, block, modify, or pause them.

Key Components: - @Policy Decorator: Turns a Python function into a policy - PolicyContext: Provides metadata about the current execution - Decision Actions: ALLOW, BLOCK, ALERT, PAUSE, REDACT - PolicyEngine: Evaluates all policies and enforces decisions

Example:

from clearstone import Policy, ALLOW, BLOCK

@Policy(name="cost_control", priority=100)
def cost_control_policy(context):
    session_cost = context.metadata.get("session_cost", 0.0)

    if session_cost > 50.0:
        return BLOCK(f"Cost limit exceeded: ${session_cost:.2f}")

    return ALLOW

2. Observability (Distributed Tracing)¶

Tracing captures a complete record of what your agent does at runtime. Every operation is recorded as a span with precise timing, inputs, outputs, and relationships to other spans.

Key Components: - TracerProvider: Initializes the tracing system - Tracer: Creates and manages spans - Span: Represents a single operation with timing and metadata - Trace: A complete execution flow (collection of spans) - TraceStore: Persists traces to SQLite for later analysis

Example:

from clearstone.observability import TracerProvider, SpanKind

provider = TracerProvider(db_path="traces.db")
tracer = provider.get_tracer("my_agent", version="1.0")

with tracer.span("agent_workflow", kind=SpanKind.INTERNAL) as root:
    with tracer.span("llm_call", attributes={"model": "gpt-4"}):
        result = call_llm()

    with tracer.span("tool_execution", attributes={"tool": "calculator"}):
        output = run_tool()

provider.shutdown()

3. Debugging (Time-Travel & Testing)¶

Checkpointing and testing allow you to debug agents retrospectively and validate behavior against historical data.

Key Components: - CheckpointManager: Creates snapshots of agent state - ReplayEngine: Restores agent state and re-executes from any point - PolicyTestHarness: Tests policies against historical traces - Behavioral Assertions: Declarative tests for agent behavior

Example:

from clearstone.debugging import CheckpointManager, ReplayEngine

manager = CheckpointManager()
checkpoint = manager.create_checkpoint(agent, trace, span_id="span_abc")

engine = ReplayEngine(checkpoint)
engine.start_debugging_session("process_next_step", input_data)

Core Abstractions¶

PolicyContext¶

The PolicyContext is the data structure passed to every policy function. It provides information about the current execution:

@dataclass
class PolicyContext:
    user_id: str
    agent_id: str
    timestamp: float
    metadata: Dict[str, Any]

Metadata is where you pass operation-specific data:

context = create_context(
    user_id="user_123",
    agent_id="research_agent",
    metadata={
        "tool_name": "web_search",
        "session_cost": 12.50,
        "user_role": "admin"
    }
)

Decision Actions¶

Policies return a Decision that tells the engine what to do:

Action	Behavior	Use Case
ALLOW	Continue execution normally	Default - no issues detected
BLOCK	Stop execution immediately, raise error	Prevent dangerous or unauthorized actions
ALERT	Continue but log a warning	Monitor suspicious behavior
PAUSE	Stop and wait for human approval	Require manual review for high-stakes operations
REDACT	Continue but remove sensitive fields	Protect PII in outputs

Example:

from clearstone import ALLOW, BLOCK, ALERT, PAUSE, REDACT

return ALLOW

return BLOCK("User not authorized")

return ALERT

return PAUSE("Manual approval required for $10k transaction")

return REDACT(reason="PII protection", fields=["ssn", "credit_card"])

Traces and Spans¶

A trace represents a complete agent execution. A span represents a single operation within that trace.

Span Hierarchy:

Trace: research_workflow
├── Span: agent_execution
│   ├── Span: plan_generation
│   ├── Span: web_search (tool)
│   └── Span: synthesis

Span Attributes:

with tracer.span("llm_call", attributes={
    "model": "gpt-4",
    "temperature": 0.7,
    "tokens": 1500,
    "cost": 0.045
}) as span:
    result = call_llm()

Checkpoints¶

A checkpoint is a snapshot of agent state at a specific moment in time. It includes: - Agent's complete internal state - The trace context (all parent spans) - Metadata about the execution point - Timestamp and version information

Creating a Checkpoint:

from clearstone.debugging import CheckpointManager

manager = CheckpointManager(checkpoint_dir=".checkpoints")

checkpoint = manager.create_checkpoint(
    agent=my_agent,
    trace=execution_trace,
    span_id="span_xyz"
)

checkpoint_path = manager.save_checkpoint(checkpoint)

Loading a Checkpoint:

checkpoint = manager.load_checkpoint("t1_ckpt_abc123.ckpt")

restored_agent = checkpoint.agent
execution_context = checkpoint.trace

How They Work Together¶

The three pillars integrate seamlessly:

Tracing captures everything your agent does
Policies enforce rules at runtime
Testing validates behavior against historical traces
Checkpoints enable time-travel debugging

Complete Example:

from clearstone import Policy, BLOCK, ALLOW, PolicyEngine, create_context, context_scope
from clearstone.observability import TracerProvider
from clearstone.testing import PolicyTestHarness, assert_tool_was_called

@Policy(name="block_expensive_tools", priority=100)
def block_expensive_tools(context):
    tool_name = context.metadata.get("tool_name")
    if tool_name == "gpt4_turbo":
        return BLOCK("Expensive tool blocked")
    return ALLOW

provider = TracerProvider(db_path="traces.db")
tracer = provider.get_tracer("cost_conscious_agent")
engine = PolicyEngine()

with tracer.span("agent_run"):
    context = create_context(
        user_id="user_1",
        agent_id="agent_1",
        metadata={"tool_name": "gpt4_turbo"}
    )

    with context_scope(context):
        try:
            engine.evaluate(context)
        except Exception as e:
            print(f"Blocked: {e}")

provider.shutdown()

harness = PolicyTestHarness("traces.db")
traces = harness.load_traces()
result = harness.simulate_policy(
    assert_tool_was_called("gpt4_turbo", times=0),
    traces
)

Key Design Principles¶

1. Declarative Over Imperative¶

Policies are written as simple functions, not complex state machines. You declare what should happen, not how to enforce it.

2. Zero Performance Impact¶

Tracing uses asynchronous batching and thread-safe operations to ensure zero impact on agent execution speed.

3. Composability¶

Policies can be combined using compose_and and compose_or to build complex rules from simple parts.

4. Fail-Safe Defaults¶

If a policy throws an error, the engine defaults to ALLOW and logs the error. The system never crashes due to a policy bug.

5. Testability First¶

Every feature is designed to be testable. Policies can be validated before deployment, and agent behavior can be tested against historical data.

Next Steps¶

Governance Guide: Deep dive into writing and composing policies
Observability Guide: Master distributed tracing
Testing Guide: Learn behavioral testing and backtesting
Time-Travel Debugging: Debug agents by traveling back in time