Core Concepts¶
Clearstone is built around three foundational pillars that work together to make AI agents safe, observable, and debuggable. Understanding these core concepts will help you use Clearstone effectively.
The Three Pillars¶
1. Governance (Policy-as-Code)¶
Policies are declarative rules that control agent behavior at runtime. They intercept agent actions before they execute and decide whether to allow, block, modify, or pause them.
Key Components: - @Policy Decorator: Turns a Python function into a policy - PolicyContext: Provides metadata about the current execution - Decision Actions: ALLOW, BLOCK, ALERT, PAUSE, REDACT - PolicyEngine: Evaluates all policies and enforces decisions
Example:
from clearstone import Policy, ALLOW, BLOCK
@Policy(name="cost_control", priority=100)
def cost_control_policy(context):
session_cost = context.metadata.get("session_cost", 0.0)
if session_cost > 50.0:
return BLOCK(f"Cost limit exceeded: ${session_cost:.2f}")
return ALLOW
2. Observability (Distributed Tracing)¶
Tracing captures a complete record of what your agent does at runtime. Every operation is recorded as a span with precise timing, inputs, outputs, and relationships to other spans.
Key Components: - TracerProvider: Initializes the tracing system - Tracer: Creates and manages spans - Span: Represents a single operation with timing and metadata - Trace: A complete execution flow (collection of spans) - TraceStore: Persists traces to SQLite for later analysis
Example:
from clearstone.observability import TracerProvider, SpanKind
provider = TracerProvider(db_path="traces.db")
tracer = provider.get_tracer("my_agent", version="1.0")
with tracer.span("agent_workflow", kind=SpanKind.INTERNAL) as root:
with tracer.span("llm_call", attributes={"model": "gpt-4"}):
result = call_llm()
with tracer.span("tool_execution", attributes={"tool": "calculator"}):
output = run_tool()
provider.shutdown()
3. Debugging (Time-Travel & Testing)¶
Checkpointing and testing allow you to debug agents retrospectively and validate behavior against historical data.
Key Components: - CheckpointManager: Creates snapshots of agent state - ReplayEngine: Restores agent state and re-executes from any point - PolicyTestHarness: Tests policies against historical traces - Behavioral Assertions: Declarative tests for agent behavior
Example:
from clearstone.debugging import CheckpointManager, ReplayEngine
manager = CheckpointManager()
checkpoint = manager.create_checkpoint(agent, trace, span_id="span_abc")
engine = ReplayEngine(checkpoint)
engine.start_debugging_session("process_next_step", input_data)
Core Abstractions¶
PolicyContext¶
The PolicyContext is the data structure passed to every policy function. It provides information about the current execution:
@dataclass
class PolicyContext:
user_id: str
agent_id: str
timestamp: float
metadata: Dict[str, Any]
Metadata is where you pass operation-specific data:
context = create_context(
user_id="user_123",
agent_id="research_agent",
metadata={
"tool_name": "web_search",
"session_cost": 12.50,
"user_role": "admin"
}
)
Decision Actions¶
Policies return a Decision that tells the engine what to do:
| Action | Behavior | Use Case |
|---|---|---|
| ALLOW | Continue execution normally | Default - no issues detected |
| BLOCK | Stop execution immediately, raise error | Prevent dangerous or unauthorized actions |
| ALERT | Continue but log a warning | Monitor suspicious behavior |
| PAUSE | Stop and wait for human approval | Require manual review for high-stakes operations |
| REDACT | Continue but remove sensitive fields | Protect PII in outputs |
Example:
from clearstone import ALLOW, BLOCK, ALERT, PAUSE, REDACT
return ALLOW
return BLOCK("User not authorized")
return ALERT
return PAUSE("Manual approval required for $10k transaction")
return REDACT(reason="PII protection", fields=["ssn", "credit_card"])
Traces and Spans¶
A trace represents a complete agent execution. A span represents a single operation within that trace.
Span Hierarchy:
Trace: research_workflow
├── Span: agent_execution
│ ├── Span: plan_generation
│ ├── Span: web_search (tool)
│ └── Span: synthesis
Span Attributes:
with tracer.span("llm_call", attributes={
"model": "gpt-4",
"temperature": 0.7,
"tokens": 1500,
"cost": 0.045
}) as span:
result = call_llm()
Checkpoints¶
A checkpoint is a snapshot of agent state at a specific moment in time. It includes: - Agent's complete internal state - The trace context (all parent spans) - Metadata about the execution point - Timestamp and version information
Creating a Checkpoint:
from clearstone.debugging import CheckpointManager
manager = CheckpointManager(checkpoint_dir=".checkpoints")
checkpoint = manager.create_checkpoint(
agent=my_agent,
trace=execution_trace,
span_id="span_xyz"
)
checkpoint_path = manager.save_checkpoint(checkpoint)
Loading a Checkpoint:
checkpoint = manager.load_checkpoint("t1_ckpt_abc123.ckpt")
restored_agent = checkpoint.agent
execution_context = checkpoint.trace
How They Work Together¶
The three pillars integrate seamlessly:
- Tracing captures everything your agent does
- Policies enforce rules at runtime
- Testing validates behavior against historical traces
- Checkpoints enable time-travel debugging
Complete Example:
from clearstone import Policy, BLOCK, ALLOW, PolicyEngine, create_context, context_scope
from clearstone.observability import TracerProvider
from clearstone.testing import PolicyTestHarness, assert_tool_was_called
@Policy(name="block_expensive_tools", priority=100)
def block_expensive_tools(context):
tool_name = context.metadata.get("tool_name")
if tool_name == "gpt4_turbo":
return BLOCK("Expensive tool blocked")
return ALLOW
provider = TracerProvider(db_path="traces.db")
tracer = provider.get_tracer("cost_conscious_agent")
engine = PolicyEngine()
with tracer.span("agent_run"):
context = create_context(
user_id="user_1",
agent_id="agent_1",
metadata={"tool_name": "gpt4_turbo"}
)
with context_scope(context):
try:
engine.evaluate(context)
except Exception as e:
print(f"Blocked: {e}")
provider.shutdown()
harness = PolicyTestHarness("traces.db")
traces = harness.load_traces()
result = harness.simulate_policy(
assert_tool_was_called("gpt4_turbo", times=0),
traces
)
Key Design Principles¶
1. Declarative Over Imperative¶
Policies are written as simple functions, not complex state machines. You declare what should happen, not how to enforce it.
2. Zero Performance Impact¶
Tracing uses asynchronous batching and thread-safe operations to ensure zero impact on agent execution speed.
3. Composability¶
Policies can be combined using compose_and and compose_or to build complex rules from simple parts.
4. Fail-Safe Defaults¶
If a policy throws an error, the engine defaults to ALLOW and logs the error. The system never crashes due to a policy bug.
5. Testability First¶
Every feature is designed to be testable. Policies can be validated before deployment, and agent behavior can be tested against historical data.
Next Steps¶
- Governance Guide: Deep dive into writing and composing policies
- Observability Guide: Master distributed tracing
- Testing Guide: Learn behavioral testing and backtesting
- Time-Travel Debugging: Debug agents by traveling back in time