Architecture Overview¶

Clearstone SDK is built on a modular, extensible architecture designed for production-grade AI agent governance and observability.

System Architecture¶

The SDK is organized into several independent but complementary pillars:

clearstone/
├── core/               # Framework-agnostic governance engine
├── observability/      # Distributed tracing system
├── debugging/          # Time-travel debugging and checkpointing
├── testing/            # AI-native testing framework
├── integrations/       # Framework-specific adapters
│   ├── langchain/
│   └── future/
├── policies/           # Pre-built policy library
├── serialization/      # Hybrid JSON/pickle serialization
├── storage/            # SQLite persistence layer
└── utils/              # Cross-cutting utilities

Core Pillars¶

1. Governance (Policy-as-Code)¶

Location: clearstone/core/

The governance pillar provides declarative policy enforcement for AI agents.

Key Components: - PolicyEngine: Discovers, evaluates, and enforces policies - Policy Decorator: Registers functions as policies with metadata - PolicyContext: Immutable execution context passed to policies - Decision Actions: ALLOW, BLOCK, ALERT, PAUSE, REDACT

Design Characteristics: - Zero dependencies on AI frameworks - Composable policy logic - Fail-safe defaults (errors don't crash) - Performance metrics and audit trails built-in

2. Observability (Distributed Tracing)¶

Location: clearstone/observability/

OpenTelemetry-aligned distributed tracing for complete agent visibility.

Key Components: - TracerProvider: Entry point for tracing system - Tracer: Creates and manages spans - Span: Represents a single operation with timing/metadata - TraceStore: SQLite-based persistence with WAL mode

Design Characteristics: - Non-blocking capture (< 1μs overhead) - Automatic parent-child span linking - Batched writes for performance - Thread-safe operations

3. Testing & Backtesting¶

Location: clearstone/testing/

AI-native testing framework for validating agent behavior.

Key Components: - PolicyTestHarness: Simulates policy enforcement on historical traces - Behavioral Assertions: Declarative tests for agent behavior - TestResult: Impact analysis and metrics

Design Characteristics: - Tests "how" agents behave, not just "what" they return - Historical backtesting against production data - pytest integration - Regression prevention

4. Time-Travel Debugging¶

Location: clearstone/debugging/

Checkpoint and replay agent execution from any point in history.

Key Components: - CheckpointManager: Creates, saves, and loads agent snapshots - ReplayEngine: Restores agent state and enables debugging - DeterministicExecutionContext: Mocks non-deterministic functions

Design Characteristics: - Complete state preservation - Deterministic replay - Interactive debugging sessions - Upstream span tracking

Cross-Cutting Concerns¶

Serialization¶

Location: clearstone/serialization/

Hybrid JSON-first approach with pickle fallback for complex objects.

Features: - Automatic format selection based on serializability - Size limits for safety - Human-readable JSON when possible

Storage¶

Location: clearstone/storage/

SQLite-based persistence with production-grade features.

Features: - Write-Ahead Logging (WAL) for concurrency - Asynchronous batching with SpanBuffer - Thread-safe operations - Efficient indexing and querying

Integrations¶

Location: clearstone/integrations/

Framework-specific adapters that keep the core framework-agnostic.

Current: - LangChain: PolicyCallbackHandler for automatic policy enforcement

Future: - CrewAI integration - AutoGen integration - Custom framework support

Data Flow¶

Policy Enforcement Flow¶

User Action
    ↓
LangChain Callback
    ↓
PolicyCallbackHandler (integration)
    ↓
PolicyEngine.evaluate() (core)
    ↓
Policy Functions (sorted by priority)
    ↓
Decision (ALLOW/BLOCK/PAUSE/ALERT/REDACT)
    ↓
Action Taken or Exception Raised

Tracing Flow¶

Agent Operation
    ↓
tracer.span() context manager
    ↓
Span creation (in-memory)
    ↓
SpanBuffer (batching)
    ↓
TraceStore (SQLite with WAL)
    ↓
Queryable trace data

Testing Flow¶

Production Agent Run
    ↓
Traces captured to SQLite
    ↓
PolicyTestHarness.load_traces()
    ↓
PolicyTestHarness.simulate_policy()
    ↓
TestResult with impact analysis
    ↓
pytest assertions

Design Principles¶

Clearstone's architecture follows these key principles:

Modularity: Each pillar can be used independently
Framework Agnostic: Core SDK has zero AI framework dependencies
Production Ready: Thread-safe, performant, fail-safe
Developer Friendly: Standard Python patterns, strong typing
Extensible: Easy to add new integrations and policies

See the Design Philosophy document for deeper insights into our architectural decisions.

Performance Characteristics¶

PolicyEngine¶

Evaluation Time: < 1ms per policy (typical)
Memory Overhead: Minimal (policies are pure functions)
Concurrency: Thread-safe evaluation

Tracing¶

Capture Overhead: < 1μs per span
Write Throughput: 10,000+ spans/second with batching
Storage Growth: ~1KB per span (depends on attributes)
Query Performance: Indexed by trace_id, parent_span_id

Testing¶

Simulation Speed: 1,000+ traces/second
Memory Usage: Loads traces in batches (configurable)

Deployment Considerations¶

Database¶

SQLite with WAL mode (concurrent reads, single writer)
Recommended: Regular backups of trace database
Optional: External storage for long-term archive

Scaling¶

Vertical: Single process handles 10,000+ policy evaluations/second
Horizontal: Each process has its own SQLite database
Aggregation: Use external tools to aggregate traces from multiple processes

Monitoring¶

Built-in PolicyMetrics for policy performance
Built-in AuditTrail for compliance logging
Export traces to external systems via TraceStore API

Next Steps¶

Design Philosophy: Understand our architectural decisions
Core Concepts: Learn the three pillars
API Reference: Complete API documentation
Contributing: Help improve the architecture