Architecture Overview¶
Clearstone SDK is built on a modular, extensible architecture designed for production-grade AI agent governance and observability.
System Architecture¶
The SDK is organized into several independent but complementary pillars:
clearstone/
├── core/ # Framework-agnostic governance engine
├── observability/ # Distributed tracing system
├── debugging/ # Time-travel debugging and checkpointing
├── testing/ # AI-native testing framework
├── integrations/ # Framework-specific adapters
│ ├── langchain/
│ └── future/
├── policies/ # Pre-built policy library
├── serialization/ # Hybrid JSON/pickle serialization
├── storage/ # SQLite persistence layer
└── utils/ # Cross-cutting utilities
Core Pillars¶
1. Governance (Policy-as-Code)¶
Location: clearstone/core/
The governance pillar provides declarative policy enforcement for AI agents.
Key Components: - PolicyEngine: Discovers, evaluates, and enforces policies - Policy Decorator: Registers functions as policies with metadata - PolicyContext: Immutable execution context passed to policies - Decision Actions: ALLOW, BLOCK, ALERT, PAUSE, REDACT
Design Characteristics: - Zero dependencies on AI frameworks - Composable policy logic - Fail-safe defaults (errors don't crash) - Performance metrics and audit trails built-in
2. Observability (Distributed Tracing)¶
Location: clearstone/observability/
OpenTelemetry-aligned distributed tracing for complete agent visibility.
Key Components: - TracerProvider: Entry point for tracing system - Tracer: Creates and manages spans - Span: Represents a single operation with timing/metadata - TraceStore: SQLite-based persistence with WAL mode
Design Characteristics: - Non-blocking capture (< 1μs overhead) - Automatic parent-child span linking - Batched writes for performance - Thread-safe operations
3. Testing & Backtesting¶
Location: clearstone/testing/
AI-native testing framework for validating agent behavior.
Key Components: - PolicyTestHarness: Simulates policy enforcement on historical traces - Behavioral Assertions: Declarative tests for agent behavior - TestResult: Impact analysis and metrics
Design Characteristics: - Tests "how" agents behave, not just "what" they return - Historical backtesting against production data - pytest integration - Regression prevention
4. Time-Travel Debugging¶
Location: clearstone/debugging/
Checkpoint and replay agent execution from any point in history.
Key Components: - CheckpointManager: Creates, saves, and loads agent snapshots - ReplayEngine: Restores agent state and enables debugging - DeterministicExecutionContext: Mocks non-deterministic functions
Design Characteristics: - Complete state preservation - Deterministic replay - Interactive debugging sessions - Upstream span tracking
Cross-Cutting Concerns¶
Serialization¶
Location: clearstone/serialization/
Hybrid JSON-first approach with pickle fallback for complex objects.
Features: - Automatic format selection based on serializability - Size limits for safety - Human-readable JSON when possible
Storage¶
Location: clearstone/storage/
SQLite-based persistence with production-grade features.
Features:
- Write-Ahead Logging (WAL) for concurrency
- Asynchronous batching with SpanBuffer
- Thread-safe operations
- Efficient indexing and querying
Integrations¶
Location: clearstone/integrations/
Framework-specific adapters that keep the core framework-agnostic.
Current:
- LangChain: PolicyCallbackHandler for automatic policy enforcement
Future: - CrewAI integration - AutoGen integration - Custom framework support
Data Flow¶
Policy Enforcement Flow¶
User Action
↓
LangChain Callback
↓
PolicyCallbackHandler (integration)
↓
PolicyEngine.evaluate() (core)
↓
Policy Functions (sorted by priority)
↓
Decision (ALLOW/BLOCK/PAUSE/ALERT/REDACT)
↓
Action Taken or Exception Raised
Tracing Flow¶
Agent Operation
↓
tracer.span() context manager
↓
Span creation (in-memory)
↓
SpanBuffer (batching)
↓
TraceStore (SQLite with WAL)
↓
Queryable trace data
Testing Flow¶
Production Agent Run
↓
Traces captured to SQLite
↓
PolicyTestHarness.load_traces()
↓
PolicyTestHarness.simulate_policy()
↓
TestResult with impact analysis
↓
pytest assertions
Design Principles¶
Clearstone's architecture follows these key principles:
- Modularity: Each pillar can be used independently
- Framework Agnostic: Core SDK has zero AI framework dependencies
- Production Ready: Thread-safe, performant, fail-safe
- Developer Friendly: Standard Python patterns, strong typing
- Extensible: Easy to add new integrations and policies
See the Design Philosophy document for deeper insights into our architectural decisions.
Performance Characteristics¶
PolicyEngine¶
- Evaluation Time: < 1ms per policy (typical)
- Memory Overhead: Minimal (policies are pure functions)
- Concurrency: Thread-safe evaluation
Tracing¶
- Capture Overhead: < 1μs per span
- Write Throughput: 10,000+ spans/second with batching
- Storage Growth: ~1KB per span (depends on attributes)
- Query Performance: Indexed by trace_id, parent_span_id
Testing¶
- Simulation Speed: 1,000+ traces/second
- Memory Usage: Loads traces in batches (configurable)
Deployment Considerations¶
Database¶
- SQLite with WAL mode (concurrent reads, single writer)
- Recommended: Regular backups of trace database
- Optional: External storage for long-term archive
Scaling¶
- Vertical: Single process handles 10,000+ policy evaluations/second
- Horizontal: Each process has its own SQLite database
- Aggregation: Use external tools to aggregate traces from multiple processes
Monitoring¶
- Built-in
PolicyMetricsfor policy performance - Built-in
AuditTrailfor compliance logging - Export traces to external systems via TraceStore API
Next Steps¶
- Design Philosophy: Understand our architectural decisions
- Core Concepts: Learn the three pillars
- API Reference: Complete API documentation
- Contributing: Help improve the architecture