Testing API Reference¶
This page documents the complete API for Clearstone's AI-native testing framework.
PolicyTestHarness¶
The core testing tool for backtesting and behavioral assertions.
from clearstone.testing import PolicyTestHarness
harness = PolicyTestHarness("agent_traces.db")
traces = harness.load_traces(limit=100)
clearstone.testing.harness.PolicyTestHarness
¶
A tool for backtesting new governance policies against a database of historical execution traces.
__del__()
¶
Ensure the database connection is closed when the object is destroyed.
__init__(trace_db_path)
¶
Initializes the harness with a path to a Clearstone trace database.
load_traces(limit=100)
¶
Loads a set of historical traces from the database.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
limit
|
int
|
The maximum number of recent traces to load. |
100
|
simulate_policy(policy, traces)
¶
Simulates the impact of a trace-level policy against a set of historical traces.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
policy
|
Callable[[Trace], Decision]
|
A policy function that takes a Trace and returns a Decision. |
required |
traces
|
List[Trace]
|
A list of Trace objects to test against. |
required |
Returns:
| Type | Description |
|---|---|
PolicyTestResult
|
A PolicyTestResult object with a full report of the simulation. |
simulate_span_policy(policy, traces)
¶
Simulates the impact of a span-level policy against a set of historical traces.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
policy
|
Callable[[Span], Decision]
|
A policy function that takes a Span and returns a Decision. |
required |
traces
|
List[Trace]
|
A list of Trace objects to test against. |
required |
Returns:
| Type | Description |
|---|---|
PolicyTestResult
|
A PolicyTestResult object with a full report of the simulation. |
Behavioral Assertions¶
Pre-built assertion policies for validating agent behavior.
assert_tool_was_called¶
Verify a tool was called the expected number of times.
from clearstone.testing import assert_tool_was_called
policy = assert_tool_was_called("web_search", times=3)
result = harness.simulate_policy(policy, traces)
clearstone.testing.assertions.assert_tool_was_called(tool_name, times=None, reason=None)
¶
Creates a policy that asserts a specific tool was called.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
tool_name
|
str
|
The name of the tool to check for. |
required |
times
|
int
|
If provided, asserts the tool was called exactly this many times. |
None
|
reason
|
str
|
Custom failure message. |
None
|
assert_no_errors_in_trace¶
Validate that traces executed without errors.
from clearstone.testing import assert_no_errors_in_trace
policy = assert_no_errors_in_trace()
result = harness.simulate_policy(policy, traces)
clearstone.testing.assertions.assert_no_errors_in_trace(reason=None)
¶
Creates a policy that asserts no spans in the trace have an ERROR status.
assert_llm_cost_is_less_than¶
Ensure agent stays within budget.
from clearstone.testing import assert_llm_cost_is_less_than
policy = assert_llm_cost_is_less_than(0.50)
result = harness.simulate_policy(policy, traces)
clearstone.testing.assertions.assert_llm_cost_is_less_than(max_cost, reason=None)
¶
Creates a policy that asserts the total LLM cost of a trace is below a threshold.
assert_span_order¶
Validate workflow sequence is correct.
from clearstone.testing import assert_span_order
policy = assert_span_order(["plan", "search", "synthesize"])
result = harness.simulate_policy(policy, traces)
clearstone.testing.assertions.assert_span_order(span_names, reason=None)
¶
Creates a policy that asserts a specific sequence of spans occurred in order. Note: This is a simple subsequence check, not a strict adjacency check.
Test Result Models¶
TestResult¶
Contains the results of policy simulation.