Testing API Reference¶

This page documents the complete API for Clearstone's AI-native testing framework.

PolicyTestHarness¶

The core testing tool for backtesting and behavioral assertions.

from clearstone.testing import PolicyTestHarness

harness = PolicyTestHarness("agent_traces.db")
traces = harness.load_traces(limit=100)

`clearstone.testing.harness.PolicyTestHarness` ¶

A tool for backtesting new governance policies against a database of historical execution traces.

`del()` ¶

Ensure the database connection is closed when the object is destroyed.

`init(trace_db_path)` ¶

Initializes the harness with a path to a Clearstone trace database.

`load_traces(limit=100)` ¶

Loads a set of historical traces from the database.

Parameters:

Name	Type	Description	Default
`limit`	`int`	The maximum number of recent traces to load.	`100`

`simulate_policy(policy, traces)` ¶

Simulates the impact of a trace-level policy against a set of historical traces.

Parameters:

Name	Type	Description	Default
`policy`	`Callable[[Trace], Decision]`	A policy function that takes a Trace and returns a Decision.	required
`traces`	`List[Trace]`	A list of Trace objects to test against.	required

Returns:

Type	Description
`PolicyTestResult`	A PolicyTestResult object with a full report of the simulation.

`simulate_span_policy(policy, traces)` ¶

Simulates the impact of a span-level policy against a set of historical traces.

Parameters:

Name	Type	Description	Default
`policy`	`Callable[[Span], Decision]`	A policy function that takes a Span and returns a Decision.	required
`traces`	`List[Trace]`	A list of Trace objects to test against.	required

Returns:

Type	Description
`PolicyTestResult`	A PolicyTestResult object with a full report of the simulation.

Behavioral Assertions¶

Pre-built assertion policies for validating agent behavior.

assert_tool_was_called¶

Verify a tool was called the expected number of times.

from clearstone.testing import assert_tool_was_called

policy = assert_tool_was_called("web_search", times=3)
result = harness.simulate_policy(policy, traces)

`clearstone.testing.assertions.assert_tool_was_called(tool_name, times=None, reason=None)` ¶

Creates a policy that asserts a specific tool was called.

Parameters:

Name	Type	Description	Default
`tool_name`	`str`	The name of the tool to check for.	required
`times`	`int`	If provided, asserts the tool was called exactly this many times.	`None`
`reason`	`str`	Custom failure message.	`None`

assert_no_errors_in_trace¶

Validate that traces executed without errors.

from clearstone.testing import assert_no_errors_in_trace

policy = assert_no_errors_in_trace()
result = harness.simulate_policy(policy, traces)

`clearstone.testing.assertions.assert_no_errors_in_trace(reason=None)` ¶

Creates a policy that asserts no spans in the trace have an ERROR status.

assert_llm_cost_is_less_than¶

Ensure agent stays within budget.

from clearstone.testing import assert_llm_cost_is_less_than

policy = assert_llm_cost_is_less_than(0.50)
result = harness.simulate_policy(policy, traces)

`clearstone.testing.assertions.assert_llm_cost_is_less_than(max_cost, reason=None)` ¶

Creates a policy that asserts the total LLM cost of a trace is below a threshold.

assert_span_order¶

Validate workflow sequence is correct.

from clearstone.testing import assert_span_order

policy = assert_span_order(["plan", "search", "synthesize"])
result = harness.simulate_policy(policy, traces)

`clearstone.testing.assertions.assert_span_order(span_names, reason=None)` ¶

Creates a policy that asserts a specific sequence of spans occurred in order. Note: This is a simple subsequence check, not a strict adjacency check.

Test Result Models¶

TestResult¶

Contains the results of policy simulation.

result = harness.simulate_policy(policy, traces)

summary = result.summary()
print(f"Blocked: {summary['runs_blocked']}")
print(f"Block Rate: {summary['block_rate_percent']}")

`clearstone.testing.harness.PolicyTestResult` `dataclass` ¶

Holds the results of a single policy backtest simulation.

`summary()` ¶

Returns a dictionary summarizing the test results.

Testing API Reference¶

PolicyTestHarness¶

clearstone.testing.harness.PolicyTestHarness ¶

__del__() ¶

__init__(trace_db_path) ¶

load_traces(limit=100) ¶

simulate_policy(policy, traces) ¶

simulate_span_policy(policy, traces) ¶

Behavioral Assertions¶

assert_tool_was_called¶

clearstone.testing.assertions.assert_tool_was_called(tool_name, times=None, reason=None) ¶

assert_no_errors_in_trace¶

clearstone.testing.assertions.assert_no_errors_in_trace(reason=None) ¶

assert_llm_cost_is_less_than¶

clearstone.testing.assertions.assert_llm_cost_is_less_than(max_cost, reason=None) ¶

assert_span_order¶

clearstone.testing.assertions.assert_span_order(span_names, reason=None) ¶

Test Result Models¶

TestResult¶

clearstone.testing.harness.PolicyTestResult dataclass ¶

summary() ¶

`clearstone.testing.harness.PolicyTestHarness` ¶

`del()` ¶

`init(trace_db_path)` ¶

`load_traces(limit=100)` ¶

`simulate_policy(policy, traces)` ¶

`simulate_span_policy(policy, traces)` ¶

`clearstone.testing.assertions.assert_tool_was_called(tool_name, times=None, reason=None)` ¶

`clearstone.testing.assertions.assert_no_errors_in_trace(reason=None)` ¶

`clearstone.testing.assertions.assert_llm_cost_is_less_than(max_cost, reason=None)` ¶

`clearstone.testing.assertions.assert_span_order(span_names, reason=None)` ¶

`clearstone.testing.harness.PolicyTestResult` `dataclass` ¶

`summary()` ¶