Skip to content

Testing API Reference

This page documents the complete API for Clearstone's AI-native testing framework.

PolicyTestHarness

The core testing tool for backtesting and behavioral assertions.

from clearstone.testing import PolicyTestHarness

harness = PolicyTestHarness("agent_traces.db")
traces = harness.load_traces(limit=100)

clearstone.testing.harness.PolicyTestHarness

A tool for backtesting new governance policies against a database of historical execution traces.

__del__()

Ensure the database connection is closed when the object is destroyed.

__init__(trace_db_path)

Initializes the harness with a path to a Clearstone trace database.

load_traces(limit=100)

Loads a set of historical traces from the database.

Parameters:

Name Type Description Default
limit int

The maximum number of recent traces to load.

100

simulate_policy(policy, traces)

Simulates the impact of a trace-level policy against a set of historical traces.

Parameters:

Name Type Description Default
policy Callable[[Trace], Decision]

A policy function that takes a Trace and returns a Decision.

required
traces List[Trace]

A list of Trace objects to test against.

required

Returns:

Type Description
PolicyTestResult

A PolicyTestResult object with a full report of the simulation.

simulate_span_policy(policy, traces)

Simulates the impact of a span-level policy against a set of historical traces.

Parameters:

Name Type Description Default
policy Callable[[Span], Decision]

A policy function that takes a Span and returns a Decision.

required
traces List[Trace]

A list of Trace objects to test against.

required

Returns:

Type Description
PolicyTestResult

A PolicyTestResult object with a full report of the simulation.

Behavioral Assertions

Pre-built assertion policies for validating agent behavior.

assert_tool_was_called

Verify a tool was called the expected number of times.

from clearstone.testing import assert_tool_was_called

policy = assert_tool_was_called("web_search", times=3)
result = harness.simulate_policy(policy, traces)

clearstone.testing.assertions.assert_tool_was_called(tool_name, times=None, reason=None)

Creates a policy that asserts a specific tool was called.

Parameters:

Name Type Description Default
tool_name str

The name of the tool to check for.

required
times int

If provided, asserts the tool was called exactly this many times.

None
reason str

Custom failure message.

None

assert_no_errors_in_trace

Validate that traces executed without errors.

from clearstone.testing import assert_no_errors_in_trace

policy = assert_no_errors_in_trace()
result = harness.simulate_policy(policy, traces)

clearstone.testing.assertions.assert_no_errors_in_trace(reason=None)

Creates a policy that asserts no spans in the trace have an ERROR status.

assert_llm_cost_is_less_than

Ensure agent stays within budget.

from clearstone.testing import assert_llm_cost_is_less_than

policy = assert_llm_cost_is_less_than(0.50)
result = harness.simulate_policy(policy, traces)

clearstone.testing.assertions.assert_llm_cost_is_less_than(max_cost, reason=None)

Creates a policy that asserts the total LLM cost of a trace is below a threshold.

assert_span_order

Validate workflow sequence is correct.

from clearstone.testing import assert_span_order

policy = assert_span_order(["plan", "search", "synthesize"])
result = harness.simulate_policy(policy, traces)

clearstone.testing.assertions.assert_span_order(span_names, reason=None)

Creates a policy that asserts a specific sequence of spans occurred in order. Note: This is a simple subsequence check, not a strict adjacency check.

Test Result Models

TestResult

Contains the results of policy simulation.

result = harness.simulate_policy(policy, traces)

summary = result.summary()
print(f"Blocked: {summary['runs_blocked']}")
print(f"Block Rate: {summary['block_rate_percent']}")

clearstone.testing.harness.PolicyTestResult dataclass

Holds the results of a single policy backtest simulation.

summary()

Returns a dictionary summarizing the test results.