| Field | Value |
|---|---|
| Status | Active |
| Maintainer | Boxuan Li |
| Date | April 2026 |
| Changelog | v1.7 |
The Agent Trajectory Interchange Format (ATIF) is a standardized, JSON-based specification for logging the complete interaction history of autonomous LLM agents. ATIF is designed to unify the distinct data requirements of conversational logs, explicit action sequences (MiniSweAgent1), and replayable data structures (OpenHands), ensuring collected data is immediately usable across debugging, visualization, Supervised Fine-Tuning (SFT), and Reinforcement Learning (RL) pipelines.
This format will serve as the standardized data logging methodology for the Harbor project.
For the purpose of ATIF, a trajectory is defined as a sequence of interactions between a user and an agent, including the agent's internal reasoning, actions, and observations. The trajectory captures the complete interaction history, including all user messages (initial and subsequent), agent responses, tool executions, and environment feedback. This design supports both single-turn tasks and multi-turn conversational interactions.
v1.7 (Current)
- Added
extrafield toToolCallSchemafor custom tool-call metadata - Added
extrafield toObservationResultSchemafor custom observation metadata - Added
subagent_trajectoriesfield to rootTrajectoryfor single-file subagent embedding - Added
trajectory_idfield toTrajectoryas the per-document identifier for embedded-subagent resolution (avoids overloadingsession_idwith document-level uniqueness).trajectory_idis REQUIRED on embedded subagents and MUST be unique within a parent'ssubagent_trajectoriesarray. - Relaxed
Trajectory.session_idfrom Required to Optional and clarified its semantics:session_idis run-scoped (not document-scoped), MAY be shared across sibling subagents, continuation trajectories, or omitted on embedded subagents that inherit the parent's run identity.session_idis no longer overloaded as the canonical matching key for subagent references. - Added
trajectory_idfield toSubagentTrajectoryRefas the canonical resolution key for embedded references. Madesession_idoptional on the ref and reclassified it as informational only —session_idis run-scoped and is NOT a valid resolution key (two sibling subagents MAY legitimately share asession_id). A ref MUST set at least one oftrajectory_id(embedded form) ortrajectory_path(file-ref form) so it is resolvable;session_idalone is insufficient. Breaking vs. v1.6: in v1.6session_idwas required onSubagentTrajectoryRefand served as the resolution key, so a ref of the shape{"session_id": "..."}(notrajectory_path) was valid; under v1.7 such a ref no longer validates. Producers MUST migrate by settingtrajectory_id(and a correspondingtrajectory_idon the embedded subagent insubagent_trajectories) for embedded refs, ortrajectory_pathfor external-file refs. - Added
llm_call_countfield toStepObjectfor multi-LLM-call step representation - Added
context_managementconvention for system steps that transform the agent's context window (see Section VII) - Resolved no-LLM orchestration by defining
llm_call_count = 0semantics onsource: "agent"steps to signal deterministic dispatch - Note: contributed by Bryan Bednarski and Anuradha Karuppiah from NVIDIA
v1.6
- Added multimodal content support for images in trajectories
- Added
ContentPartSchemafor representing mixed text/image content - Added
ImageSourceSchemafor referencing image files stored alongside trajectories - Extended
messagefield inStepObjectto accept either a string or array ofContentPartobjects - Extended
contentfield inObservationResultSchemato accept either a string or array ofContentPartobjects - Images are stored as separate files (e.g., in an
images/subdirectory) and referenced by relative path - Added
has_multimodal_content()method to Trajectory model for checking if trajectory contains images
v1.5
- Added optional
tool_definitionsfield toAgentSchemafor storing tool/function definitions - Enables proper tool call definitions for SFT training pipelines
v1.4
- Added optional
prompt_token_idsfield toMetricsSchemafor storing prompt token IDs prompt_token_idscontains the actual tokens sent to the LLM in that turn, including previous chat history if applicable
v1.3
- Added optional
completion_token_idsfield toMetricsSchemafor storing completion token IDs - Token IDs enable accurate reinforcement learning training by avoiding retokenization drift
v1.2
- Extended
observationfield to support system steps for tracking system-initiated operations - System steps can now include observations for events such as subagent delegation, context management, environment resets, or checkpoint creation
- Clarified that
prompt_tokensincludes ALL input tokens (both cached and non-cached), withcached_tokenstracking the subset that were cache hits
v1.1
- Added optional
extrafield at root level for custom metadata not covered by the core schema
v1.0
- Initial specification release
The root object stores global context and a flexible field for custom information:
| Field | Type | Status | Description |
|---|---|---|---|
| schema_version | String | Required | String defining ATIF compatibility (e.g., "ATIF-v1.7"). |
| session_id | String | Optional | Identifier for the agent run this trajectory belongs to. Run-scoped, not document-scoped: multiple Trajectory objects MAY share the same session_id when they represent the same logical run (e.g., a parent trajectory and its embedded subagents, or a trajectory and its continuation segments linked via continued_trajectory_ref). session_ids within a parent's subagent_trajectories array are therefore NOT required to be unique. Use trajectory_id when a per-trajectory-document unique identifier is required. Required in v1.6 and earlier; relaxed to Optional in v1.7. Producers SHOULD set this on root trajectories for run-level traceability, and MAY omit it on embedded subagents that inherit the parent's run identity. |
| trajectory_id | String | Optional | Canonical per-trajectory-document identifier, distinct from session_id. Unlike session_id (which is run-scoped and MAY be shared), trajectory_id uniquely identifies THIS trajectory object. Used to resolve SubagentTrajectoryRef entries against the root's subagent_trajectories array without overloading session_id's run-scoped semantics. Optional on standalone trajectories, but REQUIRED on any trajectory embedded in a parent's subagent_trajectories array. trajectory_ids within a single parent's subagent_trajectories array MUST be unique. Added in ATIF-v1.7. |
| agent | Object | Required | Object specifying the agent configuration (name, version, and optional custom fields). See AgentSchema below. |
| steps | Array | Required | Array of step objects representing the complete interaction history, including user messages, agent responses, tool calls, and observations. |
| notes | String | Optional | A string field for developers to include custom information, design notes, or explanations for format discrepancies. |
| final_metrics | Object | Optional | Summary metrics for the entire trajectory. See FinalMetricsSchema below. |
| continued_trajectory_ref | String | Optional | Reference to the continuation trajectory file if this trajectory is continued in another file. Enables agents to link trajectory segments when context management strategies (e.g., summarization) produce multiple trajectory files. |
| extra | Object | Optional | Object for custom root-level metadata not covered by the core schema. |
| subagent_trajectories | Array | Optional | Array of embedded subagent trajectories. Each element is a complete, independently-valid ATIF Trajectory object with its own schema_version, trajectory_id, agent, and step_id sequence starting at 1 — the same schema and validation rules as the parent trajectory. Enables single-file storage of multi-agent workflows: when a SubagentTrajectoryRef.trajectory_path is null, consumers resolve the reference by matching SubagentTrajectoryRef.trajectory_id against Trajectory.trajectory_id of entries in this array. Uniqueness rules: every embedded subagent MUST set trajectory_id, and trajectory_ids within this array MUST be unique. session_id, by contrast, is run-scoped and MAY collide across siblings (or match the parent's session_id) when all trajectories belong to the same logical run; embedded subagents MAY also omit session_id entirely to inherit the parent's run identity. When trajectory_path is set, the reference points at an external file instead; embedded and file-ref forms MAY be mixed within the same parent trajectory. |
The required agent object identifies the agent system used for the trajectory. The name and version fields are required, while model_name, tool_definitions, and extra are optional.
| Field | Type | Status | Description |
|---|---|---|---|
| name | String | Required | The name of the agent system (e.g., "openhands", "claude-code", "mini-swe-agent"). |
| version | String | Required | The version identifier of the agent system (e.g., "1.0.0", "v2.3.1"). |
| model_name | String | Optional | Default LLM model used for this trajectory (e.g., "gemini-2.5-flash", "claude-3-5-sonnet"). Step-level model_name overrides this if specified. |
| tool_definitions | Array | Optional | Array of tool/function definitions available to the agent. Each element follows OpenAI's function calling schema with type and function fields containing the tool's signature and docs. |
| extra | Object | Optional | Object for custom agent configuration details not covered by the core schema (e.g., prompting strategy, custom parameters). |
The optional final_metrics object provides aggregate statistics for the entire trajectory. All fields within the optional final_metrics object are optional.
| Field | Type | Status | Description |
|---|---|---|---|
| total_prompt_tokens | Integer | Optional | Sum of all prompt tokens across all steps in the trajectory. |
| total_completion_tokens | Integer | Optional | Sum of all completion tokens across all steps in the trajectory. |
| total_cached_tokens | Integer | Optional | Sum of all cached tokens across all steps in the trajectory. |
| total_cost_usd | Float | Optional | Total real monetary cost for the entire trajectory in USD unit. |
| total_steps | Integer | Optional | Total number of steps (can be unequal to length of steps array if explained in notes). |
| extra | Object | Optional | Object for custom aggregate metrics not covered by the core schema. |
The steps array contains all interaction turns. Each StepObject represents either a system prompt, a user message, or a complete agent turn (LLM inference, action execution, and observation receipt).
| Field | Type | Status | Description |
|---|---|---|---|
| step_id | Integer | Required | Ordinal index of the turn (starting from 1). |
| timestamp | String | Optional | ISO 8601 timestamp indicating when this step occurred (e.g., "2025-10-16T14:30:00Z"). |
| source | String | Required | The originator of this step. Must be one of: "system" (for system prompts), "user" (for user messages), or "agent" (for agent responses). |
| model_name | String | Optional | The specific LLM model used for this turn (e.g., gemini-2.5-flash). Only applicable when source is "agent". If omitted, the model can be inferred from the top-level agent configuration. |
| reasoning_effort | String | Float | Optional | Qualitative or quantitative measure of effort (e.g., low, medium, or a float score) assigned to this step. Only applicable when source is "agent". |
| message | String | Array | Required | The dialogue message. For text-only content, this is a string. For multimodal content (v1.6+), this can be an array of ContentPart objects. For system steps, this is the system prompt. For user steps, this is the user's prompt or instruction. For agent steps, this is the assistant's response. This field is required but can be an empty string. |
| reasoning_content | String | Optional | String field detailing the agent's explicit internal reasoning. Only applicable when source is "agent". |
| tool_calls | Array | Optional | An array of structured objects for the agent's action(s). A single LLM output may contain multiple tool calls. Only applicable when source is "agent". See ToolCallSchema below. |
| observation | Object | Optional | Environment feedback/result after actions or system events. For agent steps, this contains results from tool calls, non-tool actions, or subagent delegation. For system steps, this may contain results from system-initiated operations (e.g., subagent delegation, context management, environment reset, checkpoint creation). See ObservationSchema below. |
| metrics | Object | Optional | Object containing all LLM operational and confidence data for this step, including RL-specific fields (reward, log*probs) if applicable. Only applicable when source is "agent". See MetricsSchema below. |
| extra | Object | Optional | Object for custom step-level metadata not covered by the core schema. Applicable to all step types (system, user, and agent). |
| llm_call_count | Integer | Optional | Number of LLM inferences this step represents. When llm_call_count > 1, the metrics are aggregated across multiple LLM calls and per-call attribution is unavailable. When 1, the step represents exactly one inference. When 0 on a source: "agent" step, the step represents a deterministic (non-LLM) dispatch — a graph engine, rule-based pipeline, or eval harness that issued tool_calls without an LLM inference; metrics and reasoning_content MUST be absent on such steps, and SFT pipelines MUST filter them out. When null, the producer did not track this (backward-compatible default). Applicable to all step types. |
| is_copied_context | Boolean | Optional | Indicates this step was copied from a prior trajectory into the current trajectory for context purposes (e.g., steps retained across a summarization/compression boundary). When True, producers assert the step is not a new interaction and consumers MUST filter it out of supervised fine-tuning (SFT) training data. Absent or None means the step is a new interaction produced in the current trajectory scope. See normative usage below. Applicable to all step types. |
Normative usage of is_copied_context: Producers MUST set is_copied_context = True on all steps that are copied from a prior trajectory into the current trajectory for context purposes. This includes steps inserted during context compression events (e.g., after a summarization pass that replaces older steps with a compressed summary, the retained "prior context" steps that are copied into the new trajectory scope). Steps with is_copied_context = True MUST NOT be included in supervised fine-tuning (SFT) training data, as they represent previously-trained interactions whose contribution to the agent's behavior has already been captured. Consumers reading ATIF trajectories for SFT purposes MUST filter out steps where is_copied_context = True before constructing training examples. If is_copied_context is absent or None, the step is assumed to be a new interaction produced in the current trajectory scope.
One-LLM-per-step convention: Exporters SHOULD emit one ATIF step per LLM inference when the underlying framework provides per-call event boundaries (e.g., LLM_START/LLM_END). When an exporter cannot split calls (e.g., an opaque tool with multiple internal LLM calls), it MUST set llm_call_count to the actual count so consumers can detect aggregated metrics.
The optional tool_calls array contains structured objects representing function or tool invocations made by the agent. Each element follows this schema:
| Field | Type | Status | Description |
|---|---|---|---|
| tool_call_id | String | Required | Unique identifier for this specific tool call. Used to correlate with observation results via source_call_id. |
| function_name | String | Required | The name of the function or tool being invoked (e.g., "financial_search", "file_write", "web_search"). |
| arguments | Object | Required | Object containing the arguments passed to the function. Must be a valid JSON object, but can be empty ({}) if no arguments needed. |
| extra | Object | Optional | Object for custom tool-call-level metadata not covered by the core schema (e.g., timeout, retry count, tool version). |
All fields within the optional metrics object are optional.
| Field | Type | Status | Description |
|---|---|---|---|
| prompt_tokens | Integer | Optional | Total input tokens sent to the model for this turn, including both cached and non-cached tokens. This represents the full size of the prompt (system prompt, history, tool definitions, etc.) that was processed by the model, regardless of whether some tokens were served from cache. |
| completion_tokens | Integer | Optional | Total tokens generated by the LLM response (including reasoning and tool calls). |
| cached_tokens | Integer | Optional | Subset of prompt_tokens that were cache hits from prompt caching (e.g., a prefix or history cache). This counts tokens that were reused from cache rather than reprocessed. This is included in the prompt_tokens count, not separate from it. |
| cost_usd | Float | Optional | Monetary cost of the API call based on current provider pricing for this step. |
| prompt_token_ids | Array | Optional | Array of integer token IDs for the prompt (input) tokens sent to the LLM in this step, including previous chat history if applicable. Enables accurate analysis and debugging of prompt tokenization. Length should match prompt_tokens count. |
| completion_token_ids | Array | Optional | Array of integer token IDs for the completion (response) tokens generated in this step. Enables accurate RL training by avoiding retokenization drift. Should align with logprobs array if both are present. Length should match completion_tokens count. |
| logprobs | Array | Optional | Array of log probabilities for each completion token.2 Should align with completion_token_ids array if both are present. Length should match completion_tokens count. |
| extra | Object | Optional | Object for provider-specific or experimental metrics not covered by the core schema (e.g., reasoning_tokens, cache_creation_input_tokens). |
Token Accounting and Cost Calculation:
ATIF defines prompt_tokens as the total count of all input tokens (both cached and non-cached), with cached_tokens tracking the subset that were cache hits. This provides a complete view of prompt size while also tracking cache efficiency.
To calculate the total cost for a step:
non_cached_prompt_tokens = prompt_tokens - cached_tokens
cost_usd = (non_cached_prompt_tokens × cost_per_input_token) +
(cached_tokens × cost_per_cached_token) +
(completion_tokens × cost_per_completion_token)
Similarly, for trajectory-level costs in final_metrics:
non_cached_total = total_prompt_tokens - total_cached_tokens
total_cost_usd = (non_cached_total × cost_per_input_token) +
(total_cached_tokens × cost_per_cached_token) +
(total_completion_tokens × cost_per_completion_token)
Where:
total_prompt_tokens= sum of allprompt_tokensacross steps (includes all input tokens, both cached and non-cached)total_cached_tokens= sum of allcached_tokensacross steps (subset of total_prompt_tokens)total_completion_tokens= sum of allcompletion_tokensacross steps
Note that ATIF does not record per-token pricing information because:
- Pricing can change over time, making historical trajectories inaccurate
- Most agent frameworks don't record pricing, requiring a lookup table for conversion
- Pricing varies by provider, tier, and region
The cost_usd and total_cost_usd fields store the calculated cost at the time of execution, providing a snapshot without coupling the format to specific pricing models.
Important Note on Additional Cost Factors:
Some LLM providers charge for additional factors beyond the standard token types. For example, Anthropic charges separately for cache_creation_input_tokens when new prompt cache entries are created. These additional cost factors should be recorded in the extra field within the metrics object.
When such additional cost factors exist, the simplified cost formula above may not accurately represent the actual cost. In these cases:
- Record the actual cost in the
cost_usdfield if available from the provider - Store additional token metrics (e.g.,
cache_creation_input_tokens) inmetrics.extra - The presence of values in
metrics.extrasignals that the standard formula may be incomplete
This approach ensures that actual costs are preserved accurately while maintaining the flexibility to accommodate provider-specific pricing models.
The observation object records results from the environment or system events. For agent steps, results may stem from structured tool_calls, agent actions that don't use standard tool calling mechanisms, or subagent delegation. For system steps, observations may contain results from system-initiated operations such as subagent delegation, context management, environment resets, checkpoint creation, or other infrastructure-level events.
| Field | Type | Status | Description |
|---|---|---|---|
| results | Array | Required | Array of result objects, each containing feedback from a single tool call or action. |
Each element in the results array follows this schema:
| Field | Type | Status | Description |
|---|---|---|---|
| source_call_id | String | Optional | The tool_call_id from the tool_calls array in StepObject that this result corresponds to. If null or omitted, the result comes from an action that doesn't use the standard tool calling format (e.g., agent actions without tool calls or system-initiated operations). |
| content | String | Array | Optional | The output or result from the tool execution or action. For text-only content, this is a string. For multimodal content (v1.6+), this can be an array of ContentPart objects. May be omitted when subagent_trajectory_ref is present. |
| subagent_trajectory_ref | Array | Optional | Array of references to delegated subagent trajectories. Each element follows SubagentTrajectoryRefSchema. Use a singleton array for a single subagent. |
| extra | Object | Optional | Object for custom observation-result-level metadata not covered by the core schema (e.g., confidence score, retrieval score, source document ID). |
Example:
{
"results": [
{
"source_call_id": "call_price_1",
"content": "GOOGL is currently trading at $185.35"
}
]
}Example (observation result with custom metadata):
{
"source_call_id": "call_search_001",
"content": "NVIDIA announces new GPU architecture...",
"extra": {
"retrieval_score": 0.92,
"source_doc_id": "doc-4821"
}
}For multimodal content, the message field in StepObject and the content field in ObservationResultSchema can contain an array of ContentPart objects instead of a plain string. Each ContentPart represents either text or an image.
| Field | Type | Status | Description |
|---|---|---|---|
| type | String | Required | The type of content. Must be one of: "text" or "image". |
| text | String | Conditional | The text content. Required when type is "text", must be omitted when type is "image". |
| source | Object | Conditional | The image source specification. Required when type is "image", must be omitted when type is "text". See ImageSourceSchema below. |
Example (multimodal message with text and image):
{
"message": [
{ "type": "text", "text": "What is in this image?" },
{ "type": "image", "source": { "media_type": "image/png", "path": "images/step_1_input.png" } }
]
}Images are stored as separate files alongside the trajectory JSON file and referenced by path or URL. This avoids bloating the trajectory file with base64-encoded image data.
| Field | Type | Status | Description |
|---|---|---|---|
| media_type | String | Required | MIME type of the image. Must be one of: "image/jpeg", "image/png", "image/gif", or "image/webp". |
| path | String | Required | Location of the image. Can be a relative or absolute file path, or a URL. |
Image Storage Convention:
For local storage, images are typically stored in an images/ subdirectory relative to the trajectory file. For example, if the trajectory is at agent/trajectory.json, images would be stored at agent/images/ with relative paths like "images/screenshot.png".
Absolute file paths and URLs are also supported.
For multi-agent systems or hierarchical agent architectures, an observation result may reference a complete subagent trajectory. This enables tracking of recursive or delegated agent workflows where a parent agent spawns subagents to handle specific subtasks.
| Field | Type | Status | Description |
|---|---|---|---|
| trajectory_id | String | Conditional | Canonical identifier of the delegated subagent trajectory, used to resolve embedded references. Matches Trajectory.trajectory_id of an entry in the parent's subagent_trajectories array. Added in ATIF-v1.7. At least one of trajectory_id or trajectory_path MUST be set so the ref is resolvable. |
| trajectory_path | String | Conditional | Location of the complete subagent trajectory as an external file (file path, S3 URL, database reference, etc.), used to resolve file-ref references. At least one of trajectory_id or trajectory_path MUST be set so the ref is resolvable. |
| session_id | String | Optional | Run identity of the delegated subagent trajectory. Informational only — recorded so consumers can correlate this ref back to the subagent's run for debug / search / display purposes. Run-scoped (see Trajectory.session_id) and therefore NOT a valid resolution key; consumers MUST NOT use session_id alone to resolve a ref. |
| extra | Object | Optional | Object for custom metadata about the subagent execution (e.g., summary, exit status, performance metrics). |
Resolution mechanisms. A SubagentTrajectoryRef has exactly two resolution mechanisms:
- Embedded form —
trajectory_idmatches theTrajectory.trajectory_idof an entry in the parent'ssubagent_trajectoriesarray. - File-ref form —
trajectory_pathpoints at an external trajectory file.
A ref MUST set at least one of these two fields. When both are set, consumers MAY choose either; typically trajectory_id is preferred when the embedded trajectory is available in-memory, and trajectory_path is used to stream or lazily load.
Why session_id is not a resolution mechanism. session_id is intentionally run-scoped (multiple Trajectory documents MAY share a session_id if they belong to the same logical run — parent + subagents, parent + continuations). Two sibling subagents that belong to the same run MAY legitimately share a session_id, so session_id cannot unambiguously identify which embedded trajectory a ref points at. trajectory_id was introduced in v1.7 to carry the document-level uniqueness guarantee that ref resolution requires, without overloading session_id's run-scoped semantics. session_id on a ref is therefore purely informational — a human- and tooling-friendly breadcrumb back to the subagent's run — not a matching key.
Pre-v1.7 back-compat. Pre-v1.7 had only one resolution mechanism: trajectory_path (external file reference). The subagent_trajectories array on the root Trajectory and the embedded form via trajectory_id are both new in v1.7. Every pre-v1.7 ref that is actually resolvable therefore sets trajectory_path, which satisfies v1.7's at-least-one-of-(trajectory_id, trajectory_path) requirement — such refs remain valid, and their session_id (which in pre-v1.7 was required on the ref) is simply retained as informational metadata.
When subagent_trajectory_ref is present, the content field may be omitted, as the complete trajectory provides full detail. Alternatively, content may contain a simplified summary for quick reference without loading the full subagent trajectory.
This table summarizes how ATIF unifies the core requirements of existing agent platforms.
| Feature | MiniSweAgent Trajectory3 | OpenHands Trajectory | Gemini-CLI Trajectory | ATIF (Proposed Standard) |
|---|---|---|---|---|
| Primary Structure | JSON list of turn objects | JSON object for replay | Session-based message array | Root object containing sequential steps array |
| Agent Reasoning | Explicit discussion field1 | Implicit (within message content) | Implicit (in thoughts array) | Explicit reasoning_content string |
| Tool/Action Logging | Explicit command field (shell string)1 | Structured action object | Implicit (inferred from message) | Optional tool_calls array (allows multiple calls per step) |
| Environment Feedback | Message from role: user (command output)1 | Observation/result data | Implicit (inferred from response) | Dedicated ObservationSchema object (with flexible results array) |
| LLM Metrics | Token counts in response | Token counts in response | Token counts in response | Optional unified MetricsSchema object (Token counts, Cost, Logprobs, Perplexity) |
| RL Fields | None | None | None | Optional rl_experience (reward, log_probs)4 |
The following example illustrates a three-step task flow, where the user asks a question (step 1), the agent executes a search involving multiple tool calls (step 2), and then delivers a final conversational response (step 3).
Task: The user asks the agent to find the current price of a specific stock.
{
"schema_version": "ATIF-v1.5",
"session_id": "025B810F-B3A2-4C67-93C0-FE7A142A947A",
"agent": {
"name": "harbor-agent",
"version": "1.0.0",
"model_name": "gemini-2.5-flash",
"tool_definitions": [
{
"type": "function",
"function": {
"name": "financial_search",
"description": "Search for financial data for a given stock ticker",
"parameters": {
"type": "object",
"properties": {
"ticker": {
"type": "string",
"description": "Stock ticker symbol"
},
"metric": {
"type": "string",
"description": "The financial metric to retrieve (e.g., price, volume)"
}
},
"required": ["ticker", "metric"]
}
}
}
],
"extra": {}
},
"notes": "Initial test trajectory for financial data retrieval using a single-hop ReAct pattern, focusing on multi-tool execution in Step 2.",
"extra": {},
"final_metrics": {
"total_prompt_tokens": 1120,
"total_completion_tokens": 124,
"total_cached_tokens": 200,
"total_cost_usd": 0.00078,
"total_steps": 3,
"extra": {}
},
"steps": [
{
"step_id": 1,
"timestamp": "2025-10-11T10:30:00Z",
"source": "user",
"message": "What is the current trading price of Alphabet (GOOGL)?",
"extra": {}
},
{
"step_id": 2,
"timestamp": "2025-10-11T10:30:02Z",
"source": "agent",
"model_name": "gemini-2.5-flash",
"reasoning_effort": "medium",
"message": "I will search for the current trading price and volume for GOOGL.",
"reasoning_content": "The request requires two data points: the current stock price and the latest volume data. I will execute two simultaneous tool calls—one for price and one for volume—to retrieve this information in a single step.",
"tool_calls": [
{
"tool_call_id": "call_price_1",
"function_name": "financial_search",
"arguments": { "ticker": "GOOGL", "metric": "price" }
},
{
"tool_call_id": "call_volume_2",
"function_name": "financial_search",
"arguments": { "ticker": "GOOGL", "metric": "volume" }
}
],
"observation": {
"results": [
{
"source_call_id": "call_price_1",
"content": "GOOGL is currently trading at $185.35 (Close: 10/11/2025)"
},
{
"source_call_id": "call_volume_2",
"content": "GOOGL volume: 1.5M shares traded."
}
]
},
"metrics": {
"prompt_tokens": 520,
"completion_tokens": 80,
"cached_tokens": 200,
"cost_usd": 0.00045
}
},
{
"step_id": 3,
"timestamp": "2025-10-11T10:30:05Z",
"source": "agent",
"model_name": "gemini-2.5-flash",
"reasoning_effort": "low",
"message": "As of October 11, 2025, Alphabet (GOOGL) is trading at $185.35 with a volume of 1.5M shares traded.",
"reasoning_content": "The previous step retrieved all necessary data. I will now format this into a final conversational response for the user and terminate the task.",
"metrics": {
"prompt_tokens": 600,
"completion_tokens": 44,
"completion_token_ids": [
1722, 310, 5533, 1722, 13, 1640, 13, 1423, 13, 8425, 338, 313, 18672,
29, 338, 11302, 472, 395, 29896, 29945, 29945, 29889, 29941, 29945,
411, 263, 7977, 310, 29871, 29896, 29889, 29945, 29924, 29358, 3534,
287, 29889
],
"logprobs": [
-0.1, -0.05, -0.02, -0.01, -0.2, -0.15, -0.08, -0.03, -0.12, -0.06,
-0.04, -0.11, -0.07, -0.09, -0.13, -0.05, -0.02, -0.08, -0.14, -0.06,
-0.03, -0.1, -0.04, -0.07, -0.05, -0.09, -0.03, -0.11, -0.08, -0.06,
-0.12, -0.04, -0.07, -0.05, -0.1, -0.03, -0.08, -0.06, -0.11, -0.04,
-0.07, -0.05, -0.09, -0.02
],
"cost_usd": 0.00033,
"extra": {
"reasoning_tokens": 12
}
}
}
]
}To illustrate the differences between existing agent trajectory formats and to provide concrete reference implementations, three example trajectories from different agent frameworks are available in the 0001-trajectory-format/ directory. All three trajectories execute the same simple task: "Create a file called hello.txt with 'Hello, world!' as the content."
-
mini-swe-agent-trajectory.json— MiniSweAgent format- Demonstrates explicit
THOUGHTsections and bash command execution - Shows the conversational pattern where environment feedback is delivered as user messages
- Includes detailed token usage and caching information from Claude 3.5 Sonnet
- 3-step trajectory: create file → verify content → complete task
- Demonstrates explicit
-
openhands_trajectory.json— OpenHands format- Structured as an event log with distinct action types (
system,message,run,finish) - Includes extensive tool definitions and security risk assessments
- Shows GPT-5 reasoning tokens (960 reasoning tokens in the response)
- Demonstrates the
task_trackerconcept and rich observability metadata
- Structured as an event log with distinct action types (
-
gemini-cli-trajectory.json— Gemini CLI format- Minimalist session-based format with message array
- Includes embedded token metrics (input, output, cached, thoughts)
- Single-step completion with Gemini 2.0 Flash
- No explicit tool calls or structured actions
These real-world trajectories highlight the diversity of current approaches:
- Reasoning representation: MiniSweAgent uses explicit
THOUGHTfields, OpenHands embeds reasoning in tool calls, Gemini CLI tracks it in token metrics - Action logging: Varies from bash strings (MiniSweAgent) to structured function calls (OpenHands) to implicit actions (Gemini CLI)
- Observability: OpenHands provides the richest metadata with tool call IDs and security assessments, while Gemini CLI is most minimal
- Step granularity: MiniSweAgent creates multiple steps for verification, while Gemini CLI completes the task in one step
ATIF aims to provide a unified format that can accommodate all these patterns while maintaining interoperability.
The Harbor project provides a reference implementation of ATIF in Python using Pydantic models for validation and type safety.
The complete ATIF schema is implemented as Pydantic models in the src/harbor/models/trajectories/ directory:
agent.py— Agent configuration modelcontent.py— Multimodal content models (ContentPart,ImageSource) for v1.6+final_metrics.py— Aggregate trajectory metrics modelmetrics.py— Per-step LLM metrics modelobservation.py— Observation container modelobservation_result.py— Individual observation result modelstep.py— Step model with validators for timestamps and agent-only fieldssubagent_trajectory_ref.py— Subagent trajectory reference modeltool_call.py— Tool call modeltrajectory.py— Root trajectory model with validators for step IDs, tool call references, and multimodal content flags
These models provide:
- Type safety — All fields are strongly typed using Python type hints
- Validation — Automatic validation of required fields, types, and constraints
- Custom validators — ISO 8601 timestamp validation, sequential step ID validation, and tool call reference validation
- JSON serialization —
.to_json_dict()method for clean JSON export with optional None exclusion
The src/harbor/utils/trajectory_validator.py module provides a command-line tool and programmatic API for validating ATIF trajectory files:
# Validate a trajectory file
python -m harbor.utils.trajectory_validator trajectory.jsonThe validator:
- Accepts trajectory data as a dict, JSON string, or file path
- Validates against the complete ATIF schema using Pydantic models
- Collects all validation errors before returning (not just the first error)
- Provides user-friendly error messages with field paths
Programmatic usage:
from harbor.utils.trajectory_validator import TrajectoryValidator
validator = TrajectoryValidator()
is_valid = validator.validate("trajectory.json")
if not is_valid:
for error in validator.get_errors():
print(f"Error: {error}")The Terminus 2 agent (src/harbor/agents/terminus_2/terminus_2.py) demonstrates full integration of ATIF:
- Constructs trajectory steps using Pydantic models throughout the execution
- Tracks token metrics, costs, and logprobs in the
Metricsmodel - Records subagent trajectories with
SubagentTrajectoryReffor context summarization - Exports complete trajectories to JSON using the
Trajectory.to_json_dict()method - Automatically validates trajectory structure through Pydantic's runtime validation
System steps that transform the agent's context window (e.g., mid-trajectory compaction, context pruning, knowledge injection) may declare their transformation semantics using a context_management object in step.extra. This convention enables consumers to determine context boundaries without relying on producer-specific heuristics.
Convention fields:
| Field | Type | Description |
|---|---|---|
type |
String | The kind of context transformation. Values: "compaction" (prior context compressed into summary), "pruning" (older turns removed), "injection" (external knowledge added to context). Extensible by producers. |
boundary |
String | How the transformation affects the agent's context for subsequent steps. Values: "replace" (observation content replaces all prior context), "append" (observation content added to existing context), "truncate" (prior context trimmed). Extensible. |
Normative context boundary rule:
When a system step has
extra.context_management.boundary = "replace", the agent's effective context window for all subsequent steps consists of: (1) the observation content from the boundary step (observation.results[].content), and (2) any new turns (user, agent, or system) after the boundary step. Steps preceding the boundary are preserved in the trajectory for auditability but are NOT part of the agent's context window for post-boundary steps. Evaluation tools reconstructing the agent's input context for any post-boundary step MUST use the boundary's observation content, not the pre-boundary steps.
Example (context compaction with boundary):
{
"step_id": 5,
"source": "system",
"message": "Context compaction performed",
"observation": {
"results": [
{
"content": "Summary: prior conversation covered topic X...",
"subagent_trajectory_ref": [
{ "trajectory_id": "compact-001", "trajectory_path": null }
]
}
]
},
"extra": {
"context_management": {
"type": "compaction",
"boundary": "replace"
}
}
}Footnotes
-
nebius/SWE-agent-trajectories · Datasets at Hugging Face, accessed October 10, 2025, https://huggingface.co/datasets/nebius/SWE-agent-trajectories ↩ ↩2 ↩3 ↩4
-
7 Key LLM Metrics to Enhance AI Reliability | Galileo, accessed October 10, 2025, https://galileo.ai/blog/llm-performance-metrics ↩
-
Output files - SWE-agent documentation, accessed October 10, 2025, https://swe-agent.com/latest/usage/trajectories/ ↩
-
Towards a Unified View of Large Language Model Post-Training - arXiv, accessed October 10, 2025, https://arxiv.org/html/2509.04419v1 ↩