0001-trajectory-format.md

RFC: Agent Trajectory Interchange Format (ATIF) Specification

Field	Value
Status	Active
Maintainer	Boxuan Li
Date	April 2026
Changelog	v1.7

I. Introduction

The Agent Trajectory Interchange Format (ATIF) is a standardized, JSON-based specification for logging the complete interaction history of autonomous LLM agents. ATIF is designed to unify the distinct data requirements of conversational logs, explicit action sequences (MiniSweAgent¹), and replayable data structures (OpenHands), ensuring collected data is immediately usable across debugging, visualization, Supervised Fine-Tuning (SFT), and Reinforcement Learning (RL) pipelines.

This format will serve as the standardized data logging methodology for the Harbor project.

Trajectory Definition

For the purpose of ATIF, a trajectory is defined as a sequence of interactions between a user and an agent, including the agent's internal reasoning, actions, and observations. The trajectory captures the complete interaction history, including all user messages (initial and subsequent), agent responses, tool executions, and environment feedback. This design supports both single-turn tasks and multi-turn conversational interactions.

Version History

v1.7 (Current)

Added extra field to ToolCallSchema for custom tool-call metadata
Added extra field to ObservationResultSchema for custom observation metadata
Added subagent_trajectories field to root Trajectory for single-file subagent embedding
Added trajectory_id field to Trajectory as the per-document identifier for embedded-subagent resolution (avoids overloading session_id with document-level uniqueness). trajectory_id is REQUIRED on embedded subagents and MUST be unique within a parent's subagent_trajectories array.
Relaxed Trajectory.session_id from Required to Optional and clarified its semantics: session_id is run-scoped (not document-scoped), MAY be shared across sibling subagents, continuation trajectories, or omitted on embedded subagents that inherit the parent's run identity. session_id is no longer overloaded as the canonical matching key for subagent references.
Added trajectory_id field to SubagentTrajectoryRef as the canonical resolution key for embedded references. Made session_id optional on the ref and reclassified it as informational only — session_id is run-scoped and is NOT a valid resolution key (two sibling subagents MAY legitimately share a session_id). A ref MUST set at least one of trajectory_id (embedded form) or trajectory_path (file-ref form) so it is resolvable; session_id alone is insufficient. Breaking vs. v1.6: in v1.6 session_id was required on SubagentTrajectoryRef and served as the resolution key, so a ref of the shape {"session_id": "..."} (no trajectory_path) was valid; under v1.7 such a ref no longer validates. Producers MUST migrate by setting trajectory_id (and a corresponding trajectory_id on the embedded subagent in subagent_trajectories) for embedded refs, or trajectory_path for external-file refs.
Added llm_call_count field to StepObject for multi-LLM-call step representation
Added context_management convention for system steps that transform the agent's context window (see Section VII)
Resolved no-LLM orchestration by defining llm_call_count = 0 semantics on source: "agent" steps to signal deterministic dispatch
Note: contributed by Bryan Bednarski and Anuradha Karuppiah from NVIDIA

v1.6

Added multimodal content support for images in trajectories
Added ContentPartSchema for representing mixed text/image content
Added ImageSourceSchema for referencing image files stored alongside trajectories
Extended message field in StepObject to accept either a string or array of ContentPart objects
Extended content field in ObservationResultSchema to accept either a string or array of ContentPart objects
Images are stored as separate files (e.g., in an images/ subdirectory) and referenced by relative path
Added has_multimodal_content() method to Trajectory model for checking if trajectory contains images

v1.5

Added optional tool_definitions field to AgentSchema for storing tool/function definitions
Enables proper tool call definitions for SFT training pipelines

v1.4

Added optional prompt_token_ids field to MetricsSchema for storing prompt token IDs
prompt_token_ids contains the actual tokens sent to the LLM in that turn, including previous chat history if applicable

v1.3

Added optional completion_token_ids field to MetricsSchema for storing completion token IDs
Token IDs enable accurate reinforcement learning training by avoiding retokenization drift

v1.2

Extended observation field to support system steps for tracking system-initiated operations
System steps can now include observations for events such as subagent delegation, context management, environment resets, or checkpoint creation
Clarified that prompt_tokens includes ALL input tokens (both cached and non-cached), with cached_tokens tracking the subset that were cache hits

v1.1

Added optional extra field at root level for custom metadata not covered by the core schema

v1.0

Initial specification release

II. Schema

Root-Level Metadata

The root object stores global context and a flexible field for custom information:

Field	Type	Status	Description
schema_version	String	Required	String defining ATIF compatibility (e.g., "ATIF-v1.7").
session_id	String	Optional	Identifier for the agent run this trajectory belongs to. Run-scoped, not document-scoped: multiple Trajectory objects MAY share the same `session_id` when they represent the same logical run (e.g., a parent trajectory and its embedded subagents, or a trajectory and its continuation segments linked via `continued_trajectory_ref`). `session_id`s within a parent's `subagent_trajectories` array are therefore NOT required to be unique. Use `trajectory_id` when a per-trajectory-document unique identifier is required. Required in v1.6 and earlier; relaxed to Optional in v1.7. Producers SHOULD set this on root trajectories for run-level traceability, and MAY omit it on embedded subagents that inherit the parent's run identity.
trajectory_id	String	Optional	Canonical per-trajectory-document identifier, distinct from `session_id`. Unlike `session_id` (which is run-scoped and MAY be shared), `trajectory_id` uniquely identifies THIS trajectory object. Used to resolve `SubagentTrajectoryRef` entries against the root's `subagent_trajectories` array without overloading `session_id`'s run-scoped semantics. Optional on standalone trajectories, but REQUIRED on any trajectory embedded in a parent's `subagent_trajectories` array. `trajectory_id`s within a single parent's `subagent_trajectories` array MUST be unique. Added in ATIF-v1.7.
agent	Object	Required	Object specifying the agent configuration (name, version, and optional custom fields). See AgentSchema below.
steps	Array	Required	Array of step objects representing the complete interaction history, including user messages, agent responses, tool calls, and observations.
notes	String	Optional	A string field for developers to include custom information, design notes, or explanations for format discrepancies.
final_metrics	Object	Optional	Summary metrics for the entire trajectory. See FinalMetricsSchema below.
continued_trajectory_ref	String	Optional	Reference to the continuation trajectory file if this trajectory is continued in another file. Enables agents to link trajectory segments when context management strategies (e.g., summarization) produce multiple trajectory files.
extra	Object	Optional	Object for custom root-level metadata not covered by the core schema.
subagent_trajectories	Array	Optional	Array of embedded subagent trajectories. Each element is a complete, independently-valid ATIF `Trajectory` object with its own `schema_version`, `trajectory_id`, `agent`, and `step_id` sequence starting at 1 — the same schema and validation rules as the parent trajectory. Enables single-file storage of multi-agent workflows: when a `SubagentTrajectoryRef.trajectory_path` is null, consumers resolve the reference by matching `SubagentTrajectoryRef.trajectory_id` against `Trajectory.trajectory_id` of entries in this array. Uniqueness rules: every embedded subagent MUST set `trajectory_id`, and `trajectory_id`s within this array MUST be unique. `session_id`, by contrast, is run-scoped and MAY collide across siblings (or match the parent's `session_id`) when all trajectories belong to the same logical run; embedded subagents MAY also omit `session_id` entirely to inherit the parent's run identity. When `trajectory_path` is set, the reference points at an external file instead; embedded and file-ref forms MAY be mixed within the same parent trajectory.

AgentSchema

The required agent object identifies the agent system used for the trajectory. The name and version fields are required, while model_name, tool_definitions, and extra are optional.

Field	Type	Status	Description
name	String	Required	The name of the agent system (e.g., "openhands", "claude-code", "mini-swe-agent").
version	String	Required	The version identifier of the agent system (e.g., "1.0.0", "v2.3.1").
model_name	String	Optional	Default LLM model used for this trajectory (e.g., "gemini-2.5-flash", "claude-3-5-sonnet"). Step-level model_name overrides this if specified.
tool_definitions	Array	Optional	Array of tool/function definitions available to the agent. Each element follows OpenAI's function calling schema with `type` and `function` fields containing the tool's signature and docs.
extra	Object	Optional	Object for custom agent configuration details not covered by the core schema (e.g., prompting strategy, custom parameters).

FinalMetricsSchema

The optional final_metrics object provides aggregate statistics for the entire trajectory. All fields within the optional final_metrics object are optional.

Field	Type	Status	Description
total_prompt_tokens	Integer	Optional	Sum of all prompt tokens across all steps in the trajectory.
total_completion_tokens	Integer	Optional	Sum of all completion tokens across all steps in the trajectory.
total_cached_tokens	Integer	Optional	Sum of all cached tokens across all steps in the trajectory.
total_cost_usd	Float	Optional	Total real monetary cost for the entire trajectory in USD unit.
total_steps	Integer	Optional	Total number of steps (can be unequal to length of steps array if explained in notes).
extra	Object	Optional	Object for custom aggregate metrics not covered by the core schema.

StepObject

The steps array contains all interaction turns. Each StepObject represents either a system prompt, a user message, or a complete agent turn (LLM inference, action execution, and observation receipt).

Field	Type	Status	Description
step_id	Integer	Required	Ordinal index of the turn (starting from 1).
timestamp	String	Optional	ISO 8601 timestamp indicating when this step occurred (e.g., "2025-10-16T14:30:00Z").
source	String	Required	The originator of this step. Must be one of: "system" (for system prompts), "user" (for user messages), or "agent" (for agent responses).
model_name	String	Optional	The specific LLM model used for this turn (e.g., gemini-2.5-flash). Only applicable when source is "agent". If omitted, the model can be inferred from the top-level agent configuration.
reasoning_effort	String \| Float	Optional	Qualitative or quantitative measure of effort (e.g., low, medium, or a float score) assigned to this step. Only applicable when source is "agent".
message	String \| Array	Required	The dialogue message. For text-only content, this is a string. For multimodal content (v1.6+), this can be an array of `ContentPart` objects. For system steps, this is the system prompt. For user steps, this is the user's prompt or instruction. For agent steps, this is the assistant's response. This field is required but can be an empty string.
reasoning_content	String	Optional	String field detailing the agent's explicit internal reasoning. Only applicable when source is "agent".
tool_calls	Array	Optional	An array of structured objects for the agent's action(s). A single LLM output may contain multiple tool calls. Only applicable when source is "agent". See ToolCallSchema below.
observation	Object	Optional	Environment feedback/result after actions or system events. For agent steps, this contains results from tool calls, non-tool actions, or subagent delegation. For system steps, this may contain results from system-initiated operations (e.g., subagent delegation, context management, environment reset, checkpoint creation). See ObservationSchema below.
metrics	Object	Optional	Object containing all LLM operational and confidence data for this step, including RL-specific fields (reward, logprobs) if applicable. Only applicable when source is "agent". See MetricsSchema* below.
extra	Object	Optional	Object for custom step-level metadata not covered by the core schema. Applicable to all step types (system, user, and agent).
llm_call_count	Integer	Optional	Number of LLM inferences this step represents. When `llm_call_count > 1`, the `metrics` are aggregated across multiple LLM calls and per-call attribution is unavailable. When `1`, the step represents exactly one inference. When `0` on a `source: "agent"` step, the step represents a deterministic (non-LLM) dispatch — a graph engine, rule-based pipeline, or eval harness that issued `tool_calls` without an LLM inference; `metrics` and `reasoning_content` MUST be absent on such steps, and SFT pipelines MUST filter them out. When null, the producer did not track this (backward-compatible default). Applicable to all step types.
is_copied_context	Boolean	Optional	Indicates this step was copied from a prior trajectory into the current trajectory for context purposes (e.g., steps retained across a summarization/compression boundary). When `True`, producers assert the step is not a new interaction and consumers MUST filter it out of supervised fine-tuning (SFT) training data. Absent or `None` means the step is a new interaction produced in the current trajectory scope. See normative usage below. Applicable to all step types.

Normative usage of is_copied_context: Producers MUST set is_copied_context = True on all steps that are copied from a prior trajectory into the current trajectory for context purposes. This includes steps inserted during context compression events (e.g., after a summarization pass that replaces older steps with a compressed summary, the retained "prior context" steps that are copied into the new trajectory scope). Steps with is_copied_context = True MUST NOT be included in supervised fine-tuning (SFT) training data, as they represent previously-trained interactions whose contribution to the agent's behavior has already been captured. Consumers reading ATIF trajectories for SFT purposes MUST filter out steps where is_copied_context = True before constructing training examples. If is_copied_context is absent or None, the step is assumed to be a new interaction produced in the current trajectory scope.

One-LLM-per-step convention: Exporters SHOULD emit one ATIF step per LLM inference when the underlying framework provides per-call event boundaries (e.g., LLM_START/LLM_END). When an exporter cannot split calls (e.g., an opaque tool with multiple internal LLM calls), it MUST set llm_call_count to the actual count so consumers can detect aggregated metrics.

ToolCallSchema

The optional tool_calls array contains structured objects representing function or tool invocations made by the agent. Each element follows this schema:

Field	Type	Status	Description
tool_call_id	String	Required	Unique identifier for this specific tool call. Used to correlate with observation results via `source_call_id`.
function_name	String	Required	The name of the function or tool being invoked (e.g., "financial_search", "file_write", "web_search").
arguments	Object	Required	Object containing the arguments passed to the function. Must be a valid JSON object, but can be empty (`{}`) if no arguments needed.
extra	Object	Optional	Object for custom tool-call-level metadata not covered by the core schema (e.g., timeout, retry count, tool version).

MetricsSchema

All fields within the optional metrics object are optional.

Field	Type	Status	Description
prompt_tokens	Integer	Optional	Total input tokens sent to the model for this turn, including both cached and non-cached tokens. This represents the full size of the prompt (system prompt, history, tool definitions, etc.) that was processed by the model, regardless of whether some tokens were served from cache.
completion_tokens	Integer	Optional	Total tokens generated by the LLM response (including reasoning and tool calls).
cached_tokens	Integer	Optional	Subset of `prompt_tokens` that were cache hits from prompt caching (e.g., a prefix or history cache). This counts tokens that were reused from cache rather than reprocessed. This is included in the `prompt_tokens` count, not separate from it.
cost_usd	Float	Optional	Monetary cost of the API call based on current provider pricing for this step.
prompt_token_ids	Array	Optional	Array of integer token IDs for the prompt (input) tokens sent to the LLM in this step, including previous chat history if applicable. Enables accurate analysis and debugging of prompt tokenization. Length should match `prompt_tokens` count.
completion_token_ids	Array	Optional	Array of integer token IDs for the completion (response) tokens generated in this step. Enables accurate RL training by avoiding retokenization drift. Should align with `logprobs` array if both are present. Length should match `completion_tokens` count.
logprobs	Array	Optional	Array of log probabilities for each completion token.² Should align with `completion_token_ids` array if both are present. Length should match `completion_tokens` count.
extra	Object	Optional	Object for provider-specific or experimental metrics not covered by the core schema (e.g., reasoning_tokens, cache_creation_input_tokens).

Token Accounting and Cost Calculation:

ATIF defines prompt_tokens as the total count of all input tokens (both cached and non-cached), with cached_tokens tracking the subset that were cache hits. This provides a complete view of prompt size while also tracking cache efficiency.

To calculate the total cost for a step:

non_cached_prompt_tokens = prompt_tokens - cached_tokens
cost_usd = (non_cached_prompt_tokens × cost_per_input_token) +
           (cached_tokens × cost_per_cached_token) +
           (completion_tokens × cost_per_completion_token)

Similarly, for trajectory-level costs in final_metrics:

non_cached_total = total_prompt_tokens - total_cached_tokens
total_cost_usd = (non_cached_total × cost_per_input_token) +
                 (total_cached_tokens × cost_per_cached_token) +
                 (total_completion_tokens × cost_per_completion_token)

Where:

total_prompt_tokens = sum of all prompt_tokens across steps (includes all input tokens, both cached and non-cached)
total_cached_tokens = sum of all cached_tokens across steps (subset of total_prompt_tokens)
total_completion_tokens = sum of all completion_tokens across steps

Note that ATIF does not record per-token pricing information because:

Pricing can change over time, making historical trajectories inaccurate
Most agent frameworks don't record pricing, requiring a lookup table for conversion
Pricing varies by provider, tier, and region

The cost_usd and total_cost_usd fields store the calculated cost at the time of execution, providing a snapshot without coupling the format to specific pricing models.

Important Note on Additional Cost Factors:

Some LLM providers charge for additional factors beyond the standard token types. For example, Anthropic charges separately for cache_creation_input_tokens when new prompt cache entries are created. These additional cost factors should be recorded in the extra field within the metrics object.

When such additional cost factors exist, the simplified cost formula above may not accurately represent the actual cost. In these cases:

Record the actual cost in the cost_usd field if available from the provider
Store additional token metrics (e.g., cache_creation_input_tokens) in metrics.extra
The presence of values in metrics.extra signals that the standard formula may be incomplete

This approach ensures that actual costs are preserved accurately while maintaining the flexibility to accommodate provider-specific pricing models.

ObservationSchema

The observation object records results from the environment or system events. For agent steps, results may stem from structured tool_calls, agent actions that don't use standard tool calling mechanisms, or subagent delegation. For system steps, observations may contain results from system-initiated operations such as subagent delegation, context management, environment resets, checkpoint creation, or other infrastructure-level events.

Field	Type	Status	Description
results	Array	Required	Array of result objects, each containing feedback from a single tool call or action.

ObservationResultSchema

Each element in the results array follows this schema:

Field	Type	Status	Description
source_call_id	String	Optional	The `tool_call_id` from the tool_calls array in StepObject that this result corresponds to. If null or omitted, the result comes from an action that doesn't use the standard tool calling format (e.g., agent actions without tool calls or system-initiated operations).
content	String \| Array	Optional	The output or result from the tool execution or action. For text-only content, this is a string. For multimodal content (v1.6+), this can be an array of `ContentPart` objects. May be omitted when `subagent_trajectory_ref` is present.
subagent_trajectory_ref	Array	Optional	Array of references to delegated subagent trajectories. Each element follows SubagentTrajectoryRefSchema. Use a singleton array for a single subagent.
extra	Object	Optional	Object for custom observation-result-level metadata not covered by the core schema (e.g., confidence score, retrieval score, source document ID).

Example:

{
  "results": [
    {
      "source_call_id": "call_price_1",
      "content": "GOOGL is currently trading at $185.35"
    }
  ]
}

Example (observation result with custom metadata):

{
  "source_call_id": "call_search_001",
  "content": "NVIDIA announces new GPU architecture...",
  "extra": {
    "retrieval_score": 0.92,
    "source_doc_id": "doc-4821"
  }
}

ContentPartSchema (v1.6+)

For multimodal content, the message field in StepObject and the content field in ObservationResultSchema can contain an array of ContentPart objects instead of a plain string. Each ContentPart represents either text or an image.

Field	Type	Status	Description
type	String	Required	The type of content. Must be one of: "text" or "image".
text	String	Conditional	The text content. Required when `type` is "text", must be omitted when `type` is "image".
source	Object	Conditional	The image source specification. Required when `type` is "image", must be omitted when `type` is "text". See ImageSourceSchema below.

Example (multimodal message with text and image):

{
  "message": [
    { "type": "text", "text": "What is in this image?" },
    { "type": "image", "source": { "media_type": "image/png", "path": "images/step_1_input.png" } }
  ]
}

ImageSourceSchema (v1.6+)

Images are stored as separate files alongside the trajectory JSON file and referenced by path or URL. This avoids bloating the trajectory file with base64-encoded image data.

Field	Type	Status	Description
media_type	String	Required	MIME type of the image. Must be one of: "image/jpeg", "image/png", "image/gif", or "image/webp".
path	String	Required	Location of the image. Can be a relative or absolute file path, or a URL.

Image Storage Convention:

For local storage, images are typically stored in an images/ subdirectory relative to the trajectory file. For example, if the trajectory is at agent/trajectory.json, images would be stored at agent/images/ with relative paths like "images/screenshot.png".

Absolute file paths and URLs are also supported.

SubagentTrajectoryRefSchema

For multi-agent systems or hierarchical agent architectures, an observation result may reference a complete subagent trajectory. This enables tracking of recursive or delegated agent workflows where a parent agent spawns subagents to handle specific subtasks.

Field	Type	Status	Description
trajectory_id	String	Conditional	Canonical identifier of the delegated subagent trajectory, used to resolve embedded references. Matches `Trajectory.trajectory_id` of an entry in the parent's `subagent_trajectories` array. Added in ATIF-v1.7. At least one of `trajectory_id` or `trajectory_path` MUST be set so the ref is resolvable.
trajectory_path	String	Conditional	Location of the complete subagent trajectory as an external file (file path, S3 URL, database reference, etc.), used to resolve file-ref references. At least one of `trajectory_id` or `trajectory_path` MUST be set so the ref is resolvable.
session_id	String	Optional	Run identity of the delegated subagent trajectory. Informational only — recorded so consumers can correlate this ref back to the subagent's run for debug / search / display purposes. Run-scoped (see `Trajectory.session_id`) and therefore NOT a valid resolution key; consumers MUST NOT use `session_id` alone to resolve a ref.
extra	Object	Optional	Object for custom metadata about the subagent execution (e.g., summary, exit status, performance metrics).

Resolution mechanisms. A SubagentTrajectoryRef has exactly two resolution mechanisms:

Embedded form — trajectory_id matches the Trajectory.trajectory_id of an entry in the parent's subagent_trajectories array.
File-ref form — trajectory_path points at an external trajectory file.

A ref MUST set at least one of these two fields. When both are set, consumers MAY choose either; typically trajectory_id is preferred when the embedded trajectory is available in-memory, and trajectory_path is used to stream or lazily load.

Why session_id is not a resolution mechanism. session_id is intentionally run-scoped (multiple Trajectory documents MAY share a session_id if they belong to the same logical run — parent + subagents, parent + continuations). Two sibling subagents that belong to the same run MAY legitimately share a session_id, so session_id cannot unambiguously identify which embedded trajectory a ref points at. trajectory_id was introduced in v1.7 to carry the document-level uniqueness guarantee that ref resolution requires, without overloading session_id's run-scoped semantics. session_id on a ref is therefore purely informational — a human- and tooling-friendly breadcrumb back to the subagent's run — not a matching key.

Pre-v1.7 back-compat. Pre-v1.7 had only one resolution mechanism: trajectory_path (external file reference). The subagent_trajectories array on the root Trajectory and the embedded form via trajectory_id are both new in v1.7. Every pre-v1.7 ref that is actually resolvable therefore sets trajectory_path, which satisfies v1.7's at-least-one-of-(trajectory_id, trajectory_path) requirement — such refs remain valid, and their session_id (which in pre-v1.7 was required on the ref) is simply retained as informational metadata.

When subagent_trajectory_ref is present, the content field may be omitted, as the complete trajectory provides full detail. Alternatively, content may contain a simplified summary for quick reference without loading the full subagent trajectory.

III. Agent Trajectory Format Comparative Schema

This table summarizes how ATIF unifies the core requirements of existing agent platforms.

Feature	MiniSweAgent Trajectory³	OpenHands Trajectory	Gemini-CLI Trajectory	ATIF (Proposed Standard)
Primary Structure	JSON list of turn objects	JSON object for replay	Session-based message array	Root object containing sequential steps array
Agent Reasoning	Explicit discussion field¹	Implicit (within message content)	Implicit (in thoughts array)	Explicit reasoning_content string
Tool/Action Logging	Explicit command field (shell string)¹	Structured action object	Implicit (inferred from message)	Optional tool_calls array (allows multiple calls per step)
Environment Feedback	Message from role: user (command output)¹	Observation/result data	Implicit (inferred from response)	Dedicated ObservationSchema object (with flexible results array)
LLM Metrics	Token counts in response	Token counts in response	Token counts in response	Optional unified MetricsSchema object (Token counts, Cost, Logprobs, Perplexity)
RL Fields	None	None	None	Optional rl_experience (reward, log_probs)⁴

IV. Example ATIF Trajectory Log (Multi-Step Task)

The following example illustrates a three-step task flow, where the user asks a question (step 1), the agent executes a search involving multiple tool calls (step 2), and then delivers a final conversational response (step 3).

Task: The user asks the agent to find the current price of a specific stock.

{
  "schema_version": "ATIF-v1.5",
  "session_id": "025B810F-B3A2-4C67-93C0-FE7A142A947A",
  "agent": {
    "name": "harbor-agent",
    "version": "1.0.0",
    "model_name": "gemini-2.5-flash",
    "tool_definitions": [
      {
        "type": "function",
        "function": {
          "name": "financial_search",
          "description": "Search for financial data for a given stock ticker",
          "parameters": {
            "type": "object",
            "properties": {
              "ticker": {
                "type": "string",
                "description": "Stock ticker symbol"
              },
              "metric": {
                "type": "string",
                "description": "The financial metric to retrieve (e.g., price, volume)"
              }
            },
            "required": ["ticker", "metric"]
          }
        }
      }
    ],
    "extra": {}
  },
  "notes": "Initial test trajectory for financial data retrieval using a single-hop ReAct pattern, focusing on multi-tool execution in Step 2.",
  "extra": {},
  "final_metrics": {
    "total_prompt_tokens": 1120,
    "total_completion_tokens": 124,
    "total_cached_tokens": 200,
    "total_cost_usd": 0.00078,
    "total_steps": 3,
    "extra": {}
  },
  "steps": [
    {
      "step_id": 1,
      "timestamp": "2025-10-11T10:30:00Z",
      "source": "user",
      "message": "What is the current trading price of Alphabet (GOOGL)?",
      "extra": {}
    },
    {
      "step_id": 2,
      "timestamp": "2025-10-11T10:30:02Z",
      "source": "agent",
      "model_name": "gemini-2.5-flash",
      "reasoning_effort": "medium",
      "message": "I will search for the current trading price and volume for GOOGL.",
      "reasoning_content": "The request requires two data points: the current stock price and the latest volume data. I will execute two simultaneous tool calls—one for price and one for volume—to retrieve this information in a single step.",
      "tool_calls": [
        {
          "tool_call_id": "call_price_1",
          "function_name": "financial_search",
          "arguments": { "ticker": "GOOGL", "metric": "price" }
        },
        {
          "tool_call_id": "call_volume_2",
          "function_name": "financial_search",
          "arguments": { "ticker": "GOOGL", "metric": "volume" }
        }
      ],
      "observation": {
        "results": [
          {
            "source_call_id": "call_price_1",
            "content": "GOOGL is currently trading at $185.35 (Close: 10/11/2025)"
          },
          {
            "source_call_id": "call_volume_2",
            "content": "GOOGL volume: 1.5M shares traded."
          }
        ]
      },
      "metrics": {
        "prompt_tokens": 520,
        "completion_tokens": 80,
        "cached_tokens": 200,
        "cost_usd": 0.00045
      }
    },
    {
      "step_id": 3,
      "timestamp": "2025-10-11T10:30:05Z",
      "source": "agent",
      "model_name": "gemini-2.5-flash",
      "reasoning_effort": "low",
      "message": "As of October 11, 2025, Alphabet (GOOGL) is trading at $185.35 with a volume of 1.5M shares traded.",
      "reasoning_content": "The previous step retrieved all necessary data. I will now format this into a final conversational response for the user and terminate the task.",
      "metrics": {
        "prompt_tokens": 600,
        "completion_tokens": 44,
        "completion_token_ids": [
          1722, 310, 5533, 1722, 13, 1640, 13, 1423, 13, 8425, 338, 313, 18672,
          29, 338, 11302, 472, 395, 29896, 29945, 29945, 29889, 29941, 29945,
          411, 263, 7977, 310, 29871, 29896, 29889, 29945, 29924, 29358, 3534,
          287, 29889
        ],
        "logprobs": [
          -0.1, -0.05, -0.02, -0.01, -0.2, -0.15, -0.08, -0.03, -0.12, -0.06,
          -0.04, -0.11, -0.07, -0.09, -0.13, -0.05, -0.02, -0.08, -0.14, -0.06,
          -0.03, -0.1, -0.04, -0.07, -0.05, -0.09, -0.03, -0.11, -0.08, -0.06,
          -0.12, -0.04, -0.07, -0.05, -0.1, -0.03, -0.08, -0.06, -0.11, -0.04,
          -0.07, -0.05, -0.09, -0.02
        ],
        "cost_usd": 0.00033,
        "extra": {
          "reasoning_tokens": 12
        }
      }
    }
  ]
}

V. Real-World Trajectory Examples

To illustrate the differences between existing agent trajectory formats and to provide concrete reference implementations, three example trajectories from different agent frameworks are available in the 0001-trajectory-format/ directory. All three trajectories execute the same simple task: "Create a file called hello.txt with 'Hello, world!' as the content."

Available Examples

mini-swe-agent-trajectory.json — MiniSweAgent format
- Demonstrates explicit THOUGHT sections and bash command execution
- Shows the conversational pattern where environment feedback is delivered as user messages
- Includes detailed token usage and caching information from Claude 3.5 Sonnet
- 3-step trajectory: create file → verify content → complete task
openhands_trajectory.json — OpenHands format
- Structured as an event log with distinct action types (system, message, run, finish)
- Includes extensive tool definitions and security risk assessments
- Shows GPT-5 reasoning tokens (960 reasoning tokens in the response)
- Demonstrates the task_tracker concept and rich observability metadata
gemini-cli-trajectory.json — Gemini CLI format
- Minimalist session-based format with message array
- Includes embedded token metrics (input, output, cached, thoughts)
- Single-step completion with Gemini 2.0 Flash
- No explicit tool calls or structured actions

Key Insights from Examples

These real-world trajectories highlight the diversity of current approaches:

Reasoning representation: MiniSweAgent uses explicit THOUGHT fields, OpenHands embeds reasoning in tool calls, Gemini CLI tracks it in token metrics
Action logging: Varies from bash strings (MiniSweAgent) to structured function calls (OpenHands) to implicit actions (Gemini CLI)
Observability: OpenHands provides the richest metadata with tool call IDs and security assessments, while Gemini CLI is most minimal
Step granularity: MiniSweAgent creates multiple steps for verification, while Gemini CLI completes the task in one step

ATIF aims to provide a unified format that can accommodate all these patterns while maintaining interoperability.

VI. Implementation

Reference Implementation

The Harbor project provides a reference implementation of ATIF in Python using Pydantic models for validation and type safety.

Pydantic Models

The complete ATIF schema is implemented as Pydantic models in the src/harbor/models/trajectories/ directory:

agent.py — Agent configuration model
content.py — Multimodal content models (ContentPart, ImageSource) for v1.6+
final_metrics.py — Aggregate trajectory metrics model
metrics.py — Per-step LLM metrics model
observation.py — Observation container model
observation_result.py — Individual observation result model
step.py — Step model with validators for timestamps and agent-only fields
subagent_trajectory_ref.py — Subagent trajectory reference model
tool_call.py — Tool call model
trajectory.py — Root trajectory model with validators for step IDs, tool call references, and multimodal content flags

These models provide:

Type safety — All fields are strongly typed using Python type hints
Validation — Automatic validation of required fields, types, and constraints
Custom validators — ISO 8601 timestamp validation, sequential step ID validation, and tool call reference validation
JSON serialization — .to_json_dict() method for clean JSON export with optional None exclusion

Trajectory Validator

The src/harbor/utils/trajectory_validator.py module provides a command-line tool and programmatic API for validating ATIF trajectory files:

# Validate a trajectory file
python -m harbor.utils.trajectory_validator trajectory.json

The validator:

Accepts trajectory data as a dict, JSON string, or file path
Validates against the complete ATIF schema using Pydantic models
Collects all validation errors before returning (not just the first error)
Provides user-friendly error messages with field paths

Programmatic usage:

from harbor.utils.trajectory_validator import TrajectoryValidator

validator = TrajectoryValidator()
is_valid = validator.validate("trajectory.json")

if not is_valid:
    for error in validator.get_errors():
        print(f"Error: {error}")

Integration in Agents

The Terminus 2 agent (src/harbor/agents/terminus_2/terminus_2.py) demonstrates full integration of ATIF:

Constructs trajectory steps using Pydantic models throughout the execution
Tracks token metrics, costs, and logprobs in the Metrics model
Records subagent trajectories with SubagentTrajectoryRef for context summarization
Exports complete trajectories to JSON using the Trajectory.to_json_dict() method
Automatically validates trajectory structure through Pydantic's runtime validation

VII. Context Management Convention

System steps that transform the agent's context window (e.g., mid-trajectory compaction, context pruning, knowledge injection) may declare their transformation semantics using a context_management object in step.extra. This convention enables consumers to determine context boundaries without relying on producer-specific heuristics.

Convention fields:

Field	Type	Description
`type`	String	The kind of context transformation. Values: `"compaction"` (prior context compressed into summary), `"pruning"` (older turns removed), `"injection"` (external knowledge added to context). Extensible by producers.
`boundary`	String	How the transformation affects the agent's context for subsequent steps. Values: `"replace"` (observation content replaces all prior context), `"append"` (observation content added to existing context), `"truncate"` (prior context trimmed). Extensible.

Normative context boundary rule:

When a system step has extra.context_management.boundary = "replace", the agent's effective context window for all subsequent steps consists of: (1) the observation content from the boundary step (observation.results[].content), and (2) any new turns (user, agent, or system) after the boundary step. Steps preceding the boundary are preserved in the trajectory for auditability but are NOT part of the agent's context window for post-boundary steps. Evaluation tools reconstructing the agent's input context for any post-boundary step MUST use the boundary's observation content, not the pre-boundary steps.

Example (context compaction with boundary):

{
  "step_id": 5,
  "source": "system",
  "message": "Context compaction performed",
  "observation": {
    "results": [
      {
        "content": "Summary: prior conversation covered topic X...",
        "subagent_trajectory_ref": [
          { "trajectory_id": "compact-001", "trajectory_path": null }
        ]
      }
    ]
  },
  "extra": {
    "context_management": {
      "type": "compaction",
      "boundary": "replace"
    }
  }
}

References

nebius/SWE-agent-trajectories · Datasets at Hugging Face, accessed October 10, 2025, https://huggingface.co/datasets/nebius/SWE-agent-trajectories ↩ ↩² ↩³ ↩⁴
7 Key LLM Metrics to Enhance AI Reliability | Galileo, accessed October 10, 2025, https://galileo.ai/blog/llm-performance-metrics ↩
Output files - SWE-agent documentation, accessed October 10, 2025, https://swe-agent.com/latest/usage/trajectories/ ↩
Towards a Unified View of Large Language Model Post-Training - arXiv, accessed October 10, 2025, https://arxiv.org/html/2509.04419v1 ↩

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: Agent Trajectory Interchange Format (ATIF) Specification

I. Introduction

Trajectory Definition

Version History

II. Schema

Root-Level Metadata

AgentSchema

FinalMetricsSchema

StepObject

ToolCallSchema

MetricsSchema

ObservationSchema

ObservationResultSchema

ContentPartSchema (v1.6+)

ImageSourceSchema (v1.6+)

SubagentTrajectoryRefSchema

III. Agent Trajectory Format Comparative Schema

IV. Example ATIF Trajectory Log (Multi-Step Task)

V. Real-World Trajectory Examples

Available Examples

Key Insights from Examples

VI. Implementation

Reference Implementation

Pydantic Models

Trajectory Validator

Integration in Agents

VII. Context Management Convention

References

FilesExpand file tree

0001-trajectory-format.md

Latest commit

History

0001-trajectory-format.md

File metadata and controls

RFC: Agent Trajectory Interchange Format (ATIF) Specification

I. Introduction

Trajectory Definition

Version History

II. Schema

Root-Level Metadata

AgentSchema

FinalMetricsSchema

StepObject

ToolCallSchema

MetricsSchema

ObservationSchema

ObservationResultSchema

ContentPartSchema (v1.6+)

ImageSourceSchema (v1.6+)

SubagentTrajectoryRefSchema

III. Agent Trajectory Format Comparative Schema

IV. Example ATIF Trajectory Log (Multi-Step Task)

V. Real-World Trajectory Examples

Available Examples

Key Insights from Examples

VI. Implementation

Reference Implementation

Pydantic Models

Trajectory Validator

Integration in Agents

VII. Context Management Convention

References

Footnotes