Skip to content

Latest commit

 

History

History
628 lines (484 loc) · 54.4 KB

File metadata and controls

628 lines (484 loc) · 54.4 KB

RFC: Agent Trajectory Interchange Format (ATIF) Specification

Field Value
Status Active
Maintainer Boxuan Li
Date April 2026
Changelog v1.7

I. Introduction

The Agent Trajectory Interchange Format (ATIF) is a standardized, JSON-based specification for logging the complete interaction history of autonomous LLM agents. ATIF is designed to unify the distinct data requirements of conversational logs, explicit action sequences (MiniSweAgent1), and replayable data structures (OpenHands), ensuring collected data is immediately usable across debugging, visualization, Supervised Fine-Tuning (SFT), and Reinforcement Learning (RL) pipelines.

This format will serve as the standardized data logging methodology for the Harbor project.

Trajectory Definition

For the purpose of ATIF, a trajectory is defined as a sequence of interactions between a user and an agent, including the agent's internal reasoning, actions, and observations. The trajectory captures the complete interaction history, including all user messages (initial and subsequent), agent responses, tool executions, and environment feedback. This design supports both single-turn tasks and multi-turn conversational interactions.

Version History

v1.7 (Current)

  • Added extra field to ToolCallSchema for custom tool-call metadata
  • Added extra field to ObservationResultSchema for custom observation metadata
  • Added subagent_trajectories field to root Trajectory for single-file subagent embedding
  • Added trajectory_id field to Trajectory as the per-document identifier for embedded-subagent resolution (avoids overloading session_id with document-level uniqueness). trajectory_id is REQUIRED on embedded subagents and MUST be unique within a parent's subagent_trajectories array.
  • Relaxed Trajectory.session_id from Required to Optional and clarified its semantics: session_id is run-scoped (not document-scoped), MAY be shared across sibling subagents, continuation trajectories, or omitted on embedded subagents that inherit the parent's run identity. session_id is no longer overloaded as the canonical matching key for subagent references.
  • Added trajectory_id field to SubagentTrajectoryRef as the canonical resolution key for embedded references. Made session_id optional on the ref and reclassified it as informational onlysession_id is run-scoped and is NOT a valid resolution key (two sibling subagents MAY legitimately share a session_id). A ref MUST set at least one of trajectory_id (embedded form) or trajectory_path (file-ref form) so it is resolvable; session_id alone is insufficient. Breaking vs. v1.6: in v1.6 session_id was required on SubagentTrajectoryRef and served as the resolution key, so a ref of the shape {"session_id": "..."} (no trajectory_path) was valid; under v1.7 such a ref no longer validates. Producers MUST migrate by setting trajectory_id (and a corresponding trajectory_id on the embedded subagent in subagent_trajectories) for embedded refs, or trajectory_path for external-file refs.
  • Added llm_call_count field to StepObject for multi-LLM-call step representation
  • Added context_management convention for system steps that transform the agent's context window (see Section VII)
  • Resolved no-LLM orchestration by defining llm_call_count = 0 semantics on source: "agent" steps to signal deterministic dispatch
  • Note: contributed by Bryan Bednarski and Anuradha Karuppiah from NVIDIA

v1.6

  • Added multimodal content support for images in trajectories
  • Added ContentPartSchema for representing mixed text/image content
  • Added ImageSourceSchema for referencing image files stored alongside trajectories
  • Extended message field in StepObject to accept either a string or array of ContentPart objects
  • Extended content field in ObservationResultSchema to accept either a string or array of ContentPart objects
  • Images are stored as separate files (e.g., in an images/ subdirectory) and referenced by relative path
  • Added has_multimodal_content() method to Trajectory model for checking if trajectory contains images

v1.5

  • Added optional tool_definitions field to AgentSchema for storing tool/function definitions
  • Enables proper tool call definitions for SFT training pipelines

v1.4

  • Added optional prompt_token_ids field to MetricsSchema for storing prompt token IDs
  • prompt_token_ids contains the actual tokens sent to the LLM in that turn, including previous chat history if applicable

v1.3

  • Added optional completion_token_ids field to MetricsSchema for storing completion token IDs
  • Token IDs enable accurate reinforcement learning training by avoiding retokenization drift

v1.2

  • Extended observation field to support system steps for tracking system-initiated operations
  • System steps can now include observations for events such as subagent delegation, context management, environment resets, or checkpoint creation
  • Clarified that prompt_tokens includes ALL input tokens (both cached and non-cached), with cached_tokens tracking the subset that were cache hits

v1.1

  • Added optional extra field at root level for custom metadata not covered by the core schema

v1.0

  • Initial specification release

II. Schema

Root-Level Metadata

The root object stores global context and a flexible field for custom information:

Field Type Status Description
schema_version String Required String defining ATIF compatibility (e.g., "ATIF-v1.7").
session_id String Optional Identifier for the agent run this trajectory belongs to. Run-scoped, not document-scoped: multiple Trajectory objects MAY share the same session_id when they represent the same logical run (e.g., a parent trajectory and its embedded subagents, or a trajectory and its continuation segments linked via continued_trajectory_ref). session_ids within a parent's subagent_trajectories array are therefore NOT required to be unique. Use trajectory_id when a per-trajectory-document unique identifier is required. Required in v1.6 and earlier; relaxed to Optional in v1.7. Producers SHOULD set this on root trajectories for run-level traceability, and MAY omit it on embedded subagents that inherit the parent's run identity.
trajectory_id String Optional Canonical per-trajectory-document identifier, distinct from session_id. Unlike session_id (which is run-scoped and MAY be shared), trajectory_id uniquely identifies THIS trajectory object. Used to resolve SubagentTrajectoryRef entries against the root's subagent_trajectories array without overloading session_id's run-scoped semantics. Optional on standalone trajectories, but REQUIRED on any trajectory embedded in a parent's subagent_trajectories array. trajectory_ids within a single parent's subagent_trajectories array MUST be unique. Added in ATIF-v1.7.
agent Object Required Object specifying the agent configuration (name, version, and optional custom fields). See AgentSchema below.
steps Array Required Array of step objects representing the complete interaction history, including user messages, agent responses, tool calls, and observations.
notes String Optional A string field for developers to include custom information, design notes, or explanations for format discrepancies.
final_metrics Object Optional Summary metrics for the entire trajectory. See FinalMetricsSchema below.
continued_trajectory_ref String Optional Reference to the continuation trajectory file if this trajectory is continued in another file. Enables agents to link trajectory segments when context management strategies (e.g., summarization) produce multiple trajectory files.
extra Object Optional Object for custom root-level metadata not covered by the core schema.
subagent_trajectories Array Optional Array of embedded subagent trajectories. Each element is a complete, independently-valid ATIF Trajectory object with its own schema_version, trajectory_id, agent, and step_id sequence starting at 1 — the same schema and validation rules as the parent trajectory. Enables single-file storage of multi-agent workflows: when a SubagentTrajectoryRef.trajectory_path is null, consumers resolve the reference by matching SubagentTrajectoryRef.trajectory_id against Trajectory.trajectory_id of entries in this array. Uniqueness rules: every embedded subagent MUST set trajectory_id, and trajectory_ids within this array MUST be unique. session_id, by contrast, is run-scoped and MAY collide across siblings (or match the parent's session_id) when all trajectories belong to the same logical run; embedded subagents MAY also omit session_id entirely to inherit the parent's run identity. When trajectory_path is set, the reference points at an external file instead; embedded and file-ref forms MAY be mixed within the same parent trajectory.

AgentSchema

The required agent object identifies the agent system used for the trajectory. The name and version fields are required, while model_name, tool_definitions, and extra are optional.

Field Type Status Description
name String Required The name of the agent system (e.g., "openhands", "claude-code", "mini-swe-agent").
version String Required The version identifier of the agent system (e.g., "1.0.0", "v2.3.1").
model_name String Optional Default LLM model used for this trajectory (e.g., "gemini-2.5-flash", "claude-3-5-sonnet"). Step-level model_name overrides this if specified.
tool_definitions Array Optional Array of tool/function definitions available to the agent. Each element follows OpenAI's function calling schema with type and function fields containing the tool's signature and docs.
extra Object Optional Object for custom agent configuration details not covered by the core schema (e.g., prompting strategy, custom parameters).

FinalMetricsSchema

The optional final_metrics object provides aggregate statistics for the entire trajectory. All fields within the optional final_metrics object are optional.

Field Type Status Description
total_prompt_tokens Integer Optional Sum of all prompt tokens across all steps in the trajectory.
total_completion_tokens Integer Optional Sum of all completion tokens across all steps in the trajectory.
total_cached_tokens Integer Optional Sum of all cached tokens across all steps in the trajectory.
total_cost_usd Float Optional Total real monetary cost for the entire trajectory in USD unit.
total_steps Integer Optional Total number of steps (can be unequal to length of steps array if explained in notes).
extra Object Optional Object for custom aggregate metrics not covered by the core schema.

StepObject

The steps array contains all interaction turns. Each StepObject represents either a system prompt, a user message, or a complete agent turn (LLM inference, action execution, and observation receipt).

Field Type Status Description
step_id Integer Required Ordinal index of the turn (starting from 1).
timestamp String Optional ISO 8601 timestamp indicating when this step occurred (e.g., "2025-10-16T14:30:00Z").
source String Required The originator of this step. Must be one of: "system" (for system prompts), "user" (for user messages), or "agent" (for agent responses).
model_name String Optional The specific LLM model used for this turn (e.g., gemini-2.5-flash). Only applicable when source is "agent". If omitted, the model can be inferred from the top-level agent configuration.
reasoning_effort String | Float Optional Qualitative or quantitative measure of effort (e.g., low, medium, or a float score) assigned to this step. Only applicable when source is "agent".
message String | Array Required The dialogue message. For text-only content, this is a string. For multimodal content (v1.6+), this can be an array of ContentPart objects. For system steps, this is the system prompt. For user steps, this is the user's prompt or instruction. For agent steps, this is the assistant's response. This field is required but can be an empty string.
reasoning_content String Optional String field detailing the agent's explicit internal reasoning. Only applicable when source is "agent".
tool_calls Array Optional An array of structured objects for the agent's action(s). A single LLM output may contain multiple tool calls. Only applicable when source is "agent". See ToolCallSchema below.
observation Object Optional Environment feedback/result after actions or system events. For agent steps, this contains results from tool calls, non-tool actions, or subagent delegation. For system steps, this may contain results from system-initiated operations (e.g., subagent delegation, context management, environment reset, checkpoint creation). See ObservationSchema below.
metrics Object Optional Object containing all LLM operational and confidence data for this step, including RL-specific fields (reward, log*probs) if applicable. Only applicable when source is "agent". See MetricsSchema below.
extra Object Optional Object for custom step-level metadata not covered by the core schema. Applicable to all step types (system, user, and agent).
llm_call_count Integer Optional Number of LLM inferences this step represents. When llm_call_count > 1, the metrics are aggregated across multiple LLM calls and per-call attribution is unavailable. When 1, the step represents exactly one inference. When 0 on a source: "agent" step, the step represents a deterministic (non-LLM) dispatch — a graph engine, rule-based pipeline, or eval harness that issued tool_calls without an LLM inference; metrics and reasoning_content MUST be absent on such steps, and SFT pipelines MUST filter them out. When null, the producer did not track this (backward-compatible default). Applicable to all step types.
is_copied_context Boolean Optional Indicates this step was copied from a prior trajectory into the current trajectory for context purposes (e.g., steps retained across a summarization/compression boundary). When True, producers assert the step is not a new interaction and consumers MUST filter it out of supervised fine-tuning (SFT) training data. Absent or None means the step is a new interaction produced in the current trajectory scope. See normative usage below. Applicable to all step types.

Normative usage of is_copied_context: Producers MUST set is_copied_context = True on all steps that are copied from a prior trajectory into the current trajectory for context purposes. This includes steps inserted during context compression events (e.g., after a summarization pass that replaces older steps with a compressed summary, the retained "prior context" steps that are copied into the new trajectory scope). Steps with is_copied_context = True MUST NOT be included in supervised fine-tuning (SFT) training data, as they represent previously-trained interactions whose contribution to the agent's behavior has already been captured. Consumers reading ATIF trajectories for SFT purposes MUST filter out steps where is_copied_context = True before constructing training examples. If is_copied_context is absent or None, the step is assumed to be a new interaction produced in the current trajectory scope.

One-LLM-per-step convention: Exporters SHOULD emit one ATIF step per LLM inference when the underlying framework provides per-call event boundaries (e.g., LLM_START/LLM_END). When an exporter cannot split calls (e.g., an opaque tool with multiple internal LLM calls), it MUST set llm_call_count to the actual count so consumers can detect aggregated metrics.

ToolCallSchema

The optional tool_calls array contains structured objects representing function or tool invocations made by the agent. Each element follows this schema:

Field Type Status Description
tool_call_id String Required Unique identifier for this specific tool call. Used to correlate with observation results via source_call_id.
function_name String Required The name of the function or tool being invoked (e.g., "financial_search", "file_write", "web_search").
arguments Object Required Object containing the arguments passed to the function. Must be a valid JSON object, but can be empty ({}) if no arguments needed.
extra Object Optional Object for custom tool-call-level metadata not covered by the core schema (e.g., timeout, retry count, tool version).

MetricsSchema

All fields within the optional metrics object are optional.

Field Type Status Description
prompt_tokens Integer Optional Total input tokens sent to the model for this turn, including both cached and non-cached tokens. This represents the full size of the prompt (system prompt, history, tool definitions, etc.) that was processed by the model, regardless of whether some tokens were served from cache.
completion_tokens Integer Optional Total tokens generated by the LLM response (including reasoning and tool calls).
cached_tokens Integer Optional Subset of prompt_tokens that were cache hits from prompt caching (e.g., a prefix or history cache). This counts tokens that were reused from cache rather than reprocessed. This is included in the prompt_tokens count, not separate from it.
cost_usd Float Optional Monetary cost of the API call based on current provider pricing for this step.
prompt_token_ids Array Optional Array of integer token IDs for the prompt (input) tokens sent to the LLM in this step, including previous chat history if applicable. Enables accurate analysis and debugging of prompt tokenization. Length should match prompt_tokens count.
completion_token_ids Array Optional Array of integer token IDs for the completion (response) tokens generated in this step. Enables accurate RL training by avoiding retokenization drift. Should align with logprobs array if both are present. Length should match completion_tokens count.
logprobs Array Optional Array of log probabilities for each completion token.2 Should align with completion_token_ids array if both are present. Length should match completion_tokens count.
extra Object Optional Object for provider-specific or experimental metrics not covered by the core schema (e.g., reasoning_tokens, cache_creation_input_tokens).

Token Accounting and Cost Calculation:

ATIF defines prompt_tokens as the total count of all input tokens (both cached and non-cached), with cached_tokens tracking the subset that were cache hits. This provides a complete view of prompt size while also tracking cache efficiency.

To calculate the total cost for a step:

non_cached_prompt_tokens = prompt_tokens - cached_tokens
cost_usd = (non_cached_prompt_tokens × cost_per_input_token) +
           (cached_tokens × cost_per_cached_token) +
           (completion_tokens × cost_per_completion_token)

Similarly, for trajectory-level costs in final_metrics:

non_cached_total = total_prompt_tokens - total_cached_tokens
total_cost_usd = (non_cached_total × cost_per_input_token) +
                 (total_cached_tokens × cost_per_cached_token) +
                 (total_completion_tokens × cost_per_completion_token)

Where:

  • total_prompt_tokens = sum of all prompt_tokens across steps (includes all input tokens, both cached and non-cached)
  • total_cached_tokens = sum of all cached_tokens across steps (subset of total_prompt_tokens)
  • total_completion_tokens = sum of all completion_tokens across steps

Note that ATIF does not record per-token pricing information because:

  1. Pricing can change over time, making historical trajectories inaccurate
  2. Most agent frameworks don't record pricing, requiring a lookup table for conversion
  3. Pricing varies by provider, tier, and region

The cost_usd and total_cost_usd fields store the calculated cost at the time of execution, providing a snapshot without coupling the format to specific pricing models.

Important Note on Additional Cost Factors:

Some LLM providers charge for additional factors beyond the standard token types. For example, Anthropic charges separately for cache_creation_input_tokens when new prompt cache entries are created. These additional cost factors should be recorded in the extra field within the metrics object.

When such additional cost factors exist, the simplified cost formula above may not accurately represent the actual cost. In these cases:

  1. Record the actual cost in the cost_usd field if available from the provider
  2. Store additional token metrics (e.g., cache_creation_input_tokens) in metrics.extra
  3. The presence of values in metrics.extra signals that the standard formula may be incomplete

This approach ensures that actual costs are preserved accurately while maintaining the flexibility to accommodate provider-specific pricing models.

ObservationSchema

The observation object records results from the environment or system events. For agent steps, results may stem from structured tool_calls, agent actions that don't use standard tool calling mechanisms, or subagent delegation. For system steps, observations may contain results from system-initiated operations such as subagent delegation, context management, environment resets, checkpoint creation, or other infrastructure-level events.

Field Type Status Description
results Array Required Array of result objects, each containing feedback from a single tool call or action.

ObservationResultSchema

Each element in the results array follows this schema:

Field Type Status Description
source_call_id String Optional The tool_call_id from the tool_calls array in StepObject that this result corresponds to. If null or omitted, the result comes from an action that doesn't use the standard tool calling format (e.g., agent actions without tool calls or system-initiated operations).
content String | Array Optional The output or result from the tool execution or action. For text-only content, this is a string. For multimodal content (v1.6+), this can be an array of ContentPart objects. May be omitted when subagent_trajectory_ref is present.
subagent_trajectory_ref Array Optional Array of references to delegated subagent trajectories. Each element follows SubagentTrajectoryRefSchema. Use a singleton array for a single subagent.
extra Object Optional Object for custom observation-result-level metadata not covered by the core schema (e.g., confidence score, retrieval score, source document ID).

Example:

{
  "results": [
    {
      "source_call_id": "call_price_1",
      "content": "GOOGL is currently trading at $185.35"
    }
  ]
}

Example (observation result with custom metadata):

{
  "source_call_id": "call_search_001",
  "content": "NVIDIA announces new GPU architecture...",
  "extra": {
    "retrieval_score": 0.92,
    "source_doc_id": "doc-4821"
  }
}

ContentPartSchema (v1.6+)

For multimodal content, the message field in StepObject and the content field in ObservationResultSchema can contain an array of ContentPart objects instead of a plain string. Each ContentPart represents either text or an image.

Field Type Status Description
type String Required The type of content. Must be one of: "text" or "image".
text String Conditional The text content. Required when type is "text", must be omitted when type is "image".
source Object Conditional The image source specification. Required when type is "image", must be omitted when type is "text". See ImageSourceSchema below.

Example (multimodal message with text and image):

{
  "message": [
    { "type": "text", "text": "What is in this image?" },
    { "type": "image", "source": { "media_type": "image/png", "path": "images/step_1_input.png" } }
  ]
}

ImageSourceSchema (v1.6+)

Images are stored as separate files alongside the trajectory JSON file and referenced by path or URL. This avoids bloating the trajectory file with base64-encoded image data.

Field Type Status Description
media_type String Required MIME type of the image. Must be one of: "image/jpeg", "image/png", "image/gif", or "image/webp".
path String Required Location of the image. Can be a relative or absolute file path, or a URL.

Image Storage Convention:

For local storage, images are typically stored in an images/ subdirectory relative to the trajectory file. For example, if the trajectory is at agent/trajectory.json, images would be stored at agent/images/ with relative paths like "images/screenshot.png".

Absolute file paths and URLs are also supported.

SubagentTrajectoryRefSchema

For multi-agent systems or hierarchical agent architectures, an observation result may reference a complete subagent trajectory. This enables tracking of recursive or delegated agent workflows where a parent agent spawns subagents to handle specific subtasks.

Field Type Status Description
trajectory_id String Conditional Canonical identifier of the delegated subagent trajectory, used to resolve embedded references. Matches Trajectory.trajectory_id of an entry in the parent's subagent_trajectories array. Added in ATIF-v1.7. At least one of trajectory_id or trajectory_path MUST be set so the ref is resolvable.
trajectory_path String Conditional Location of the complete subagent trajectory as an external file (file path, S3 URL, database reference, etc.), used to resolve file-ref references. At least one of trajectory_id or trajectory_path MUST be set so the ref is resolvable.
session_id String Optional Run identity of the delegated subagent trajectory. Informational only — recorded so consumers can correlate this ref back to the subagent's run for debug / search / display purposes. Run-scoped (see Trajectory.session_id) and therefore NOT a valid resolution key; consumers MUST NOT use session_id alone to resolve a ref.
extra Object Optional Object for custom metadata about the subagent execution (e.g., summary, exit status, performance metrics).

Resolution mechanisms. A SubagentTrajectoryRef has exactly two resolution mechanisms:

  1. Embedded formtrajectory_id matches the Trajectory.trajectory_id of an entry in the parent's subagent_trajectories array.
  2. File-ref formtrajectory_path points at an external trajectory file.

A ref MUST set at least one of these two fields. When both are set, consumers MAY choose either; typically trajectory_id is preferred when the embedded trajectory is available in-memory, and trajectory_path is used to stream or lazily load.

Why session_id is not a resolution mechanism. session_id is intentionally run-scoped (multiple Trajectory documents MAY share a session_id if they belong to the same logical run — parent + subagents, parent + continuations). Two sibling subagents that belong to the same run MAY legitimately share a session_id, so session_id cannot unambiguously identify which embedded trajectory a ref points at. trajectory_id was introduced in v1.7 to carry the document-level uniqueness guarantee that ref resolution requires, without overloading session_id's run-scoped semantics. session_id on a ref is therefore purely informational — a human- and tooling-friendly breadcrumb back to the subagent's run — not a matching key.

Pre-v1.7 back-compat. Pre-v1.7 had only one resolution mechanism: trajectory_path (external file reference). The subagent_trajectories array on the root Trajectory and the embedded form via trajectory_id are both new in v1.7. Every pre-v1.7 ref that is actually resolvable therefore sets trajectory_path, which satisfies v1.7's at-least-one-of-(trajectory_id, trajectory_path) requirement — such refs remain valid, and their session_id (which in pre-v1.7 was required on the ref) is simply retained as informational metadata.

When subagent_trajectory_ref is present, the content field may be omitted, as the complete trajectory provides full detail. Alternatively, content may contain a simplified summary for quick reference without loading the full subagent trajectory.

III. Agent Trajectory Format Comparative Schema

This table summarizes how ATIF unifies the core requirements of existing agent platforms.

Feature MiniSweAgent Trajectory3 OpenHands Trajectory Gemini-CLI Trajectory ATIF (Proposed Standard)
Primary Structure JSON list of turn objects JSON object for replay Session-based message array Root object containing sequential steps array
Agent Reasoning Explicit discussion field1 Implicit (within message content) Implicit (in thoughts array) Explicit reasoning_content string
Tool/Action Logging Explicit command field (shell string)1 Structured action object Implicit (inferred from message) Optional tool_calls array (allows multiple calls per step)
Environment Feedback Message from role: user (command output)1 Observation/result data Implicit (inferred from response) Dedicated ObservationSchema object (with flexible results array)
LLM Metrics Token counts in response Token counts in response Token counts in response Optional unified MetricsSchema object (Token counts, Cost, Logprobs, Perplexity)
RL Fields None None None Optional rl_experience (reward, log_probs)4

IV. Example ATIF Trajectory Log (Multi-Step Task)

The following example illustrates a three-step task flow, where the user asks a question (step 1), the agent executes a search involving multiple tool calls (step 2), and then delivers a final conversational response (step 3).

Task: The user asks the agent to find the current price of a specific stock.

{
  "schema_version": "ATIF-v1.5",
  "session_id": "025B810F-B3A2-4C67-93C0-FE7A142A947A",
  "agent": {
    "name": "harbor-agent",
    "version": "1.0.0",
    "model_name": "gemini-2.5-flash",
    "tool_definitions": [
      {
        "type": "function",
        "function": {
          "name": "financial_search",
          "description": "Search for financial data for a given stock ticker",
          "parameters": {
            "type": "object",
            "properties": {
              "ticker": {
                "type": "string",
                "description": "Stock ticker symbol"
              },
              "metric": {
                "type": "string",
                "description": "The financial metric to retrieve (e.g., price, volume)"
              }
            },
            "required": ["ticker", "metric"]
          }
        }
      }
    ],
    "extra": {}
  },
  "notes": "Initial test trajectory for financial data retrieval using a single-hop ReAct pattern, focusing on multi-tool execution in Step 2.",
  "extra": {},
  "final_metrics": {
    "total_prompt_tokens": 1120,
    "total_completion_tokens": 124,
    "total_cached_tokens": 200,
    "total_cost_usd": 0.00078,
    "total_steps": 3,
    "extra": {}
  },
  "steps": [
    {
      "step_id": 1,
      "timestamp": "2025-10-11T10:30:00Z",
      "source": "user",
      "message": "What is the current trading price of Alphabet (GOOGL)?",
      "extra": {}
    },
    {
      "step_id": 2,
      "timestamp": "2025-10-11T10:30:02Z",
      "source": "agent",
      "model_name": "gemini-2.5-flash",
      "reasoning_effort": "medium",
      "message": "I will search for the current trading price and volume for GOOGL.",
      "reasoning_content": "The request requires two data points: the current stock price and the latest volume data. I will execute two simultaneous tool calls—one for price and one for volume—to retrieve this information in a single step.",
      "tool_calls": [
        {
          "tool_call_id": "call_price_1",
          "function_name": "financial_search",
          "arguments": { "ticker": "GOOGL", "metric": "price" }
        },
        {
          "tool_call_id": "call_volume_2",
          "function_name": "financial_search",
          "arguments": { "ticker": "GOOGL", "metric": "volume" }
        }
      ],
      "observation": {
        "results": [
          {
            "source_call_id": "call_price_1",
            "content": "GOOGL is currently trading at $185.35 (Close: 10/11/2025)"
          },
          {
            "source_call_id": "call_volume_2",
            "content": "GOOGL volume: 1.5M shares traded."
          }
        ]
      },
      "metrics": {
        "prompt_tokens": 520,
        "completion_tokens": 80,
        "cached_tokens": 200,
        "cost_usd": 0.00045
      }
    },
    {
      "step_id": 3,
      "timestamp": "2025-10-11T10:30:05Z",
      "source": "agent",
      "model_name": "gemini-2.5-flash",
      "reasoning_effort": "low",
      "message": "As of October 11, 2025, Alphabet (GOOGL) is trading at $185.35 with a volume of 1.5M shares traded.",
      "reasoning_content": "The previous step retrieved all necessary data. I will now format this into a final conversational response for the user and terminate the task.",
      "metrics": {
        "prompt_tokens": 600,
        "completion_tokens": 44,
        "completion_token_ids": [
          1722, 310, 5533, 1722, 13, 1640, 13, 1423, 13, 8425, 338, 313, 18672,
          29, 338, 11302, 472, 395, 29896, 29945, 29945, 29889, 29941, 29945,
          411, 263, 7977, 310, 29871, 29896, 29889, 29945, 29924, 29358, 3534,
          287, 29889
        ],
        "logprobs": [
          -0.1, -0.05, -0.02, -0.01, -0.2, -0.15, -0.08, -0.03, -0.12, -0.06,
          -0.04, -0.11, -0.07, -0.09, -0.13, -0.05, -0.02, -0.08, -0.14, -0.06,
          -0.03, -0.1, -0.04, -0.07, -0.05, -0.09, -0.03, -0.11, -0.08, -0.06,
          -0.12, -0.04, -0.07, -0.05, -0.1, -0.03, -0.08, -0.06, -0.11, -0.04,
          -0.07, -0.05, -0.09, -0.02
        ],
        "cost_usd": 0.00033,
        "extra": {
          "reasoning_tokens": 12
        }
      }
    }
  ]
}

V. Real-World Trajectory Examples

To illustrate the differences between existing agent trajectory formats and to provide concrete reference implementations, three example trajectories from different agent frameworks are available in the 0001-trajectory-format/ directory. All three trajectories execute the same simple task: "Create a file called hello.txt with 'Hello, world!' as the content."

Available Examples

  1. mini-swe-agent-trajectory.json — MiniSweAgent format

    • Demonstrates explicit THOUGHT sections and bash command execution
    • Shows the conversational pattern where environment feedback is delivered as user messages
    • Includes detailed token usage and caching information from Claude 3.5 Sonnet
    • 3-step trajectory: create file → verify content → complete task
  2. openhands_trajectory.json — OpenHands format

    • Structured as an event log with distinct action types (system, message, run, finish)
    • Includes extensive tool definitions and security risk assessments
    • Shows GPT-5 reasoning tokens (960 reasoning tokens in the response)
    • Demonstrates the task_tracker concept and rich observability metadata
  3. gemini-cli-trajectory.json — Gemini CLI format

    • Minimalist session-based format with message array
    • Includes embedded token metrics (input, output, cached, thoughts)
    • Single-step completion with Gemini 2.0 Flash
    • No explicit tool calls or structured actions

Key Insights from Examples

These real-world trajectories highlight the diversity of current approaches:

  • Reasoning representation: MiniSweAgent uses explicit THOUGHT fields, OpenHands embeds reasoning in tool calls, Gemini CLI tracks it in token metrics
  • Action logging: Varies from bash strings (MiniSweAgent) to structured function calls (OpenHands) to implicit actions (Gemini CLI)
  • Observability: OpenHands provides the richest metadata with tool call IDs and security assessments, while Gemini CLI is most minimal
  • Step granularity: MiniSweAgent creates multiple steps for verification, while Gemini CLI completes the task in one step

ATIF aims to provide a unified format that can accommodate all these patterns while maintaining interoperability.


VI. Implementation

Reference Implementation

The Harbor project provides a reference implementation of ATIF in Python using Pydantic models for validation and type safety.

Pydantic Models

The complete ATIF schema is implemented as Pydantic models in the src/harbor/models/trajectories/ directory:

  • agent.py — Agent configuration model
  • content.py — Multimodal content models (ContentPart, ImageSource) for v1.6+
  • final_metrics.py — Aggregate trajectory metrics model
  • metrics.py — Per-step LLM metrics model
  • observation.py — Observation container model
  • observation_result.py — Individual observation result model
  • step.py — Step model with validators for timestamps and agent-only fields
  • subagent_trajectory_ref.py — Subagent trajectory reference model
  • tool_call.py — Tool call model
  • trajectory.py — Root trajectory model with validators for step IDs, tool call references, and multimodal content flags

These models provide:

  • Type safety — All fields are strongly typed using Python type hints
  • Validation — Automatic validation of required fields, types, and constraints
  • Custom validators — ISO 8601 timestamp validation, sequential step ID validation, and tool call reference validation
  • JSON serialization.to_json_dict() method for clean JSON export with optional None exclusion

Trajectory Validator

The src/harbor/utils/trajectory_validator.py module provides a command-line tool and programmatic API for validating ATIF trajectory files:

# Validate a trajectory file
python -m harbor.utils.trajectory_validator trajectory.json

The validator:

  • Accepts trajectory data as a dict, JSON string, or file path
  • Validates against the complete ATIF schema using Pydantic models
  • Collects all validation errors before returning (not just the first error)
  • Provides user-friendly error messages with field paths

Programmatic usage:

from harbor.utils.trajectory_validator import TrajectoryValidator

validator = TrajectoryValidator()
is_valid = validator.validate("trajectory.json")

if not is_valid:
    for error in validator.get_errors():
        print(f"Error: {error}")

Integration in Agents

The Terminus 2 agent (src/harbor/agents/terminus_2/terminus_2.py) demonstrates full integration of ATIF:

  • Constructs trajectory steps using Pydantic models throughout the execution
  • Tracks token metrics, costs, and logprobs in the Metrics model
  • Records subagent trajectories with SubagentTrajectoryRef for context summarization
  • Exports complete trajectories to JSON using the Trajectory.to_json_dict() method
  • Automatically validates trajectory structure through Pydantic's runtime validation

VII. Context Management Convention

System steps that transform the agent's context window (e.g., mid-trajectory compaction, context pruning, knowledge injection) may declare their transformation semantics using a context_management object in step.extra. This convention enables consumers to determine context boundaries without relying on producer-specific heuristics.

Convention fields:

Field Type Description
type String The kind of context transformation. Values: "compaction" (prior context compressed into summary), "pruning" (older turns removed), "injection" (external knowledge added to context). Extensible by producers.
boundary String How the transformation affects the agent's context for subsequent steps. Values: "replace" (observation content replaces all prior context), "append" (observation content added to existing context), "truncate" (prior context trimmed). Extensible.

Normative context boundary rule:

When a system step has extra.context_management.boundary = "replace", the agent's effective context window for all subsequent steps consists of: (1) the observation content from the boundary step (observation.results[].content), and (2) any new turns (user, agent, or system) after the boundary step. Steps preceding the boundary are preserved in the trajectory for auditability but are NOT part of the agent's context window for post-boundary steps. Evaluation tools reconstructing the agent's input context for any post-boundary step MUST use the boundary's observation content, not the pre-boundary steps.

Example (context compaction with boundary):

{
  "step_id": 5,
  "source": "system",
  "message": "Context compaction performed",
  "observation": {
    "results": [
      {
        "content": "Summary: prior conversation covered topic X...",
        "subagent_trajectory_ref": [
          { "trajectory_id": "compact-001", "trajectory_path": null }
        ]
      }
    ]
  },
  "extra": {
    "context_management": {
      "type": "compaction",
      "boundary": "replace"
    }
  }
}

References

Footnotes

  1. nebius/SWE-agent-trajectories · Datasets at Hugging Face, accessed October 10, 2025, https://huggingface.co/datasets/nebius/SWE-agent-trajectories 2 3 4

  2. 7 Key LLM Metrics to Enhance AI Reliability | Galileo, accessed October 10, 2025, https://galileo.ai/blog/llm-performance-metrics

  3. Output files - SWE-agent documentation, accessed October 10, 2025, https://swe-agent.com/latest/usage/trajectories/

  4. Towards a Unified View of Large Language Model Post-Training - arXiv, accessed October 10, 2025, https://arxiv.org/html/2509.04419v1