Skip to content

Add OTLP HTTP MetricExporter max_export_batch_size#4576

Merged
lzchen merged 38 commits into
open-telemetry:mainfrom
tammy-baylis-swi:otlp-http-metrics-export-max-batch
Mar 19, 2026
Merged

Add OTLP HTTP MetricExporter max_export_batch_size#4576
lzchen merged 38 commits into
open-telemetry:mainfrom
tammy-baylis-swi:otlp-http-metrics-export-max-batch

Conversation

@tammy-baylis-swi
Copy link
Copy Markdown
Contributor

@tammy-baylis-swi tammy-baylis-swi commented May 9, 2025

Description

Adds support for HTTP OTLPMetricExporter configurable max_export_batch_size, like the gRPC OTLPMetricExporter already does (completed through issue #2710 with PR #2809).

This is currently much longer than the gRPC version because:

  1. HTTP protobuf representations of ResourceMetrics, ScopeMetrics, etc are not replace-able like the gRPC data classes
    • So references are stored and new protobuf objects are created immediately before yield/export
  2. protobuf does not define a DataPointT to encompass all metric types
    • So I've added some if-elif throughout for accessing data points, creating new metrics objects

Fixes #4577

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration

  • Added unit tests
  • Install OTLPMetricExporter locally and do the following:
  1. Set up 3 counters, with add(1) calls to each
  2. Init global MeterProvider using OTLPMetricExporter with max_export_batch_size=2 and endpoint as Collector (debug).
  3. Run to get export in 2 batches (2 counters + 1 counter) of 1 ResourceMetrics each.

Does This PR Require a Contrib Repo Change?

  • Yes. - Link to PR:
  • No.

Checklist:

  • Followed the style guidelines of this project
  • Changelogs have been updated
  • Unit tests have been added
  • Documentation has been updated

@tammy-baylis-swi tammy-baylis-swi marked this pull request as ready for review May 9, 2025 19:07
@tammy-baylis-swi tammy-baylis-swi requested a review from a team as a code owner May 9, 2025 19:07
@tammy-baylis-swi
Copy link
Copy Markdown
Contributor Author

I think the aiohttp-client test failure is a hiccup from the recent release, not from changes in this PR.

@tammy-baylis-swi tammy-baylis-swi requested a review from lzchen May 12, 2025 22:06
@tammy-baylis-swi
Copy link
Copy Markdown
Contributor Author

Working on some conflict resolution with changes introduced in #4564

@tammy-baylis-swi
Copy link
Copy Markdown
Contributor Author

Hi @lzchen and @open-telemetry/python-approvers , please may I have a new review of this PR.

In 25e8be9 I resolved merge conflicts which required some redesign of HTTP metrics batching to match the timeout updates in #4564. That was the main change since last approval; the split helpers are the same.

The last Cassandra and Celery instrumentor test failures seem unrelated and are citing commit hashes like similar failures in other PRs.

@tammy-baylis-swi tammy-baylis-swi requested review from a team and lzchen July 3, 2025 21:46
@tammy-baylis-swi tammy-baylis-swi moved this from Easy to review / merge / close to Ready for review in Python PR digest Jan 15, 2026
@tammy-baylis-swi tammy-baylis-swi moved this from Ready for review to Easy to review / merge / close in Python PR digest Jan 15, 2026
@tammy-baylis-swi
Copy link
Copy Markdown
Contributor Author

Hi again @open-telemetry/python-approvers , @open-telemetry/python-maintainers please could I have a review of this?

Hi again, please may I have a review

Copy link
Copy Markdown
Contributor

@42Questions 42Questions left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two comments re implementation.

Copy link
Copy Markdown
Member

@MikeGoldsmith MikeGoldsmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Left one comment regarding reporting failures across batches.

Copy link
Copy Markdown
Member

@MikeGoldsmith MikeGoldsmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

@tammy-baylis-swi tammy-baylis-swi moved this from Easy to review / merge / close to Approved PRs in Python PR digest Feb 26, 2026
@github-actions
Copy link
Copy Markdown

This PR has been automatically marked as stale because it has not had any activity for 14 days. It will be closed if no further activity occurs within 14 days of this comment.
If you're still working on this, please add a comment or push new commits.

@github-actions github-actions Bot added the Stale label Mar 19, 2026
@MikeGoldsmith
Copy link
Copy Markdown
Member

Not stale - Can someone from @open-telemetry/python-approvers take a look please?

@lzchen lzchen merged commit 7b38450 into open-telemetry:main Mar 19, 2026
508 checks passed
@github-project-automation github-project-automation Bot moved this from Approved PRs to Done in Python PR digest Mar 19, 2026
@tammy-baylis-swi tammy-baylis-swi deleted the otlp-http-metrics-export-max-batch branch March 19, 2026 18:30
Copy link
Copy Markdown
Contributor

@anuraaga anuraaga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @tammy-baylis-swi @lzchen. I was trying to wrap my head around this PR while merging main into my branch, but I am wondering, does it actually work? The types seem to be very off, and I think only work since everything is mocked heavily in the unit tests - the bytes sent will probably be the wrong type based on my skim.

If that sounds right, could it be reverted and tried again?



def _split_metrics_data(
metrics_data: pb2.MetricsData,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be ExportMetricsServiceRequest

def _split_metrics_data(
metrics_data: pb2.MetricsData,
max_export_batch_size: int | None = None,
) -> Iterable[pb2.MetricsData]:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is also supposed to be Iterable[ExportMetricsServiceRequest]

# Non-retryable
MagicMock(ok=False, status_code=400, reason="bad request"),
]
mock_encode_metrics.return_value = pb2.MetricsData(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tammy-baylis-swi
Copy link
Copy Markdown
Contributor Author

Hi @tammy-baylis-swi @lzchen. I was trying to wrap my head around this PR while merging main into my branch, but I am wondering, does it actually work? The types seem to be very off, and I think only work since everything is mocked heavily in the unit tests - the bytes sent will probably be the wrong type based on my skim.

If that sounds right, could it be reverted and tried again?

Hi @anuraaga , I'll take a look. Meanwhile, could you share more about your issue? What specific error do you get when you opt into this feature while instrumenting your test services?

@tammy-baylis-swi
Copy link
Copy Markdown
Contributor Author

Hi @tammy-baylis-swi @lzchen. I was trying to wrap my head around this PR while merging main into my branch, but I am wondering, does it actually work? The types seem to be very off, and I think only work since everything is mocked heavily in the unit tests - the bytes sent will probably be the wrong type based on my skim.
If that sounds right, could it be reverted and tried again?

Hi @anuraaga , I'll take a look. Meanwhile, could you share more about your issue? What specific error do you get when you opt into this feature while instrumenting your test services?

This gist contains a quick demo app and output using changes from this PR and max_export_batch_size: 2. The debug log shows the batching is working as expected where ResourceMetrics #0 contains trace.service.counter_a and trace.service.counter_b (first batch) and ResourceMetrics #1 contains trace.service.counter_c (second batch). If there are problems with your use case(s), please could you create a new issue.

@anuraaga
Copy link
Copy Markdown
Contributor

Thanks @tammy-baylis-swi for confirming. It seems the collector works with either proto so the issue may be less pressing, but I filed #5014 since it still seems needs fixing

Vitexus pushed a commit to Vitexus/opentelemetry-python that referenced this pull request Apr 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

Add HTTP OTLPMetricExporter configurable max export batch size, like gRPC

6 participants