Add OTLP HTTP MetricExporter max_export_batch_size by tammy-baylis-swi · Pull Request #4576 · open-telemetry/opentelemetry-python

tammy-baylis-swi · 2025-05-09T18:04:39Z

Description

Adds support for HTTP OTLPMetricExporter configurable max_export_batch_size, like the gRPC OTLPMetricExporter already does (completed through issue #2710 with PR #2809).

This is currently much longer than the gRPC version because:

HTTP protobuf representations of ResourceMetrics, ScopeMetrics, etc are not replace-able like the gRPC data classes
- So references are stored and new protobuf objects are created immediately before yield/export
protobuf does not define a DataPointT to encompass all metric types
- So I've added some if-elif throughout for accessing data points, creating new metrics objects

Fixes #4577

Type of change

Please delete options that are not relevant.

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
This change requires a documentation update

How Has This Been Tested?

Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration

Added unit tests
Install OTLPMetricExporter locally and do the following:

Set up 3 counters, with add(1) calls to each
Init global MeterProvider using OTLPMetricExporter with max_export_batch_size=2 and endpoint as Collector (debug).
Run to get export in 2 batches (2 counters + 1 counter) of 1 ResourceMetrics each.

Does This PR Require a Contrib Repo Change?

Yes. - Link to PR:
No.

Checklist:

Followed the style guidelines of this project
Changelogs have been updated
Unit tests have been added
Documentation has been updated

tammy-baylis-swi · 2025-05-09T19:08:48Z

I think the aiohttp-client test failure is a hiccup from the recent release, not from changes in this PR.

tammy-baylis-swi · 2025-07-03T17:25:02Z

Working on some conflict resolution with changes introduced in #4564

tammy-baylis-swi · 2025-07-03T21:46:16Z

Hi @lzchen and @open-telemetry/python-approvers , please may I have a new review of this PR.

In 25e8be9 I resolved merge conflicts which required some redesign of HTTP metrics batching to match the timeout updates in #4564. That was the main change since last approval; the split helpers are the same.

The last Cassandra and Celery instrumentor test failures seem unrelated and are citing commit hashes like similar failures in other PRs.

tammy-baylis-swi · 2026-01-15T23:47:04Z

Hi again @open-telemetry/python-approvers , @open-telemetry/python-maintainers please could I have a review of this?

Hi again, please may I have a review

42Questions

Two comments re implementation.

MikeGoldsmith

Looks good. Left one comment regarding reporting failures across batches.

MikeGoldsmith

🚀

github-actions · 2026-03-19T04:05:54Z

This PR has been automatically marked as stale because it has not had any activity for 14 days. It will be closed if no further activity occurs within 14 days of this comment.
If you're still working on this, please add a comment or push new commits.

MikeGoldsmith · 2026-03-19T13:26:57Z

Not stale - Can someone from @open-telemetry/python-approvers take a look please?

anuraaga

Hi @tammy-baylis-swi @lzchen. I was trying to wrap my head around this PR while merging main into my branch, but I am wondering, does it actually work? The types seem to be very off, and I think only work since everything is mocked heavily in the unit tests - the bytes sent will probably be the wrong type based on my skim.

If that sounds right, could it be reverted and tried again?

anuraaga · 2026-03-26T04:10:16Z



+def _split_metrics_data(
+    metrics_data: pb2.MetricsData,


This should be ExportMetricsServiceRequest

anuraaga · 2026-03-26T04:10:45Z

+def _split_metrics_data(
+    metrics_data: pb2.MetricsData,
+    max_export_batch_size: int | None = None,
+) -> Iterable[pb2.MetricsData]:


I think this is also supposed to be Iterable[ExportMetricsServiceRequest]

anuraaga · 2026-03-26T04:12:22Z

+            # Non-retryable
+            MagicMock(ok=False, status_code=400, reason="bad request"),
+        ]
+        mock_encode_metrics.return_value = pb2.MetricsData(


encode_metrics returns ExportMetricsServiceRequest

https://github.com/open-telemetry/opentelemetry-python/blob/main/exporter/opentelemetry-exporter-otlp-proto-common/src/opentelemetry/exporter/otlp/proto/common/_internal/metrics_encoder/__init__.py#L210

tammy-baylis-swi · 2026-03-26T15:23:17Z

Hi @tammy-baylis-swi @lzchen. I was trying to wrap my head around this PR while merging main into my branch, but I am wondering, does it actually work? The types seem to be very off, and I think only work since everything is mocked heavily in the unit tests - the bytes sent will probably be the wrong type based on my skim.

If that sounds right, could it be reverted and tried again?

Hi @anuraaga , I'll take a look. Meanwhile, could you share more about your issue? What specific error do you get when you opt into this feature while instrumenting your test services?

tammy-baylis-swi · 2026-03-26T22:14:11Z

Hi @tammy-baylis-swi @lzchen. I was trying to wrap my head around this PR while merging main into my branch, but I am wondering, does it actually work? The types seem to be very off, and I think only work since everything is mocked heavily in the unit tests - the bytes sent will probably be the wrong type based on my skim.
If that sounds right, could it be reverted and tried again?

Hi @anuraaga , I'll take a look. Meanwhile, could you share more about your issue? What specific error do you get when you opt into this feature while instrumenting your test services?

This gist contains a quick demo app and output using changes from this PR and max_export_batch_size: 2. The debug log shows the batching is working as expected where ResourceMetrics #0 contains trace.service.counter_a and trace.service.counter_b (first batch) and ResourceMetrics #1 contains trace.service.counter_c (second batch). If there are problems with your use case(s), please could you create a new issue.

anuraaga · 2026-03-27T00:27:01Z

Thanks @tammy-baylis-swi for confirming. It seems the collector works with either proto so the issue may be less pressing, but I filed #5014 since it still seems needs fixing

tammy-baylis-swi added 3 commits May 9, 2025 11:03

Add OTLP HTTP MetricExporter max export batch size

a62f2ee

Changelog

d09feba

Lint

dc86036

tammy-baylis-swi marked this pull request as ready for review May 9, 2025 19:07

tammy-baylis-swi requested a review from a team as a code owner May 9, 2025 19:07

tammy-baylis-swi added this to Python PR digest May 9, 2025

tammy-baylis-swi moved this to Ready for review in Python PR digest May 9, 2025

tammy-baylis-swi commented May 9, 2025

View reviewed changes