Skip to content

[fix](regression) Make Iceberg rewrite where init script idempotent#63673

Merged
Gabriel39 merged 1 commit into
apache:masterfrom
suxiaogang223:codex/fix-iceberg-rewrite-where-bootstrap
May 27, 2026
Merged

[fix](regression) Make Iceberg rewrite where init script idempotent#63673
Gabriel39 merged 1 commit into
apache:masterfrom
suxiaogang223:codex/fix-iceberg-rewrite-where-bootstrap

Conversation

@suxiaogang223
Copy link
Copy Markdown
Member

@suxiaogang223 suxiaogang223 commented May 26, 2026

What problem does this PR solve?

Issue Number: N/A

Problem Summary:

test_iceberg_rewrite_data_files_where_conditions depends on three Iceberg tables created by the Spark bootstrap script run21.sql. The script used CREATE TABLE IF NOT EXISTS and then always inserted the test rows. If the table already exists or the bootstrap SQL is re-entered after partial execution, the insert statements append data to the existing table, so the regression case may fail before running rewrite_data_files because COUNT(*) is no longer the expected 30 rows.

This PR makes the init SQL for this case idempotent by dropping and recreating the three test tables before inserting the fixed test data.

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
      • git diff --check -- docker/thirdparties/docker-compose/iceberg/scripts/create_preinstalled_scripts/iceberg/run21.sql
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes. The Iceberg bootstrap SQL now recreates the dedicated rewrite-where-condition test tables before loading fixed test data.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@suxiaogang223 suxiaogang223 changed the title [codex] Fix Iceberg rewrite where bootstrap data [fix](regression) Make Iceberg rewrite where init script idempotent May 26, 2026
@suxiaogang223 suxiaogang223 marked this pull request as ready for review May 26, 2026 08:37
@suxiaogang223
Copy link
Copy Markdown
Member Author

run buildall

@Gabriel39
Copy link
Copy Markdown
Contributor

/review

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Summary
No blocking issues found. The change is small and focused: the dedicated Iceberg bootstrap tables are dropped before recreation so rerunning the Spark init SQL cannot append duplicate rows and break the regression test count assumptions.

Critical checkpoint conclusions:

  • Goal/test proof: The goal is to make run21.sql idempotent for test_iceberg_rewrite_data_files_where_conditions; the code accomplishes this by resetting the three dedicated tables before fixed inserts. Existing regression test assertions still cover the expected 30-row state, though the PR only reports git diff --check as manual validation.
  • Scope/focus: Minimal and consistent with neighboring Iceberg bootstrap scripts that use DROP TABLE IF EXISTS followed by CREATE TABLE.
  • Concurrency/lifecycle: No runtime concurrency or special lifecycle concerns; this is bootstrap SQL executed during external test environment initialization.
  • Configuration/compatibility/protocol: No configuration, serialization, FE-BE protocol, or storage format compatibility changes.
  • Parallel paths: All three tables used by the related regression case were updated consistently.
  • Conditional checks: No new conditional logic beyond standard DROP TABLE IF EXISTS semantics.
  • Test coverage/results: No test result files are modified. The existing regression test remains the functional coverage; no additional issue found in the changed script.
  • Observability/transactions/data writes: No Doris transaction-path or observability changes. The script writes external Iceberg test fixture data only.
  • Performance: No meaningful performance concern; the affected data set is small and bootstrap-only.

Focus points: No additional user-provided review focus was supplied.

@github-actions
Copy link
Copy Markdown
Contributor

OpenCode automated review failed and did not complete.

Error: Review step was failure (possibly timeout or cancelled)
Workflow run: https://github.com/apache/doris/actions/runs/26486763238

Please inspect the workflow logs and rerun the review after the underlying issue is resolved.

@zclllyybb
Copy link
Copy Markdown
Contributor

skip buildall

@github-actions github-actions Bot added the approved Indicates a PR has been approved by one committer. label May 27, 2026
@github-actions
Copy link
Copy Markdown
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Copy Markdown
Contributor

PR approved by anyone and no changes requested.

@Gabriel39 Gabriel39 merged commit 5fae02f into apache:master May 27, 2026
34 of 35 checks passed
github-actions Bot pushed a commit that referenced this pull request May 27, 2026
…63673)

`test_iceberg_rewrite_data_files_where_conditions` depends on three
Iceberg tables created by the Spark bootstrap script `run21.sql`. The
script used `CREATE TABLE IF NOT EXISTS` and then always inserted the
test rows. If the table already exists or the bootstrap SQL is
re-entered after partial execution, the insert statements append data to
the existing table, so the regression case may fail before running
`rewrite_data_files` because `COUNT(*)` is no longer the expected 30
rows.

This PR makes the init SQL for this case idempotent by dropping and
recreating the three test tables before inserting the fixed test data.
github-actions Bot pushed a commit that referenced this pull request May 27, 2026
…63673)

`test_iceberg_rewrite_data_files_where_conditions` depends on three
Iceberg tables created by the Spark bootstrap script `run21.sql`. The
script used `CREATE TABLE IF NOT EXISTS` and then always inserted the
test rows. If the table already exists or the bootstrap SQL is
re-entered after partial execution, the insert statements append data to
the existing table, so the regression case may fail before running
`rewrite_data_files` because `COUNT(*)` is no longer the expected 30
rows.

This PR makes the init SQL for this case idempotent by dropping and
recreating the three test tables before inserting the fixed test data.
morningman pushed a commit that referenced this pull request May 28, 2026
yiguolei pushed a commit that referenced this pull request May 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/4.0.6-merged dev/4.1.2-merged reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants