[fix](regression) Make Iceberg rewrite where init script idempotent#63673
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
|
/review |
There was a problem hiding this comment.
Review Summary
No blocking issues found. The change is small and focused: the dedicated Iceberg bootstrap tables are dropped before recreation so rerunning the Spark init SQL cannot append duplicate rows and break the regression test count assumptions.
Critical checkpoint conclusions:
- Goal/test proof: The goal is to make run21.sql idempotent for test_iceberg_rewrite_data_files_where_conditions; the code accomplishes this by resetting the three dedicated tables before fixed inserts. Existing regression test assertions still cover the expected 30-row state, though the PR only reports git diff --check as manual validation.
- Scope/focus: Minimal and consistent with neighboring Iceberg bootstrap scripts that use DROP TABLE IF EXISTS followed by CREATE TABLE.
- Concurrency/lifecycle: No runtime concurrency or special lifecycle concerns; this is bootstrap SQL executed during external test environment initialization.
- Configuration/compatibility/protocol: No configuration, serialization, FE-BE protocol, or storage format compatibility changes.
- Parallel paths: All three tables used by the related regression case were updated consistently.
- Conditional checks: No new conditional logic beyond standard DROP TABLE IF EXISTS semantics.
- Test coverage/results: No test result files are modified. The existing regression test remains the functional coverage; no additional issue found in the changed script.
- Observability/transactions/data writes: No Doris transaction-path or observability changes. The script writes external Iceberg test fixture data only.
- Performance: No meaningful performance concern; the affected data set is small and bootstrap-only.
Focus points: No additional user-provided review focus was supplied.
|
OpenCode automated review failed and did not complete. Error: Review step was failure (possibly timeout or cancelled) Please inspect the workflow logs and rerun the review after the underlying issue is resolved. |
|
skip buildall |
|
PR approved by at least one committer and no changes requested. |
|
PR approved by anyone and no changes requested. |
…63673) `test_iceberg_rewrite_data_files_where_conditions` depends on three Iceberg tables created by the Spark bootstrap script `run21.sql`. The script used `CREATE TABLE IF NOT EXISTS` and then always inserted the test rows. If the table already exists or the bootstrap SQL is re-entered after partial execution, the insert statements append data to the existing table, so the regression case may fail before running `rewrite_data_files` because `COUNT(*)` is no longer the expected 30 rows. This PR makes the init SQL for this case idempotent by dropping and recreating the three test tables before inserting the fixed test data.
…63673) `test_iceberg_rewrite_data_files_where_conditions` depends on three Iceberg tables created by the Spark bootstrap script `run21.sql`. The script used `CREATE TABLE IF NOT EXISTS` and then always inserted the test rows. If the table already exists or the bootstrap SQL is re-entered after partial execution, the insert statements append data to the existing table, so the regression case may fail before running `rewrite_data_files` because `COUNT(*)` is no longer the expected 30 rows. This PR makes the init SQL for this case idempotent by dropping and recreating the three test tables before inserting the fixed test data.
…idempotent #63673 (#63752) Cherry-picked from #63673 Co-authored-by: Socrates <[email protected]>
…idempotent #63673 (#63753) Cherry-picked from #63673 Co-authored-by: Socrates <[email protected]>
What problem does this PR solve?
Issue Number: N/A
Problem Summary:
test_iceberg_rewrite_data_files_where_conditionsdepends on three Iceberg tables created by the Spark bootstrap scriptrun21.sql. The script usedCREATE TABLE IF NOT EXISTSand then always inserted the test rows. If the table already exists or the bootstrap SQL is re-entered after partial execution, the insert statements append data to the existing table, so the regression case may fail before runningrewrite_data_filesbecauseCOUNT(*)is no longer the expected 30 rows.This PR makes the init SQL for this case idempotent by dropping and recreating the three test tables before inserting the fixed test data.
Release note
None
Check List (For Author)
Test
git diff --check -- docker/thirdparties/docker-compose/iceberg/scripts/create_preinstalled_scripts/iceberg/run21.sqlBehavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)