Backport CI & test output fixes to release-11.1 #7884

onurctirtir · 2025-02-04T14:41:58Z

No description provided.

(cherry picked from commit b886cfa)

(cherry picked from commit cbe0de3)

.. as documented in actions/upload-artifact#480. (cherry picked from commit 26f16a7)

Removes el/7 and ol/7 as runners and update checkout action to v4 We use EL/7 and OL/7 runners to test packaging for these distributions. However, for the past two weeks, we've encountered errors during the checkout step in the pipelines. The error message is as follows: ``` /__e/node20/bin/node: /lib64/libm.so.6: version `GLIBC_2.27' not found (required by /__e/node20/bin/node) /__e/node20/bin/node: /lib64/libstdc++.so.6: version `GLIBCXX_3.4.20' not found (required by /__e/node20/bin/node) /__e/node20/bin/node: /lib64/libstdc++.so.6: version `CXXABI_1.3.9' not found (required by /__e/node20/bin/node) /__e/node20/bin/node: /lib64/libstdc++.so.6: version `GLIBCXX_3.4.21' not found (required by /__e/node20/bin/node) /__e/node20/bin/node: /lib64/libc.so.6: version `GLIBC_2.28' not found (required by /__e/node20/bin/node) /__e/node20/bin/node: /lib64/libc.so.6: version `GLIBC_2.25' not found (required by /__e/node20/bin/node) ``` The GCC version within the EL/7 and OL/7 Docker images is 2.17, and we cannot upgrade it. Therefore, we need to remove these images from the packaging test pipelines. Consequently, we will no longer verify if the code builds for EL/7 and OL/7. However, we are not using these packaging images as runners within the packaging infrastructure, so we can continue to use these images for packaging. Additional Info: I learned that Marlin team fully dropped the el/7 support so we will drop in further releases as well (cherry picked from commit c603c3e)

DESCRIPTION: Removes ubuntu/bionic from packaging pipelines Since pg16 beta is not available for ubuntu/bionic and ubuntu/bionic support is EOL, I need to remove this os from pipeline https://ubuntu.com/blog/ubuntu-18-04-eol-for-devices Additionally, added concurrency support for GH Actions Packaging pipeline (cherry picked from commit 553780e)

(cherry picked from commit 3fe2240)

(cherry picked from commit 4bf9a7b)

foreign_key_to_reference_shard_rebalance failed because partition of 2024 year does not exist, fixed by add default partition. Co-authored-by: chuhx <148182736+cstarc1@users.noreply.github.com> (cherry picked from commit 968ac74)

https://github.com/citusdata/citus/actions/runs/6745019678/attempts/1#summary-18336188930 ```diff insert into target_table SELECT a*2 FROM source_table RETURNING a; -NOTICE: executing the command locally: SELECT bytes FROM fetch_intermediate_results(ARRAY['repartitioned_results_xxxxx_from_4213582_to_0','repartitioned_results_xxxxx_from_4213584_to_0']::text[],'localhost',57638) bytes +NOTICE: executing the command locally: SELECT bytes FROM fetch_intermediate_results(ARRAY['repartitioned_results_3940758121873413_from_4213584_to_0','repartitioned_results_3940758121873413_from_4213582_to_0']::text[],'localhost',57638) bytes ``` The elements in the array passed to `fetch_intermediate_results` are the same, but in the opposite order than expected. To fix this flakiness, we can omit the `"SELECT bytes FROM fetch_intermediate_results..."` line. From the following logs, it is understandable that the intermediate results have been fetched. (cherry picked from commit 0dc41ee)

codecov · 2025-02-04T14:45:51Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 89.35%. Comparing base (7b51f3e) to head (dcf4f33).
Report is 101 commits behind head on release-11.1.

Additional details and impacted files

@@               Coverage Diff                @@
##           release-11.1    #7884      +/-   ##
================================================
- Coverage         92.85%   89.35%   -3.51%     
================================================
  Files               255      254       -1     
  Lines             54987    55292     +305     
  Branches              0     6889    +6889     
================================================
- Hits              51059    49405    -1654     
+ Misses             3928     3893      -35     
- Partials              0     1994    +1994

Sometimes isolation_metadata_sync_deadlock fails in CI like this: ```diff diff -dU10 -w /__w/citus/citus/src/test/regress/expected/isolation_metadata_sync_deadlock.out /__w/citus/citus/src/test/regress/results/isolation_metadata_sync_deadlock.out --- /__w/citus/citus/src/test/regress/expected/isolation_metadata_sync_deadlock.out.modified 2023-11-01 16:03:15.090199229 +0000 +++ /__w/citus/citus/src/test/regress/results/isolation_metadata_sync_deadlock.out.modified 2023-11-01 16:03:15.098199312 +0000 @@ -110,10 +110,14 @@ t (1 row) step s2-stop-connection: SELECT stop_session_level_connection_to_node(); stop_session_level_connection_to_node ------------------------------------- (1 row) + +teardown failed: ERROR: localhost:57638 is a metadata node, but is out of sync +HINT: If the node is up, wait until metadata gets synced to it and try again. +CONTEXT: SQL statement "SELECT master_remove_distributed_table_metadata_from_workers(v_obj.objid, v_obj.schema_name, v_obj.object_name)" ``` Source: https://github.com/citusdata/citus/actions/runs/6721938040/attempts/1#summary-18268946448 To fix this we now wait for the metadata to be fully synced to all nodes at the start of the teardown steps. (cherry picked from commit a6e8688)

For some reason using localhost in our hba file doesn't have the intended effect anymore in our Github Actions runners. Probably because of some networking change (IPv6 maybe) or some change in the `/etc/hosts` file. Replacing localhost with the equivalent loopback IPv4 and IPv6 addresses resolved this issue. (cherry picked from commit 8c9de08)

Sometimes in CI we run into this failure: ```diff SELECT resultId, nodeport, rowcount, targetShardId, targetShardIndex FROM partition_task_list_results('test', $$ SELECT * FROM source_table $$, 'target_table') NATURAL JOIN pg_dist_node; -WARNING: connection to the remote node localhost:xxxxx failed with the following error: connection not open +ERROR: connection to the remote node localhost:9060 failed with the following error: connection not open SELECT * FROM distributed_result_info ORDER BY resultId; - resultid | nodeport | rowcount | targetshardid | targetshardindex ---------------------------------------------------------------------- - test_from_100800_to_0 | 9060 | 22 | 100805 | 0 - test_from_100801_to_0 | 57637 | 2 | 100805 | 0 - test_from_100801_to_1 | 57637 | 15 | 100806 | 1 - test_from_100802_to_1 | 57637 | 10 | 100806 | 1 - test_from_100802_to_2 | 57637 | 5 | 100807 | 2 - test_from_100803_to_2 | 57637 | 18 | 100807 | 2 - test_from_100803_to_3 | 57637 | 4 | 100808 | 3 - test_from_100804_to_3 | 9060 | 24 | 100808 | 3 -(8 rows) - +ERROR: current transaction is aborted, commands ignored until end of transaction block -- fetch from worker 2 should fail SAVEPOINT s1; +ERROR: current transaction is aborted, commands ignored until end of transaction block SELECT fetch_intermediate_results('{test_from_100802_to_1,test_from_100802_to_2}'::text[], 'localhost', :worker_2_port) > 0 AS fetched; -ERROR: could not open file "base/pgsql_job_cache/xx_x_xxx/test_from_100802_to_1.data": No such file or directory -CONTEXT: while executing command on localhost:xxxxx +ERROR: current transaction is aborted, commands ignored until end of transaction block ROLLBACK TO SAVEPOINT s1; +ERROR: savepoint "s1" does not exist -- fetch from worker 1 should succeed SELECT fetch_intermediate_results('{test_from_100802_to_1,test_from_100802_to_2}'::text[], 'localhost', :worker_1_port) > 0 AS fetched; - fetched ---------------------------------------------------------------------- - t -(1 row) - +ERROR: current transaction is aborted, commands ignored until end of transaction block -- make sure the results read are same as the previous transaction block SELECT count(*), sum(x) FROM read_intermediate_results('{test_from_100802_to_1,test_from_100802_to_2}'::text[],'binary') AS res (x int); - count | sum ---------------------------------------------------------------------- - 15 | 863 -(1 row) - +ERROR: current transaction is aborted, commands ignored until end of transaction block ROLLBACk; ``` As outlined in the #7306 I created, the reason for this is related to only having a single connection open to the node. Finding and fixing the full cause is not trivial, so instead this PR starts working around this bug by forcing maximum parallelism. Preferably we'd want this workaround not to be necessary, but that requires spending time to fix this. For now having a less flaky CI is good enough. (cherry picked from commit f171ec9)

onurctirtir and others added 9 commits February 4, 2025 17:35

Upgrade upload-artifacts action to 4.6.0

76fac7d

(cherry picked from commit b886cfa)

Upgrade download-artifacts action to 4.1.8

754e260

(cherry picked from commit cbe0de3)

Avoid publishing artifacts with conflicting names

ab6064d

.. as documented in actions/upload-artifact#480. (cherry picked from commit 26f16a7)

Updates github checkout actions to v4 (#7611)

073b4e7

(cherry picked from commit 3fe2240)

Workaround the the runner issue in check-sql-snapshots

8c9936b

(cherry picked from commit 4bf9a7b)

JelteF and others added 4 commits February 4, 2025 17:48

Add an alternative output for multi_move_mx

dcb290a

onurctirtir force-pushed the release-11.1-fix-CI-02-04-25 branch from 2417749 to dcf4f33 Compare February 4, 2025 14:50

onurctirtir enabled auto-merge (rebase) February 4, 2025 16:08

onurctirtir merged commit 97e2f78 into release-11.1 Feb 4, 2025
117 checks passed

onurctirtir deleted the release-11.1-fix-CI-02-04-25 branch February 4, 2025 16:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Backport CI & test output fixes to release-11.1 #7884

Backport CI & test output fixes to release-11.1 #7884

onurctirtir commented Feb 4, 2025

codecov bot commented Feb 4, 2025 •

edited

Loading

Backport CI & test output fixes to release-11.1 #7884

Backport CI & test output fixes to release-11.1 #7884

Conversation

onurctirtir commented Feb 4, 2025

codecov bot commented Feb 4, 2025 • edited Loading

Codecov Report

codecov bot commented Feb 4, 2025 •

edited

Loading