-
Notifications
You must be signed in to change notification settings - Fork 688
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Backport CI & test output fixes to release-11.1 #7884
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
(cherry picked from commit b886cfa)
(cherry picked from commit cbe0de3)
.. as documented in actions/upload-artifact#480. (cherry picked from commit 26f16a7)
Removes el/7 and ol/7 as runners and update checkout action to v4 We use EL/7 and OL/7 runners to test packaging for these distributions. However, for the past two weeks, we've encountered errors during the checkout step in the pipelines. The error message is as follows: ``` /__e/node20/bin/node: /lib64/libm.so.6: version `GLIBC_2.27' not found (required by /__e/node20/bin/node) /__e/node20/bin/node: /lib64/libstdc++.so.6: version `GLIBCXX_3.4.20' not found (required by /__e/node20/bin/node) /__e/node20/bin/node: /lib64/libstdc++.so.6: version `CXXABI_1.3.9' not found (required by /__e/node20/bin/node) /__e/node20/bin/node: /lib64/libstdc++.so.6: version `GLIBCXX_3.4.21' not found (required by /__e/node20/bin/node) /__e/node20/bin/node: /lib64/libc.so.6: version `GLIBC_2.28' not found (required by /__e/node20/bin/node) /__e/node20/bin/node: /lib64/libc.so.6: version `GLIBC_2.25' not found (required by /__e/node20/bin/node) ``` The GCC version within the EL/7 and OL/7 Docker images is 2.17, and we cannot upgrade it. Therefore, we need to remove these images from the packaging test pipelines. Consequently, we will no longer verify if the code builds for EL/7 and OL/7. However, we are not using these packaging images as runners within the packaging infrastructure, so we can continue to use these images for packaging. Additional Info: I learned that Marlin team fully dropped the el/7 support so we will drop in further releases as well (cherry picked from commit c603c3e)
DESCRIPTION: Removes ubuntu/bionic from packaging pipelines Since pg16 beta is not available for ubuntu/bionic and ubuntu/bionic support is EOL, I need to remove this os from pipeline https://ubuntu.com/blog/ubuntu-18-04-eol-for-devices Additionally, added concurrency support for GH Actions Packaging pipeline (cherry picked from commit 553780e)
(cherry picked from commit 3fe2240)
(cherry picked from commit 4bf9a7b)
foreign_key_to_reference_shard_rebalance failed because partition of 2024 year does not exist, fixed by add default partition. Co-authored-by: chuhx <148182736+cstarc1@users.noreply.github.com> (cherry picked from commit 968ac74)
https://github.com/citusdata/citus/actions/runs/6745019678/attempts/1#summary-18336188930 ```diff insert into target_table SELECT a*2 FROM source_table RETURNING a; -NOTICE: executing the command locally: SELECT bytes FROM fetch_intermediate_results(ARRAY['repartitioned_results_xxxxx_from_4213582_to_0','repartitioned_results_xxxxx_from_4213584_to_0']::text[],'localhost',57638) bytes +NOTICE: executing the command locally: SELECT bytes FROM fetch_intermediate_results(ARRAY['repartitioned_results_3940758121873413_from_4213584_to_0','repartitioned_results_3940758121873413_from_4213582_to_0']::text[],'localhost',57638) bytes ``` The elements in the array passed to `fetch_intermediate_results` are the same, but in the opposite order than expected. To fix this flakiness, we can omit the `"SELECT bytes FROM fetch_intermediate_results..."` line. From the following logs, it is understandable that the intermediate results have been fetched. (cherry picked from commit 0dc41ee)
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## release-11.1 #7884 +/- ##
================================================
- Coverage 92.85% 89.35% -3.51%
================================================
Files 255 254 -1
Lines 54987 55292 +305
Branches 0 6889 +6889
================================================
- Hits 51059 49405 -1654
+ Misses 3928 3893 -35
- Partials 0 1994 +1994 |
Sometimes isolation_metadata_sync_deadlock fails in CI like this: ```diff diff -dU10 -w /__w/citus/citus/src/test/regress/expected/isolation_metadata_sync_deadlock.out /__w/citus/citus/src/test/regress/results/isolation_metadata_sync_deadlock.out --- /__w/citus/citus/src/test/regress/expected/isolation_metadata_sync_deadlock.out.modified 2023-11-01 16:03:15.090199229 +0000 +++ /__w/citus/citus/src/test/regress/results/isolation_metadata_sync_deadlock.out.modified 2023-11-01 16:03:15.098199312 +0000 @@ -110,10 +110,14 @@ t (1 row) step s2-stop-connection: SELECT stop_session_level_connection_to_node(); stop_session_level_connection_to_node ------------------------------------- (1 row) + +teardown failed: ERROR: localhost:57638 is a metadata node, but is out of sync +HINT: If the node is up, wait until metadata gets synced to it and try again. +CONTEXT: SQL statement "SELECT master_remove_distributed_table_metadata_from_workers(v_obj.objid, v_obj.schema_name, v_obj.object_name)" ``` Source: https://github.com/citusdata/citus/actions/runs/6721938040/attempts/1#summary-18268946448 To fix this we now wait for the metadata to be fully synced to all nodes at the start of the teardown steps. (cherry picked from commit a6e8688)
For some reason using localhost in our hba file doesn't have the intended effect anymore in our Github Actions runners. Probably because of some networking change (IPv6 maybe) or some change in the `/etc/hosts` file. Replacing localhost with the equivalent loopback IPv4 and IPv6 addresses resolved this issue. (cherry picked from commit 8c9de08)
Sometimes in CI we run into this failure: ```diff SELECT resultId, nodeport, rowcount, targetShardId, targetShardIndex FROM partition_task_list_results('test', $$ SELECT * FROM source_table $$, 'target_table') NATURAL JOIN pg_dist_node; -WARNING: connection to the remote node localhost:xxxxx failed with the following error: connection not open +ERROR: connection to the remote node localhost:9060 failed with the following error: connection not open SELECT * FROM distributed_result_info ORDER BY resultId; - resultid | nodeport | rowcount | targetshardid | targetshardindex ---------------------------------------------------------------------- - test_from_100800_to_0 | 9060 | 22 | 100805 | 0 - test_from_100801_to_0 | 57637 | 2 | 100805 | 0 - test_from_100801_to_1 | 57637 | 15 | 100806 | 1 - test_from_100802_to_1 | 57637 | 10 | 100806 | 1 - test_from_100802_to_2 | 57637 | 5 | 100807 | 2 - test_from_100803_to_2 | 57637 | 18 | 100807 | 2 - test_from_100803_to_3 | 57637 | 4 | 100808 | 3 - test_from_100804_to_3 | 9060 | 24 | 100808 | 3 -(8 rows) - +ERROR: current transaction is aborted, commands ignored until end of transaction block -- fetch from worker 2 should fail SAVEPOINT s1; +ERROR: current transaction is aborted, commands ignored until end of transaction block SELECT fetch_intermediate_results('{test_from_100802_to_1,test_from_100802_to_2}'::text[], 'localhost', :worker_2_port) > 0 AS fetched; -ERROR: could not open file "base/pgsql_job_cache/xx_x_xxx/test_from_100802_to_1.data": No such file or directory -CONTEXT: while executing command on localhost:xxxxx +ERROR: current transaction is aborted, commands ignored until end of transaction block ROLLBACK TO SAVEPOINT s1; +ERROR: savepoint "s1" does not exist -- fetch from worker 1 should succeed SELECT fetch_intermediate_results('{test_from_100802_to_1,test_from_100802_to_2}'::text[], 'localhost', :worker_1_port) > 0 AS fetched; - fetched ---------------------------------------------------------------------- - t -(1 row) - +ERROR: current transaction is aborted, commands ignored until end of transaction block -- make sure the results read are same as the previous transaction block SELECT count(*), sum(x) FROM read_intermediate_results('{test_from_100802_to_1,test_from_100802_to_2}'::text[],'binary') AS res (x int); - count | sum ---------------------------------------------------------------------- - 15 | 863 -(1 row) - +ERROR: current transaction is aborted, commands ignored until end of transaction block ROLLBACk; ``` As outlined in the #7306 I created, the reason for this is related to only having a single connection open to the node. Finding and fixing the full cause is not trivial, so instead this PR starts working around this bug by forcing maximum parallelism. Preferably we'd want this workaround not to be necessary, but that requires spending time to fix this. For now having a less flaky CI is good enough. (cherry picked from commit f171ec9)
onurctirtir
force-pushed
the
release-11.1-fix-CI-02-04-25
branch
from
February 4, 2025 14:50
2417749
to
dcf4f33
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.