You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Running ChaNGa under valgrind reports that CkArray::springCleaning() is accessing freed memory.
This is with ChaNGa version 3.5 commit v3.5-11-gc7ba57c0 and charm version v7.1.0-devel-321-g606459e74
This is built on an AMD/infiniband machine with mpi-linux-x86_64-smp with gcc v11.2.0 and mvapchi2 2.3.6
Soon after writing an output (using CkIO) valgrind reports errors like:
==3563697== Invalid read of size 8
==3563697== at 0x7DC2DB: CkArray::staticSpringCleaning(void*) (in /anvil/scratch/x-trq/testdata/ChaNGa.smp)
==3563697== by 0x88A663: CcdRaiseCondition (in /anvil/scratch/x-trq/testdata/ChaNGa.smp)
==3563697== by 0x88AD15: CcdCallBacks (in /anvil/scratch/x-trq/testdata/ChaNGa.smp)
==3563697== by 0x88EE0B: CsdScheduleForever (in /anvil/scratch/x-trq/testdata/ChaNGa.smp)
==3563697== by 0x88F234: CsdScheduler (in /anvil/scratch/x-trq/testdata/ChaNGa.smp)
==3563697== by 0x8DDD46: ConverseRunPE(int) (in /anvil/scratch/x-trq/testdata/ChaNGa.smp)
==3563697== by 0x8DDD8A: call_startfn(void*) (in /anvil/scratch/x-trq/testdata/ChaNGa.smp)
==3563697== by 0x50A91CE: start_thread (in /usr/lib64/libpthread-2.28.so)
==3563697== by 0x6CDBE72: clone (in /usr/lib64/libc-2.28.so)
==3563697== Address 0x848f768 is 1,016 bytes inside a block of size 1,024 free'd
==3563697== at 0x4C4AB30: free (in /apps/spack/anvil/apps/valgrind/3.15.0-gcc-11.2.0-u7tvx2t/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==3563697== by 0x7DF84B: CkIndex_CkArray::_call_ckDestroy_void(void*, void*) (in /anvil/scratch/x-trq/testdata/ChaNGa.smp)
==3563697== by 0x7D4ABF: CkDeliverMessageFree (in /anvil/scratch/x-trq/testdata/ChaNGa.smp)
==3563697== by 0x7D8194: _processHandler(void*, CkCoreState*) (in /anvil/scratch/x-trq/testdata/ChaNGa.smp)
==3563697== by 0x88EDCB: CsdScheduleForever (in /anvil/scratch/x-trq/testdata/ChaNGa.smp)
==3563697== by 0x88F234: CsdScheduler (in /anvil/scratch/x-trq/testdata/ChaNGa.smp)
==3563697== by 0x8DDD46: ConverseRunPE(int) (in /anvil/scratch/x-trq/testdata/ChaNGa.smp)
==3563697== by 0x8DDD8A: call_startfn(void*) (in /anvil/scratch/x-trq/testdata/ChaNGa.smp)
==3563697== by 0x50A91CE: start_thread (in /usr/lib64/libpthread-2.28.so)
==3563697== by 0x6CDBE72: clone (in /usr/lib64/libc-2.28.so)
==3563697== Block was alloc'd at
==3563697== at 0x4C495ED: malloc (in /apps/spack/anvil/apps/valgrind/3.15.0-gcc-11.2.0-u7tvx2t/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==3563697== by 0x7D5BCD: CkCreateLocalGroup (in /anvil/scratch/x-trq/testdata/ChaNGa.smp)
==3563697== by 0x7D858E: _processHandler(void*, CkCoreState*) (in /anvil/scratch/x-trq/testdata/ChaNGa.smp)
==3563697== by 0x88EDCB: CsdScheduleForever (in /anvil/scratch/x-trq/testdata/ChaNGa.smp)
==3563697== by 0x88F234: CsdScheduler (in /anvil/scratch/x-trq/testdata/ChaNGa.smp)
==3563697== by 0x8DDD46: ConverseRunPE(int) (in /anvil/scratch/x-trq/testdata/ChaNGa.smp)
==3563697== by 0x8DDD8A: call_startfn(void*) (in /anvil/scratch/x-trq/testdata/ChaNGa.smp)
==3563697== by 0x50A91CE: start_thread (in /usr/lib64/libpthread-2.28.so)
==3563697== by 0x6CDBE72: clone (in /usr/lib64/libc-2.28.so)
==3563697==
The text was updated successfully, but these errors were encountered:
Mathew, I am adding you to this issue simply because you are familiar with ckio. But the issue (probably) has to do with "spring cleaning" garbage collection scheme for broadcasts, applying to deleted chare arrays when it should not.
This crash can be reproduced on stampede3 by compiling ChaNGa, changing to the "teststep" directory and running: ../ChaNGa.smp ++ppn 12 -n 1000 -oi 10 +setcpuaffinity +commap 0,1 +pemap 2-46:2,3-47:2 -binout 6 test_pg.param
The program will run for about 800 seconds before crashing.
Running ChaNGa under valgrind reports that CkArray::springCleaning() is accessing freed memory.
This is with ChaNGa version 3.5 commit v3.5-11-gc7ba57c0 and charm version v7.1.0-devel-321-g606459e74
This is built on an AMD/infiniband machine with mpi-linux-x86_64-smp with gcc v11.2.0 and mvapchi2 2.3.6
Soon after writing an output (using CkIO) valgrind reports errors like:
The text was updated successfully, but these errors were encountered: