Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#36 add DM support to field_checksum() and turn off field tiling and allow for that in copy_field() #37

Merged
merged 11 commits into from
Apr 1, 2020

Conversation

arporter
Copy link
Member

Small fix that corrects use of tiling in the copy_field() routine.

@arporter arporter requested review from hiker and sergisiso March 20, 2020 15:02
@arporter
Copy link
Member Author

Ready for review. This is required in order to fix the manual, distributed-memory version of nemolite2d.

Copy link
Collaborator

@sergisiso sergisiso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arporter I made a superficial read through the changes and add a couple of in-line comments.

Does this solve the manual MPI implementation in PSycloneBench without any change in that repository? If that's the case I will give it a go bringing the submodule to this branch to double check it works.

finite_difference/src/field_mod.f90 Show resolved Hide resolved
do jj= field_out%tile(it)%whole%ystart, field_out%tile(it)%whole%ystop
do ji = field_out%tile(it)%whole%xstart, field_out%tile(it)%whole%xstop
field_out%data(ji,jj) = field_in%data(ji,jj)
do it = 1, field_out%ntiles, 1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

!$OMP private(it,ji,jj) ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought loop variables were private by default but I've just checked and that only applies to the loop being parallelised. I've therefore fixed this and checked that it compiles OK with OpenMP enabled (which it didn't but I've sorted that). In doing that I've realised that we don't in fact provide any support for building dl_esm_inf with OpenMP enabled so I've created a new issue (#38) in case we care in the future.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am OK deferring this to #38, but now that you have added the DEFAULT(none) clause I assume the 'loop variables are private' is not necessarily true anymore.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the standard specifies that the loop variable is private and therefore it's not affected by the DEFAULT(none) clause.

@arporter
Copy link
Member Author

Thanks @sergisiso. Yes, it does solve the DM problem without any changes to PSycloneBench (stfc/PSycloneBench#20).

@arporter
Copy link
Member Author

Thanks for spotting the errors Sergi. Ready for another look if Travis is happy...

@arporter
Copy link
Member Author

Given that this is a fix for the DM version of NEMOLite2D, @sergisiso has also suggested we add DM support into the checksum routine. I'll do that as part of this PR.

@arporter arporter changed the title #36 turn off field tiling and allow for that in copy_field() #36 add DM support to field_checksum() and turn off field tiling and allow for that in copy_field() Mar 27, 2020
@arporter
Copy link
Member Author

@sergisiso would you mind trying this again with the Intel compiler and nemolite2d? I'm getting an ICE with gfortran :-(

@sergisiso
Copy link
Collaborator

sergisiso commented Mar 27, 2020

I compiled it with gfortran without any issues and it runs fine: ua and uv show the global value (although the checksum still doesn't return the same value for executions with different number of processors). Which file do you have the ICE on?

@arporter
Copy link
Member Author

It's in time_step_mod.f90:

mpif90 -O0 -g  -I../../common -I/home/kbc59144/Projects/dl_esm_inf/finite_difference/src -I../../../../../shared/dl_timer/src -c time_step_mod.f90
f951: internal compiler error: in read_module, at fortran/module.c:5096

@arporter
Copy link
Member Author

I've just managed to build it OK on my desktop so I must have messed something up on my laptop (or there's a bug related to the specific version of gfortran that I have).

@arporter
Copy link
Member Author

There was also a bug in the decomposition code that affected decompositions consisting of more than one row of PEs. Things are working now though:

nemolite2d_8pes

@arporter
Copy link
Member Author

This is ready for another look @sergisiso.

Copy link
Collaborator

@sergisiso sergisiso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See some in-line comments. The global_sum and its new test looks good.

I still have some problems with the compiler. When compiling without MPI, using just gfortran but with the -fopenmp flag I am getting the following error (note this is needed for the manual_verison/psykal_omp of nemolite2d):

gfortran -Wall -Wsurprising -Wuninitialized -faggressive-function-elimination -Ofast -mtune=native -finline-limit=50000 -fopt-info-all=gnu_opt_report.txt -march=core2 -mtune=core2 -ffree-line-length-none  -fopenmp -c /home/sergi/workspace/euroexa/PSycloneBench/shared/dl_esm_inf/finite_difference/src//grid_mod.f90
/home/sergi/workspace/euroexa/PSycloneBench/shared/dl_esm_inf/finite_difference/src//grid_mod.f90:353:0:

           do ji = xstart-1, xstop+1

Error: ‘xstart’ not specified in enclosing ‘parallel’
/home/sergi/workspace/euroexa/PSycloneBench/shared/dl_esm_inf/finite_difference/src//grid_mod.f90:353:0: Error: enclosing ‘parallel’
/home/sergi/workspace/euroexa/PSycloneBench/shared/dl_esm_inf/finite_difference/src//grid_mod.f90:353:0: Error: ‘xstop’ not specified in enclosing ‘parallel’
/home/sergi/workspace/euroexa/PSycloneBench/shared/dl_esm_inf/finite_difference/src//grid_mod.f90:353:0: Error: enclosing ‘parallel’

This again can be solved by removing the default(none) clause or setting the shared() variables. but I am a bit puzzled as this was already on master and I didn't had this issue before.

Finally the manual_verison/psykal_dm still gives a different checksum for rank > 4 compared to the serial version. If the other mentioned issues are fixed I can approve this PR but we still have to find the cause of PsycloneBench/20 issue.

finite_difference/tests/dist_mem/Makefile Outdated Show resolved Hide resolved
do jj= field_out%tile(it)%whole%ystart, field_out%tile(it)%whole%ystop
do ji = field_out%tile(it)%whole%xstart, field_out%tile(it)%whole%xstop
field_out%data(ji,jj) = field_in%data(ji,jj)
do it = 1, field_out%ntiles, 1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am OK deferring this to #38, but now that you have added the DEFAULT(none) clause I assume the 'loop variables are private' is not necessarily true anymore.

@arporter
Copy link
Member Author

arporter commented Apr 1, 2020

See some in-line comments. The global_sum and its new test looks good.

I still have some problems with the compiler. When compiling without MPI, using just gfortran but with the -fopenmp flag I am getting the following error (note this is needed for the manual_verison/psykal_omp of nemolite2d):

gfortran -Wall -Wsurprising -Wuninitialized -faggressive-function-elimination -Ofast -mtune=native -finline-limit=50000 -fopt-info-all=gnu_opt_report.txt -march=core2 -mtune=core2 -ffree-line-length-none  -fopenmp -c /home/sergi/workspace/euroexa/PSycloneBench/shared/dl_esm_inf/finite_difference/src//grid_mod.f90
/home/sergi/workspace/euroexa/PSycloneBench/shared/dl_esm_inf/finite_difference/src//grid_mod.f90:353:0:

           do ji = xstart-1, xstop+1

Error: ‘xstart’ not specified in enclosing ‘parallel’
/home/sergi/workspace/euroexa/PSycloneBench/shared/dl_esm_inf/finite_difference/src//grid_mod.f90:353:0: Error: enclosing ‘parallel’
/home/sergi/workspace/euroexa/PSycloneBench/shared/dl_esm_inf/finite_difference/src//grid_mod.f90:353:0: Error: ‘xstop’ not specified in enclosing ‘parallel’
/home/sergi/workspace/euroexa/PSycloneBench/shared/dl_esm_inf/finite_difference/src//grid_mod.f90:353:0: Error: enclosing ‘parallel’

This again can be solved by removing the default(none) clause or setting the shared() variables. but I am a bit puzzled as this was already on master and I didn't had this issue before.

I too don't get this problem but then I see from the Makefile that we don't build dl_esm_inf with OpenMP enabled - just the mini-app itself. Therefore we will only build dl_esm_inf with OpenMP if we set F90FLAGS to include "-fopenmp" before we make. Given that we don't have explicit support in dl_esm_inf for building with OpenMP I think we should leave this for the moment.

@arporter
Copy link
Member Author

arporter commented Apr 1, 2020

I've built the psykal_dm version as a serial executable and compared the checksums with the DM version (on 1, 4 and 6 PEs) and with the psykal_serial version. They all seem to agree. Did you make sure your namelists matched up or am I missing something?

@sergisiso
Copy link
Collaborator

@arporter I tried again with the current HEAD of the branch and the same namelist I was using and a few others and now I get the same checksums independently of the rank size. It was probably me messing something up trying to force the OMP flag into the compilation.

So if you can bring FortCL submodule to master I am happy to merge the PR.

@arporter
Copy link
Member Author

arporter commented Apr 1, 2020

@sergisiso Ooh, I've made it go faster? That feels good :-) Ready for another look now.

Copy link
Collaborator

@sergisiso sergisiso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All looks good now. I did remove the performance comment because I realized I was reporting wrong times, but the performance degradation I observed some time ago is now gone.

@sergisiso sergisiso merged commit 08124ca into master Apr 1, 2020
@sergisiso sergisiso deleted the 36_fix_field_copy branch April 1, 2020 14:28
@sergisiso sergisiso mentioned this pull request Apr 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants