Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model crashing in winter months #384

Open
mpanagi opened this issue Feb 22, 2019 · 23 comments
Open

Model crashing in winter months #384

mpanagi opened this issue Feb 22, 2019 · 23 comments

Comments

@mpanagi
Copy link

mpanagi commented Feb 22, 2019

I am running the model with the following parameters:

2000 number of steps
900 step size (seconds)
2 species interpolation method (pw constant = 1, pw linear = 2)
2 conditions interpolation method (pw constant = 1, pw linear = 2)
3600 rates output step size (seconds)
0 model start time (seconds)
0 jacobian output step size (seconds)
39.975 latitude (degrees)
-116.377 longitude (degrees)
01 day
05 month
2016 year
1800 reaction rates output step size (seconds)rm

and it works fine, but when i change the month to 10,11 or 12 the model is crashing.

Any ideas why this is happening?

@spco
Copy link
Collaborator

spco commented Feb 25, 2019

Hi @mpanagi - could you tell me are you running the latest master version? What .fac file are you running, and are all the other inputs from e.g. model/configuration the original ones? What is the output that you see, including any error message if there is one?

@rs028 rs028 added the bug label Feb 25, 2019
@mpanagi
Copy link
Author

mpanagi commented Mar 5, 2019

Hi,

I'm using the mcm_example.fac because i am only running for photolysis rates and all the other inputs are the originals.

The output looks looks the same but the model doesnt go through all the dates that i am setting it up to go. It crashes very early on.
For example below its the run with the error:

......
time = 85500
time = 86400

[CVODE ERROR] CVode
At t = 86450.5 and h = 0.00209612, the error test failed repeatedly or with |h| = hmin.

ier POST FCVODE()= -3
time = 86451

SUNDIALS_ERROR: FCVODE() returned ier = -3
Linear Solver returned ier = 0

**
Thank you

@spco
Copy link
Collaborator

spco commented Mar 5, 2019

Hmm, I've tried this just now on my Mac, with a clean install of the latest master 1c074ab, and I can't reproduce this with 10, 11, or 12 month. Could you confirm which commit of the master branch you're using? (Just give me the first line of git log)? Could you also supply a copy of model/output/reactionRates/1800 and model/output/reactionRates/84600 please? I'd like to dig into whether we're really running the same thing.

@mpanagi
Copy link
Author

mpanagi commented Mar 7, 2019

I will try to install it again and see whether that was the problem.

I will let you know soon.

@rs028
Copy link
Collaborator

rs028 commented Mar 8, 2019

i have also tested it (using tools/mcm_example.fac and default settings on everything except model.parameters) and no problem.

@rs028
Copy link
Collaborator

rs028 commented Apr 3, 2019

Update: if DEC = CALC in model/configuration/environmentVariables.config then the model crashes for me as well.

This is probably a numerical problem with the calculation of DEC.

@rs028
Copy link
Collaborator

rs028 commented May 19, 2019

@spco We may actually need to sort this out to complete the model runs for the paper. Are you in the position to take a look and see if it is a quick fix? If not we'll figure something out.

@spco
Copy link
Collaborator

spco commented May 21, 2019

I can't immediately reproduce this. I take the latest master (bc4cd2c), change the model/configuration/model.parameters to match @mpanagi's, and set DEC CALC in model/configuration/environmentVariables.config. I get

 ...
 time = 1796400
 time = 1797300
 time = 1798200
 time = 1799100
 time = 1800000

------------------
 Final statistics
------------------
 No. steps = 22354   No. f-s = 25773   No. J-s = 30759   No. LU-s = 3247
 No. nonlinear iterations = 25771
 No. nonlinear convergence failures = 0
 No. error test failures = 1187

 Runtime = 16
 Deallocating memory.
Note: The following floating-point exceptions are signalling: IEEE_UNDERFLOW_FLAG IEEE_DENORMAL

The FPE warnings are often seen, and while they are something we should investigate in time, it's not a big issue.

This is on my Mac - I will retry on Alice.

@spco
Copy link
Collaborator

spco commented May 21, 2019

For reference, the output of the above run starts with

AtChem2 v1.2-dev

-------------
 Directories
-------------
 Model dir is: model
 Output dir is: model/output
 Configuration dir is: model/configuration
 MCM dir is: mcm
 Species Constraints dir is: model/constraints/species
 Environment Constraints dir is: model/constraints/environment
 Photolysis Constraints dir is: model/constraints/photolysis
 Shared library is: model/configuration/mechanism.so

-----------------------
 Species and reactions
-----------------------
 Number of Species   = 29
 Number of Reactions = 71

 Size of lhs = 114
 Size of rhs = 100

 Reading reactants (lhs) from mechanism.reac...
 Reading products (rhs) from mechanism.prod...
 Finished reading lhs and rhs data.

 Reading species names from mechanism.species...
 Finished reading species names.

 Reading initial concentrations...
      1 CH4              4.900E+13
 ...
      4 NO2              2.400E+11
 Finished reading initial concentrations.

----------------------------------------
 Species requiring detailed rate output
----------------------------------------
 Reading which species require detailed rate output...
      1 OH        
      2 HO2       
 Finished reading which species require detailed rate output.
 Species requiring detailed rate output (number of species found): 2

 Reading ro2 numbers from mechanism.ro2...
 Finished reading ro2 numbers.
 Reading solver parameters from file...
 ------------------
 Solver parameters:
 ------------------
            atol:   1.000E-03
            rtol:   1.000E-04
       deltaMain:   1.000E-04
        lookBack:         100
         maxStep:   1.000E+02
 preconBandUpper:         750
 preconBandLower:         750
      solverType: SPGMR + Banded Preconditioner 
 ------------------

 Finished reading solver parameters from file.

 Reading model parameters from file...
 -----------------
 Model parameters:
 -----------------
                                   number of steps:        2000
                               step size (seconds):   0.900E+03
                      species interpolation method: piecewise linear 
                   conditions interpolation method: piecewise linear 
                               ratesOutputStepSize:        3600
                   reaction rates output step size:        1800
                                    modelStartTime:           0
                            jacobianOutputStepSize:           0
                                          latitude:   0.400E+02
                                         longitude:  -0.116E+03
                                    day/month/year:   1/ 5/2016
 -----------------

 Finished reading model parameters from file.

---------------------
 Species of Interest
---------------------
 Reading concentration output from file...
 Finished reading concentration output from file.
 Output required for concentration of 8 species:
      1 CH4       
 ...
      8 CH3O2     

------------
 Photolysis
------------
 Reading photolysis numbers from file...
      1
 ...
     61
 Finished reading photolysis numbers.
 Number of photolysis numbers: 35
 Looking for photolysis constants file...
 Checking that photolysis constants exist in file...
 Photolysis constants file is empty.
 No photolysis constants applied, so trying constrained photolysis rates file...
 Looking for photolysis constraints file...
 Checking that photolysis constraints exist in file...
 Photolysis constraint file is empty, so all photolysis rates will be calculated.
 Reading all photolysis rates from file...
      1      6.073E-05      1.743E+00      4.740E-01    J1      1.000E+00
 ...
     61      7.537E-04      4.990E-01      2.660E-01   J61      1.000E+00
 Finished reading all photolysis rates.
 Number of all photolysis rates: 35


-----------------------
 Environment variables
-----------------------
 Reading environment variables...
 Number of environment variables: 10
 1      TEMP                   298.15
 2      PRESS                 1013.25
 3      RH                    NOTUSED
 5      H2O                  3.91e+17
 5      DEC                      CALC
 6      BLHEIGHT              NOTUSED
 7      DILUTE                NOTUSED
 8      JFAC                  NOTUSED
 9      ROOF                     OPEN
 Finished reading environment variables.

 Checking for constrained environment variables...
 Finished checking for constrained environment variables.


-------------
 Constraints
-------------
 Counting the variable-concentration species to be constrained (in file speciesConstrained.config)...
 Finished counting the names of variable-concentration constrained species.
 Number of names of variable-concentration constrained species: 0
 Counting the fixed-concentration species to be constrained (in file speciesConstant.config)...
 Finished counting the names of fixed-concentration constrained species.
 Number of names of fixed-concentration constrained species: 0
 Setting size of constraint arrays, n = 0
 Skipped reading the names of variable-concentration constrained species
 Reading concentration data for variable-concentration constrained species...
 Reading in the names and concentration of the fixed constrained species (in file speciesConstant.config)...
 Finished reading in the names and concentration of fixed-concentration species.
 Finished reading constrained species.
 Initialising concentrations of constrained species...
 Finished initialising concentrations of constrained species.

---------------
 Problem stats
---------------
                        neq = 29
 numberOfConstrainedSpecies = 0
                         t0 =       0.000E+00

 setting maxnumsteps ier = 0
 setting maxstep ier = 0

If either of you is able to post the full starting output of your runs in a similar way, that might highlight the difference if any.

@spco
Copy link
Collaborator

spco commented May 21, 2019

Ok, I've tried with a clean build on Alice too. Still no luck in reproducing this. @rs028, do you get the same error as @mpanagi when you run?

What am I missing, beyond the changes to model/configuration/{environmentVariables.config,model.parameters}, to reproduce this?

Using the system gfortran:

gfortran --version
GNU Fortran (GCC) 4.8.5 20150623 (Red Hat 4.8.5-36)

@spco
Copy link
Collaborator

spco commented May 21, 2019

@rs028 , @mpanagi please ignore the above - I wasn't changing the month to 10/11/12. I can reproduce this on both machines now. I'll see what I can deduce, although it looks like it may just be ill conditioning 😟

@spco spco mentioned this issue May 21, 2019
@spco
Copy link
Collaborator

spco commented May 21, 2019

So, the way to get around this seems to be to up the rtol value in solver.parameters. Doing so, changing this value to 1.0e-02 from 1.0e-04, I can happily run the affected example to completion.

Whether this is the right thing to do is a good question, but I'm not sure what the answer is. As I don't remember there being a particular rationale behind the currently chosen default of 1.0e-04, it may be that we're just being far too fussy here with our requirements.

A very brief test didn't highlight any particular differences to the output that looked noteworthy, but I have not investigated in detail.

So, in short, I think this is just because rtol = 1.0e-04 is too stringent a requirement, and it's fine to loosen it.

As to why this only happens in the winter months, I'm not quite sure, but somewhere in our calculations our system gets a bit ill-conditioned because of DEC or something in that realm. Relaxing our tolerances allows us to ride smoothly over the difficult parts, and (hopefully) this has no real-world effect. 😄

@rs028
Copy link
Collaborator

rs028 commented May 21, 2019

Interesting. Thanks a bunch for looking into this. I am not too familiar with the inner workings of CVODE, tbh, so I am not really sure what (if anything) it means for the model output.

Should we consider this resolved or leave it open for a deeper look in the future (in this case it should probably be renamed)?

@spco
Copy link
Collaborator

spco commented May 22, 2019

I would personally consider it resolved, but perhaps make a note in #265 that this could also feed in - we don't have a handle on what is 'good enough' accuracy of the solver, which plays a part in the numerical stability.

@rs028
Copy link
Collaborator

rs028 commented May 22, 2019

Sounds good to me. Thanks a lot for looking into it so quickly.

@mpanagi
Copy link
Author

mpanagi commented Jun 6, 2019

Hi Sam,

Did you mean change the rtol = 1.0e-02?

Because i changed the atol but didnt work but when i changed rtol it worked.

Thanks.

@spco
Copy link
Collaborator

spco commented Jun 6, 2019

Apologies, @mpanagi you are entirely correct!

@spco
Copy link
Collaborator

spco commented Jun 6, 2019

I will edit the previous comment so as not to confuse later readers - for them, please note that in #384 (comment) I errantly talked about atol rather than rtol.

@xcy12
Copy link

xcy12 commented Sep 2, 2019

Hi,

I'm using the mcm_example.fac because i am only running for photolysis rates and all the other inputs are the originals.

The output looks looks the same but the model doesnt go through all the dates that i am setting it up to go. It crashes very early on.
For example below its the run with the error:

......
time = 85500
time = 86400
[CVODE ERROR] CVode
At t = 86450.5 and h = 0.00209612, the error test failed repeatedly or with |h| = hmin.
ier POST FCVODE()= -3
time = 86451
SUNDIALS_ERROR: FCVODE() returned ier = -3
Linear Solver returned ier = 0

**
Thank you

i also met this problem.

@rs028
Copy link
Collaborator

rs028 commented Sep 4, 2019

@xcy12 did you manage to solve it by changing rtol as suggested?

@xcy12
Copy link

xcy12 commented Sep 5, 2019

@xcy12 did you manage to solve it by changing rtol as suggested?

yes, I tried this solution and it works! Thanks very much!

@xcy12
Copy link

xcy12 commented Nov 2, 2019

@xcy12 did you manage to solve it by changing rtol as suggested?

Hi, this problem was solved by changing rtol during the test using example mechanism. However, when i use complicated mechanism (mechanism for 67 VOCs), this problem came back. Is there some new solution for this? Thank you.

[CVODE ERROR] CVode
At t = 690062 and h = 9.65393e-05, the error test failed repeatedly or with |h| = hmin.

ier POST FCVODE()= -3
time = 690062

@rs028
Copy link
Collaborator

rs028 commented Nov 12, 2019

Reopening this - with a more general title - as it seems people are still having problems.

@rs028 rs028 reopened this Nov 12, 2019
@rs028 rs028 changed the title Model crashing during October, November and December Model crashing in winter months Nov 12, 2019
@rs028 rs028 added the question label Aug 23, 2021
@rs028 rs028 moved this to Compilation and Execution Issues in Roadmap Sep 22, 2022
@rs028 rs028 added this to Roadmap Sep 22, 2022
@rs028 rs028 added this to the version 1.4 milestone Oct 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Compilation and Execution Issues
Development

No branches or pull requests

4 participants