Add parse_retrieved_files option for the PP plugin. #1029

yakutovicha · 2024-05-22T13:50:34Z

fixes #945

The option enables switching on/off the parsing of the output files produced by the PP plugin.

AndresOrtegaGuerrero · 2024-07-31T13:40:16Z

@yakutovicha I tried testing it but it doesnt work for me i used

from aiida.plugins import CalculationFactory
from aiida import load_profile
import ipywidgets as ipw

load_profile(); # noqa: E402

from aiida import orm
pp_retrieved = orm.load_node(8849) # Replace for your node
PpCalculation = CalculationFactory("quantumespresso.pp")

parameters = {
"INPUTPP": {
"plot_num": 0,
},
"PLOT": {
"iflag": 3,
}
}

pp_parameters = orm.Dict(parameters)

code = orm.load_code('pp-7.2@daint-gpu') #Replace for your comp

builder = code.get_builder()

builder.parameters = pp_parameters
builder.parent_folder = pp_retrieved

builder.metadata.options = {
'resources': {
'num_machines': 1,
"num_mpiprocs_per_machine": 1,
"num_cores_per_mpiproc": 1,
},
#'parse_retrieved_files': False,
'max_wallclock_seconds': 10800,
'withmpi': True,

}

from aiida.engine import run
results, node = run.get_node(builder)

How are you testing this PR ?

AndresOrtegaGuerrero · 2024-07-31T13:41:01Z

@yakutovicha could you update the PR? , somehow some test are failing maybe with the update is fixed

AndresOrtegaGuerrero · 2024-07-31T16:09:07Z

@yakutovicha I tried testing it but it doesnt work for me i used

from aiida.plugins import CalculationFactory from aiida import load_profile import ipywidgets as ipw

load_profile(); # noqa: E402

from aiida import orm pp_retrieved = orm.load_node(8849) # Replace for your node PpCalculation = CalculationFactory("quantumespresso.pp")

parameters = { "INPUTPP": { "plot_num": 0, }, "PLOT": { "iflag": 3, } }

pp_parameters = orm.Dict(parameters)

code = orm.load_code('pp-7.2@daint-gpu') #Replace for your comp

builder = code.get_builder()

builder.parameters = pp_parameters builder.parent_folder = pp_retrieved

builder.metadata.options = { 'resources': { 'num_machines': 1, "num_mpiprocs_per_machine": 1, "num_cores_per_mpiproc": 1, }, #'parse_retrieved_files': False, 'max_wallclock_seconds': 10800, 'withmpi': True,

}

from aiida.engine import run results, node = run.get_node(builder)

How are you testing this PR ?

The solution for this submission code was to add a label to the metadata , I check the code with the main branch and ppcalculation runs, but testing the PR it doesnt

src/aiida_quantumespresso/calculations/pp.py

src/aiida_quantumespresso/parsers/pp.py

sphuber · 2024-08-26T15:03:14Z

@yakutovicha what is the use case for not wanting to parse output files?

yakutovicha · 2024-08-27T07:41:40Z

@yakutovicha what is the use case for not wanting to parse output files?

Sometimes we do not want to save cube files as they are too big.

sphuber

Thanks @yakutovicha

Sometimes we do not want to save cube files as they are too big.

But then what is the point of running the calculation? Are you still using the output files but directly on the scratch? Or is there still some other partial information that is parsed that is of use?

Regardless of the use case, I wonder if we could come up with a better name for the option. The current name seems to suggest that there is no parsing going on, but that is not the case. What the option is disabling is the parsing/storing of any output files besides the stdout, which is still parsed regardless of the value for this bew input option.

In the code they are referred to as "output data files", so perhaps we can use something like:

spec.input('metadata.options.store_parsed_data_files', valid_type=bool, default=True, help='When set to `False`, only the stdout is retrieved and parsed. All other produced data files are not retrieved, parsed or stored.')

sphuber · 2024-08-27T09:10:49Z

src/aiida_quantumespresso/parsers/pp.py

-            self.out('output_data', data_parsed[0][1])
-        else:
-            self.out('output_data_multiple', dict(data_parsed))
+        if self.node.base.attributes.get('parse_retrieved_files', True):


Instead of moving this entire block in a conditional, we can simply exit early. All the code starting from line 67 is only necessary to parse the additional output data. So I propose we simply add

if not self.node.base.attributes.get('parse_retrieved_files', True): return self.exit(logs=logs)

after line 65.

sphuber · 2024-08-27T09:11:46Z

src/aiida_quantumespresso/calculations/pp.py

@@ -83,6 +83,7 @@ def define(cls, spec):
        spec.input('metadata.options.parser_name', valid_type=str, default='quantumespresso.pp')
        spec.input('metadata.options.withmpi', valid_type=bool, default=True)
        spec.input('metadata.options.keep_plot_file', valid_type=bool, default=False)
+        spec.input('metadata.options.parse_retrieved_files', valid_type=bool, default=True)


Please add a docstring as to what this does

yakutovicha · 2024-08-27T13:30:54Z

But then what is the point of running the calculation? Are you still using the output files but directly on the scratch? Or is there still some other partial information that is parsed that is of use?

That depends on the specific needs of a user. We develop tools allowing them to download and look at low-res copies of the cube files to then decide how to render the original. Another alternative is to stash the file for further use.

In the original comment I should have written "... to save cube files locally as they are too big".

yakutovicha · 2024-08-27T13:58:03Z

Regardless of the use case, I wonder if we could come up with a better name for the option. The current name seems to suggest that there is no parsing going on, but that is not the case. What the option is disabling is the parsing/storing of any output files besides the stdout, which is still parsed regardless of the value for this bew input option.

I understand the name is confusing, but the option I add is really about parsing. Essentially, 2 options affect the parsing/storage of the mesh files: the existing keep_plot_file and the new parse_retrieved_files. Ultimately, you have 4 combinations:

parse and keep
do not parse, keep
parse, do not keep
do not parse and do not keep

In the 4th case, the mesh files are not even downloaded.

I guess we can call the option parse_plot_file to be compatible with the previous option. What do you think?

sphuber · 2024-08-29T08:47:29Z

I understand the name is confusing, but the option I add is really about parsing. Essentially, 2 options affect the parsing/storage of the mesh files: the existing keep_plot_file and the new parse_retrieved_files. Ultimately, you have 4 combinations:

Thanks @yakutovicha , that clarifies things a lot.

I guess we can call the option parse_plot_file to be compatible with the previous option. What do you think?

I think that would already help things a lot, since it makes it immediately clear that these options affect the same files.

Unfortunately, the naming of the existing keep_plot_file is still a bit misleading since it seems to suggest it concerns a single output file, but depending on the calculation's inputs, there can be many. Some raw data output files and a file with a script to visualize them. If I were to name these options from scratch, I would go for something like: store_raw_output_data and parse_raw_output_data with them both being True by default. The raw could perhaps be omitted, but I have added it to suggest that the normal stdout and stderr are kept regardless.

I think it makes sense to properly name the options if we are adding one and deprecating keep_plot_file if it makes sense to rename it. Thoughts? Also pinging @ConradJohnston since he chose the original name for the keep_plot_file.

ConradJohnston · 2024-08-29T16:49:24Z

Hi @sphuber,

Thanks for looping me.

4 or 5 years is a long time in git blame but my motivation was probably to mirror the nomenclature of QE PP.x itself, rather than to be an activist about fixing what things are called in QE. But yes, the plot_file (filplot) is the intermediate file that is created that is then converted by PP (internally) into some other format (like cube).

If you guys want to rename, happy for you to go for it, but one needs to make clear the link with the PP.x docs and how PP works. Otherwise, the experienced QE user will just be frustrated by these new mystery AiiDA-QE options that are different to what they've been using for many years.

sphuber · 2024-08-29T17:41:11Z

Very fair point @ConradJohnston . Mirroring the API/naming of the tool itself as close as possible definitely has its benefits. But what I am mostly worried about is that we are not really talking about one file. There is a plot file and a data file in most cases, is there not? At least looking at the calcjob/parser implementation it expects (multiple) pairs of files of filplot and filout. That is why I was proposing to make it plural in any case.

yakutovicha · 2024-09-04T09:59:40Z

To be 100% clear, do we agree on what behaviour we look for? From my perspective, it should be this:

The stdout file is always kept and parsed.
Parsing/keeping all the other pp output files (including plot files, mesh files, etc...) is optional and controlled by 2 parameters (keep_...tbd, parse_...tbd).

If that is the wanted behaviour, I am fine with any naming convention - I have no strong opinion here.

sphuber · 2024-09-04T18:46:55Z

Yes, that is how I currently understand it as well @yakutovicha and that makes sense to me. I would just suggest to choose a name that makes it somewhat clear that it concerns the "data" files and not the stdout, because the latter is always parsed and kept. That is my only gripe with the current naming.

yakutovicha · 2024-10-10T08:01:43Z

@sphuber I renamed the parameters to keep_data_files and parse_data_files since they deal with the data files produced by pp (as opposed to the standard output). Please let me know if it makes sense, otherwise, choose any option you find more reasonable.

The other two things we need to do before merging:

Consider if we need any backwards compatibility here.
Add tests for the

I will do them after we agree on the naming.

sphuber · 2024-10-13T21:00:47Z

Thanks @yakutovicha I am onboard with the latest naming. In terms of backwards compatibility, I think it would be good to keep the old option and simply print a deprecation warning if it is set (set the default to None to check if a value is passed).

Add parse_retrieved_files option for the PP plugin.

0a40d5a

The option enables switching on/off the parsing of the output files produced by the PP plugin.

yakutovicha requested a review from AndresOrtegaGuerrero May 22, 2024 13:50

PNOGillespie mentioned this pull request Jun 6, 2024

XspectraCrystalWorkChain: Enable Symmetry Data Inputs #1028

Merged

AndresOrtegaGuerrero reviewed Jul 31, 2024

View reviewed changes

src/aiida_quantumespresso/calculations/pp.py Outdated Show resolved Hide resolved

AndresOrtegaGuerrero reviewed Aug 1, 2024

View reviewed changes

src/aiida_quantumespresso/parsers/pp.py Outdated Show resolved Hide resolved

yakutovicha added 3 commits August 26, 2024 14:38

Fix wrong identation.

e5c8200

Merge branch 'main' into fix/945-if-requested-do-not-produce-output-data

2725431

Fix checking the value of the parse_retrieved_files attribure.

ef05535

yakutovicha requested review from mbercx and sphuber August 26, 2024 12:52

sphuber requested changes Aug 27, 2024

View reviewed changes

yakutovicha added 2 commits October 10, 2024 09:56

renamings

14c098d

Merge branch 'main' into fix/945-if-requested-do-not-produce-output-data

5a80c57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add parse_retrieved_files option for the PP plugin. #1029

Add parse_retrieved_files option for the PP plugin. #1029

yakutovicha commented May 22, 2024

AndresOrtegaGuerrero commented Jul 31, 2024

AndresOrtegaGuerrero commented Jul 31, 2024

AndresOrtegaGuerrero commented Jul 31, 2024

sphuber commented Aug 26, 2024

yakutovicha commented Aug 27, 2024

sphuber left a comment

sphuber Aug 27, 2024

sphuber Aug 27, 2024

yakutovicha commented Aug 27, 2024 •

edited

Loading

yakutovicha commented Aug 27, 2024 •

edited

Loading

sphuber commented Aug 29, 2024

ConradJohnston commented Aug 29, 2024

sphuber commented Aug 29, 2024

yakutovicha commented Sep 4, 2024 •

edited

Loading

sphuber commented Sep 4, 2024

yakutovicha commented Oct 10, 2024

sphuber commented Oct 13, 2024

Add parse_retrieved_files option for the PP plugin. #1029

Are you sure you want to change the base?

Add parse_retrieved_files option for the PP plugin. #1029

Conversation

yakutovicha commented May 22, 2024

AndresOrtegaGuerrero commented Jul 31, 2024

AndresOrtegaGuerrero commented Jul 31, 2024

AndresOrtegaGuerrero commented Jul 31, 2024

sphuber commented Aug 26, 2024

yakutovicha commented Aug 27, 2024

sphuber left a comment

Choose a reason for hiding this comment

sphuber Aug 27, 2024

Choose a reason for hiding this comment

sphuber Aug 27, 2024

Choose a reason for hiding this comment

yakutovicha commented Aug 27, 2024 • edited Loading

yakutovicha commented Aug 27, 2024 • edited Loading

sphuber commented Aug 29, 2024

ConradJohnston commented Aug 29, 2024

sphuber commented Aug 29, 2024

yakutovicha commented Sep 4, 2024 • edited Loading

sphuber commented Sep 4, 2024

yakutovicha commented Oct 10, 2024

sphuber commented Oct 13, 2024

yakutovicha commented Aug 27, 2024 •

edited

Loading

yakutovicha commented Aug 27, 2024 •

edited

Loading

yakutovicha commented Sep 4, 2024 •

edited

Loading