Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Facets warning appears even when facets are specified #93

Open
onnyyonn opened this issue Aug 31, 2023 · 5 comments
Open

Facets warning appears even when facets are specified #93

onnyyonn opened this issue Aug 31, 2023 · 5 comments

Comments

@onnyyonn
Copy link

I am searching for all the files matching a given list of parameters. Even though I specify facets while creating a new_context, I still get the facets warning. In case it matters, I am creating a number of new contexts inside loops. Here is an example code snippet for what I am trying to do:

from pyesgf.search import SearchConnection
import numpy as np
import json, os, operator, itertools, tqdm

query_params = {"latest": True,
                "mip_era": "CMIP6",
                "activity_id": "HighResMIP",
                "realm": "atmos",
                "frequency": "6hr",
                }
variables = ['ua', 'uas', 'va', 'vas', 'psl', 'zg']
experiment_ids = ['highres-future', 'hist-1950']
source_ids = ["MPI-ESM1-2-XR", "MPI-ESM1-2-HR"]

def search_esgf(source_id, experiment_id, variable, query_params, savefile):
    try:
        ctx = conn.new_context(source_id=source_id, experiment_id=experiment_id, variable=variable, facets='source_id', **query_params)
        results = ctx.search()
        
        # Convert search results into a list of filename and download url
        files = []
        for i in range(0, len(results)):
            files.extend(list(map(lambda f : {'filename': f.filename, 'url': f.download_url, 'size': f.size, 'checksum': f.checksum, 'checksum_type': f.checksum_type},
                                    results[i].file_context().search())))
        
        # Consolidate all duplicate download links into a single entry
        files = sorted(files, key=operator.itemgetter("filename"))
        files_gr=[]
        for i,g in itertools.groupby(files, key=operator.itemgetter("filename")):
            grp = list(g)
            entry = grp[0].copy()
            entry['url'] = tuple(e['url'] for e in grp)
            files_gr.append(entry)
        
        # Save file list
        with open(savefile, 'w') as f:
            json.dump(files_gr, f, indent = 4)
    
    except KeyboardInterrupt:
        raise KeyboardInterrupt
            
    except Exception as e:
        print(e)

conn = SearchConnection('https://esgf-data.dkrz.de/esg-search', distrib=True)
for sid in tqdm.tqdm(source_ids, desc="Models", position=0):
    for eid in tqdm.tqdm(experiment_ids, desc="Experiments", position=1):
        for v in tqdm.tqdm(variables, desc="Variables", position=2):
            savefile = query_params["activity_id"]+"_"+eid+"_"+sid+"_"+v+"_"+query_params["frequency"]+".json"
            if not os.path.exists(savefile):
                search_esgf(sid, eid, v, query_params, savefile)

Here I am calling the search_esgf function inside loops, and inside that function a new context is being created, with facets='source_id' specified. Why do I still get the facets warning?

@siankg
Copy link

siankg commented Sep 18, 2023

I'm having the same issue!

@JimCircadian
Copy link

@onnyyonn and @siankg the issue is the ordering, because the code is catching the additional unspecified keyword arguments internally, you need to use connection.new_context(facets="source", **query_params) rather than connection.new_context(**query, facets="source"). Make sure the parameters destined for new_context come before the search parameters.

@siankg
Copy link

siankg commented Mar 26, 2024

Thanks for the info! I tried switching it but got the same error message.

ctx = conn.new_context(facets='project,institution_id,source_id,activity_id,experiment_id,variable,frequency', latest = True, institution_id = mod_inf[0], source_id = mod_inf[1], project = 'CMIP6', activity_id='CMIP', experiment_id = 'historical', variable='fBNF,cVeg,nVeg,npp,pr,tas', frequency='mon')

@JimCircadian
Copy link

JimCircadian commented Mar 26, 2024

Thanks for the info! I tried switching it but got the same error message.

ctx = conn.new_context(facets='project,institution_id,source_id,activity_id,experiment_id,variable,frequency', latest = True, institution_id = mod_inf[0], source_id = mod_inf[1], project = 'CMIP6', activity_id='CMIP', experiment_id = 'historical', variable='fBNF,cVeg,nVeg,npp,pr,tas', frequency='mon')

Gave myself a red herring (it disappeared temporarily for me, but it was late at night): will post here again, having an investigate!

@JimCircadian
Copy link

Further to this, I wasn't realising the warning comes from the invocation of file_context or aggregation_context, neither of which accept facets themselves.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants