Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

InternalDataStore.cache_size hangs #6

Open
elainethale opened this issue Apr 18, 2019 · 3 comments
Open

InternalDataStore.cache_size hangs #6

elainethale opened this issue Apr 18, 2019 · 3 comments

Comments

@elainethale
Copy link
Contributor

Reproduce on Peregrine with .ini file containing:

[local_cache]
root_path = /scratch/mrossol/Smart_DS
size = 1000
threads = 12

and command

python R2PD/cli.py -ds $CONFIG_FILE -n 28.442 -80.812 -t solar -te 1/1/2000 1/1/2020 -o $OUT_DIR power -c 10.0 actual

If you use the elaine_debug branch, this will print some debug messages about progress (lack thereof) to the console.

@elainethale
Copy link
Contributor Author

If I comment out the part of main that checks the cache size (cli.py lines 91 to 98), then I hit essentially the same problem in datastore.py line 639 (line numbers refer to elaine_debug branch). That is, I see:

(r2pd) [ehale@n2184 FAASSTeR]$ python ../R2PD/R2PD/cli.py -ds /home/ehale/r2pd_config.ini -n 28.442 -80.812 -t solar -te 1/1/2000 1/1/2020 -o reserves/data_out/ power -c 10.0 actual
DEBUG:__main__:Connecting to DRPower
DEBUG:R2PD.datastore:Connecting to InternalDataStore
DEBUG:__main__:Determining local cache status
DEBUG:__main__:Getting nodes
DEBUG:__main__:Getting temporal parameters
/home/ehale/R2PD/R2PD/nearestnodes.py:52: FutureWarning: Method .as_matrix will be removed in a future version. Use .values instead.
  lat_lon = r_left.as_matrix(['latitude', 'longitude'])
/home/ehale/R2PD/R2PD/nearestnodes.py:60: FutureWarning: Method .as_matrix will be removed in a future version. Use .values instead.
  node_lat_lon = nodes_left.as_matrix(['latitude', 'longitude'])
DEBUG:R2PD.datastore:Making sure there is room in the local cache for 0.08 GB

which means the code is just sitting on

self._local_cache.test_cache_size(download_size)

@MRossol
Copy link
Contributor

MRossol commented Apr 19, 2019

I never envisioned R2PD to reference the full data repository. cache_size uses os.path.getsize(file_name) on every file in the LocalCache to compute the current local Cache size. Since you are pointing to the full solar dataset on Peregrine this will take FOREVER as it has to touch 150k files.

@MRossol
Copy link
Contributor

MRossol commented Apr 19, 2019

See commit 2e493c4. I now estimate the cache size from the cache_meta. I also updated the logic to re-scan the cache less frequently.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants