-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Understanding memory consumption #1003
Comments
In order to get a better view of system metrics over time, we need to set up our LXD containers to export data to Prometheus, which will be graphed by Grafana. The Grafana dashboard should be accessible on the navigator.oceansdata domain but perhaps behind a username/password combo for safety. In addition to tracking the LXD metrics, our Flask app must be configured to expose data for Prometheus to scrape at defined intervals. |
It would also be incredibly helpful if |
That's quite the improvement! Another good reason to drop the lru decorator. Any idea what that spike at ~400s might be? |
Matplotlib I believe |
Forced an exception to be raised so I could access the python debugger and here's a dump of the memory summary at 2.3GB RSS. All memory numbers below are in bytes
|
A random tile (produced by using gunicorn
./launch_web_service.sh
):`/api/v1.0/tiles/gaussian/25/10/EPSG:3857/giops_day/votemper/2281521600/0/-5,30/3/3/3.png` produces the following memory profile
A random transect (produced by using gunicorn
./launch_web_service.sh
)`/api/v1.0/plot/?query=%7B%22colormap%22%3A%22default%22%2C%22dataset%22%3A%22giops_day%22%2C%22depth_limit%22%3Afalse%2C%22linearthresh%22%3A200%2C%22path%22%3A%5B%5B53.94162670955251%2C-48.65234553813935%5D%2C%5B44.5103249408252%2C-60.86914241313934%5D%5D%2C%22plotTitle%22%3A%22%22%2C%22scale%22%3A%22-5%2C30%2Cauto%22%2C%22selectedPlots%22%3A%220%2C1%2C1%22%2C%22showmap%22%3Atrue%2C%22surfacevariable%22%3A%22none%22%2C%22time%22%3A2281564800%2C%22type%22%3A%22transect%22%2C%22variable%22%3A%22votemper%22%7D&format=json` produces the following memory profile
Observations:
186MB * nproc * WORKER_THREADS
. On my laptop that's186 * 8 * 1 = 1488MB
.find_nearest_grid_point
function which allocates a whopping 31.8MiB (~33.3MB) for every tiling request. For comparison, the actual netCDF data loaded from disk to render one tile only occupies 9MiB (~9.4MB). All of this memory should be reclaimed by the garbage collector once the request returns back to the browser so this isn't leaking memory.Running a fresh single-threaded server via
mprof run -C -M runserver.py
to capture all child process and forks, and navigating to localhost to let a bunch of tiles plot. Note that the base memory usage of a flask worker when runningrunserver.py
is higher than via gunicorn due to debug stuff. Note the y-axis is denoted in MiB which is < MB.Using gUnicorn and
mprof run -C -M launch-web-service.sh
to capture all child process and forks. Note the y-axis is denoted in MiB which is < MB.Both of these graphs show memory usage trending higher. I think we need to track this data over a longer period of time on production. I wonder if there's data being left around in memory with some weak refs that the garbage collector isn't able to get rid of.
The text was updated successfully, but these errors were encountered: