-
Currently I am using I am using the latest development version of uproot as of the time of posting at commit 7e3353c. Here's the code I'm using: arrays = uproot.iterate(
data.get_list(tree), # a list of xrootd paths formatted like this: "{path}:{ROOT_tree_location}"
filter_name=branches, # a set of 35 column / branch names
library="pd", # my data is not jagged, so I'm using pandas
)
total = 0
total_saved = 0
with IncrementalPqWriter(output) as selection: # lets me write row groups to a parquet file from dataframes
for df in arrays:
mask = xicp_presel_mask(df, use_rectangular_cuts=use_rect_cuts) # returns boolean mask
df["foldNumber"] = df["eventNumber"] % 2
selection.write_from_df(df[mask])
total += df.shape[0]
total_saved += np.sum(mask)
log.info(
f" {np.sum(mask)} / {df.shape[0]} events saved. "
f"Total: {total_saved} saved / {total} processed"
)
I think this error is maybe related to #281. In |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
MemoryError and If it's running out of memory "legitimately," which is to say, you just don't have enough memory to do what you're trying to do, then it is difficult to diagnose on a different computer with a different amount of memory available. If it's a memory leak (which #281 was not), then it happens on any computer if you wait long enough. What you're doing in your code doesn't involve Pandas, just some masking, so you can remove a big performance and memory hog by replacing The Python garbage collector is supposed to get invoked when you start running low on memory, though it could only do that if the limit is approached while the Python code is active (i.e. a MemoryError, not Actually, thinking about this more deeply, the arrays you make in one step of iteration ( Beyond that, the resource utilization of XRootD are a bit of a mystery to me. @nsmith- recommends uproot.open.defaults["xrootd_handler"] = uproot.source.xrootd.MultithreadedXRootDSource and this should probably become the default. The current default does a vector-read, which has caused a number of problems. Beyond that, maybe break your work up into smaller processes. (That's what I'd do if I run out of all other options.) I'm going to convert this into a Discussion because it's not really a bug, that I know of. |
Beta Was this translation helpful? Give feedback.
MemoryError and
std::bad_alloc
are pretty similar: the first is Python's exception, the second is C++'s exception. The allocation that actually fails to allocate memory, whether it's in Python or C++, is not where the real problem lies. It's a matter of other things using up the memory.If it's running out of memory "legitimately," which is to say, you just don't have enough memory to do what you're trying to do, then it is difficult to diagnose on a different computer with a different amount of memory available. If it's a memory leak (which #281 was not), then it happens on any computer if you wait long enough.
What you're doing in your code doesn't involve Pandas, just some masking, so …