Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to distinguish an object is "large" #12

Open
HenryLeongStat opened this issue Aug 17, 2018 · 6 comments
Open

How to distinguish an object is "large" #12

HenryLeongStat opened this issue Aug 17, 2018 · 6 comments

Comments

@HenryLeongStat
Copy link
Member

HenryLeongStat commented Aug 17, 2018

As we discussed before, some large objects should be transferred by I/O on disk instead of using memory. However, how can we distinguish the threshold?

  1. "Largeness" is different from different machines
  2. Does "largeness" mean number of elements in an object? Or size of a object saving as file?

data.frame, list, matrix etc. are transferred by I/O on disk (i.e. using feather). I suppose the only large objects not using feather to be transferred are vector or string?

@BoPeng
Copy link
Contributor

BoPeng commented Aug 17, 2018

String cannot be large because no one holds book level string in a variable. Vector is the only concern because it can appear from numeric computations.

The more difficult case is actually dictionaries because it can hold large items.

@HenryLeongStat
Copy link
Member Author

Let me see what's the limitation in my 8GB ram laptop. 😄

@BoPeng
Copy link
Contributor

BoPeng commented Aug 17, 2018

Yes, that is something we should do, namely stress testing the magic and see when it breaks. Basically you can generate larger and larger arrays and see how well they can be passed around.

@HenryLeongStat
Copy link
Member Author

large_dict = {}
for i in range(100000000):
    large_dict[i] = "I am the string in every box"
%use R
%get large_dict

After about 10 mins, memory is exhausted.

large_dict = {}
for i in range(10000000):
    large_dict[i] = "I am the string in every box"
%use R
%get large_dict

Also about 10 mins, memory is exhausted.

The memory used by python is not that much, but R used >8GB after 10 mins. (There is only 8GB memory on my laptop)

@HenryLeongStat
Copy link
Member Author

I also noticed that when I shut down the kernel, process R is still there and memory used by it isn't released.

@HenryLeongStat
Copy link
Member Author

And even though I shut down jupyter, those R processes are still there and taking all the memory.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants