Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MONGOID-5411 allow results to be returned as demongoized hashes #5877

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

jamis
Copy link
Contributor

@jamis jamis commented Oct 9, 2024

Mongoid now supports a new raw directive when building queries. When present, the resulting query will be returned as hashes, rather than instantiated document models. The hashes will not be "demongoized" -- the values will be returned exactly as provided by the database, directly.

If you specify typed: true, the hashes will be returned with the values translated from the raw value stored in the database, into the corresponding type defined by the field definitions on the model. (That is to say, they will be "demongoized".)

For example:

result = Person.raw.first
p result  #=> {"_id"=>BSON::ObjectId(...), "name" => "...", "birth_date" => 2002-01-01 00:00:00 -0600, ... }

result = Person.raw(typed: true).first
p result  #=> {"_id"=>BSON::ObjectId(...), "name" => "...", "birth_date" => 2002-01-01, ... }

results = Person.where(...).limit(...).raw
p results.to_a #=> [ { "_id"=>BSON::ObjectId(...), ... }, { "_id"=>BSON::ObjectId(...), ... }, ... ]

The typed: true option will also honor embedded documents, correctly demongoizing the embedded hashes according to their declared types.

@johnnyshields
Copy link
Contributor

Jamis, thank you, this is super useful.

Please kindly check the following:

  1. Combining with .only/.exclude projections, eg Customer.only(:name, :age).raw
  2. Consider making this .format(:raw) in case there are other potential formats to return. I think it may be useful to have options for demongoized raw vs non-demongized raw (what is actually in DB), as the latter is the best performance strictly speaking.
  3. Please check that all enumerable methods like .each, etc work in a progressive loading fashion (using GETMORE)
  4. Please check that embedded models are also demongized.

@jamis
Copy link
Contributor Author

jamis commented Oct 11, 2024

  1. Combining with .only/.exclude projections, eg Customer.only(:name, :age).raw

Good suggestion. I've added a few tests for that here.

  1. Consider making this .format(:raw) in case there are other potential formats to return. I think it may be useful to have options for demongoized raw vs non-demongized raw (what is actually in DB), as the latter is the best performance strictly speaking.

I think #raw reads better from a DSL perspective. If, eventually, we want the mongoized values directly, we can add that as a parameter to #raw (e.g. raw(:db) or some such). For now, I think taking that next step is out of scope for this feature, though.

  1. Please check that all enumerable methods like .each, etc work in a progressive loading fashion (using GETMORE)

There are already tests for this mongoid/contextual/mongo_spec.rb, but I tweaked them a bit to make it clearer that #each plays nicely with the underlying cursor.

  1. Please check that embedded models are also demongized.

This is already done, as well.

Thanks for the feedback, @johnnyshields!

@jamis
Copy link
Contributor Author

jamis commented Oct 14, 2024

I did a couple of simple benchmarks on this, to get some idea of just how much difference this makes. The benchmarks were:

  1. "Gizmo." Loading a thousand models ("gizmos") with no embedded records.
  2. "Room." Loading a thousand models ("rooms") with 30 embedded records each ("furnishings"). The "lazy-loaded" benchmark did not immediately instantiate the furnishings. The "eager-loaded" benchmark did.

Each benchmark was performed one hundred times, and the median elapsed time captured. Each time was then reported relative to the baseline benchmark.

"Raw" results have no typecasting applied to them. "Raw (Typed)" benchmarks are typecast according to the fields declared on the corresponding model (they are "demongoized").

The results:

Gizmo: Instances   :: 1.0000 (baseline)
Gizmo: Raw         :: 0.3938
Gizmo: Raw (Typed) :: 1.0027

Room: Instances (eager-loaded) :: 1.0000 (baseline)
Room: Instances (lazy-loaded)  :: 0.0884
Room: Raw                      :: 0.0815
Room: Raw (Typed)              :: 0.2210

Memory Comparison

For a memory benchmark, I used the same operations as above, but for each the median memory usage was reported. The numbers here represent memory usage relative to the baseline benchmark.

Gizmo: Instances   :: 1.0000 (baseline)
Gizmo: Raw         :: 0.4200
Gizmo: Raw (Typed) :: 0.6717

Room: Instances (eager-loaded) :: 1.0000 (baseline)
Room: Instances (lazy-loaded)  :: 0.2206
Room: Raw                      :: 0.2067
Room: Raw (Typed)              :: 0.2529

Conclusions

Returning just the demongoized hashes makes very little difference in performance for records with no embedded children. The memory profile is significantly better, though.

Returning demongoized hashes for records with embedded children hurts both performance and memory usage, if you do not intend to access the embedded children.

Returning demongoized hashes is significantly better (both in time and in memory) when querying records with embedded children, when you intend to access those embedded children.

@johnnyshields
Copy link
Contributor

For benchmarks, try:

  • Adding 100 fields to objects (Gizmo / Room)
  • Do benchmarks with .only projection on 3 fields.
  • Try looking at memory and object allocations as well.

One of my main use cases for this is reporting, where I'm getting 1,000,000+ objects and putting them in a CSV. Currently I'm using the #pluck_each logic in this PR heavily for it: #5497 which is definitely faster than loading objects, takes roughly 1/10 the time to run the full report (assuming I'm only plucking the fields I need)

@jamis
Copy link
Contributor Author

jamis commented Oct 17, 2024

@johnnyshields -- I've updated my previous comment to include some basic memory profiling.

Returns the hashes exactly as fetched from the database.
@jamis
Copy link
Contributor Author

jamis commented Oct 17, 2024

I also decided it was probably worth going the distance, here, and providing completely raw results (not typecast at all) as the default. If you want demongoized hashes, you can specify typed: true to the raw method:

records = Person.raw(typed: true).to_a

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants