MONGOID-5411 allow results to be returned as demongoized hashes #5877

jamis · 2024-10-09T21:11:50Z

Mongoid now supports a new raw directive when building queries. When present, the resulting query will be returned as hashes, rather than instantiated document models. The hashes will not be "demongoized" -- the values will be returned exactly as provided by the database, directly.

If you specify typed: true, the hashes will be returned with the values translated from the raw value stored in the database, into the corresponding type defined by the field definitions on the model. (That is to say, they will be "demongoized".)

For example:

result = Person.raw.first
p result  #=> {"_id"=>BSON::ObjectId(...), "name" => "...", "birth_date" => 2002-01-01 00:00:00 -0600, ... }

result = Person.raw(typed: true).first
p result  #=> {"_id"=>BSON::ObjectId(...), "name" => "...", "birth_date" => 2002-01-01, ... }

results = Person.where(...).limit(...).raw
p results.to_a #=> [ { "_id"=>BSON::ObjectId(...), ... }, { "_id"=>BSON::ObjectId(...), ... }, ... ]

The typed: true option will also honor embedded documents, correctly demongoizing the embedded hashes according to their declared types.

johnnyshields · 2024-10-10T02:34:13Z

Jamis, thank you, this is super useful.

Please kindly check the following:

Combining with .only/.exclude projections, eg Customer.only(:name, :age).raw
Consider making this .format(:raw) in case there are other potential formats to return. I think it may be useful to have options for demongoized raw vs non-demongized raw (what is actually in DB), as the latter is the best performance strictly speaking.
Please check that all enumerable methods like .each, etc work in a progressive loading fashion (using GETMORE)
Please check that embedded models are also demongized.

jamis · 2024-10-11T15:19:24Z

Combining with .only/.exclude projections, eg Customer.only(:name, :age).raw

Good suggestion. I've added a few tests for that here.

Consider making this .format(:raw) in case there are other potential formats to return. I think it may be useful to have options for demongoized raw vs non-demongized raw (what is actually in DB), as the latter is the best performance strictly speaking.

I think #raw reads better from a DSL perspective. If, eventually, we want the mongoized values directly, we can add that as a parameter to #raw (e.g. raw(:db) or some such). For now, I think taking that next step is out of scope for this feature, though.

Please check that all enumerable methods like .each, etc work in a progressive loading fashion (using GETMORE)

There are already tests for this mongoid/contextual/mongo_spec.rb, but I tweaked them a bit to make it clearer that #each plays nicely with the underlying cursor.

Please check that embedded models are also demongized.

This is already done, as well.

Thanks for the feedback, @johnnyshields!

jamis · 2024-10-14T17:12:07Z

I did a couple of simple benchmarks on this, to get some idea of just how much difference this makes. The benchmarks were:

"Gizmo." Loading a thousand models ("gizmos") with no embedded records.
"Room." Loading a thousand models ("rooms") with 30 embedded records each ("furnishings"). The "lazy-loaded" benchmark did not immediately instantiate the furnishings. The "eager-loaded" benchmark did.

Each benchmark was performed one hundred times, and the median elapsed time captured. Each time was then reported relative to the baseline benchmark.

"Raw" results have no typecasting applied to them. "Raw (Typed)" benchmarks are typecast according to the fields declared on the corresponding model (they are "demongoized").

The results:

Gizmo: Instances   :: 1.0000 (baseline)
Gizmo: Raw         :: 0.3938
Gizmo: Raw (Typed) :: 1.0027

Room: Instances (eager-loaded) :: 1.0000 (baseline)
Room: Instances (lazy-loaded)  :: 0.0884
Room: Raw                      :: 0.0815
Room: Raw (Typed)              :: 0.2210

Memory Comparison

For a memory benchmark, I used the same operations as above, but for each the median memory usage was reported. The numbers here represent memory usage relative to the baseline benchmark.

Gizmo: Instances   :: 1.0000 (baseline)
Gizmo: Raw         :: 0.4200
Gizmo: Raw (Typed) :: 0.6717

Room: Instances (eager-loaded) :: 1.0000 (baseline)
Room: Instances (lazy-loaded)  :: 0.2206
Room: Raw                      :: 0.2067
Room: Raw (Typed)              :: 0.2529

Conclusions

Returning just the demongoized hashes makes very little difference in performance for records with no embedded children. The memory profile is significantly better, though.

Returning demongoized hashes for records with embedded children hurts both performance and memory usage, if you do not intend to access the embedded children.

Returning demongoized hashes is significantly better (both in time and in memory) when querying records with embedded children, when you intend to access those embedded children.

johnnyshields · 2024-10-15T00:31:04Z

For benchmarks, try:

Adding 100 fields to objects (Gizmo / Room)
Do benchmarks with .only projection on 3 fields.
Try looking at memory and object allocations as well.

One of my main use cases for this is reporting, where I'm getting 1,000,000+ objects and putting them in a CSV. Currently I'm using the #pluck_each logic in this PR heavily for it: #5497 which is definitely faster than loading objects, takes roughly 1/10 the time to run the full report (assuming I'm only plucking the fields I need)

jamis · 2024-10-17T19:41:55Z

@johnnyshields -- I've updated my previous comment to include some basic memory profiling.

Returns the hashes exactly as fetched from the database.

jamis · 2024-10-17T21:41:21Z

I also decided it was probably worth going the distance, here, and providing completely raw results (not typecast at all) as the default. If you want demongoized hashes, you can specify typed: true to the raw method:

records = Person.raw(typed: true).to_a

MONGOID-5411 allow results to be returned as demongoized hashes

1b9957e

jamis requested a review from comandeo-mongo October 9, 2024 21:33

tests

333fde4

modify the hash in-place as an optimization

df098f4

Add a new default mode for raw

1155311

Returns the hashes exactly as fetched from the database.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MONGOID-5411 allow results to be returned as demongoized hashes #5877

MONGOID-5411 allow results to be returned as demongoized hashes #5877

jamis commented Oct 9, 2024 •

edited

Loading

johnnyshields commented Oct 10, 2024

jamis commented Oct 11, 2024

jamis commented Oct 14, 2024 •

edited

Loading

johnnyshields commented Oct 15, 2024

jamis commented Oct 17, 2024

jamis commented Oct 17, 2024

MONGOID-5411 allow results to be returned as demongoized hashes #5877

Are you sure you want to change the base?

MONGOID-5411 allow results to be returned as demongoized hashes #5877

Conversation

jamis commented Oct 9, 2024 • edited Loading

johnnyshields commented Oct 10, 2024

jamis commented Oct 11, 2024

jamis commented Oct 14, 2024 • edited Loading

Memory Comparison

Conclusions

johnnyshields commented Oct 15, 2024

jamis commented Oct 17, 2024

jamis commented Oct 17, 2024

jamis commented Oct 9, 2024 •

edited

Loading

jamis commented Oct 14, 2024 •

edited

Loading