Add upsert behavior for doc_ids (not just condition) #394

benetherington · 2021-05-05T20:28:07Z

PR forthcoming, but I wanted to talk alternatives out first.

Upsert is one of my favorite tricks implemented by TinyDB. I'm currently working on an API where I'd really like to duplicate server db IDs in my local db. Fewer mistakes that way, I figure. Currently, Table.insert and Table.update handle doc_ids in very different ways. Insert doesn't take a doc_id argument, but instead requires a pre-built Document. Update, on the other hand, takes doc_ids as an argument. I suspect this has to do with the way Query() works, and that unifying this behavior would require a major version (maybe two! 😄)

The simplest solution would be to add a new method, maybe Table.upsert_by_id, but instead, I'd like to make upsert smarter to begin with. There are a few details to iron out, though.

Upsert bridges an odd gap, in that updating changes one or may records, but inserting only changes one. Upsert already wraps an insert-generated doc_id in a list to mimic a single updated doc_id's return value. This is good. An ID-aware upsert should accept doc_ids (plural). If it accepted doc_id (singular), it would be expected to return a single ID, not a list. That would be confusing.

Simple is better. When I started working on this, my first instinct was to break upsert into two branches of an if statement. One branch would handle cond but not doc_ids, the other would handle doc_ids but not cond. I added complexity by asserting only one argument was specified. I added further complexity when I had to loop over multiple inserts. I added EVEN FURTHER complexity when I tried to handle mixed existing and not-existing IDs. I stopped writing code when I was considering breaking off the doc_ids branch into a private method that upsert could call multiple times. It wasn't pretty, folks.

That's where I'm at right now, and I need to break off into a different task. When I come back, my plan is to keep the if doc_ids is not None branching structure, but A) group missing and existing IDs into two separate lists and handle them separately, B) change update_multiple to take a doc_ids argument and handle some of this logic, and maybe make it more useful to someone else down the line.

Does this all sound like a good plan of attack? Is the juice not worth the squeeze, and would Table.upsert_by_id be a better solution all round?

Thanks, folks, I really love this library, and I hope I can make it better. It fills a fantastic niche!

EDIT: AHHH should have checked open PRs first! There's already work to change update_multiple! Okay, I think that's really going to be the direction to take this.

The text was updated successfully, but these errors were encountered:

benetherington · 2021-05-06T13:51:19Z

Submitted PR #395.

Gardening and sleep lend clarity. Instead of adding a new argument to upsert, I followed the pattern established by insert and respected Document.doc_id.

msiemens · 2021-05-17T18:25:18Z

First, thanks for your work @benetherington! As you described your motivation and general design in this issue and also opened a Pull Request, I'll answer regarding the more general questions here and then take a look a the PR and give feedback on the code over there 🙂

Currently, Table.insert and Table.update handle doc_ids in very different ways. Insert doesn't take a doc_id argument, but instead requires a pre-built Document. Update, on the other hand, takes doc_ids as an argument.

Interesting! This difference didn't occur to me when adding doc_id support to insert in 5040649 last June. For the update operation however, doc_id support was implemented in d091d52 back in 2014 (gosh, has it been a long time!). In hindsight it would have been a more clean design always using a separate doc_id/s argument instead of detecting Document objects and reading their IDs. While this would result in more verbose code when inserting/updating a single document (db.update(doc, doc_ids=[doc.doc_id])) it would be really clear what's happening and it wouldn't result in surprises when passing a Document instance by accident. But that opportunity has passed 🙂

Long story short, I agree that this is a weird spot in TinyDB's API.

An ID-aware upsert should accept doc_ids (plural). If it accepted doc_id (singular), it would be expected to return a single ID, not a list.

I agree 100%

That's where I'm at right now, and I need to break off into a different task.

I think you found a nice solution in your PR, I'll comment on that over there 😉

Again, thanks for your work and the thought you put into this, it's very much appreciated!

See #394

msiemens · 2021-05-17T18:40:42Z

Closing this as #395 has been merged

msiemens pushed a commit that referenced this issue May 17, 2021

feat: add doc_id handling to upsert() (#395)

6cc942e

See #394

msiemens closed this as completed May 17, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add upsert behavior for doc_ids (not just condition) #394

Add upsert behavior for doc_ids (not just condition) #394

benetherington commented May 5, 2021 •

edited

Loading

benetherington commented May 6, 2021

msiemens commented May 17, 2021

msiemens commented May 17, 2021

Add upsert behavior for doc_ids (not just condition) #394

Add upsert behavior for doc_ids (not just condition) #394

Comments

benetherington commented May 5, 2021 • edited Loading

benetherington commented May 6, 2021

msiemens commented May 17, 2021

msiemens commented May 17, 2021

benetherington commented May 5, 2021 •

edited

Loading