Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add upsert behavior for doc_ids (not just condition) #394

Closed
benetherington opened this issue May 5, 2021 · 3 comments
Closed

Add upsert behavior for doc_ids (not just condition) #394

benetherington opened this issue May 5, 2021 · 3 comments

Comments

@benetherington
Copy link
Contributor

benetherington commented May 5, 2021

PR forthcoming, but I wanted to talk alternatives out first.

Upsert is one of my favorite tricks implemented by TinyDB. I'm currently working on an API where I'd really like to duplicate server db IDs in my local db. Fewer mistakes that way, I figure. Currently, Table.insert and Table.update handle doc_ids in very different ways. Insert doesn't take a doc_id argument, but instead requires a pre-built Document. Update, on the other hand, takes doc_ids as an argument. I suspect this has to do with the way Query() works, and that unifying this behavior would require a major version (maybe two! 😄)

The simplest solution would be to add a new method, maybe Table.upsert_by_id, but instead, I'd like to make upsert smarter to begin with. There are a few details to iron out, though.

Upsert bridges an odd gap, in that updating changes one or may records, but inserting only changes one. Upsert already wraps an insert-generated doc_id in a list to mimic a single updated doc_id's return value. This is good. An ID-aware upsert should accept doc_ids (plural). If it accepted doc_id (singular), it would be expected to return a single ID, not a list. That would be confusing.

Simple is better. When I started working on this, my first instinct was to break upsert into two branches of an if statement. One branch would handle cond but not doc_ids, the other would handle doc_ids but not cond. I added complexity by asserting only one argument was specified. I added further complexity when I had to loop over multiple inserts. I added EVEN FURTHER complexity when I tried to handle mixed existing and not-existing IDs. I stopped writing code when I was considering breaking off the doc_ids branch into a private method that upsert could call multiple times. It wasn't pretty, folks.

That's where I'm at right now, and I need to break off into a different task. When I come back, my plan is to keep the if doc_ids is not None branching structure, but A) group missing and existing IDs into two separate lists and handle them separately, B) change update_multiple to take a doc_ids argument and handle some of this logic, and maybe make it more useful to someone else down the line.

Does this all sound like a good plan of attack? Is the juice not worth the squeeze, and would Table.upsert_by_id be a better solution all round?

Thanks, folks, I really love this library, and I hope I can make it better. It fills a fantastic niche!

EDIT: AHHH should have checked open PRs first! There's already work to change update_multiple! Okay, I think that's really going to be the direction to take this.

@benetherington
Copy link
Contributor Author

Submitted PR #395.

Gardening and sleep lend clarity. Instead of adding a new argument to upsert, I followed the pattern established by insert and respected Document.doc_id.

@msiemens
Copy link
Owner

First, thanks for your work @benetherington! As you described your motivation and general design in this issue and also opened a Pull Request, I'll answer regarding the more general questions here and then take a look a the PR and give feedback on the code over there 🙂

Currently, Table.insert and Table.update handle doc_ids in very different ways. Insert doesn't take a doc_id argument, but instead requires a pre-built Document. Update, on the other hand, takes doc_ids as an argument.

Interesting! This difference didn't occur to me when adding doc_id support to insert in 5040649 last June. For the update operation however, doc_id support was implemented in d091d52 back in 2014 (gosh, has it been a long time!). In hindsight it would have been a more clean design always using a separate doc_id/s argument instead of detecting Document objects and reading their IDs. While this would result in more verbose code when inserting/updating a single document (db.update(doc, doc_ids=[doc.doc_id])) it would be really clear what's happening and it wouldn't result in surprises when passing a Document instance by accident. But that opportunity has passed 🙂

Long story short, I agree that this is a weird spot in TinyDB's API.

An ID-aware upsert should accept doc_ids (plural). If it accepted doc_id (singular), it would be expected to return a single ID, not a list.

I agree 100%

That's where I'm at right now, and I need to break off into a different task.

I think you found a nice solution in your PR, I'll comment on that over there 😉

Again, thanks for your work and the thought you put into this, it's very much appreciated!

msiemens pushed a commit that referenced this issue May 17, 2021
@msiemens
Copy link
Owner

Closing this as #395 has been merged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants