-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Find better examples #9
Comments
Thanks for the comment! I think you are absolutely right, the example is just bad. I opened an issue to find a better one. Especially in the academic field, theres often spotty json data you have to work with. I had a project once, where I got terabytes of json files from a crawler, where some attributes where sometimes present, and sometimes not. In this case, it helped a lot to be able to just work with data that is there, and skip if it is None. That's the thing with json data, that is does not have to adhere to a table schema with rows and columns. I'm not saying that you should structure your data like that, but if you get data that looks like this, then it makes easier to work with it. PathDict is not perfect by no means, and I think it still has a long way to go, but I know of at least one instance where it is used in production level code and provides a benefit there. I just have to dig up some better examples: Heres one: crawler_output = { Get all deleted Posts:deleted_posts = [] Ordeleted_posts = [post for post in crawler_output["posts"] if post.get("meta", {}).get("deleted", False)] Remove all deleted postsdb["posts"] = [post for post in crawler_output["posts"] if not post.get("meta", {}).get("deleted", False)] print("Deleted Posts:") PD version get deleted postspd = PathDict(crawler_output) PD version remove deleted postspd.filter("posts", f=lambda x: not x["meta", "deleted"]) Do you think this is a better example? |
First, I appreciate the work you've put into this. But, frankly, I don't know if this is something I could see myself using. The main reason being that it introduces a lot of non-obvious syntax for something as primitive as dict manipulation. I would consider it a burden on whoever is trying to understand my code because this is such a low-level thing to have a customized implementation for. The only trade-off being that I save some time reading KeyError guards or if-statements.
And on the other hand, I'm always unconvinced to utilize a library when examples seem to be constructed strawmen to pit their new implementation against. It's kind of like seeing someone pick a fight with a blind guy; it's obvious you're going to win. The code in your examples is questionably written in my opinion:
for user_name in db:
user_music = None
if user_name in db:
if "interests" in db[user_name]:
if "music" in db[user_name]["interests"]:
user_music = db[user_name]["interests"]["music"]
print(user_music)
out: None
out: ["pop", "alternative"]
Why are we iterating over the keys of db and then checking to see if the keys are in the iterable we're iterating over? That's a given.
Why are we initializing user_music as None? It seems like an attempt to make this code seem even longer and more cumbersome than it really needs to be.
Why are we not utilizing db.items()?
In my opinion, this a more realistic representation of code you'd actually come across in this situation:
for user, user_dict in db.items():
for interest, interest_list in user_dict['interests'].items():
print(interest_list if interest == 'music' else None)
out: None
out: ["pop", "alternative"]
Half the work I do is data analysis and it's also uncommon to encounter data in the form of records with random missing keys (interests I mean). If you find yourself the point where you're handling record data and you don't know if your first nested key is going to exist, let alone the second nested key... It seems like the solution isn't to have more capable dict-handling syntax but to reassess why your data looks like that to begin with and fix that.
I understand that in this context-free example we don't know if interests will be a key for each user. But in reality that would mean an entire column of your records data doesn't exist for an entry. And that's just poor data at that point.
Again, thanks for sharing and I hope this doesn't come off as harsh or discouraging. I just thought I'd share what stopping me and maybe other developers from pip installing your package.
The text was updated successfully, but these errors were encountered: