-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Preserve Gmail labels in the mbox as sub-labels of the main mbox label. #32
base: master
Are you sure you want to change the base?
Conversation
dotysan
commented
Oct 31, 2017
- Also ignore messages in Trash/Spam.
- Don't bother creating Unread label.
- Also ignore messages in Trash/Spam. - Don't bother creating Unread label.
Thanks for your pull request. It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). 📝 Please visit https://cla.developers.google.com/ to sign. Once you've signed, please reply here (e.g.
|
I signed the CLA. |
CLAs look good, thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi,
Thanks for this! I've been wanting to add Google Takeout support for a while but never had the time.
I have quite a few changes I'd like to see before approving it. I understand that you may not have time to do it, so if you don't - that's fine, please let me know.
import-mailbox-to-gmail.py
Outdated
@@ -25,6 +25,7 @@ | |||
import logging | |||
import logging.handlers | |||
import mailbox | |||
from csv import reader |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please order imports alphabetically
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, willdo.
import-mailbox-to-gmail.py
Outdated
required=False, | ||
action='store_false', | ||
help= | ||
"Preserve Gmail labels from the imported mbox as sublabels " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is meant for MBOXes created by Google Takeout, right? In that case, I think it would be more helpful to say "Preserve Gmail labels from Takeout mbox files" and naming it "--takeout-labels".
Also, I think there's a lot of value in importing read/unread status and spam/trash for some people, so I'd suggest creating separate boolean flags for "--takeout-read-state" and "--takeout-spam-trash".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. I was just lazy. Will implement these suggestions now.
import-mailbox-to-gmail.py
Outdated
@@ -223,6 +233,30 @@ def process_mbox_files(username, service, labels): | |||
if index < args.from_message: | |||
continue | |||
logging.info("Processing message %d in label '%s'", index, labelname) | |||
|
|||
if args.with_labels and 'X-Gmail-Labels' in message: | |||
gmail_labels= next(reader([message['X-Gmail-Labels']])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Won't this get only the first label?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also: Have you tested what happens if there's a label with a comma in its name in Gmail before the Takeout export?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. That's why I used from csv import reader
. However in testing, I found that I forgot to handle a long list of labels that wraps the header. Will fix that now too.
import-mailbox-to-gmail.py
Outdated
|
||
if args.with_labels and 'X-Gmail-Labels' in message: | ||
gmail_labels= next(reader([message['X-Gmail-Labels']])) | ||
#logging.info('Found Gmail Labels: %s', gmail_labels) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not log it? I think it's valuable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Me too! Just a bit chatty. I'll remove the comment.
import-mailbox-to-gmail.py
Outdated
logging.info("Skipped Trash message %d in label '%s'", index, labelname) | ||
continue | ||
if 'Unread' in gmail_labels: | ||
gmail_labels.remove('Unread') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my note above in the flags - I think this should be controlled by the user (and specifically the unread label should be imported by default).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. However, if I import the 'Unread' label as a sub-label, it doesn't work as expected. So I'll add the code/arg you suggested and make sure it is imported without the sub-label adjustment.
import-mailbox-to-gmail.py
Outdated
if 'Unread' in gmail_labels: | ||
gmail_labels.remove('Unread') | ||
|
||
label_ids= [label_id] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please make spacing consistent with the rest of the code (here and below).
import-mailbox-to-gmail.py
Outdated
gmail_labels.remove('Unread') | ||
|
||
label_ids= [label_id] | ||
for sublabel in gmail_labels: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this will work with how you use next() above.
>>> from csv import reader
>>> gmail_labels= next(reader("a,b,c"))
>>> gmail_labels
['a']
>>> next(gmail_labels)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: list object is not an iterator
>>> for sublabel in gmail_labels:
... print sublabel
...
a
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It works for me. FYI, I actually did this:
>>> from csv import reader
>>> gmail_labels= next(reader(["a,b,c"]))
>>> gmail_labels
['a', 'b', 'c']
import-mailbox-to-gmail.py
Outdated
label_ids.append(get_label_id_from_name(service, username, labels, sublabel)) | ||
metadata_object= {'labelIds': label_ids} | ||
|
||
# TODO: test/handle when there is no labels header? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this already handled by "if args.with_labels and 'X-Gmail-Labels' in message" above? What else needs to be tested?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops. It was a note-to-self written before I did the deed. Should have removed it before the PR.
import-mailbox-to-gmail.py
Outdated
@@ -235,6 +269,7 @@ def process_mbox_files(username, service, labels): | |||
logging.exception( | |||
'Failed to replace text/quoted-printable with text/plain ' | |||
'in Content-Type header') | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are there new empty lines here and elsewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was just for my own personal readability/style. I'll clean it up and make it consistent with yours.
import-mailbox-to-gmail.py
Outdated
@@ -223,6 +233,30 @@ def process_mbox_files(username, service, labels): | |||
if index < args.from_message: | |||
continue | |||
logging.info("Processing message %d in label '%s'", index, labelname) | |||
|
|||
if args.with_labels and 'X-Gmail-Labels' in message: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is quite long, maybe extract it into a separate function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can do. I was just following existing convention in the code. I.e.
if args.fix_msgid and 'Message-ID' in message:
Here's what I've implemented:
...still hafta test that last one! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor changes requested. Have you tested all of the new options yet? (as well as with a "normal" mbox in a directory structure to ensure no regressions)
Thanks again for your help!
choices=[True, False, 'sublabels'], | ||
required=False, | ||
default=True, | ||
#action='store_false', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove :)
|
||
if args.takeout_labels and 'X-Gmail-Labels' in msg: | ||
gmail_labels = next(reader([msg['X-Gmail-Labels'].replace('\r\n', '')])) | ||
logging.info('Found Gmail Labels: %s', gmail_labels) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe "Found Gmail labels from Takeout"? Otherwise the user might not understand the context (if they don't remember this mbox came from Takeout, etc.), this would help differentiate it from finding a label in the sense of finding a new mbox file.
logging.info('Found Gmail Labels: %s', gmail_labels) | ||
|
||
if args.takeout_spam_trash: | ||
# TODO: test if we are importing spam/trash without renaming to sublabels |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we not testing that?
|
||
for gmail_label in gmail_labels: | ||
if args.takeout_labels == 'sublabels' and gmail_label != 'Unread': | ||
gmail_label = "%s/%s" %(mbox_label_name, gmail_label) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Space after % please :)
@@ -247,7 +310,8 @@ def process_mbox_files(username, service, labels): | |||
message.replace_header('Message-ID', msgid) | |||
except Exception: | |||
logging.exception('Failed to fix brackets in Message-ID header') | |||
metadata_object = {'labelIds': [label_id]} | |||
metadata_object = get_metadata(service, username, labels, message, label_id, labelname) | |||
if not metadata_object: continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Please put
continue
in a new indented line. - When does this happen? Only if this is a spam/trash label we're skipping?
@@ -159,6 +189,39 @@ def get_label_id_from_name(service, username, labels, labelname): | |||
raise | |||
|
|||
|
|||
# TODO: instead of passing mbox_label_name as param, can't we look it up from id? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can, but it would be less efficient (potentially much slower if you have many labels and many messages). I think we can drop this comment unless you feel strongly about it.
Hi @dotysan, please let me know if you don't have time to make the changes I requested so I can take it up myself if needed. :) Thanks! |
Unfortunately, I haven't had a chance to test much. Plus, I got hung up trying to decide what SYSTEM labels to honor when importing into a sublabel... For example, in the current PR, when importing In my un-pushed tree, I've also honored the STARRED and IMPORTANT system labels and subsequent status formatting. Still fiddling with it a bit more. Feedback? |
Hi, sorry for the delay in response, I was away. I think treating STARRED and IMPORTANT as system labels (i.e. not prefixing them) is good, and IMHO there's no need in adding a flag on whether to preserve them like read/unread. Thanks! |
So I'm finally back on this and testing/refining. And just stumbled across an annoying (redundant?) label that is inconsistently applied. https://productforums.google.com/forum/#!topic/gmail/eKveqdKxZDM Think I'll add the code to totally ignore "Archived" label since it's already implied with the absence of the "Inbox" label. Sound good? |
Yes, if there's no other label it will go to All Mail which is the same as archived. Thanks! |