Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preserve Gmail labels in the mbox as sub-labels of the main mbox label. #32

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

dotysan
Copy link

@dotysan dotysan commented Oct 31, 2017

  • Also ignore messages in Trash/Spam.
  • Don't bother creating Unread label.

 - Also ignore messages in Trash/Spam.
 - Don't bother creating Unread label.
@googlebot
Copy link
Collaborator

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please visit https://cla.developers.google.com/ to sign.

Once you've signed, please reply here (e.g. I signed it!) and we'll verify. Thanks.


  • If you've already signed a CLA, it's possible we don't have your GitHub username or you're using a different email address. Check your existing CLA data and verify that your email is set on your git commits.
  • If your company signed a CLA, they designated a Point of Contact who decides which employees are authorized to participate. You may need to contact the Point of Contact for your company and ask to be added to the group of authorized contributors. If you don't know who your Point of Contact is, direct the project maintainer to go/cla#troubleshoot.
  • In order to pass this check, please resolve this problem and have the pull request author add another comment and the bot will run again.

@dotysan
Copy link
Author

dotysan commented Oct 31, 2017

I signed the CLA.

@googlebot
Copy link
Collaborator

CLAs look good, thanks!

Copy link
Collaborator

@eesheesh eesheesh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi,

Thanks for this! I've been wanting to add Google Takeout support for a while but never had the time.

I have quite a few changes I'd like to see before approving it. I understand that you may not have time to do it, so if you don't - that's fine, please let me know.

@@ -25,6 +25,7 @@
import logging
import logging.handlers
import mailbox
from csv import reader
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please order imports alphabetically

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, willdo.

required=False,
action='store_false',
help=
"Preserve Gmail labels from the imported mbox as sublabels "
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is meant for MBOXes created by Google Takeout, right? In that case, I think it would be more helpful to say "Preserve Gmail labels from Takeout mbox files" and naming it "--takeout-labels".

Also, I think there's a lot of value in importing read/unread status and spam/trash for some people, so I'd suggest creating separate boolean flags for "--takeout-read-state" and "--takeout-spam-trash".

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I was just lazy. Will implement these suggestions now.

@@ -223,6 +233,30 @@ def process_mbox_files(username, service, labels):
if index < args.from_message:
continue
logging.info("Processing message %d in label '%s'", index, labelname)

if args.with_labels and 'X-Gmail-Labels' in message:
gmail_labels= next(reader([message['X-Gmail-Labels']]))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't this get only the first label?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also: Have you tested what happens if there's a label with a comma in its name in Gmail before the Takeout export?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. That's why I used from csv import reader. However in testing, I found that I forgot to handle a long list of labels that wraps the header. Will fix that now too.


if args.with_labels and 'X-Gmail-Labels' in message:
gmail_labels= next(reader([message['X-Gmail-Labels']]))
#logging.info('Found Gmail Labels: %s', gmail_labels)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not log it? I think it's valuable.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Me too! Just a bit chatty. I'll remove the comment.

logging.info("Skipped Trash message %d in label '%s'", index, labelname)
continue
if 'Unread' in gmail_labels:
gmail_labels.remove('Unread')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my note above in the flags - I think this should be controlled by the user (and specifically the unread label should be imported by default).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. However, if I import the 'Unread' label as a sub-label, it doesn't work as expected. So I'll add the code/arg you suggested and make sure it is imported without the sub-label adjustment.

if 'Unread' in gmail_labels:
gmail_labels.remove('Unread')

label_ids= [label_id]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make spacing consistent with the rest of the code (here and below).

gmail_labels.remove('Unread')

label_ids= [label_id]
for sublabel in gmail_labels:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this will work with how you use next() above.

>>> from csv import reader
>>> gmail_labels= next(reader("a,b,c"))
>>> gmail_labels
['a']
>>> next(gmail_labels)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: list object is not an iterator
>>> for sublabel in gmail_labels:
...   print sublabel
... 
a

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It works for me. FYI, I actually did this:

>>> from csv import reader
>>> gmail_labels= next(reader(["a,b,c"]))
>>> gmail_labels
['a', 'b', 'c']

label_ids.append(get_label_id_from_name(service, username, labels, sublabel))
metadata_object= {'labelIds': label_ids}

# TODO: test/handle when there is no labels header?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this already handled by "if args.with_labels and 'X-Gmail-Labels' in message" above? What else needs to be tested?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops. It was a note-to-self written before I did the deed. Should have removed it before the PR.

@@ -235,6 +269,7 @@ def process_mbox_files(username, service, labels):
logging.exception(
'Failed to replace text/quoted-printable with text/plain '
'in Content-Type header')

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are there new empty lines here and elsewhere?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was just for my own personal readability/style. I'll clean it up and make it consistent with yours.

@@ -223,6 +233,30 @@ def process_mbox_files(username, service, labels):
if index < args.from_message:
continue
logging.info("Processing message %d in label '%s'", index, labelname)

if args.with_labels and 'X-Gmail-Labels' in message:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is quite long, maybe extract it into a separate function?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can do. I was just following existing convention in the code. I.e.
if args.fix_msgid and 'Message-ID' in message:

@dotysan
Copy link
Author

dotysan commented Nov 6, 2017

Here's what I've implemented:

  --takeout-labels {True,False,sublabels}
                        Preserve Gmail labels from Takeout mbox files. Or
                        optionally, transform them into sublabels. (default:
                        keep them)
  --takeout-no-unread   Don't preserve read/unread state from Takeout mbox
                        files (default: keep it)
  --takeout-spam-trash  Import messages from Spam/Trash labels in Takeout mbox
                        files (default: skip them)

...still hafta test that last one!

Copy link
Collaborator

@eesheesh eesheesh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor changes requested. Have you tested all of the new options yet? (as well as with a "normal" mbox in a directory structure to ensure no regressions)

Thanks again for your help!

choices=[True, False, 'sublabels'],
required=False,
default=True,
#action='store_false',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove :)


if args.takeout_labels and 'X-Gmail-Labels' in msg:
gmail_labels = next(reader([msg['X-Gmail-Labels'].replace('\r\n', '')]))
logging.info('Found Gmail Labels: %s', gmail_labels)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe "Found Gmail labels from Takeout"? Otherwise the user might not understand the context (if they don't remember this mbox came from Takeout, etc.), this would help differentiate it from finding a label in the sense of finding a new mbox file.

logging.info('Found Gmail Labels: %s', gmail_labels)

if args.takeout_spam_trash:
# TODO: test if we are importing spam/trash without renaming to sublabels
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we not testing that?


for gmail_label in gmail_labels:
if args.takeout_labels == 'sublabels' and gmail_label != 'Unread':
gmail_label = "%s/%s" %(mbox_label_name, gmail_label)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Space after % please :)

@@ -247,7 +310,8 @@ def process_mbox_files(username, service, labels):
message.replace_header('Message-ID', msgid)
except Exception:
logging.exception('Failed to fix brackets in Message-ID header')
metadata_object = {'labelIds': [label_id]}
metadata_object = get_metadata(service, username, labels, message, label_id, labelname)
if not metadata_object: continue
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Please put continue in a new indented line.
  2. When does this happen? Only if this is a spam/trash label we're skipping?

@@ -159,6 +189,39 @@ def get_label_id_from_name(service, username, labels, labelname):
raise


# TODO: instead of passing mbox_label_name as param, can't we look it up from id?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can, but it would be less efficient (potentially much slower if you have many labels and many messages). I think we can drop this comment unless you feel strongly about it.

@eesheesh
Copy link
Collaborator

Hi @dotysan, please let me know if you don't have time to make the changes I requested so I can take it up myself if needed. :) Thanks!

@dotysan
Copy link
Author

dotysan commented Nov 30, 2017

Unfortunately, I haven't had a chance to test much. Plus, I got hung up trying to decide what SYSTEM labels to honor when importing into a sublabel...

For example, in the current PR, when importing --takeout-labels sublabels without --takeout-no-unread it will honor the UNREAD system label instead of creating a sublabel. This preserves the bold formatting of that status. But it will still create 'Starred' and 'Important' sublabels.

In my un-pushed tree, I've also honored the STARRED and IMPORTANT system labels and subsequent status formatting. Still fiddling with it a bit more. Feedback?

@eesheesh
Copy link
Collaborator

eesheesh commented Dec 7, 2017

Hi, sorry for the delay in response, I was away.

I think treating STARRED and IMPORTANT as system labels (i.e. not prefixing them) is good, and IMHO there's no need in adding a flag on whether to preserve them like read/unread.

Thanks!

@dotysan
Copy link
Author

dotysan commented Jun 22, 2018

So I'm finally back on this and testing/refining. And just stumbled across an annoying (redundant?) label that is inconsistently applied. https://productforums.google.com/forum/#!topic/gmail/eKveqdKxZDM

Think I'll add the code to totally ignore "Archived" label since it's already implied with the absence of the "Inbox" label. Sound good?

@eesheesh
Copy link
Collaborator

Think I'll add the code to totally ignore "Archived" label since it's already implied with the absence of the "Inbox" label. Sound good?

Yes, if there's no other label it will go to All Mail which is the same as archived. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants