Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Code to preserve original linefeeds (issue #121) #131

Open
wants to merge 9 commits into
base: master
Choose a base branch
from
49 changes: 48 additions & 1 deletion libmodernize/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@

from __future__ import absolute_import, print_function

import os
import sys
import logging
import optparse
Expand All @@ -17,6 +18,52 @@
from libmodernize import __version__
from libmodernize.fixes import lib2to3_fix_names, six_fix_names, opt_in_fix_names


PY3K = sys.version_info >= (3, 0)
if not PY3K:
LF = '\n'
CRLF = '\r\n'
CR = '\r'
else:
LF = bytes('\n', encoding='ascii')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just use the b prefix, like b'\n'. It's valid syntax on Python 2.6 and above, which is what we support. And then you don't need an if/else, because on Python 2, b'\n' == '\n'.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

CRLF = bytes('\r\n', encoding='ascii')
CR = bytes('\r', encoding='ascii')


class LFPreservingRefactoringTool(StdoutRefactoringTool):
""" https://github.com/python-modernize/python-modernize/issues/121 """
def write_file(self, new_text, filename, old_text, encoding):
# detect linefeeds
oldfile = open(filename, 'rb')
lineends = {LF:0, CRLF:0, CR:0}
for line in oldfile:
if line.endswith(CRLF):
lineends[CRLF] += 1
elif line.endswith(LF):
lineends[LF] += 1
elif line.endswith(CR):
lineends[CR] += 1
oldfile.close()
super(LFPreservingRefactoringTool, self).write_file(
new_text, filename, old_text, encoding)
# detect if line ends are consistent in source file
if sum([bool(lineends[x]) for x in lineends]) == 1:
# detect if line ends are different from system-specific
newline = [x for x in lineends if lineends[x] != 0][0]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you used a collections.Counter object to count the line ending styles, I think this could be simplified to newline = counter.most_common(1)[0][0]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New in version 2.7, and modernize README says that it attempts to spit out a codebase compatible with Python 2.6+

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, OK, I forget when these things got added. I rarely do anything where 2.6 compatibility still matters now.

if os.linesep != newline:
# rereading new file is easier that writing new_text
# correct encoding in Python 2 and 3 compatible way
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I think it's easier to write new_text again:

with io.open(filename, 'w', encoding=encoding, newline=newline) as f:
    f.write(newtext)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure encoding is set on Python 2?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean if the test was there, it is not a problem to change this. But I had a bad experience with encodings in the past and prefer all file interactions to be in binary.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

encoding shouldn't be a problem, but I guess new_text might not be unicode on Python 2, which would make things more awkward.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've made this code without even running it, and because I can't wrap my head around how those encodings should behave in 2 and 3, I think this will be a complication. I don't like rereading all the files three times myself, but 1. python-modernize is not made for speed and 2. disk cache should fix that

lines = []
with open(filename, 'rb') as newfile:
for line in newfile:
lines.append(line.rstrip(CRLF))
with open(filename, 'wb') as f:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, we can use io.open, which is the Python 3 open function, but also available on Python 2.

for line in lines:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're using the lines that you read before the file was rewritten, so this will undo the changes modernize made. You need to base it on new_text, or re-read the file after it is changed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right! Fixed in 7787615

f.write(line + newline)
self.log_debug('fixed %s linefeeds back to %s',
filename, newline)


usage = __doc__ + """\
%s

Expand Down Expand Up @@ -125,7 +172,7 @@ def main(args=None):
else:
requested = default_fixes
fixer_names = requested.difference(unwanted_fixes)
rt = StdoutRefactoringTool(sorted(fixer_names), flags, sorted(explicit),
rt = LFPreservingRefactoringTool(sorted(fixer_names), flags, sorted(explicit),
options.nobackups, not options.no_diffs)

# Refactor all files and directories passed as arguments
Expand Down