-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Code to preserve original linefeeds (issue #121) #131
base: master
Are you sure you want to change the base?
Conversation
Nice, I like this idea |
# detect linefeeds | ||
lineends = {'\n':0, '\r\n':0, '\r':0} | ||
lines = [] | ||
for line in open(filename, 'rb'): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To make this work the same way on Python 2 and 3, use:
io.open(filename, newline='')
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But file needs to be opened in binary more or else the information about lineends will be lost.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, the newline=''
means it will do no conversion of line endings. Binary mode has a much bigger effect on Python 3, because it means you're dealing with bytes rather than strs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But binary mode guarantees that Python 3 won't bail out with UnicodeDecodeError. How to address that? I can't know the file encoding before opening it as Python 3 requires.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair point. There are ways around that, like using errors='ignore'
to skip characters that can't be decoded, but it may be easier to use binary mode here. In that case, though, you'll need to update the checks below to use bytes, e.g. if line.endswith(b'\r\n')
.
Well, Python 3 is a pain. But it should work for now on Python 2. I may be able to finish this later. |
newline = [x for x in lineends if lineends[x] != 0][0] | ||
if os.linesep != newline: | ||
with open(filename, 'wb') as f: | ||
for line in lines: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're using the lines that you read before the file was rewritten, so this will undo the changes modernize made. You need to base it on new_text
, or re-read the file after it is changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right! Fixed in 7787615
I am not sure I can work on a test. Not today. Basically, the test need to create two files - one with LF linefeeds and another with CRLF. And after fix is applied, check that linefeeds didn't change. So either produce 4 files (input and output) or copy/paste the function being tested to the test case (which is no good). The files are also can not be committed to Git, because it messes with linefeeds. |
CRLF = '\r\n' | ||
CR = '\r' | ||
else: | ||
LF = bytes('\n', encoding='ascii') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just use the b prefix, like b'\n'
. It's valid syntax on Python 2.6 and above, which is what we support. And then you don't need an if/else, because on Python 2, b'\n' == '\n'
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
This isn't correct as-is, but I like the basic approach. I will work on it tomorrow. |
@techtonik I think this feature would be best in https://github.com/jreese/fissix |
@graingert I see this project became derived from |
@techtonik not much has changed, but you can see what enhancements are applied here: Everything goes via black + a git merge from CPython @jreese maybe a changelog will help? |
This is pseudo code - it may work or may not. I haven't got chance to test it yet