-
Notifications
You must be signed in to change notification settings - Fork 54
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
parser: account for number of header columns in dialect detection
When we parse CSVs, we consider it an error for any row to contain more values than there are header columns. But the dialect detection wasn't consistent with that behavior, and if it encountered such a row it would score it higher than it would a row containing fewer values than there are headers. The consequence of that is that we could end up scoring an incorrect quote character higher than a correct one if it produces more columns (which often the case when quoted values contain delimiters). This commit addresses that oversight by zeroing the score of any row that contains too many values. Thus it is treated the same as if the row couldn't be parsed at all. The result is that dialect detection produces a much more accurate guess of the correct quote character.
- Loading branch information
Showing
2 changed files
with
82 additions
and
6 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters