csv sniffer fails, when given only a single line of BED-like input #196

sergpolly · 2020-05-05T17:32:35Z

https://github.com/mirnylab/cooler/blob/c9c718fdebccbda41ad10c47f700853f79ee3cd3/cooler/cli/balance.py#L181

I was trying to steal this blacklist ingesting code for cooltools expected cli tool - here https://github.com/mirnylab/cooltools/blob/9294dae6dd19794e61bbb50773c1db04fb627398/cooltools/cli/compute_expected.py#L127

here are some examples of its behavior (deisred and undesired):

bed_content = "track=full-of-nonsense\nchr1\t9000000\t10000000\n"
ftmp = "black.tmp"
with open(ftmp,'w') as fp:
    fp.write(bed_content)

# trying to read/sniff - like in cooler-balance cli: 
blacklist = ftmp
import csv
with open(blacklist, 'rt') as f:
    print( csv.Sniffer().has_header(f.read(1024)) )

yields True like it should (I guess)

bed_content = "chr1\t9000000\t10000000\nchr2\t9000000\t10000000\n" yields False - like it should!

However bed_content = "chr1\t9000000\t10000000" or with the newline bed_content = "chr1\t9000000\t10000000\n" - yields True - which is very much undesired ...

after reading has_header source code https://github.com/python/cpython/blob/607b1027fec7b4a1602aab7df57795fbcec1c51b/Lib/csv.py#L383 - it becomes apparent - they "sniff" if a csv has a header based on the several rows - i.e. they check delimiter patterns in several rows and then decide if the first row was a header or not . Thus when there is only 1 row - everything falls back to the default assumption - which is that the first raw is a header...

@nvictus what should we do ? make a special case for BED file with a single rows ?

The text was updated successfully, but these errors were encountered:

nvictus · 2020-06-30T05:30:08Z

Migrating to #209

sergpolly mentioned this issue May 5, 2020

what's the best way to read a BED-file with masking regions - aka "bad bins" open2c/cooltools#157

Closed

nvictus mentioned this issue Jun 30, 2020

Better input TSV validation #209

Open

5 tasks

nvictus closed this as completed Jun 30, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

csv sniffer fails, when given only a single line of BED-like input #196

csv sniffer fails, when given only a single line of BED-like input #196

sergpolly commented May 5, 2020

nvictus commented Jun 30, 2020

csv sniffer fails, when given only a single line of BED-like input #196

csv sniffer fails, when given only a single line of BED-like input #196

Comments

sergpolly commented May 5, 2020

nvictus commented Jun 30, 2020