You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
here are some examples of its behavior (deisred and undesired):
bed_content="track=full-of-nonsense\nchr1\t9000000\t10000000\n"ftmp="black.tmp"withopen(ftmp,'w') asfp:
fp.write(bed_content)
# trying to read/sniff - like in cooler-balance cli: blacklist=ftmpimportcsvwithopen(blacklist, 'rt') asf:
print( csv.Sniffer().has_header(f.read(1024)) )
yields True like it should (I guess)
bed_content = "chr1\t9000000\t10000000\nchr2\t9000000\t10000000\n" yields False - like it should!
However bed_content = "chr1\t9000000\t10000000" or with the newline bed_content = "chr1\t9000000\t10000000\n" - yields True - which is very much undesired ...
after reading has_header source code https://github.com/python/cpython/blob/607b1027fec7b4a1602aab7df57795fbcec1c51b/Lib/csv.py#L383 - it becomes apparent - they "sniff" if a csv has a header based on the several rows - i.e. they check delimiter patterns in several rows and then decide if the first row was a header or not . Thus when there is only 1 row - everything falls back to the default assumption - which is that the first raw is a header...
@nvictus what should we do ? make a special case for BED file with a single rows ?
The text was updated successfully, but these errors were encountered:
https://github.com/mirnylab/cooler/blob/c9c718fdebccbda41ad10c47f700853f79ee3cd3/cooler/cli/balance.py#L181
I was trying to steal this blacklist ingesting code for cooltools expected cli tool - here https://github.com/mirnylab/cooltools/blob/9294dae6dd19794e61bbb50773c1db04fb627398/cooltools/cli/compute_expected.py#L127
here are some examples of its behavior (deisred and undesired):
yields
True
like it should (I guess)bed_content = "chr1\t9000000\t10000000\nchr2\t9000000\t10000000\n"
yieldsFalse
- like it should!However
bed_content = "chr1\t9000000\t10000000"
or with the newlinebed_content = "chr1\t9000000\t10000000\n"
- yieldsTrue
- which is very much undesired ...after reading
has_header
source code https://github.com/python/cpython/blob/607b1027fec7b4a1602aab7df57795fbcec1c51b/Lib/csv.py#L383 - it becomes apparent - they "sniff" if a csv has a header based on the several rows - i.e. they check delimiter patterns in several rows and then decide if the first row was a header or not . Thus when there is only 1 row - everything falls back to the default assumption - which is that the first raw is a header...@nvictus what should we do ? make a special case for BED file with a single rows ?
The text was updated successfully, but these errors were encountered: