Byte Order Mark characters included in output? #33

BillyTom · 2014-03-20T15:23:11Z

The csv-file I am importing is encoded in UTF-8 and thus it startes with the byte order "EF BB BF" or "ï»¿" when decoded. (see http://de.wikipedia.org/wiki/Byte_Order_Mark)

These are non-print characters and generally don't show up in the output. However, it can make a difference if you are making a string comparison.

For example, my first column in the first row looks like this:

array(12) {
  [0]=>
  string(16) "location-ID"
  [1]=>
  string(5) "value"
  [2]=>
    ...

As you can see the character count is a bit off because of the non-print-characters. Other columns are not affected. Only the very first column in the very first row shows this behaviour.

I've tried several different config-options (->setToCharset('UTF-8') etc.) in order to quash those unwanted characters, but none did work.

My csv-file contains several special characters like äöü or ß which are all displayed correctly, so I am positive that the input is decoded correctly.

It is not a big deal to manually remove those unwanted characters in the interpreter, but I was wondering if this was a bug in goodby/csv.

The text was updated successfully, but these errors were encountered:

judgej · 2014-04-17T10:29:34Z

In a similar fashion, I am looking for support to generate the BOM characters when exporting. Those three characters seem to be the only way to tell MS Excel what encoding the file uses. I'll raise it as a separate issue when I have more details, but just noting it here so it does not get lost. To export setFromCharset() could be used to set what the (optional) BOM looks like and should not need to be paired up with a setToCharset() if no conversion is needed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Byte Order Mark characters included in output? #33

Byte Order Mark characters included in output? #33

BillyTom commented Mar 20, 2014

judgej commented Apr 17, 2014

Byte Order Mark characters included in output? #33

Byte Order Mark characters included in output? #33

Comments

BillyTom commented Mar 20, 2014

judgej commented Apr 17, 2014