Why the CSV standard library is broken (and how to fix it), Part IV or Numerics a.k.a. Auto-Magic Type Inference for Strings and Numbers?


I've written a new (and fourth) episode on why the CSV standard library is
broken, broken, broken (and how to fix it).

Let's have a look at numerics a.k.a. auto-magic type inference for
strings and numbers [1].

Here's the challenge for the standard csv library.
Let's read data.csv:


Using these popular two rules (bonus for NaNs - not a number).

Rule 1: Use "un-quoted" values for float numbers e.g. 1,2,3 or 1.0,
2.0, 3.0 etc.

Rule 2: Use quoted values for "non-numeric" strings e.g. "4", "5", "6"
or "Hello, World!" etc.

In the new csv reader it works like this :-):

records = 'data.csv' )
pp records
# => [[1.0, 2.0, 3.0],
# ["4", "5", "6"]]

And with your own not a number constants / configuration:

records = Csv.numeric.parse( '1,2,NAN,#NA', nan: ['NAN', '#NA'] )
pp records
# => [[1.0, 2.0, NaN, NaN]]

Let's quote an old quote from this mailing list:

Anyone? Show us how you handle the reading of the numerics
variant and Not a Number (NaN) with the standard csv library?

Questions and comments welcome. Cheers. Prost.

PS: If you want to see other (more) CSV formats / dialects pre-configured
and supported "out-of-the-box" in the new csv reader, please tell.

[1] <a href="" title=""></a>

Unsubscribe: <>