Skip to content

CsvEnumerator .size can be wrong with quoted embedded newlines in a cell #7

@jjowdy

Description

@jjowdy

There's some great stuff here! I noticed one thing that might be a problem somewhere, but probably isn't in practice:

If a (quoted) CSV cell contains embedded newlines—which unfortunately can happen, especially with web form input or something like option-return in Google Sheets—then using wc naively as in this line will return a count larger than the number of rows:

       count = `wc -l < #{filepath}`.strip.to_i

It may not matter much, but it does mean that the enumerator's .size will be wrong.

Unfortunately, I don't think in general there's a good way to deal with this other than doing a first pass with the same CSV library (or equivalent parsing logic) to get the row count. But as long as size is at least as large as the number of rows, it could just be reasonable to state as a known limitation.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions