Don't parse CSVs by hand!

Randall - May 11 '22 - - Dev Community

I've seen custom JavaScript code to parse CSV data a number of times. It looks something like this:

const csvData = fs.readFileSync(csvFilePath, 'utf8');
const lines = csvData.split('\n');
const rows = lines.map(line => line.split(','));
Enter fullscreen mode Exit fullscreen mode

So simple, and it may be fine if you're parsing a single static CSV file whose content you know. But if you're parsing unknown CSV files, don't do this, there are many ways it can break.

What Can Go Wrong

Escaped cells

Columns in a CSV file may contain commas. Such columns are typically escaped with quotes. The raw data for a row with such a column might look something like this:

Name,Best Line
Hamlet,"To be, or not to be"
Enter fullscreen mode Exit fullscreen mode

Though the second line has two commas, this produces a table with two columns, not three:

Name Best Line
Hamlet To be, or not to be

This breaks the simple CSV parser implementation which just splits on ,.

Line Endings

The simple CSV parser implementation assumes LF line endings (\n), but text files, including CSV files, created on Windows will often have CRLF line endings (\r\n). Further, there may be line ending characters within escaped cells. Needless to say, these cases break the simple CSV parser implementation.

What to do instead

Bring in a library that's built for this. I like to use csv-parse. It's easy to use:

import { parse as csvParse } from 'csv-parse/sync';

const csvData = fs.readFileSync(csvFilePath, 'utf8');
const rows = csvParse(csvData);
Enter fullscreen mode Exit fullscreen mode

That's it.

Conclusion

I know no one likes bringing in libraries to solve what seems to be a simple task, but parsing arbitrary CSVs correctly is actually a lot harder than it looks at first glance. Save yourself the pain, use someone else's battle-hardened parser.

. . . . . . . . . . . . . . . . . . . . . . .