This is the results of days of trial and error. I didn't have a clue about streams and what not, hence why it took so long :D
You need the following libraries:
Create form to upload CSV files - multipart/form-data
File upload needs to be done via multipart/form-data
. This is something I got acquainted recently too, and probably will be the object of another post. For now, I will skip it.
multer
multer
will grab the file and place it in req.file
. Don't expect to find the file in req.body
- that one will only contain the form field data that is text. 90% of the tutorials for multer out there explain how to save the incoming file in a directory. I could not care less about that because this will live on a server where I don't have write rights, so I want the file to live in memory.
const multer = require("multer");
const parseCsv = multer().single("whatever-name-you-gave-to-the-input-field-in-your-form");
module.exports = {parseCsv}
This is the middleware that will place the file in req.file
fast-csv and streamifier
req.file
will have a buffer
property, but it is not readable for node's createReadStream
. If you try fs.createReadStream(buffer)
you will most likely get an error saying that this is not a file, or something like that. While Node's createReadStream
accepts an instance of a Buffer
(and our buffer
it is an instance), that instance is not readable by createReadStream
. I learnt about it in in this SO answer. The solution I found? streamifier
, which I first learnt about here. If you look at its source code, it does some magic to convert the buffer in req.file
into a readable buffer that is passed into createReadStream
. I was glad to have found this library.
So, you create the stream like so
const { buffer } = req.file;
streamifier.createReadStream(buffer)
@fast-csv/parse
@fast-csv/parse
takes a stream with data from the csv and calls couple of events to parse the contents of the file. It calls .on('data', data => callback)
for every row, so you can do whatever you want with it. Once all the rows have been parsed, it calls .on('end', rowCount => callback)
. There is an event .on('error', callback)
which I suppose is related to their validation capabilities but I haven't tried it yet.
You can import fast-csv as csv
and then you call .pipe(csv.parse())
(see example below). Also, you can pass options to csv.parse()
, the ones I have used so far are headers: true
(skips header line from the csv file, see docs here) and ignoreEmpty: true
(ignores empty lines, see docs here)
My first iteration was to place the document creation at every row parsing. Mistake because of the async nature of saving data in a DB and the sync nature of parsing a CSV. I found myself with the 'end'
event being triggered before the first document was saved, and that screw up my strategy and my server responses.
I did a bit of research, I found a strategy that works well: add the parsed row (which comes back as an object) into an array in memory, and you call Mongoose's Model.create([ARRAY_OF_OBJECTS])
on the 'end'
event. You need to make that async and determine your server response to the client. Like so, it seems to work well for me:
const csv = require("@fast-csv/parse");
const streamifier = require("streamifier");
// somewhere below
router.post("/endpoint", [multerMiddlewareExplainedAbove], (req, res) => {
const { buffer } = req.file;
const dataFromRows = [];
streamifier
.createReadStream(buffer)
.pipe(csv.parse({ headers: true, ignoreEmpty: true })) // <== this is @fast-csv/parse!!
.on("data", (row) => {
dataFromRows .push(row);
})
.on("end", async (rowCount) => {
try {
const data = await MyModelName.create(dataFromRows );
res.status(200).json({ rowCount, data });
} catch (error) {
res.status(400).json({ error});
}
});
});
Hope it makes sense. I will be adding stuff as I discover stuff. Thanks for reading (: