Yarn Workspace Scripts Refactor - A Case Study

Matti Bar-Zeev - Nov 11 '22 - - Dev Community

It happens to all of us - you implement a solution only to realize later on that it’s not robust enough and probably could use a good refactor.
Same happened to me and I thought this is a good opportunity to share with you the reasons which made me refactor my code and how I practically did that.
A case study if you will.

The solution I’m talking about is a script I wrote a while ago to assist me in generating a unified unit test coverage report for all the packages under a single monorepo. You can read more about it in details here, but let me TL;DR how it works for you:

The script went over each package in the monorepo and executed yarn test –coverage –silent. After all the reports were generated, each in its own package location, the script copied the reports into a pre-created directory called .nyc_output at the project’s root. Once done, the script executed an [nyc](https://www.npmjs.com/package/nyc) CLI command to generate a report from them all.

And it worked well :)

“So what’s wrong with it?” you might ask. Well… it did not scale well. Actually, it did not work well at all.
It starts with the fact that the script was looking for the packages in a specific, hard-coded, location - “packages”. What if I have packages in another location as well?
Another issue with it was that the script currently does not search for all packages (including nested ones), and so it will ignore nested packages reports altogether.
On top of these, I also hate the fact that the script “knows” what command to run in order to generate the reports. What if we’re not using “yarn”? What if we’re not using Jest?

All these make the script very limited, and so it was time to boost it up a little, and luckily enough we have just the tools to make it happen.

Let’s start!


All the code can be found under my Pedalboard monorepo at Github.

Generating the coverage using native Yarn workspace API

I’m starting with inspecting the npm script which launches the coverage report aggregation:

"coverage:combined": "pedalboard-scripts aggregatePackagesCoverage && nyc report --reporter lcov"
Enter fullscreen mode Exit fullscreen mode

My first mission is to generate a coverage report in each package without needing the pedalboard-scripts. For that I will use a Yarn’s Workspace feature which runs a command on each managed package/workspace within a monorepo - foreach.
I will remove all the coverage directories from each package to make sure I’m not seeing old results, and change the script to the following:

"coverage:combined": "yarn workspaces foreach -pvA run test --coverage --silent"
Enter fullscreen mode Exit fullscreen mode

Just to remind what’s going on, params-wise: “p” is for running in parallel, “v” to have some verbose output and “A” is for running on all the workspaces.
When I run this script now, from the project’s root, a “coverage” directory is being created in each package. Awesome!

Wait… this can be simple

As you can see I dropped off the other part of the script above, which is the part where the unified report gets generated. At this point I thought that it was time to modify the pedalboard-scripts aggregatePackagesCoverage script, but hold on… do I really need that script now?
Let’s go over it step by step:

A part of what my old script did was to create the .nyc_output directory, but I don’t need the script for that, do I? I can create this directory with a simple command:

mkdir -p .nyc_output
Enter fullscreen mode Exit fullscreen mode

And so I add this command to follow the initial coverage generation:

"coverage:combined": "yarn workspaces foreach -pvA run test --coverage --silent && mkdir -p .nyc_output"
Enter fullscreen mode Exit fullscreen mode

Ok, now that we have this directory created, we need to collect all the coverage-final.json files from each package into it, and change its name so they won’t overwrite each other.

My first go at this was naive - I thought I could do that, again, with yarn workspace foreach, but I gave up when I realized that there is no easy way to extract the package name in each run (yo, Yarn people, that’s a good feature right there ;)) in order to rename each file when copied. I know there is probably a way, but looking at the length of the script at hand I got a little sick…

The collectFiles script

The solution I chose was to introduce another script to my scripts package, called “collectFiles” and what this script does is collect files according to a glob pattern and copy them to a target directory.
Here how the script looks like:

const yargs = require('yargs/yargs');
const glob = require('glob');
const fs = require('fs');
const path = require('path');

const GREEN = '\x1b[32m%s\x1b[0m';

async function collectFiles({pattern, target}) {
   if (!pattern || !target) throw new Error('Missing either pattern or target params');

   console.log(GREEN, `Collecting files... into ${target}`);

   glob(pattern, {}, (err, files) => {
       if (err) throw err;
       files.forEach((file, index) => {
           fs.copyFileSync(file, path.resolve(target, `${index}-${path.basename(file)}`));
       });
   });

   console.log(GREEN, `Done.`);
}

const args = yargs(process.argv.slice(2)).argv;

collectFiles(args);
Enter fullscreen mode Exit fullscreen mode

I’m using the “glob” package here to make things easier for me - it searches the pattern, and then returns a list of files on which I can traverse and copy to the desired destination. As you can see this script gets 2 arguments - pattern and target.

Since all these files have the same name I append the index as a prefix to the name just to make sure they do not overwrite each other in the target directory. The report generator does not mind.

Split for flexibility & readability

Nobody likes long script commands in their package.json, and I’m no different. I decided to split the big script into 3 new scripts:

coverage:all - this generates the reports for each workspace (package)
coverage:collect - this collects the coverage-final.json files into a single dir
coverage:combined - call the scripts above and generates the report in the end

"coverage:all": "yarn workspaces foreach -pvR run test --coverage --silent",
       "coverage:collect": "mkdir -p .nyc_output && pedalboard-scripts collectFiles --pattern='packages/**/coverage-final.json' --target='.nyc_output'",
       "coverage:combined": "yarn coverage:all && yarn coverage:collect && nyc report --reporter lcov"
Enter fullscreen mode Exit fullscreen mode

And… That’s it.

When I run the yarn coverage:combined script the reports get generated like they used to but now I don’t have to worry whether I forgot to include some nested workspace, and I have the power to change how the reports for each pack is generated with ease.

I hope you find this useful for you. As always if you have questions or other ideas how to make this better, please share them with the rest of us in the comments below :)

As mentioned, all the code can be found under my Pedalboard monorepo at Github.

Hey! If you liked what you've just read check out @mattibarzeev on Twitter 🍻

Photo by Raimond Klavins on Unsplash

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .