Build an Interactive Bar Chart of Taylor Swift lyrics with D3.js and Observable

Lizzie Siegle - Aug 26 '20 - - Dev Community

This blog post was written for Twilio and originally published on the Twilio blog.

Data visualizations are handy ways to examine and think about data. Observable is a Jupyter Notebook-like tool that makes it easy to quickly run JavaScript code in cells so you can see what you're doing in real-time.

This post will go over how to make an interactive bar chart showing Taylor Swift's most-used words from her lyrics with Observable using D3.js. In the meantime you can view the completed notebook and visualization here, and you can fork and edit it yourself.
gif of chart

Brief Intro to Observable

You can think of each different cell as a function. Cells come in two primary forms:

  1. Expressions. Expression cells are the most concise and are meant for simple definitions and in Observable outside of a closure, you don’t need a var/const/let keyword.
    expression example

  2. Blocks. Block cells are encompassed by curly braces and include more complex code that might contain local variables and loops.
    simple block example

Because local variables like arr above can not be referenced by other cells, many Observable notebooks put different definitions and functions in their own cells. That is what this post will do as well--all the code snippets should be in their own cells, and after adding the code to a cell you should run it by typing shift-return.

For a more detailed introduction to Observable, check out this Notebook.

Setup

Download this dataset of Taylor Swift lyrics and then make an Observable account if you do not have one already. Once you have an account, make a new notebook by clicking the New button in the top-right corner.
new button
To get started, hover your mouse near the left of a cell. You should see a plus sign like this:
hover to see plus sign

Import the dataset from your machine by clicking the plus sign beneath the existing stock markdown cell, clicking into an Observable cell, and then clicking shift-command-u on Mac. Then select the file you wish to import (don't forget to unzip it!) In the cell you selected, you should then see something like:

FileAttachment("tswiftlyrics.csv")
Enter fullscreen mode Exit fullscreen mode

Your file name can be different. You can run the cell by clicking the right-facing triangle on the right-end of the Run cell button
run cell button
or by typing shift-return, both of which would return the following:
file attachment
To see the actual data from the CSV, append .text() to the code and run it to see the data above like so:

FileAttachment("tswiftlyrics.csv").text()
Enter fullscreen mode Exit fullscreen mode

complete data from Kaggle in the Notebook with text()
You can also see that a file was imported in that cell because there is that file symbol on the right. We see the data includes the artist for each song (Taylor Swift), the album name, the track title, track number on the album, the lyric, the line the lyric is on, and the year the song came out.

Now click the plus sign on the left of the cell to insert a new cell which will hold a comment. We can do that with markdown:

md`#### Require d3`
Enter fullscreen mode Exit fullscreen mode

Insert a new cell and add the following to require D3.js.

d3 = {
  const d3 = require("d3-dsv@1", "d3@5","d3-scale@3","d3-scale-chromatic@1", "d3-shape@1", "d3-array@2")
  return d3
}
Enter fullscreen mode Exit fullscreen mode

In Observable notebooks you cannot require any npm package: you can only use tools that expose their modules via UMD or AMD. Usually if you can include the module from unpkg.com via CDN in a webpage, you can use it in Observable.

Now we loop through the CSV file, calling csvParse to parse the input string (the contents of our CSV file). This returns an array of objects according to the parsed rows.

data = {
  const text = await FileAttachment(<your-imported-taylor-swift-file-name.csv>).text();
  return d3.csvParse(text, ({lyric}) => ({
    lyric: lyric
  }));
}
Enter fullscreen mode Exit fullscreen mode

If you run and expand that cell you can see this input that just contains the lyrics from the CSV file:
just the lyrics
In a new cell make an empty array to add the words from the lyrics to:

lyrics = []
Enter fullscreen mode Exit fullscreen mode

In a new cell add the following to loop through our data object to add each lyric to the lyrics array.

data.forEach(lyric => lyrics.push(lyric.lyric));
Enter fullscreen mode Exit fullscreen mode

You can see the modified lyrics object in a new cell:
lyrics object

Clean up the Lyrics

Observable does not let us reassign variables because "Named cells are declarations, not assignments." If you were to try to reset or reassign the lyrics variable you would get this error because cell names must be unique:
unique error in Observable
To analyze the most-used words from Taylor's lyrics, in a new cell let's convert the array to a string and use regex to remove non-string characters.

newLyrics = lyrics.join(' ').replace(/[.,\/#!""'$%\?^&\*;:{}=\-_`~()0-9]/g,"").toLowerCase()
Enter fullscreen mode Exit fullscreen mode

After we clean up the lyrics, let's remove stopwords from the array of lyrics. Most of these words were taken from a list of NLTK stop words and do not really say much: they're sort-of "scaffolding-y." In a new cell add

stopwords = ['i','me','my','myself','we','our','ours','ourselves','you','your','yours','yourself','yourselves','he','him','his','himself','she','her','hers','herself','it','its','itself','they','them','their','theirs','themselves','what','which','who','whom','this','that','these','those','am','is','are','was','were','be','been','being','have','has','had','having','do','does','did','doing','a','an','the','and','but','if','or','because','as','until','while','of','at','by','for','with','about','against','between','into','through','during','before','after','above','below','to','from','up','down','in','out','on','off','over','under','again','further','then','once','here','there','when','where','why','how','all','any','both','each','few','more','most','other','some','such','no','nor','not','only','own','same','so','than','too','very','s','t','can','will','just','don','should','now', 'im', 'ill', 'let', 'said', 'thats', 'oh', 'say', 'see', 'yeah', 'youre', 'ey', 'cant', 'dont', 'cause']
Enter fullscreen mode Exit fullscreen mode

To remove these stopwords from the lyrics add this function to a new cell.

remove_stopwords = function(str) {
    var res = []
    var words = str.split(' ')
    for(let i=0;i<words.length;i++) {
       var word_clean = words[i].split(".").join("")
       if(!stopwords.includes(word_clean)) {
           res.push(word_clean)
       }
    }
    return(res.join(' '))
}  
Enter fullscreen mode Exit fullscreen mode

Now we make a new variable in a new cell calling the remove_stopwords function.

lyrics_no_stopwords = remove_stopwords(newLyrics)
Enter fullscreen mode Exit fullscreen mode

Get String Frequency for each Lyric

To get the number of occurrences for each word in the lyrics, add this code to a new cell using [reduce](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Array/reduce).

strFrequency = function (stringArr) { //es6 way of getting frequencies of words
  return stringArr.reduce((count, word) => {
        count[word] = (count[word] || 0) + 1;
        return count;
  }, {})
}
Enter fullscreen mode Exit fullscreen mode

Then we call that strFrequency function and assign the output to a new variable obj.

obj = strFrequency(lyrics_no_stopwords.split(' '))
Enter fullscreen mode Exit fullscreen mode

If you run the cell you would see something like this:
object

Sort our Word Frequencies

Because this is a JavaScript object we can't just call sort(). To sort our frequencies add this function to a new cell to sort our object from greatest to least.

sortedObj = Object.fromEntries(
  Object.entries(obj).sort( (a,b) => a[1] - b[1] )    
) 
Enter fullscreen mode Exit fullscreen mode

Running the cell would show the following output:
sortedObj output
Make a new function in a new cell to only return the first x-number (in this case, 30) of items of the object, editing the object to also have lyric and freq in front of each value so the values are easy to access.

final = Object.entries(sortedObj).map(([lyric, freq]) => ({lyric, freq})).slice(0,30);
Enter fullscreen mode Exit fullscreen mode

Running the cell you can see that final is an array, slightly different from sortedObj above.
final array of objects

Make our Chart

We need to set some attributes of our chart. In a new cell add

margin = ({top: 20, right: 0, bottom: 30, left: 40})
Enter fullscreen mode Exit fullscreen mode

followed by another new cell with

height = 500
Enter fullscreen mode Exit fullscreen mode

Now we create our x-values in a new cell with d3.scaleBand() breaking up our domain of each Taylor Swift lyric from the final object into a range of values, which are the minimum and maximum extents of the band.

x = d3.scaleBand()
    .domain(final.map(d => d.lyric))
    .rangeRound([margin.left, width - margin.right])
    .padding(0.1)
Enter fullscreen mode Exit fullscreen mode

Our y-values are made in a similar manner in a new cell:

y = d3.scaleLinear()
    .domain([0, d3.max(final, d => d.freq)])
    .range([height - margin.bottom, margin.top])
Enter fullscreen mode Exit fullscreen mode

To style and display our axes, we must define them as functions translating them into the appropriate location according to the set orientation. In two separate cells include the following:

xAxis = g => g
    .attr("transform", `translate(0,${height - margin.bottom})`)
    .call(d3.axisBottom(x).tickSizeOuter(0))
Enter fullscreen mode Exit fullscreen mode
yAxis = g => g
    .call(d3.axisLeft(y).ticks(15))
    .call(g => g.select(".domain").remove())
Enter fullscreen mode Exit fullscreen mode

Now to add a title to the y-axis add the following code to a new cell.

yTitle = g => g.append("text")
    .attr("font-family", "sans-serif")
    .attr("font-size", 10)
    .attr("y", 10)
    .text("Frequency")
Enter fullscreen mode Exit fullscreen mode

Now we call these by making our chart in a new cell. We create an SVG object, using the viewBox attribute to set the position and dimension. Then we append a g element (which is not unique to D3.js, as it is used to group SVG shapes together) creating rectangles from our lyric data and setting the lyric as the x-value for each rectangle and the frequency of the lyric as the y-value for each rectangle. We also set some style attributes and then call our xAxis, yAxis, and yTitle.

{
  const svg = d3.create("svg")
      .attr("viewBox", [0, 0, width, height]);

  svg.append("g")
  .selectAll("rect")
  .data(final)
  .enter().append("rect")
    .attr('x', d => x(d.lyric))
    .attr('y', d => y(d.freq))
    .attr('width', x.bandwidth())
    .attr('height', d => y(0) - y(d.freq))
    .style("padding", "3px")
    .style("margin", "1px")
    .style("width", d => `${d * 10}px`)
    .text(d => d)
    .attr("fill", "#CEBEDE")
    .attr("stroke", "#FFB9EC")
    .attr("stroke-width", 1)

  svg.append("g")
      .call(xAxis);
  svg.append("g")
      .call(yAxis);
  svg.call(yTitle);

  svg.call(yTitle);

  return svg.node();
Enter fullscreen mode Exit fullscreen mode

Running that cell should output this chart. Tada!
static chart

Add Interactivity to the Bar Chart

Beneath the yAxis cell, add a new cell to contain a tooltip, which is displayed when a user hovers their cursor over a rectangle. We set different style elements to be hex colors related to Taylor Swift albums and other CSS-like properties.

tooltip = d3.select("body")
      .append("div")
      .style("position", "absolute")
      .style("font-family", "'Open Sans', sans-serif")
      .style("font-size", "15px")
      .style("z-index", "10")
      .style("background-color", "#A7CDFA")
      .style("color", "#B380BA")
      .style("border", "solid")
      .style("border-color", "#A89ED6")
      .style("padding", "5px")
      .style("border-radius", "2px")
      .style("visibility", "hidden"); 
Enter fullscreen mode Exit fullscreen mode

Now edit the chart cell before by adding the following tooltip code. On a mouseover event the tooltip is displayed and shows the word with how frequently the word appears in Taylor Swift songs. When the mouse moves while hovering over a rectangle in the bar chart, so does the tooltip and its text.

{
  const svg = d3.create("svg")
      .attr("viewBox", [0, 0, width, height]);

  // Call tooltip
  tooltip;

  svg.append("g")
  .selectAll("rect")
  .data(final)
  .enter().append("rect")
    .attr('x', d => x(d.lyric))
    .attr('y', d => y(d.freq))
    .attr('width', x.bandwidth())
    .attr('height', d => y(0) - y(d.freq))
    .style("padding", "3px")
    .style("margin", "1px")
    .style("width", d => `${d * 10}px`)
    .text(d => d)
    .attr("fill", "#CEBEDE")
    .attr("stroke", "#FFB9EC")
    .attr("stroke-width", 1)
  .on("mouseover", function(d) {
      tooltip.style("visibility", "visible").text(d.lyric + ": " + d.freq);
      d3.select(this).attr("fill", "#FDE5BD");
    })
    .on("mousemove", d => tooltip.style("top", (d3.event.pageY-10)+"px").style("left",(d3.event.pageX+10)+"px").text(d.lyric + ": " + d.freq))
    .on("mouseout", function(d) {
      tooltip.style("visibility", "hidden");
      d3.select(this)
    .attr("fill", "#CEBEDE")
    });

  svg.append("g")
      .call(xAxis);
  svg.append("g")
      .call(yAxis);

  svg.call(yTitle);

  return svg.node();
}
Enter fullscreen mode Exit fullscreen mode

You should see:
interactive bar chart gif with hovering mouse
Tada! Now if you hover over a bar, you can see the exact value. If you want to see the complete code you can play around with the published Observable notebook here.

What's next for data visualizations?

You don't need to use Observable notebooks to make data visualizations in JavaScript-- you can use D3.js and other data visualization libraries in your preferred text editor too, and then display them in a webpage. However, Observable is a handy tool that lets you view code output quickly and can help make building and sharing demos easier. You can use other datasets as well such as different datasets here on Kaggle and be sure to ask yourself these 5 questions before working with a dataset. Let me know online what you're building!

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .