Dev Diary Week 3 - Parsing OPML Files

Angelo Stavrow - Apr 30 '20 - - Dev Community

This post is an entry in a weekly development diary on building a feed-aggregator-based blog on Glitch. Only a small part of building an app is code; the bulk of your time is spent planning, experimenting, making mistakes, getting frustrated, and making progress in baby steps.

The plan

Last time, I worked on an API endpoint for parsing RSS feeds. It lets you pass in an RSS feed URL, and returns some JSON feed metadata and an array of feed entries. That's a good start! However, I want to combine entries from multiple feeds into one "content firehose" feed, so this week we'll look at how to do this.

Parsing an OPML file

Now, I could hard-code an array for feed URLs (maybe as a JSON file), but there's an open standard called OPML that I can use instead. This is an "outline" file, and essentially each entry for a feed URL looks something like this:

<outline
  text="What's displayed (usually title)"
  title="Title of feed"
  description=""
  type="rss"
  version="RSS"
  htmlUrl="https://link/to/site/"
  xmlUrl="https://link/to/feed/"
/>
Enter fullscreen mode Exit fullscreen mode

The advantage to using an OPML file is that they're portable. You can export your susbcriptions to an OPML file to use with this app, and anyone can get your OPML file and import it into their feed reader of choice. Here's mine!

So, this week, I'm planning on doing the following:

  1. Create a default OPML file for this app.
  2. Parse the OPML file to get each individual feed URL.
  3. Pass each feed to the feed parsing endpoint.
  4. Collect all feed entries as JSON.

And, just as we used the feedparser npm package last week, we'll use the opmlparser package (by the same author) to handle the parsing work this week.

How did it go?

I managed to complete all of this work in about two hours between meetings, thanks (again) to great example code in the opmlparser GitHub repository. This serves as a great reminder that writing great docs is just as important as writing solid code for your library/framework/app!

I worked off a remix of my existing work and added the opmlparser package to package.json, and then added the opmlparser.js file to build out the functionality I needed. I also added a subscriptions.opml file to the public directory that included two feeds: the Glitch team's posts on Dev.to, and my own posts on Dev.to. This will give us an interesting de-duplication problem to solve for next week, since both feeds contain my posts.

I then added a /opml route to server.js which does the following:

  1. Gets the feeds in the subscriptions.opml file;
  2. Sends them to the /api/parse/:feed endpoint;
  3. Returns the feed content from each feed in the OPML file to the caller.

This will probably change over time — for now, I wanted a way to see that the OPML file was being parsed correctly.

What went well?

Integrating the opmlparser library was really straightforward. Since it was built by the same author, the code in opmlparser.js is pretty much the same as in feedparser.js.

What did I have trouble with?

I got a bit lost in async spaghetti while trying to send each feed in the OPML file to the /api/parse/:feed endpoint. 😂 There's lots of room for improvement here, but because I expect to rewrite this elsewhere, I didn't want to spend too much time on it.

What did I learn?

I learned a lot about OPML files! I'm used to working with JSON data, and looking at XML sometimes takes a while to figure out. At the end of the day, though, OPML is a variation on an outline document. Thinking about it this way helped me reason through things.

What will I work on next week?

Next week, I'm going to start working on getting feeds into a database!

What's in your OPML file?

OPML files were created to be shared! They're an open standard that most any feed reader will import or export. Here's what's in mine — share yours in the comments!

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .