For hacktoberfest I'm gonna make a CLI for DEV.to... Let's make it together!
This is meant to be a follow along type tutorial... so follow along. But if you think you are too good to learn something cool, you can just skip to the end.
If I skip over something too quickly and you want more explanation, ask me in the comments!
Setup
Since I'm the one doing the driving, I get the pick the language. I'll be using MojiScript (of course).
"The only way to learn a new programming language is by writing programs in it." - Dennis Ritchie12:37 PM - 11 Oct 2018
git clone https://github.com/joelnet/mojiscript-starter-app.git devto-cli
cd devto-cli
npm ci
There isn't an API for DEV.to. And what happens to all sites that don't have an API? They get scraped!
# install axios
npm install --save-prod axios
Add the axios dependency to index.mjs
import log from 'mojiscript/console/log'
import run from 'mojiscript/core/run'
import axios from 'mojiscript/net/axios'
import main from './main'
const dependencies = {
axios,
log
}
run ({ dependencies, main })
Create src/api.mjs
Create a new file src/api.mjs
to contain our scraping API. We are using mojiscript/net/axios
, which is a curried version of axios
.
import pipe from 'mojiscript/core/pipe'
const getData = response => response.data
export const getUrl = axios => pipe ([
url => axios.get (url) ({}),
getData
])
export const getDevToHtml = axios => pipe ([
() => getUrl (axios) ('https://dev.to')
])
Import getDevToHtml
into main.mjs
import pipe from 'mojiscript/core/pipe'
import { getDevToHtml } from './api'
const main = ({ axios, log }) => pipe ([
getDevToHtml (axios),
log
])
export default main
Now run the code:
npm start
If everything is successful, you should see a bunch of HTML flood the console.
JavaScript interop
Now I don't want to slam DEV.to with HTTP calls every time I debug my code, so let's cache that output to a file.
# this will get you the same version in this tutorial
curl -Lo devto.html https://raw.githubusercontent.com/joelnet/devto-cli/master/devto.html
Next I'm gonna create a file interop/fs.mjs
, which is where fs.readFile
will be. I place this in an interop
folder because this is where MojiScript requires JavaScript interop files to be placed. JavaScript is written differently than MojiScript and is sometimes incompatible (unless inside the interop directory).
To make fs.readFile
compatible with MojiScript, I need to first promisify
it.
promisify (fs.readFile)
Now that it's promisified, I also need to curry it.
export const readFile = curry (2) (promisify (fs.readFile))
I'm also dealing with UTF8, so let's add a helper to make life easier.
export const readUtf8File = file => readFile (file) ('utf8')
And the full interop/fs.mjs
:
import fs from 'fs'
import curry from 'mojiscript/function/curry'
import { promisify } from 'util'
export const readFile = curry (2) (promisify (fs.readFile))
export const readUtf8File = file => readFile (file) ('utf8')
Read the cache
Inside of src/mocks/axios.mock.mjs
, I'm going to create mockAxios
. That will return the contents of our file when get
is called.
import pipe from 'mojiscript/core/pipe'
import { readUtf8File } from '../interop/fs'
const mockAxios = {
get: () => pipe ([
() => readUtf8File ('devto.html'),
data => ({ data })
])
}
export default mockAxios
Using the mock is easy. All I have to do is change the dependencies
. Nothing in main.mjs
needs to change!
// don't forget to add the import!
import mockAxios from './mocks/axios.mock'
const dependencies = {
axios: mockAxios,
log
}
Now when we run npm start
no HTTP requests are being made. This is good because I am probably gonna run npm start
a whole bunch before I complete this thing!
Parsing the HTML
I like cheerio
for parsing. I'm pretty sure this is what the cool kids are using.
npm install --save-prod cheerio
create another interop interop/cheerio.mjs
.
import cheerio from 'cheerio';
import pipe from 'mojiscript/core/pipe';
import map from 'mojiscript/list/map';
export const getElements = selector => pipe ([
cheerio.load,
$ => $ (selector),
$articles => $articles.toArray (),
map (cheerio)
])
note: When cheerio's toArray
is called, the elements lose all those nice cheerio methods. So we have to map
cheerio
back onto all the elements.
Next add getElements
to main
.
import { getElements } from './interop/cheerio'
const main = ({ axios, log }) => pipe ([
getDevToHtml (axios),
getElements ('.single-article:not(.feed-cta)'),
log
])
Run npm start
again to see the Array of elements.
npm install --save-prod reselect nothis
Create interop/parser.mjs
. I'm gonna use reselect
to select the attributes I need from the HTML. I'm not really gonna go into detail about this. It's basically just doing a whole bunch of gets from an element. The code is easy to read, you can also skip it, it's not important.
import reselect from 'reselect'
import nothis from 'nothis'
const { createSelector } = reselect
const isTextNode = nothis(({ nodeType }) => nodeType === 3)
const parseUrl = element => `http://dev.to${element.find('a.index-article-link').attr('href')}`
const parseTitle = element => element.find('h3').contents().filter(isTextNode).text().trim()
const parseUserName = element => element.find('.featured-user-name,h4').text().trim().split('γ»')[0]
const parseTags = element => element.find('.featured-tags a,.tags a').text().substr(1).split('#')
const parseComments = element => element.find('.comments-count .engagement-count-number').text().trim() || '0'
const parseReactions = element => element.find('.reactions-count .engagement-count-number').text().trim() || '0'
export const parseElement = createSelector(
parseUrl,
parseTitle,
parseUserName,
parseTags,
parseComments,
parseReactions,
(url, title, username, tags, comments, reactions) => ({
url,
title,
username,
tags,
comments,
reactions
})
)
Add parseElement
to main
.
import map from 'mojiscript/list/map'
import { parseElement } from './interop/parser'
const main = ({ axios, log }) => pipe ([
getDevToHtml (axios),
getElements ('.single-article:not(.feed-cta)'),
map (parseElement),
log,
])
Now when you run npm start
you should see something like this:
[
{ url:
'http://dev.to/ccleary00/how-to-find-the-best-open-source-nodejs-projects-to-study-for-leveling-up-your-skills-1c28',
title:
'How to find the best open source Node.js projects to study for leveling up your skills',
username: 'Corey Cleary',
tags: [ 'node', 'javascript', 'hacktoberfest' ],
comments: '0',
reactions: '33' } ]
Format the data
Add the import
, formatPost
and add formatPost
to main
and change log
to map (log)
.
import $ from 'mojiscript/string/template'
const formatPost = $`${'title'}
${'url'}\n#${'tags'}
${'username'} γ» π ${'comments'} π¬ ${'reactions'}
`
const main = ({ axios, log }) => pipe ([
getDevToHtml (axios),
getElements ('.single-article:not(.feed-cta)'),
map (parseElement),
map (formatPost),
map (log)
])
Run npm start
again and you should see a handful of records that look like this:
The Introvert's Guide to Professional Development
http://dev.to/geekgalgroks/the-introverts-guide-to-professional-development-3408
#introvert,tips,development,professional
Jenn γ» π 1 π¬ 50
Finally, this is starting to look like something!
I am also going to add a conditional in main.mjs
to use axios
only when production
is set in the NODE_ENV
.
import ifElse from 'mojiscript/logic/ifElse'
const isProd = env => env === 'production'
const getAxios = () => axios
const getMockAxios = () => mockAxios
const dependencies = {
axios: ifElse (isProd) (getAxios) (getMockAxios) (process.env.NODE_ENV),
log
}
Run it with and without production
to make sure both are working.
# dev mode
npm start
# production mode
NODE_ENV=production npm start
Viewing the Article
The list is nice and I was planning on stopping the walk through here, but it would be super cool if I could also read the article.
I would like to be able to type something like:
devto read 3408
I notice the url's have an ID on the end that I can use: http://dev.to/geekgalgroks/the-introverts-guide-to-professional-development-3408
<-- right there.
So I'll modify parser.mjs
to include a new parser to get that id.
const parseId = createSelector(
parseUrl,
url => url.match(/-(\w+)$/, 'i')[1]
)
Then just follow the pattern and parseId
into parseElement
.
Now the CLI is going to have two branches, one that will display the feed, the other that will show the article. So let's break out our feed logic from main.mjs
and into src/showFeed.mjs
.
import pipe from 'mojiscript/core/pipe'
import map from 'mojiscript/list/map'
import $ from 'mojiscript/string/template'
import { getDevToHtml } from './api'
import { getElements } from './interop/cheerio'
import { parseElement } from './interop/parser'
const formatPost = $`${'title'}
${'url'}\n#${'tags'}
${'username'} γ» π ${'comments'} π¬ ${'reactions'}
`
export const shouldShowFeed = args => args.length < 1
export const showFeed = ({ axios, log }) => pipe ([
getDevToHtml (axios),
getElements ('.single-article:not(.feed-cta)'),
map (parseElement),
map (formatPost),
map (log)
])
Next, I'm gonna wrap cond
around showFeed
. It's possible we will have many more branches (maybe help?) in the CLI, but for right now we just have the 1 path.
This is what main.mjs
should look like now.
import pipe from 'mojiscript/core/pipe'
import cond from 'mojiscript/logic/cond'
import { showFeed } from './showFeed'
const main = dependencies => pipe ([
cond ([
[ () => true, showFeed (dependencies) ]
])
])
export default main
We will need access to node's args. So make these changes main.mjs
. I am doing a slice
on them because the first 2 args are junk args and I don't need them.
// add this line
const state = process.argv.slice (2)
// add state to run
run ({ dependencies, state, main })
Okay we have a lot of work to do before we can actually view the article. So let's add the help. That's something easy.
View the Help
Create src/showHelp.mjs
.
import pipe from 'mojiscript/core/pipe'
const helpText = `usage: devto [<command>] [<args>]
<default>
Show article feed
read <id> Read an article
`
export const showHelp = ({ log }) => pipe ([
() => log (helpText)
])
Now we can simplify main.mjs
and add the new case to cond
.
import pipe from 'mojiscript/core/pipe'
import cond from 'mojiscript/logic/cond'
import { shouldShowFeed, showFeed } from './showFeed'
import { showHelp } from './showHelp'
const main = dependencies => pipe ([
cond ([
[ shouldShowFeed, showFeed (dependencies) ],
[ () => true, showHelp (dependencies) ]
])
])
export default main
Now if we run npm start -- help
, we should see our help:
usage: devto [<command>] [<args>]
<default> Show article feed
read <id> Read an article
And if we run npm start
we should still see our feed!
Article from Cache
The same as I read main feed from cache, I also want to read the article from cache.
curl -Lo article.html https://raw.githubusercontent.com/joelnet/devto-cli/master/article.html
Modify axios.mock.mjs
to read the article too.
import pipe from 'mojiscript/core/pipe'
import ifElse from 'mojiscript/logic/ifElse'
import { readUtf8File } from '../interop/fs'
const feedOrArticle = ifElse (url => url === 'https://dev.to') (() => 'devto.html') (() => 'article.html')
const mockAxios = {
get: url => pipe ([
() => feedOrArticle (url),
readUtf8File,
data => ({ data })
])
}
export default mockAxios
Parsing the Article
Parsing the article HTML is much easier because I'm planning on just formatting the whole article-body
block as text. So I just need the title and body.
Create interop/articleParser.mjs
.
import reselect from 'reselect'
const { createSelector } = reselect
const parseTitle = $ => $('h1').first().text().trim()
const parseBody = $ => $('#article-body').html()
export const parseArticle = createSelector(
parseTitle,
parseBody,
(title, body) => ({
title,
body
})
)
Read the Article
Because there is no state, the CLI will not know what URL to pull when I issue the read
command. Because I am lazy, I'll just query the feed again. And pull the URL from the feed.
So I'm gonna hop back into showFeed.mjs
and expose that functionality.
I'm just extracting the functions from showFeed
and putting them into getArticles
. I haven't added any new code here.
export const getArticles = axios => pipe ([
getDevToHtml (axios),
getElements ('.single-article:not(.feed-cta)'),
map (parseElement)
])
export const showFeed = ({ axios, log }) => pipe ([
getArticles (axios),
map (formatPost),
map (log)
])
Show the Article
Now I want to write a function like the one below, but we'll get an error id
is not defined. The id
is the argument to the pipe
, but it's not accessible here. The input to filter
is the Array of articles, not the id
.
const getArticle = ({ axios }) => pipe ([
getArticles (axios),
filter (article => article.id === id), // 'id' is not defined
articles => articles[0]
])
But there's a trick. Using the W Combinator I can create a closure, so that id
is exposed.
const getArticle = ({ axios }) => W (id => pipe ([
getArticles (axios),
filter (article => article.id === id),
articles => articles[0]
]))
Compare that block with the one above it, not much different just add W (id =>
and a closing )
. The W Combinator is an awesome tool. More on Function Combinators in a future article :) For now, let's move on.
All together src/showArticle.mjs
should look like this:
import W from 'mojiscript/combinators/W'
import pipe from 'mojiscript/core/pipe'
import filter from 'mojiscript/list/filter'
import { getArticles } from './showFeed'
export const shouldShowArticle = args => args.length === 2 && args[0] === 'read'
const getArticle = ({ axios }) => W (id => pipe ([
getArticles (axios),
filter (article => article.id === id),
articles => articles[0]
]))
export const showArticle = ({ axios, log }) => pipe ([
getArticle ({ axios }),
log
])
Modify main.mjs
's cond
to include the new functions:
import { shouldShowArticle, showArticle } from './showArticle'
const main = dependencies => pipe ([
cond ([
[ shouldShowArticle, args => showArticle (dependencies) (args[1]) ],
[ shouldShowFeed, showFeed (dependencies) ],
[ () => true, showHelp (dependencies) ]
])
])
Run npm run start -- 1i0a
(replace id) and you should see something like this:
{ id: '1i0a',
url:
'http://dev.to/ppshobi/-email-sending-in-django-2-part--1--1i0a',
title: 'Email Sending in Django 2, Part -1',
username: 'Shobi',
tags: [ 'django', 'emails', 'consoleemailbackend' ],
comments: '0',
reactions: '13' }
HTML to Text
I found a great npm packge that look like it'll handle this for me.
npm install --save-prod html-to-text
We have already laid out most of our foundation, so to make an HTTP request, parse the HTML and format it into text, it's as simple as this. Open up showArticle.mjs
.
const getArticleTextFromUrl = axios => pipe ([
({ url }) => getUrl (axios) (url),
cheerio.load,
parseArticle,
article => `${article.title}\n\n${htmlToText.fromString (article.body)}`
])
I also want to create a view for when the id
is not found.
const showArticleNotFound = $`Article ${0} not found.\n`
I'll also create an isArticleFound
condition to make the code more readable.
const isArticleFound = article => article != null
I'll use the same W Combinator technique to create a closure and expose id
and modify showArticle
.
export const showArticle = ({ axios, log }) => W (id => pipe ([
getArticle ({ axios }),
ifElse (isArticleFound) (getArticleTextFromUrl (axios)) (() => showArticleNotFound (id)),
log
]))
All together showArticle.mjs
looks like this:
import cheerio from 'cheerio'
import htmlToText from 'html-to-text'
import W from 'mojiscript/combinators/W'
import pipe from 'mojiscript/core/pipe'
import filter from 'mojiscript/list/filter'
import ifElse from 'mojiscript/logic/ifElse'
import $ from 'mojiscript/string/template'
import { getUrl } from './api'
import { parseArticle } from './interop/articleParser'
import { getArticles } from './showFeed'
const isArticleFound = article => article != null
const showArticleNotFound = $`Article ${0} not found.\n`
const getArticleTextFromUrl = axios => pipe ([
({ url }) => getUrl (axios) (url),
cheerio.load,
parseArticle,
article => `${article.title}\n\n${htmlToText.fromString (article.body)}`
])
export const shouldShowArticle = args => args.length === 2 && args[0] === 'read'
const getArticle = ({ axios }) => W (id => pipe ([
getArticles (axios),
filter (article => article.id === id),
articles => articles[0]
]))
export const showArticle = ({ axios, log }) => W (id => pipe ([
getArticle ({ axios }),
ifElse (isArticleFound) (getArticleTextFromUrl (axios)) (() => showArticleNotFound (id)),
log
]))
Run npm start -- read 1i0a
again and you should see the article!
Finishing Touches
I'd like to make the id
more clear in the feed.
const formatPost = $`${'id'}γ»${'title'}
${'url'}\n#${'tags'}
${'username'} γ» π ${'comments'} π¬ ${'reactions'}
`
Add this to the package.json
, I'm gonna name the command devto
.
"bin": {
"devto": "./src/index.mjs"
}
In src/index.mjs
, add this mystical sorcery at the top:
#!/bin/sh
':' //# comment; exec /usr/bin/env NODE_ENV=production node --experimental-modules --no-warnings "$0" "$@"
Run this command to create a global link to that command.
npm link
If everything went well, you should now be able to run the following commands:
# get the feed
devto
# read the article
devto read <id>
So you decided to skip to the end?
You can lead the horse to water... or something.
To catch up with the rest of us follow these steps:
# clone the repo
git clone https://github.com/joelnet/devto-cli
cd devto-cli
# install
npm ci
npm run build
npm link
# run
devto
Warnings about the CLI
Scraping websites is a bad idea. When the website changes, which is guaranteed to happen, your code breaks.
This is meant to just be a fun demo for #hacktoberfest and not a maintainable project. If you find a bug, please submit a pull request to fix it along with the bug report. I'm not maintaining this project.
If this was a real project, some things that would be cool:
- login, so you can read your feed.
- more interactions, comments, likes, tags. Maybe post an article?
Happy Hacktoberfest!
For those of you that read through the whole thing, thank you for your time. I know this was long. I hope that it was interesting, I hope you learned something and above all, I hope you had fun.
For those of you that actually followed along step by step and created the CLI yourself: You complete me π.
Please tell me in the comments or twitter what you learned, what you found interesting or any other comments, or criticisms you may have.
My articles are very Functional JavaScript heavy, if you need more, follow me here, or on Twitter @joelnet!
More articles
Ask me dumb questions about functional programming
Let's talk about auto-generated documentation tools for JavaScript