Intro to HTTP (for data folks & managers, especially!)

Katie - Jun 3 '19 - - Dev Community

HTTP is easy to learn in five minutes, and familiarity is essential if you'd like to use "APIs" to integrate data between systems over the internet.

This post will give you a beginner-level foundation in HTTP.

You'll need to dive much deeper for certain jobs (e.g. learning that all the cool kids say "URI," not "URL," and why), but this should be just the right level of detail if you're a business analyst, data integrator, or manager.


Definition

HTTP(s) stands for "HyperText Transfer Protocol."
(I add the "(s)" to remind you to use good Security practices.)

Protocol is the key word: immediately we know that we're talking about a standard for how to behave when interacting.

In this case, it's a standard about how two computers should speak to each other when talking to each other over "the web."


HTTP(s) Is Lopsided

The most important thing to know about this protocol is that although communications can flow both ways between two computers, every interaction is initiated from just one of the computers.

This computer is called the "web client," and messages it sends using the HTTP(s) protocol are called "requests."

The other computer is called a "web server." When it replies to an HTTP(s) request, the message it sends using the HTTP(s) protocol is called a "response."

Like communicating through "APIs", this is a "knocking at the door of a fortified castle" style of communication. The web server is the computer that's inside the castle walls.


You're The Client

Although it's fun to learn to build a web server and program it to issue HTTP(s) responses when it receives HTTP(s) requests, this tutorial isn't about that. This tutorial is about using your computer as a web client.

That means we'll focus on sending requests and receiving responses.

Like all computer-to-computer communications, the actual message is a bunch of 0's and 1's. Luckily, you don't need to write those!

For us humans, writing a good HTTP(s) "request" means specifying simple details like an "address," a "header," etc.

Actually sending the request over the internet can be delegated to specialized software like:

  • Your web browser (Firefox, Safari, Chrome, etc.)
  • Specialized point-and-click desktop software like Postman
  • Specialized command-line desktop software like Curl
  • A programming language and its libraries like Requests for Python or like HTTPRequest in Salesforce Apex.

Requests

So what makes an HTTP(s) request "within proper protocol?"

Glad you asked!

1 - URL

A request must specify, in plain text, an address indicating the castle whose door it'd like to knock on. Examples:

  • https://www.google.com
  • https://google.com/webhp

Note: When discussing "APIs" hosted on a web server, you'll often hear the URL referred to as a given "endpoint" of the API and, for any part of the URL that comes after a question mark (?), as "parameters" being "passed" to the API.

That's not really something to know about HTTP(s) per se, but you're going to hear people talk about it almost as if it were.

A good rule of thumb: when you hear "endpoint" or "parameter," think "part of the URL."

2 - Method

A request must, with a single plain-text keyword, briefly state what its purpose is in knocking at the web server's castle door.

I recommend reading more about allowed keywords, but the most common "methods" by far are the keywords "GET" and "POST."

3 - Header

A request is allowed to, in well-structured plain text, provide some summary details about itself like:

  • What kind of web client it's representing (user-agent)
  • What "formatting standard" it will be using if it includes anything in the request "body" (content-type)

Example 1: content-type: application/json; means:

"The body's contents are plaintext -- more specifically, formatted using the JSON standard."

Example 2: content-type: image/gif means:

"The body's contents are a GIF image."

4 - Body

After the URL, method, and header, a request is allowed to say just about anything else it would like to say. This is considered the "body" of an HTTP(s) request.

Obviously, it would behoove a web client to put useful, well-structured text or binary data here if it expects the web server to understand the contents of the body.

It also helps facilitate communication if useful information like "content-type" is provided in the request header to ensure that the server understands how to interpret the request body's contents.


Responses

An HTTP(s) response also follows a certain protocol.

Here's what we can expect it to contain:

1 - Status

A response must specify a 3-digit number and corresponding English phrase briefly stating what it thought of the request it received.

You can read all of the options here, but common status codes are:

  • 200 OK
  • 401 Unauthorized
  • 404 Not Found

2 - Header

A response is allowed to, in well-structured plain text, provide some summarizing details about itself that it thinks the web client might appreciate, like:

  • When the response was sent (date)
  • What "formatting standard" it will be using if it includes anything in the response "body" (content-type)

3 - Body

After the status and header, a response is allowed to say just about anything else it would like to say. This is considered the "body" of an HTTP(s) response.

Convincing a web server to send a response whose body contains something useful is often considered the purpose of making an HTTP(s) request:

  • If you've made a request to https://google.com, you're probably hoping that the response body will be filled with HTML, CSS, and JavaScript code that your web browser can use to draw a search box and doodle on your screen.
  • If you've made a request to a URL beginning with https://pi.pardot.com/api, you're probably hoping that the response body will be filled with information about records (e.g. "Prospects") from a Salesforce Pardot database you control.

Sometimes "useful information" means "additional details elaborating upon the response's status." This can be very helpful when the status code indicates that the server didn't like your request -- for example, a "401" or "404" status.


The Devil's In The Details

There's not much more to say about HTTP(s) itself. In less than 5 minutes, you've learned all you need to know to get started using it.

Celebration GIF

It truly is a simple protocol.

However, what you do with HTTP(s) requests matters.

It's just like there are varying levels of "protocol" in the real world.

Protocol when you meet the CEO of a company is to shake her hand and greet her.

If you call her an insulting name during your greeting, you've certainly violated an additional protocol of etiquette, and the CEO's response may be structured accordingly!

But technically, you didn't violate the "handshake and say something" protocol.

Making effective use of HTTP(s) requests requires researching additional protocols defined by each server with which you might like to communicate.

When these protocols tell you what to put in your requests to provoke specific behaviors from the server, they are known as an "Application Programming Interface" or "API."

Stay tuned for a hands-on, no-code tutorial series that will let you put your newfound HTTP(s) knowledge into practice against a variety of APIs.


Request URLs: a bedeviling detail

It might be worth mentioning now that sometimes, subtle variations in the URL get treated by the web server as if they were all headed to the same address, with the end of your specified URL being treated as if it were "extra data you'd like to tell the castle guard" rather than as part of "the location of the castle."

Examples:

  • For https://google.com/search?q=abc, you could argue that the guard at https://google.com considers the "details" to be /search?q=abc asking, "please search the internet for abc."
  • For https://google.com/search?q=XYZ, you could argue that the guard at https://google.com considers the "details" to be /search?q=XYZ asking, "please search the internet for XYZ."

Some web servers deliberately offer this "split treatment" of request URLs as part of their API.

(This is exactly the kind of situation that might make API documentation refer to "endpoints" and "parameters.")

When an API doesn't require a lot of information from an end-user, it's considered more "convenient" to let the user communicate all of their data as a variation on the URL than to force the user to add a body to the request.


Web Pages vs. API Endpoints

Perhaps you've noticed that I'm flippantly treating the following two categories of URL as if they were interchangeable:

  1. URLs typically thought of as "web pages," such as https://google.com
  2. URLs typically thought of as "data integration API endpoints," such as https://yesno.wtf/api

That's not an accident.

This might be a little much to wrap your mind around right now, but keep it tucked away somewhere for future reference:

I'd like to argue that a "web page" is just a special type of "API endpoint."
(However, we usually use "API endpoint" to mean "everything except web pages.")

Hear me out. I said earlier that:

  • An individual web server's rules about what to put into an HTTP(s) request and what to expect in the HTTP(s) response for a given URL can be thought of as an "API."
  • The web server often thinks of its valid URLs as "API endpoints."

If Google's web server's rule for https://google.com were the following...

"Whenever you request this URL with a GET method, I promise that the response body will be HTML that your web browser can use for drawing our 'home page' on your screen"

...then isn't that just a rule of "how to request and what to expect in response" like any other "API" rule?

Unsure GIF

It seems to me that the main difference between a URL considered a "web page" and a URL considered an "API endpoint" is that:

  • "Web pages" have simple and consistent "API" rules about what belongs in the HTTP(s) request (just provide the URL; your web browser will fill in everything else) and response (it'll be "something that a web browser can make pretty").
  • Everything else hosted by a web server -- everything with rules complicated enough to confuse humans who haven't yet learned about making an HTTP(s) request "from scratch" like you now have -- is what humans like to call "API endpoints."

Shrugging GIF


Security

I mentioned earlier that the "S" in HTTP(s) stands for Secure.

Rule 1

Don't ever send important data to a web server in an HTTP request if the URL begins with http:// instead of https://.

It'd be like mailing your passport number on a postcard instead of inside an envelope.

Everyone helping deliver the postcard can read and/or rewrite its contents.

Similarly, don't ever send an HTTP response to a web server whose URL begins with http:// instead of https:// if you expect the data it sends back to be important, because the response will also be arriving by "postcard."

Rule 2

Don't trust a URL just because you add an "s" to the URL yourself.

You should always do a test request against any URL starting with https:// without sending any important data so that you can inspect the web server's response and make sure that "secure" communications are actually working.

  • In your everyday life, with a web browser, this means, for example, visiting a web site's home page and making sure that your browser shows a padlock next to the URL.
  • In a tool like Postman or Curl or a programming language to make your computer behave as a "web client," you'll have to play with the tool and/or read its documentation to figure out the equivalent appropriate verification steps.
    • Often times your tool will show you an error saying that it didn't even receive a response from the web server, or that it thinks there's a problem with the "security certificate," so verifying that "https://" works as expected can be pleasantly obvious, even in code.

Rule 3

Don't send important data, even if "inside an envelope" (over HTTP*S*), to web servers whose owners you wouldn't trust with that data.

You wouldn't mail me your tax return, and you shouldn't include your Salesforce password in an HTTPS request to any URL but one that you know belongs to Salesforce.


Further Reading


Takeaways

  1. HTTP(s) is a very simple protocol that lets a computer acting as a web client make a well-formatted request for information to a web server, which in turn will send a well-formatted response.
  2. A well-formatted HTTP(s) request has a URL, a method, maybe a header, and maybe a body.
  3. A well-formatted HTTP(s) response has a status, maybe a header, and maybe a body.
  4. "S" is for "Security." Follow good HTTP(s) hygiene.
  5. It's easiest to practice writing your own HTTP(s) requests "from scratch" by talking to web servers that offer data over "APIs." Stay tuned to get your hands dirty!
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .