Originally published in my newsletter.

Hypertext Transfer Protocol (HTTP) is an application-layer protocol for transmitting hypermedia documents, such as HTML. It was originally designed for communication between web browsers and web servers, but with later development, its application is not limited here, the mature software and hardware environment makes it the infrastructure of the Internet. I believe you must have been exposed to it, it is just like water or air in software development, it’s all too common.

But as programmers, we may ignore a lot of key information because of its commonness, resulting in some problems in daily work development and do not know how to solve them. Then this article will introduce you to one of the key pieces of information — HTTPS.

Why HTTPS?

Why does it appear? What problem does it appear to solve?

To answer the above questions, we need to first analyze the characteristics of HTTP.

The characteristics of HTTP are simple, flexible, and easy to extend, but at the same time it is stateless (you can use Cookie technology to achieve “stateful”), and it is transmitted in clear text, which means that its data is completely visible and easy to be eavesdropped or forgery.

At the same time, HTTP is widely used, so scenarios with high-security requirements such as online payment require corresponding security measures, so HTTPS appeared.

What is HTTPS?

The RFC document of HTTPS has very little content. It specifies a new protocol name “HTTPS” and the default port number is 443, which is to insert a “security layer” between TCP and HTTP.

This security layer, as the name implies, encrypts the data sent and decrypts the data received so that the middleman cannot steal the information.

Then as long as you understand this security layer, you also understand HTTPS.

SSL is the Secure Sockets Layer, which is at layer 6 (Presentation) in the OSI model. It was invented by Netscape in 1994. There are two versions, v2 and v3, and v1 was never disclosed because of its flaws.

When SSL developed to v3, it proved itself to be a very good secure communication protocol, so the Internet Engineering Group IETF renamed it TLS (Transport Layer Security) in 1999, and officially standardized, the version number from 1.0 is recalculated, so TLS1.0 is actually SSL 3.1.

At present, TLS has developed three versions, namely 1.1 in 2006, 1.2 in 2008, and 1.3 in 2018, and each new version is continuously enhancing security and performance. The most widely used TLS 1.2 at present, so let’s unravel the secrets of TLS 1.2!

HTTPS (TLS 1.2) secret

Let me explain the above image first:

Handshake phase:

The client generates a client-random and then passes the symmetric cipher suites and asymmetric cipher suites it supports to the server.
After the server receives it, it will select the encryption algorithm to use from the encryption suite, generate a server-random and pass the certificate of the service to the client.
After the client receives it, it will first verify the certificate. If the certificate is valid, it will generate a pre-master random number, encrypt it with the public key in the certificate and the selected asymmetric encryption algorithm , and then pass it to the server. It will also come with a Client finished confirmation message.
The server receives the encrypted pre-master and can decrypt it using the private key. A confirmation message of the Server finished will then be passed to the Client.

Transfer stage:

After the handshake phase, both ends already have client-random, server-random, and pre-master. Mix them to generate the final master secret, and use the previously selected symmetric encryption algorithm to encrypt and decrypt the required transmitted data.

God, is it so troublesome that I just want to send data? I thought the same thing when I first saw it.

Let’s be patient and take a closer look, it’s actually not that difficult.

What are symmetric encryption and asymmetric encryption?

Let’s first explain the symmetric encryption and asymmetric encryption mentioned above.

Symmetric encryption means that the same key is used for encryption and decryption. Asymmetric encryption has two keys, A and B. If you use the A key to encrypt, you can only use the B key to decrypt; conversely, If you want the B key to encrypt, you can only use the A key to decrypt.

In the handshake phase of HTTPS, we use asymmetric encryption, why? Suppose we use symmetric encryption:

As can be seen from the above figure, the client-random and service-random and symmetric encryption suites passed in are in clear text, which leads to the fact that if a hacker gets the data, they can generate the same key, which can be cracked data.

If we use asymmetric encryption, the pre-master encrypted by the client can only decrypt the key stored by the server.

What is a CA certificate?

Because once the hacker uses DNS hijacking and replaces the IP address that the user wants to access with the hacker’s IP address, the request will be sent directly to the hacker’s service. He implements the public key and private key on his service, and the client completely does not know!

So we need the CA (Certificate Authority) certificate authority to help us prove that this service is the service we want to access!

How to check the validity of the CA certificate?

The client will verify according to the following process:

Check if the certificate has expired
Check if revoked by CA
Check whether the certificate is issued by the CA authority. We will use the original certificate information to calculate the message digest, use the CA public key to decrypt the digital signature in the certificate, and then compare the message digest with the digital signature.
Prove the legitimacy of the CA organization. The CA certification chain is a tree structure, and the root CA certificate is found step by step. The root CA certificate (self-signed certificate) is built into the system. The requirements are very strict. If the root CA certificate is valid, it also proves that the certificate is valid. But if malware injects an illegitimate root CA certificate into a user’s system, there’s no way around it.

Why is the transmission phase symmetric encryption?

This is because the symmetric encryption algorithm usually uses bit operations, and the asymmetric encryption is mainly the calculation of some large number multiplications in RSA, so the efficiency of asymmetric encryption is very low, which will seriously affect the transmission speed and make the user experience very poor. And the master secret we generated in the handshake phase is secure enough, so we can use symmetric encryption in the transfer phase.

If you find this helpful, please consider subscribing to my newsletter for more insights on web development. Thank you for reading!

How does HTTPS work