We commonly utilize NGINX or Apache as load balancers and reverse proxy servers, even when external network traffic is forwarded through this software to reach the actual services within the internal network. While there is a reverse proxy, there also exists a forward proxy. Let's delve into their functionalities.
From OSI Model to Proxy Servers
When accessing internet services, we typically employ the HTTP protocol, operating at the layer 7 of the OSI model. Familiar components of this protocol, such as hostnames, paths, and query parameters, are parts of this layer. The HTTP protocol is built upon TCP or UDP protocols, functioning at the layer 4 of the OSI model, and the concepts of ports used during service access exist within these protocols. Going further, both TCP and UDP are based on the Internet Protocol (IP), operating at the layer 3 of the OSI model, where IP addresses serve as the internet's "house numbers".
When using an IP network to access the internet, the IP protocol is responsible for addressing the target, encapsulating data packets according to rules, and forwarding them from the source address to the destination address within the IP network. For instance, network traffic travels from a visitor's local network through the ISP-provided gateway device, connecting to the ISP's network before accessing the services of a resource provider.
At this point, we use a gateway to access the internet. When discussing proxy servers, we can draw a parallel with the IP network. Proxy servers act as gateways, bridging the gap to assist clients in accessing services. They essentially operate between the layer 4 and layer 7 of the OSI model, and indeed, they fall under the layer 5 of the OSI model.
Purposes of Proxy Servers
Proxy servers are typically employed for the following purposes:
1. Centralized Exit for Internal Network Access
Enterprises often have certain information security requirements, such as access control and traffic logging. With the prevalence of encrypted traffic like HTTPS, concealing network traffic beneath passwords makes it challenging to record and audit at the network boundary, increasing the risk of leaks. The existence of a proxy server serves as a unified traffic exit, acting as an intermediary to handle traffic auditing tasks.
Besides human-oriented purposes, proxy services can also serve as an exit for programmatic services accessing the external network. For example, when a service provider offers webhook functionality, it needs to channel traffic through a fixed exit, using a single fixed IP address or a range of fixed IP addresses. This facilitates the recipient of webhook calls to correctly whitelist them in the firewall. Failure to do so exposes both sides of webhook calls to potential security risks.
2. Concealing Visitor Identity
At times, internet users may wish to hide their identity, such as their IP address. In such cases, transparent proxy servers come into play. The client initially connects to the proxy server, specifying the real service address to connect to, and then accesses the target service through the proxy server using the HTTPS protocol. The presence of the proxy server ensures the client's identity remains hidden, while the use of an encrypted protocol guarantees that the proxy server cannot steal data during this process.
HTTP-Based Proxy
On proxy servers, we typically encounter HTTP-based HTTP proxy and binary-based protocol SOCKS 4/5 protocol. They perform similar functions but with different implementation methods. Let's focus on HTTP-based proxies.
In the early stages of protocol implementation, traffic on HTTP was primarily plaintext. This transparency allowed proxy servers in the middle of the client and service to effortlessly parse URLs and request payloads. Through DNS resolution and similar processes, the proxy could connect to the service using its own network address, thereby concealing the client.
An example of such a call is as follows:
GET http://example.com/resource HTTP/1.1
Proxy-Authorization: Basic encoded-credentials
The proxy server understands the address the client is attempting to access and sends a request to the service to obtain a response, which is then returned to the client.
HTTP/1.1 200 OK
Content-Type: text/html
...
body blahblah
...
This represents the simplest implementation form of an HTTP proxy server. However, we observe drawbacks: the proxy server handles client traffic in plaintext, posing a potential security risk as it might record user traffic during forwarding. Therefore, the consideration of encryption methods is necessary to ensure security.
Complex Workings of HTTPS Traffic and Proxy Servers
With the increase in the proportion of HTTPS encrypted traffic within all HTTP traffic, proxy servers must adapt to this scenario.
However, a challenge arises: the traffic sent by the client to the service provider is now encrypted. The proxy server cannot understand what resource the client is accessing through decryption. This is because the traffic is protected by a key negotiation mechanism based on asymmetric encryption algorithms between the client and the service provider, and subsequent encrypted traffic uses symmetric keys that cannot be obtained by a man-in-the-middle during communication. The fundamental purpose of TLS is to prevent the possibility of man-in-the-middle attacks.
So, how does the proxy server work in this case?
This is more complex compared to the previous method of plaintext request parsing. The HTTP protocol introduced a dedicated request method CONNECT. The client uses this method to send an initial request to the proxy server:
CONNECT server.example.com:80 HTTP/1.1
Host: server.example.com:80
Proxy-Authorization: Basic encoded-credentials
The client sends a CONNECT request to the proxy server, including the domain or IP address and port to which the client wishes to connect. Upon receiving the request, the proxy server establishes a TCP connection with the target service and stores the port mapping between the client and the service. Subsequently, the client can send the correct request to the proxy server, which will forward the traffic to the service as it is, without attempting to parse the data. Hence, the encrypted communication of HTTPS is reliable.
This mechanism, compared to plaintext HTTP proxies, is more versatile. Once the first HTTP request informs the proxy server of the information to establish a connection, it essentially becomes a transparent proxy channel. It can facilitate communication for both HTTPS and TCP binary traffic (such as SSH) through the proxy server.