1. Background
Related Terms: Frequency Limiting, Access Control, Crawlers, Anti-crawling, WAF, SafeLine
For some automated bots or malicious crawlers, their access to websites tends to be frequent and prolonged. When accessing the cloud server's management backend, one often finds that most of the network traffic is concentrated on one or a few IP addresses. These situations can typically be addressed with a straightforward approach: implementing IP frequency limiting on the server.
However, the function of IP frequency limiting is generally not closely related to business logic, and developers often prefer not to maintain an IP access frequency table themselves. Moreover, manually maintaining information about all visitors in distributed and concurrent environments poses significant development costs.
Chaitin's WAF SafeLine effectively solves this series of problems. SafeLine provides functions such as frequency limiting, port forwarding, manual IP blacklisting and whitelisting, as well as its core function of defending against Web attacks.
2. Instalation
The official website provides several installation methods, which will not be elaborated on in this document. For details, please refer to:SafeLine
3.Configuring Sites and Frequency Limiting Functions
3.1 Site Configuration on SafeLine
The site configuration function of SafeLine is relatively comprehensive, including automatic uploading of TLS certificates and private keys, specifying multiple forwarding ports, etc., eliminating the need for developers to configure nginx forwarding on their own.
3.2 Configure the frequency limit function
The specific blocking strategy can be customized. It is recommended to limit the number of operations to 100 within 10 seconds and ban the user for 10 minutes.
Btw, if it's for self-testing or if a false alarm is detected, the ban can be manually lifted.
4. Testing and Other
4.1 Testing
A simple server is prepared in the backend, providing a "hello" interface that takes a parameter named "a".
Write a simple crawler code for testing purposes:
def send_request(url,request_method="GET",header=None,data=None):
try:
if header is None:
header=Config.get_global_config().header
response = requests.request(request_method, url, headers=header)
return response
except Exception as err:
print(err)
pass
return None
if __name__ == '__main__':
# config=Config.get_global_config()
# print(config.header) # send_request(header="asad")
for i in range(0,100):
str = random.choice('abcdefghijklmnopqrstuvwxyz')
resp = send_request("http://a.com/hello?a="+str)
print(resp.content)
Printing values
b'{"a":"u"}'
b'{"a":"m"}'
b'{"a":"y"}'
b'{"a":"o"}'
b'<!DOCTYPE html>\n\n<html lang="zh">\n <head>\n .... #
At this time, when you revisit the page, you will find that it has been blocked.
4.2 What if some cunning crawlers falsify the X-Forwarded-For request header?
SafeLine allows you to directly select "Socket Connection" in the "Global Settings" -> "Get Attack IP From". This indicates that the Source IP is retrieved from the TCP connection.
If you ask, "What if the crawler is extremely cunning and forges the TCP Source IP field?" Well, due to the forgery of TCP header information, the HTTP handshake based on TCP will directly fail. This means the crawler itself has lost its ability to crawl information, and the access request will be discarded by nginx upon reaching it.