How to keep an HTTP connection alive for 9 hours

SnykSec - Oct 24 '23 - - Dev Community

It’s become so ubiquitous, that it’s easy to forget what a marvel the HTTP specification truly is. When you browse to website, like https://snyk.io, that triggers a flurry of additional HTTP requests to retrieve JavaScript, images, videos, and other assets. And within seconds, you see a fully rendered page. In fact, the goal of any consumer-facing website is to deliver an entirely rendered web page within a few seconds at most, or else they could lose traffic to a slightly faster site (seconds add up!).

Sometimes, however, there’s a use case for a longer-running process that feeds regular updates down the HTTP connection. Let's take a look at one of those cases.

How Snyk runs a Capture the Flag event

Each year, Snyk runs a Capture the Flag event called Fetch the Flag (named so because of our mascot, Patch). This year, we’re super excited to have the CTF legend John Hammond host the event.

Under the hood, we use the open source CTF platform, CTFd. CTFd has its own system for registration and login. However, we wanted to use our own registration landing event for style and tracking purposes. Here are the requirements from our marketing team:

  1. Register exclusively through our registration landing page.
  2. Automatically allocate an account on the CTFd server.

    1. Generate a unique alias.
    2. Set a unique, complex password.
    3. DON’T notify users yet.
  3. When we’re close to the event, trigger a process to bulk-email all pre-registered users with information for getting their credentials.

    1. Flip the CTFd registration mode so that new users get a credential email at the time they register.

In this post, I’ll cover how the open source project I created — ctfd-account-hook — evolved to support a long-running, secured HTTP request to notify nearly 4,000 registered participants over email.

Working with the CTFd API

The CTFd system has a built-in API for common tasks like CRUD operations for users as well as an endpoint for email notification. The system can easily be configured to use an email provider by setting host, port, and authentication credentials.

As is the case with most modern APIs, there are rate limits for these endpoints. In particular, the email notification has strict API rate limits because the configured email service will usually have its own API rate limits. The email endpoint allows 10 emails to be sent before returning a standard 429 HTTP status code to indicate “too many requests”. After a minute has elapsed, you can make a new set of 10 email API calls. This information was useful in deciding how to build the ctfd-account-hook application.

Building out the CTFd account hook app

The system our marketing team uses for registration pages has the ability to make an API call when the registration page is submitted. Based on the requirements above, I knew I wanted:

  1. A secure endpoint
  2. Minimal input to the account hook — just an email address
  3. The ability to switch modes from NOT sending email notifications at registration time to sending email notification at registration time

    1. Ideally, NO changes would be needed on the registration landing page configuration when it was time to switch modes.

I settled on Spring Boot with Spring Security and WebFlux. This made it super easy to support secure endpoints, make API calls to CTFd, handle API rate limits, and have easy configuration changes to support the two modes of operation.

Creating accounts

The only input to the ctfd-account-hook app is an email address. The app needs to create a unique alias and then create a CTFd user account using its API.

We settled on an alias system that would select components from internal dictionaries. The alias consists of an adjective, a color, and a dog breed. With 900 adjectives, 52 colors, and 80 dog breeds, there’s a pool of 3,744,000 possible aliases.

The CTFd API endpoint — /api/v1/users — is used to create a user. It has an optional query string parameter: notify. To create a user and notify them by email of their credentials at the same time, you would issue a POST like this:

POST /api/v1/users?notify=true
Enter fullscreen mode Exit fullscreen mode

Without the notify query string parameter, the user will NOT get an email notification.

The first real benefit of using Spring Boot was realized by its environment variable handling capabilities. In the CtfdApiServiceImpl class, there’s a boolean field called notifyOverride. The value is set automatically via an environment variable using the following syntax:

@Value("#{ @environment['ctfd.api.notify-override'] ?: false }")
private Boolean notifyOverride;
Enter fullscreen mode Exit fullscreen mode

By default, notifyOverride will be set to false. But, if the environment variable ctfd.api.notify-override is set to true, then every new CTFd account that’s created will also receive an email notification. This is handled further down in the code:

…
String notify = (notifyOverride || req.getNotify()) ? "?notify=true" : "";
String uri = API_URI + "/users" + notify;
Enter fullscreen mode Exit fullscreen mode

The app is deployed to Heroku and when it came time to switch the mode to email-on-account-creation mode, it was a very simple environment change:

heroku config:set ctfd.api.notify-override=true
Enter fullscreen mode Exit fullscreen mode

Sending bulk emails

With account creation (both with and without email notification) in place, the next big hurdle was to build out sending bulk email notifications. The plan was to allow people to register for a number of weeks prior to our Fetch the Flag event. While a CTFd account would be allocated for them (complete with a generated alias), they would NOT be notified of their credentials.

About a week out from the event, the switch would be flipped so that new registrations would receive an email notification immediately. Then, a long-running process would be kicked off to send notifications to all previously registered users.

This long-running process had to support the paginated CTFd API endpoint for getting a list of existing users and had to support a sane backoff/retry approach for handling API rate limits. This is where Spring Boot’s async support and the WebFlux HTTP client really shine. Let’s take a look at a WebFlux API request to the CTFd email endpoint:

this.webClient.post().uri(uri)
    .bodyValue(emailText)
    .retrieve()
    …
    .bodyToMono(CtfdUserResponse.class)
    .retryWhen(retryBackoffSpec)
    .block();
Enter fullscreen mode Exit fullscreen mode

This one line — .retryWhen(retryBackoffSpec) — ensures that when API rate limits are hit, the request will be retried in a sane way. Here’s the definition of retryBackoffSpec:

this.retryBackoffSpec = Retry.backoff(maxAttempts, Duration.ofSeconds(backoffSeconds))
    .doBeforeRetry(retrySignal -> log.debug(
        "Waiting {} seconds. Retry #{} of {} after exception: {}",
        backoffSeconds, (retrySignal.totalRetriesInARow()+1), maxAttempts,
        retrySignal.failure().getLocalizedMessage()
    ))
    .onRetryExhaustedThrow((retryBackoffSpec, retrySignal) -> retrySignal.failure());
Enter fullscreen mode Exit fullscreen mode

On the first line, it uses the environment variables maxAttempts and backoffSeconds to control what happens when an error on the HTTP request occurs. The cool thing is that this definition covers ANY type of error. The most common error would be a 429 “too many requests” error. But, if there’s a service disruption and a 5xx type error is returned, the request will be retried as well. This makes the web requests very resilient with very little code. That’s the power of WebFlux.

With the backoff/retry approach handled, it was time to get the long-running email notification process set up. Knowing that every 10 email notifications, there would be a 1-minute wait, and knowing that we had around 4,000 registrations, I knew that it would take over 6.5 hours to get all the notifications processed. In practice, adding for the additional overhead of API calls for pagination, password update, and email notification, the entire process took over 9 hours to complete.

The next step was to implement async handling. I wanted my controller to return immediately while kicking off the long-running process. And, I wanted to keep the HTTP request channel open and send periodic updates on the status of the process. This is where Server Sent Events (SSE) comes in. You can think of SSE as an open pipeline that we can keep sending information down on. A subscriber will receive the information.

Spring Boot has built-in support for SSE and the HTTP request is automatically subscribed to it. Here’s the Controller code to kick off the long-running email notification process:

@PostMapping("/api/v1/update-and-email/{affiliation}")
public SseEmitter updateAndEmailUsers(@PathVariable String affiliation) {
    SseEmitter emitter = new SseEmitter(1000*60*60*24L);
    ctfdApiService.updateAndEmail(emitter, affiliation);
    return emitter;
}
Enter fullscreen mode Exit fullscreen mode

On the first line of the method, an SseEmitter object with a timeout of 24 hours is created. The asynchronous method ctfdApiService.updateAndEmail is called, passing in the newly created emitter. Finally, the emitter is returned from the controller method. This 3-line controller method enables the asynchronous SSE handler. The updateAndEmail method will periodically send events to the emitter which will automatically be sent down on the open HTTP request.

Before we look at the service code, let’s get our Spring Boot application set up to support asynchronous calls. In the main Spring Boot application, you turn on async handling through the EnableAsync annotation:

@SpringBootApplication
@EnableAsync
public class CtfdAccountHookApplication {

    public static void main(String[] args) {
        SpringApplication.run(CtfdAccountHookApplication.class, args);
    }
}
Enter fullscreen mode Exit fullscreen mode

Then, a service method can automatically be made asynchronous by annotating it with @Async. Here’s the definition of the updateAndEmail method in the CtfdApiServiceImpl class:

    @Async
    @Override
    public void updateAndEmail(SseEmitter emitter, String affiliation) {
        Integer page = 1;
        int processed = 0;

        do {
            try {
                CtfdUserPaginatedResponse ctfdUserResponse =
                    getUsersByAffiliation(affiliation, page);
                for (CtfdUser ctfdUser : ctfdUserResponse.getData()) {
                    SseEmitter.SseEventBuilder  event = SseEmitter.event()
                        .data("Processing - " + ctfdUser.getId() + " - " + LocalTime.now().toString())
                        .id(String.valueOf(ctfdUser.getId()))
                        .name(ctfdUser.getId() + " - " + ctfdUser.getName());
                    emitter.send(event);
…
                    ctfdUser = updatePassword(ctfdUser);
                    emailUser(ctfdUser);
                }
                page = ctfdUserResponse.getMeta().getPagination().getNext();
                processed += ctfdUserResponse.getData().length;
…
            } catch (Exception e) {
                log.error("Failure while update/email operation: {}", e.getMessage());
                emitter.completeWithError(e);
                return;
            }
        } while (page != null);
…
        emitter.complete();
}
Enter fullscreen mode Exit fullscreen mode

Spring Boot handles running this method in its own thread. For each page (a CTFd API call), it iterates over that page’s list of registered users. Then, for each user, it updates the password (a CTFd API call) and sends out an email notification (a CTFd API call). Along the way, it uses the SSE emitter to send messages down the pipeline. The request and its output looks something like this (using the HTTPie client):

http POST \
https:///api/v1/update-and-email/fetch2023 \
 x-api-key:""

HTTP/1.1 200
data:Processing - 1 - 17:19:00.595414016
id:1
event:1 - raw-blue-armant

data:Finished Processing - 1 - 17:19:03.958247179
id:1
event:1 - raw-blue-armant
Enter fullscreen mode Exit fullscreen mode

Looking at the server logs, I saw something like this:

Processing user id: 1, name: raw-blue-armant
Password updated for user id: 1
Email sent for user id: 1
…
Processing user id: 10, name: conscious-harlequin-cursinu
Password updated for user id: 10
Waiting 10 seconds. Retry #1 of 10 after exception: 429 Too Many Requests from POST https://snyk.ctf.games/api/v1/users/10/email
Waiting 10 seconds. Retry #2 of 10 after exception: 429 Too Many Requests from POST https://snyk.ctf.games/api/v1/users/10/email
Waiting 10 seconds. Retry #3 of 10 after exception: 429 Too Many Requests from POST https://snyk.ctf.games/api/v1/users/10/email
Email sent for user id: 10
Enter fullscreen mode Exit fullscreen mode

Here, we see the backoff/retry mechanism of our WebFlux HTTP client in action.

The wrench in the works

After testing this all out locally, I did a full test run. It ran for 9+ hours and finished without an issue and I had the full log of the SSE output. It was time to deploy to Heroku and run it for real.

Knowing that things in a production environment can behave differently than on my local machine, I did a test run of about 100 dummy accounts from Heroku and to my surprise, I started seeing errors and the request died after about 1-minute of operation. Heroku, it seemed, was shutting down my HTTP request because it was idle for too long.

Heroku has an edge proxy that automatically makes it so that a deployed application is accessible over the public Internet on an HTTPS address. This is all configured automatically and makes it such that all deployed applications are SSL-protected by default. In order to provide a good quality of service to all the applications running on Heroku, it aggressively closes idle connections. The issue my app was having was that when the backoff/retry logic was triggered, there were no SSE emitter notifications for up to a minute. Heroku closes the idle connection after about 10 seconds. In order to mitigate this, I needed another asynchronous method that would drop a “heartbeat” SSE message at regular intervals — no matter what.

Here’s the updated service controller method:

@PostMapping("/api/v1/update-and-email/{affiliation}")
public SseEmitter updateAndEmailUsers(@PathVariable String affiliation) {
    // TODO - should probs be another env var setting
    SseEmitter emitter = new SseEmitter(1000*60*60*24L);
    ctfdApiService.emitterHeartBeat(emitter);
    ctfdApiService.updateAndEmail(emitter, affiliation);
    return emitter;
}
Enter fullscreen mode Exit fullscreen mode

Since both emitterHeartBeat and updateAndEmail are asynchronous methods, everything still works as expected. Here’s the emitterHeartBeat method:

@Async
@Override
public void emitterHeartBeat(SseEmitter emitter) {
    try {
        do {
            emitter.send("beat");
            Thread.sleep(5000);
        } while (true);
    } catch (Exception e) {
        log.debug("exception during emitter: {}", e.getMessage());
    }
}
Enter fullscreen mode Exit fullscreen mode

This guarantees that every 5 seconds, a beat message will be sent down the open HTTP request via the SSE emitter. It will never present as idle and Heroku will not close it as a result. After I deployed this change, I triggered my 9+ hour notification process and it worked like a champ.

Go forth and fetch

I’m proud of the ctfd-account-hook project and I welcome contributions to it! I’m participating in Hacktoberfest, so you can get badges for getting pull requests accepted. 

What I learned is that this long-running process would be much better suited to run entirely in the background. Then, I could write an endpoint to poll its progress. As cool as the Server Sent Event protocol is, the current approach still has some fragility. If there’s a disruption to the open HTTP request, the whole process can fail. As outlined in this issue on GitHub, I could keep my asynchronous service processes pretty much as-is. After the long-running process is kicked off, the controller could immediately return a job id. A polling endpoint would return the status of the job, including an indication that it was complete. While this approach adds the complexity of using a database and table records to track progress, it eliminates the fragility of an open HTTP connection.

We’d love to have you participate in our Fetch the Flag event on October 27, 2023. After you register, you’ll receive an email notification with your credentials to the CTFd platform. There are 30 challenges and you’ll have 24 hours to compete. You can form a new team or join an existing one, and you can join the chat on our Discord server.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .