The aim of this post is to succinctly describe an effective and robust architecture for self hosting your NodeJS web applications. I’m going to stay relatively high level, describing the technologies, and components, by the end of it you will have a good idea of what such a system looks like. There is a focus on standard well tested pieces rather than the latest shiny cloud / containerisation offerings. It is well suited for running small to medium size applications.
Features of the architecture
- Runs on standard VPS hosts
- Possibility to scale
- Secure
- Easy to maintain
- Fault tolerant
- Low cost
- Backed up and easy to restore
- Easy machine provisioning
- Easy to deploy code
- Support multiple databases
3 main components
- Load balancer
- Web and API application servers
- Datastore
During it’s life cycle, a client web request travels over the internet and eventually arrives at the load balancer where any SSL/TLS connections are terminated, then re-encrypted using self-signed certs and sent to an available application server. That application server performs the tasks it needs to do, persisting information on a shared datastore. Responses are sent directly from the application servers to the client.
The SSL/TLS termination happens on the load balancer because it makes managing the certificates much easier, with only a single place to renew, create, update and backup certificates.
Having a load balancer ensures that you can have several application servers running in parallel, which means you can scale by just adding more application servers, but it also means you can reboot servers without impacting site uptime.
As for the application servers, you can separate out web servers from API servers, but for ease of maintenance you can also just run both on the same machine on different ports, with a reverse proxy on the machine directing the requests to the right application. In this way you have one discrete unit which makes it much easier to add capacity. In the vaste majority of cases this setup is good enough, though could be optimised later.
Having a shared datastore is key to being able to run the application servers in parallel. This is a single machine that has a large storage volume mounted. It runs all the databases which write their data to the storage volume. The datastore can also run on a clustered set of machines for high availability, though this adds quite a lot of complexity, so initially it’s probably best to run one machine with good backups, so if anything goes wrong you can be restored and running with a minimum of downtime.
Technologies
- Nginx - Load balancer and reverse proxy
- Redis - Key/value very fast database often used for storing sessions and caching
- Mongodb - NoSQL database
- Postgres - SQL database
- Letsencrypt certbot - for generating and maintaining certificates
- Linux Ubuntu - Operating system for all 3 components
- Pm2 - NodeJS process manager, runs the applications, handles logging and a variety of other runtime activities
- RabbitMQ - Message queue software very important for fault tolerant backend systems
- Mongodb-queue - Message queue implemented via a NodeJS library backed by MongoDB
Provisioning infrastructure
You can keep things quite simple in this regard, using a Bash script for each of the 3 main components. The script would need to do the following:
- Install latest OS updates
- Install necessary software
- Configure users and groups
- Write/update software configuration files
- Start and stop various services
These are some of the important Linux items you would need to know about:
- sshd - server for ssh connections
- stunnel - creates secure connections, used on datastore for applications without built in SSL - e.g. Redis
- ufw / iptables - firewalls
- PKI and creating self-signed certificates
- logrotate - manage rotating and backing up application log files
- cron - schedule the running of maintenance scripts like backups
- certbot - generate and renew certs
- rsync - securely synchronize files between machines
It’s likely that your VPS hosting provider has an API and / or command line tools, making it possible to create a provisioning script that creates a VPS server, rsyncs the bash install script to the machine and runs it. So with a minimum of fuss you can provision fresh servers by running a script, so it’s completely repeatable.
It’s worth noting that there are modern tools that use containerisation like Kubernetes, which are very powerful but can get quite complex.
Deploying code
This is another place where a simple bash script can be very effective.
It would need to do the following:
- Build your application to a deploy directory
- Backup currently running app
- Rsync the files to the application servers
- Restart the application server
There is a lot of variety in this area. Many modern workflows that use CI/CD systems use git to clone your entire application
repository to the server, rather than rsyncing just the built files. Requirements vary a lot from project to project.
The bash script route is great for simplicity, but there are often more manual steps involved, especially if your application has complex configuration. In the early days of a project it’s often good enough.
Backups
Backups are super important. You need to have all the important files backed up and ideally scripts to restore the backups in the event that a component fails and needs to be restored.
Consider backing up:
- Each deployed application version, along with configuration
- Log files for databases, firewalls
- Certificates
- Contents of all databases
- Configurations for every 3rd party application you are using
It’s a good idea to use storage from big cloud providers, they are low cost and have good scripting tools.
Security
It’s important to configure your machines securely, set firewalls (local and cloud) appropriately. Always use TLS/SSL for inter machine communication. Follow the security advice from the various pieces of software you install, for example creating different users for specific purposes e.g. application access vs access for backups. Only give the minimum of access rights necessary to perform a given task.
Staging and production environments
Once the application is running in production, you will benefit a lot from having a staging environment. It’s a replica of the production environment where you can try out new code without being worried to break the live system. Never deploy directly to production, always test it out in staging first.
Wrapping up
The infrastructure side of running applications can get quite complex, but there are a lot of advantages to knowing how to construct these setups yourself:
- Keep costs at a minimum
- Be in full control of the infrastructure
- Be able to deploy anywhere
It’s also worth experimenting with integrating serverless technologies for aspects that are very high load, the low cost and high performance might be worth the portability trade-off, but be aware that a move might require rewriting parts of your application should you need to change providers.
--
Thanks for reading!
I’m currently available for hire - Consider hiring me!
If you liked this article you might like adding my blog feed to your RSS reader, reading my daily linkblog or signing up for my weekly newsletter. :)
Originally posted on blog.markjgsmith.com