(Photo by sergio souza )
Depending on your background, the process of instrumenting your applications, systems, and even your coffee pot makes perfect sense. Requests to "pipe this curl command through BASH" are the kind of thing you do every day.
Or, you know, maybe not.
I hail from a sysadmin and network engineer background. And while I've spent over two decades focusing on installing, configuring, and maintaining monitoring and observability solutions, code-heavy, developer-centric processes still leave me a little in the dark.
It's nice when someone can show me exactly how something works and why it's useful—and then make it dead simple to set up myself. So that's exactly what I'm going to do here. You'll see how to use New Relic to monitor infrastructure like
- CPU
- RAM
- storage
- network traffic.
And then I’ll walk through the steps to install New Relic on a system you control, that’s NOT in production (because we’d NEVER test things in prod, right? RIGHT?) It can be a virtual machine (VM) running locally, a system in the private or public cloud, or even your actual machine sitting under (or on top of) your desk. It can be running on Windows, Linux, MacOS, Docker, or Kubernetes. It doesn’t matter—because you can monitor any of it with New Relic.
Monitoring your infrastructure with New Relic
First, why would you want to monitor your infrastructure systems with New Relic in the first place? Let’s take a look at some of the key features that are helpful for sysadmins including dashboards and alerts.
After you’ve installed New Relic and instrumented your system, New Relic’s Hosts dashboard gives you high-level information about your system’s CPU, RAM, storage, network traffic, and so on.
You can easily access Metrics and Logs from the left-hand pane. More on those in a moment.
Every data point and statistic you see on screen can potentially be used to generate an alert or notification if something goes wrong. That way, you can use automation rather than waiting for a customer to call and ask, “Is the internet down?”
The Metrics dashboard has everything from the Summary dashboard but more of it, including network inbound and outbound (in bytes and packets) and dropped packets inbound and outbound. That’s alongside more information on CPU, RAM, and disk.
The logs tab, as the name implies, contains log messages.
New Relic’s overall logging capability supports a variety of inputs and sources. While it’s good to know you can instrument “anything,” I think it’s always important to know what the default behavior is going to be. So, here are the logs that will start generating messages in the dashboard right away.
For Linux systems (including MacOS), New Relic will forward all messages appearing in:
- /var/log/alternatives.log
- /var/log/cloud-init.log
- /var/log/auth.log
- /var/log/dpkg.log
- /var/log/syslog
- /root/.newrelic/newrelic-cli.log
For Windows systems, the New Relic agent will pass along messages from the following locations:
- Security event log entries with the following event IDs:
- 4740
- 4728
- 4732
- 4756
- 4735
- 4624
- 4625
- 4648
- All events from the Application event log.
- All messages appearing in newrelic-cli.log in the .newrelic of whichever user ran the infrastructure agent installation in the first place.
You can also access the Events explorer and Metrics explorer from the left-hand pane. If you’re interested in learning more about them, check out Introduction to the data explorer.
Setting up New Relic
Now that you have an appreciation for both the ease of navigation and the range of data you’re able to collect and display, you might be wondering how to install New Relic yourself. If the level of difficulty ranges from “connecting my wireless mouse” to “setting up an internet-connected coffee pot,” it’s closer to the mouse side of the scale. You don’t need to compile code, download multiple (possibly conflicting) libraries, or choose between features or modules before you’ve even had a chance to test the system out.
Here’s what you need to get started:
- An active New Relic account ( https://newrelic.com/signup )
- A system you want to instrument.
- A connection to the system where you can cut and paste commands.
- A connection from the system to the internet.
- Optionally, a tool or command to stress test the machine's CPU, RAM, or disk I/O. Stress-ng ( https://wiki.ubuntu.com/Kernel/Reference/stress-ng ) and Prime95 ( https://www.mersenne.org/download/ ) are two examples of such utilities.
Installation Steps
Log in to your New Relic account. In the left-hand column, select Add more data.
Choose Guided install, which will help you install the main infrastructure agent.
Select Begin installation. From the following screen, copy the command and then paste it into the terminal or remote session connected to the system you want to monitor.
After the installation is finished you'll get a link to the New Relic One dashboard in your remote system. Alternatively, you can select See your data in New Relic One.
If any issues come up during the process, the guided install will offer commands, documentation, and suggestions on how to move forward.
Perturbing the system (but only in test)
There are many utilities that will spike the CPU, fill up RAM, or push the disk I/O to the ceiling, so I'm leaving it to your creativity to choose which one to use.
On my test system, after spiking the CPU for 10 minutes and then letting it cool back down, this is what my dashboard looked like:
I used the following stress-ng command for this example:
stress-ng --matrix 0 -t 10m
Of course, there are many more ways in which a system can break, from single-point-of-failure situations like a disk or RAM to the complex multi-element cascades that you’re more likely to see in real life. There are other stress-ng commands you can use to stress CPU, RAM, disk I/O, and other subsystems.
Honestly, I had so much fun using stress-ng to play “will it crash” with my system that I’m planning another blog post where I beat up a system and show you what it looks like in New Relic.
Do you want to play with infrastructure monitoring in a risk-free environment (i.e.: “not your stuff”)? This lab uses ephemeral machines and walks you through a more real-world scenario: “Identify root cause issues in your infrastructure.” Or maybe you’re ready to dive in! In that case, sign up here for a free account that comes with 100GB of data, free for life.