How to overcome data silos in open source server monitoring

Leon Adato - Jun 21 '23 - - Dev Community

When it comes to open source tools for server monitoring, there are an astonishing number of options. But the problem that server administrators face isn’t lack of options—it’s being able to tie disparate information from multiple sources together. One solution is to create a "swivel chair integration" where you set up multiple dashboards on multiple screens and swivel your view from one screen to the next. While you’ll be able to see all of your data, it's hard to compare and correlate it into meaningful insights to effectively troubleshoot issues. 

A better solution is to bring all of your data into one platform. In this blog, you’ll learn how to use New Relic to maintain your open source toolset while avoiding observability silos and tool fragmentation. You’ll also learn about top open source server monitoring tools and the problem of siloed data. Finally, you'll learn three specific techniques to send data to New Relic:

  • Using an alternative agent with Nagios
  • Using an API with Zabbix
  • Using Prometheus as a data forwarder

There are many open source monitoring tools that can be integrated into New Relic using alternate agents, APIs, or custom integrations. So please focus on the aspect that has the most potential for you in your work.

Top open source server monitoring tools

Here are three of the top open source server monitoring tools:

  1. Prometheus: Ideal for containerized environments, with an especially strong focus on Kubernetes.
  2. Zabbix: A beloved infrastructure monitoring tool that’s adapted through the years. 
  3. Nagios: One of the pioneers of OS server monitoring, which is focused on agent-based metrics collection for applications, operating systems, websites, and middleware.

There are many other open source server monitoring solutions out there doing an admirable job, growing and adapting over time, and still attracting the attention of users. 

Data silos undermine observability

If you're looking at that list and thinking there's no "one size fits all" solution, you are correct—there's not even a "one size fits most" option! But you can still use a combination of these free tools to get a solid understanding of your environment. The challenge then becomes finding a way to make sense of all the disparate data streams and dashboards.

A common example occurs with monitoring on-premises network equipment versus cloud-based application environments. With the former, tools like Zabbix and Nagios are a good choice because they have robust support for protocols like SNMP, one of the most effective ways to extract telemetry from these device types. With the latter, cloud-native options are usually a better choice. However, even with open source tooling in place, your cloud-based observability may be missing key insights and context. Without application telemetry, the network monitoring data can only draw hints and inferences of the user experience. Clearly, you need both sets of information, and ideally the data should be in one place for correlation. 

But if that’s the way heterogeneous environments are, why would you even want data from multiple different sources? Here are some reasons:

  • Some tools are simply better at collecting certain types of data than others.   
  • Some tools make gathering certain types of data challenging. For example, Jaeger—a very fine open source tool for tracing—can’t collect network metrics via SNMP. Nor would you want to spend the time trying to force it to do so when other tools are more readily available and easily installed.
  • Some data is so specialized that it’s not available in every solution on the market.

That means you often need to use multiple tools, but you can still pull that data together in one place to efficiently make use of it.

One option is to build an uber-dashboard—the quintessential Network Operations Center (NOC) display consisting of multiple monitors, each showing a different tool's display.    

Image description

However, this approach has many problems, particularly tool fragmentation and observability silos. The crucial issue is that data from one tool can't interact with the data in others. While there are certainly additional open source solutions that aim to overcome that challenge, there are still some obstacles:

  1. Open source is only "free" if you do not consider the time it takes to make up for any shortcomings. Adding a second open source tool to fill a specific gap in the first increases toil.
  2. Most businesses leverage a blend of open source and paid tools, and getting everything to play together is often challenging.

You can solve the communication failure within your open source tools by using New Relic as your open source data switchboard. With New Relic, you can continue to use your existing open source server monitoring tools while unifying your data and removing data silos.

Integrating Nagios using an alternate agent

New Relic has a specific integration that sends your data into New Relic instead of Nagios. With this integration, you install the New Relic Nagios agent and copy over all of the tests you already have running via the native Nagios Remote Plugin Executor (NRPE) agent. The benefit is that you can continue to compare results between the Nagios and New Relic dashboards until everything is completely tuned, and then remove the NRPE agent if desired. This same process can be followed for other integrations that New Relic offers.

You’ll start with a working Nagios installation. The next image shows the basic dashboard you see when you first get Nagios up and running.

Image description

On each system that Nagios is monitoring, you'll install both the New Relic infrastructure agent and the Nagios integration. To do this, open the New Relic platform and select Add Data. Search for nagios and select the Nagios integration to begin the installation process.

Image description

During the process, the installer will realize the infrastructure agent is missing and offer to install it, which you should accept.

Image description

As the previous image shows, the installer also detects and offers to install plugins for other applications and elements that exist on your system. While optional, doing so is recommended because those applications are often critical to system functioning.

The final and most critical step is to pull in all of your existing custom Nagios checks. To do so, navigate to /etc/newrelic-infra/integrations.d on Linux/Mac or C:\Program Files\New Relic\newrelic-infra\integrations.d on Windows, where you'll find a series of nagios-*.yml files. This is where you’ll add the monitoring scripts for the Nagios agent. Because Nagios agents perform monitoring through a set of commands found in a series of shell scripts, you’ll need to add these shell scripts to a New Relic file that will run them and send the data to New Relic. 

For example, if you want to add the status and details of Nagios' CPU check originally described in the check_cpu_status.sh file, you would add the following line to nagios-service-checks.yml:

  - name: check_cpu_stats
    command: ["/usr/local/nagios/libexec/check_cpu_stats.sh"]
    parse_output: true
Enter fullscreen mode Exit fullscreen mode

This automatically adds CPU information into any dashboard or display you build in New Relic.

Image description

To learn more about configuring the Nagios agent, including example configurations and troubleshooting, see Nagios monitoring integration in the New Relic documentation.

Integrating Zabbix using the Zabbix API and New Relic Flex

Like Nagios, Zabbix is an open source tool with a devoted user base. In this example, you’ll integrate Zabbix data into New Relic using the Zabbix API. If you go to http:///zabbix/api_jsonrpc.php (make sure to put your server name or IP in place of ), you can connect to various endpoints that expose data about your server. This API is a great way to programmatically extract live data from Zabbix, which you can then send to New Relic via the Flex integration. In fact, you can use the Flex integration to send data to New Relic from other open source tools that use APIs as well. 

As with Nagios, the first thing you need is a working Zabbix installation. The next image shows a Zabbix dashboard for monitoring servers.

Image description

Next, you'll need an API key to run the API requests. You can get this two different ways. Within the Zabbix web portal, you can go to User Settings > API Tokens, then select the Create API token button.

Image description

As shown in the previous image, give your API token a name and description. Set an expiration that fits your organization’s security standards, and select Add.

Alternatively, you can run a curl command which will output a token:

curl --request POST \
  --url 'https://example.com/zabbix/api_jsonrpc.php' \
  --header 'Content-Type: application/json-rpc' \
  --data '{"jsonrpc":"2.0","method":"user.login","params":{"username":"Admin","password":"zabbix"},"id":1}'
Enter fullscreen mode Exit fullscreen mode

Note that you'll have to change example.com to your Zabbix server URL, and both the username and password values (appearing as "Admin" and "zabbix") to the proper login credentials for your system. Running this command in a terminal will give you results that look like this:

Image description

The result value is the API token you should save and use in future commands. Do not use the API token shown in the previous image or upcoming code snippets—this is an example only. For the rest of this guide wherever you see a command that specifies the Authorization: Bearer, include the API token that you just generated.

Next, you’ll need to make a request to the API for server data. Exactly how to do this is variable and will take a dash of digging, a tad of trial and error, and an iota of iteration. Ultimately the Zabbix API guide is going to be your best resource. With a little luck, you can pull together a curl request that looks something like this:

curl --request POST \
  --url 'http://example.com/zabbix/api_jsonrpc.php' \
  --header 'Authorization: Bearer 20a4a7e9636a3f57feed2bb4a391ae60' \
  --header 'Content-Type: application/json-rpc' \
  --data '{"jsonrpc": "2.0", "id": "1","method": "host.get", "params": {"selectItems": ["name", "lastvalue", "units", "itemid", "lastclock", "value_type", "itemid"], "output": "extend", "expandDescription": 1, "expandData": 1}}'
Enter fullscreen mode Exit fullscreen mode

If you're tempted to copy-and-paste, remember to change example.com to your Zabbix server name or IP, and the Authorization: Bearer string has to include your API token.

The previous curl command outputs server data that looks in part like this:

"jsonrpc": "2.0",
  "result": [
    {
      "host": "Zabbix server",
      "status": "0",
      "items": [
        {
          "itemid": "42237",
          "name": "Linux: Zabbix agent ping",
          "units": "",
          "value_type": "3",
          "lastclock": "1686186537",
          "lastvalue": "1"
        },
        {
          "itemid": "45502",
          "name": "Interface enp0s3: Bits received",
          "units": "bps",
          "value_type": "3",
          "lastclock": "1686186502",
          "lastvalue": "9688"
        },
        {
          "itemid": "45505",
          "name": "Interface enp0s3: Bits sent",
          "units": "bps",
          "value_type": "3",
          "lastclock": "1686186505",
          "lastvalue": "6384"
        },
        {
          "itemid": "42255",
          "name": "Linux: System boot time",
          "units": "unixtime",
          "value_type": "3",
          "lastclock": "1686185055",
          "lastvalue": "1686156037"
        },
        {
          "itemid": "42257",
          "name": "Linux: Load average (5m avg)",
          "units": "",
          "value_type": "0",
          "lastclock": "1686186557",
          "lastvalue": "0.287598"
        },
        {
          "itemid": "42269",
          "name": "Linux: CPU utilization",
          "units": "%",
          "value_type": "0",
          "lastclock": "1686186564",
          "lastvalue": "1.2775550000000067"
        },
        {
          "itemid": "42236",
          "name": "Linux: Free swap space in %",
          "units": "%",
          "value_type": "0",
          "lastclock": "1686186536",
          "lastvalue": "100"
        },

    ...
Enter fullscreen mode Exit fullscreen mode

The next step is to send that data to New Relic. You’ll do so using New Relic Flex, a veritable Swiss army knife of capabilities that can quickly and easily transform JSON into New Relic data streams.

To get started, you need to install the New Relic infrastructure agent—either on the Zabbix server itself or on a server that can reliably connect to the Zabbix server. To install the infrastructure agent, see the previous section, or read the documentation Install the infrastructure agent.

If you install the infrastructure agent on the Zabbix server itself, the New Relic installer will detect other elements and offer to install modules for them. 

Once the installation is complete, it’s time to use New Relic Flex.

First, navigate to /etc/newrelic-infra/integrations.d on Linux/Mac or C:\Program Files\New Relic\newrelic-infra\integrations.d on Windows, which is the primary location for all things Flex.

Now comes the fun part. You're going to create a Flex integration that runs the curl command you saw earlier, parses that data, and pulls it into New Relic. Here’s what you’ll add to the integrations.d file:           

integrations:
  - name: nri-flex
    config:
      name: getzabbixstats
      global:
        headers:
          Accept: application/json-rpc
          Authorization: Bearer 20a4a7e9636a3f57feed2bb4a391ae60
      apis:
        - name: zabbixstats
          url: http://192.168.101.227/zabbix/api_jsonrpc.php
          jq: .result[0].items
          method: POST
          payload: > 
            {"jsonrpc": "2.0", "id": "1","method": "host.get", "params": 
            {"selectItems": ["name", "lastvalue", "units", "itemid", "lastclock", "value_type", "itemid"], 
            "output": "extend", "expandDescription": 1, "expandData": 1}}
          remove_keys:
            - timestamp
Enter fullscreen mode Exit fullscreen mode

Remember to change the server name and authorization bearer token for your system.

With New Relic Flex, sending data to New Relic doesn't mean it automatically shows up. For more details on why that’s the case, read the blog Absolutely simple Flex. To see your Flex data in New Relic, you’ll need a bit of New Relic Query Language (NRQL) magic. With NRQL, you can query any data sent to New Relic. Here's a sample query:           

SELECT AVERAGE(lastvalue) from zabbixstatsSample WHERE name = 'Linux: Load average (15m avg)' FACET fullHostname TIMESERIES
Enter fullscreen mode Exit fullscreen mode

The previous query returns a graph of CPU utilization by hostname. The next image shows the result of that query.

Image description

You can then add this query to a custom dashboard (as described in Absolutely simple Flex). 

Even though this example uses Zabbix, you can use Flex and the infrastructure agent to send JSON data from any server or tool API to New Relic. 

Integrating monitoring data using Prometheus as a data forwarder 

In this final example, you’ll learn how to integrate Prometheus into New Relic with a custom integration to send any Prometheus metrics to New Relic for aggregation, visualization, alerting, and more.

For this example, you’ll use the Prometheus Node Exporter, which can collect local metrics and send them to a Prometheus server. To simplify this demo, you’ll set up a simple Prometheus server on the same box that you installed the Node Exporter on. Once you have both of those things in place, you’ll set up New Relic to connect to this server and export the data into New Relic. 

Starting with a bare Linux box (Ubuntu 22.04), you’ll install the Node Exporter first:

sudo apt install prometheus-node-exporter
Enter fullscreen mode Exit fullscreen mode

Once installed, you can verify data is being collected at the command line using curl localhost:9100/metrics. You should see something like this in your terminal:

Image description

Or you can see this data on a simple web page running locally by going to http://localhost:9100/metrics.

Image description

Next, you’ll send this data to New Relic. Start by going to the New Relic platform and choosing Add Data. Then search for Prometheus, which will offer three choices:

Image description

For this example, select Prometheus Remote Write Integration.

From the next screen, create a name for this integration, generate a URL, and save the URL to use for later. Also stay on this screen because you'll come back to it once you've completed the setup on the target machine.

Image description

If you don’t already have Prometheus installed and running, follow this fantastic guide from Cherry Servers to set it up. You're going to edit the configuration with the information you got from the New Relic installation. A default prometheus.yml looks something like this:

# my global config
global:
  scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

# Alertmanager configuration
alerting:
  alertmanagers:
    - static_configs:
        - targets:
          # - alertmanager:9093

...AND SO ON...
Enter fullscreen mode Exit fullscreen mode

Take the URL output from the New Relic installer and add it to the file like this:           

# my global config
global:
  scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).
remote_write:
- url: https://metric-api.newrelic.com/prometheus/v1/write?prometheus_server=PromethiusTest
  authorization:
    credentials: secretcredentialkeygoeshere

# Alertmanager configuration

...AGAIN, MORE STUFF AFTER THIS...
Enter fullscreen mode Exit fullscreen mode

Depending on your version of Prometheus, the URL that is generated may have to be adjusted. As noted in the docs Set up your Prometheus remote write integration, for Prometheus 2.26 or lower, the format is:

remote_write:
- url: https://metric-api.newrelic.com/prometheus/v1/write?prometheus_server=PromethiusTest
  bearer_token: secretcredentialkeygoeshere
Enter fullscreen mode Exit fullscreen mode

But for versions 2.26 or higher, it's slightly different:           

remote_write:
- url: https://metric-api.newrelic.com/prometheus/v1/write?prometheus_server=PromethiusTest
  authorization:
    credentials: secretcredentialkeygoeshere
Enter fullscreen mode Exit fullscreen mode

In any case, the remote_write section you got from the New Relic installer goes in the top global: section, and your spacing matters: remote_write and - url: are both at the far left of the file, in column 0.

Once you've made this change, restart the Prometheus server with sudo systemctl restart prometheus. Then make sure there are no errors with sudo systemctl status prometheus. Optionally, see if the Prometheus web page is running at http://localhost:9090.

Now go back to the browser tab opened to New Relic and the Prometheus installation guide. If everything is running correctly, you'll get a message about how many metrics have been ingested as the next image shows.

Image description

Select Explore your data to see your pre-built dashboard for the Prometheus remote write integration. The next image shows an example of what the dashboard looks like.

Image description

The top two rows of this dashboard contain charts that provide information about the volume of metrics sent to New Relic. The remaining rows show prometheus_remote_storage* metrics. From here, you can explore your data or dive into other pre-built dashboards.

To learn about other ways to send Prometheus data to New Relic, start with these resources:

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .