How to Get Started with Time-Series Data Analysis: Resources Roundup

Lacey Butler - Apr 9 '20 - - Dev Community

Every data point we collect has a timestamp, whether it’s when a customer did something on your website, your infrastructure’s CPU utilization, or when a bus last received maintenance.

At Team Timescale, we believe that all data is time-series data, and, thus, we’ve built TimescaleDB: a time-series database that gives you the tools and services you need to store long-term metrics and gain insight for cost-management, capacity planning, root-cause analysis, and more.

...and, to help you get started with time-series data analysis, we’ve rounded up a few of our most popular tutorials, recommended resources, and datasets. While this list isn’t exhaustive, it’s comprehensive enough to get you up and running, without leaving you saying “which one should I start with?” (disclaimer: I’m fascinated by psychology and sociology, including the paradox of choice theory).

Tutorials & How-Tos

(1) Analyzing Historical NYC Taxicab Data and making predictions: this set of tutorials uses NYC taxicab data to walk you through, step-by-step, how to load data in your database and use SQL to understand past trends (like the number of rides to specific airports, ride duration, and rate). From there, you’ll ask questions and plan for the future (e.g., should NYC plan to have more taxis on the streets before or after 12am on New Year’s Eve?).

(2) Visualizing Data in Grafana and Tableau: if you’re collecting data and want to analyze it, odds are you also need visualizations: the custom dashboards, reports, graphs, and charts that make it easier to see trends and current status at a glance. Grafana is an open-source visualization tool that’s well-suited to time-series data (and Team Timescale uses it ourselves!), while Tableau is a proprietary solution that’s popular with many Timescale customers and global organizations.

  • These tutorials show you, in detail, how to turn your data into powerful visualizations that help you better understand whatever you’re collecting. You’ll see how to connect each tool to your database - TimescaleDB in this case - run queries, and build the visualizations you’re after.

(3) Use TimescaleDB to Store Prometheus Metrics Data & Visualize Results in Grafana (2 parts!): Prometheus is arguably the most popular open-source systems monitoring solution, and, as mentioned above, Grafana is a much-loved open-source visualization engine. If you’re looking for a lower-cost or more flexible way to store your metrics, correlate them to past trends, and plot results in team dashboards, this tutorial has you covered.

  • In this first part, you’ll learn how to set up TimescaleDB, Prometheus, and Grafana.
  • In the second part, you’ll learn how to use your metrics data to answer questions about infrastructure performance.
  • Includes a sample dataset, so you can run through without setting up a full monitoring harness 🎉.

(4) How to Clean (Normalize) Public Datasets: Public datasets can help us gain insight into our businesses, our users, and our world. If you’re new to time-series data analysis, open datasets give you a pretty rad training ground: datasets range from municipal and government data and weather almanacs to tweet sentiment during the Super Bowl or new video game popularity.

  • But, combining public datasets or joining them with your own data often requires a series of steps to clean up (or “normalize”) the data.
  • Fear not: this short how-to walks through 2 ways to normalize datasets so you can get to the best part: analyzing and using data to better understand your world.

Videos

The below examples use IoT and monitoring scenarios to demonstrate what time-series data analysis and visualization make possible – but there are many other applicable use cases, from geospatial analysis to using historical data to predict sales or market trends.

(1) Time-Series for Developers coding session recap (YouTube recording): For a primer on just what time-series data is and what makes time-series databases different, this recap blog and session recording breaks down the fundamentals, but keeps it light on theory. You’ll learn (a) what time-series data is in practical terms and (b) the types of questions you can ask and answer with it.

  • To demo just how powerful time-series analysis can be, Avathar (my fantastic colleague) spends 15-20 mins walking through a mock scenario, where we’re “tasked” with analyzing NYC taxicab data to find ways to cut carbon emissions, suggest routes to travelers, and more.
  • You can follow along with the tutorial and get the dataset here - it’s our “Hello, World!”-esqe example.

(2) How to Analyze Your Prometheus Data in SQL 3 Queries You Need to Know & How (and why) to use TimescaleDB as a long-term store for your Prometheus metrics: Our resident webinar and demo pro Avthar shows you how to set up a long-term store for Prometheus metrics (and why you should), write custom queries, and spin up Grafana dashboards.

  • You'll set up aggregates to rollup hourly and daily summaries of your metrics, create automated downsampling rules to keep metrics around - without wasting disk space - and see 3 common monitoring queries that you can use and customize right away.

(3) IoT Monitoring and Time Series Analysis(YouTube): We demo'd this at AWS re:Invent 2019 to show how to connect NYC’s Metropolitan Transportation Authority data to your database (TimescaleDB in this instance), and use special SQL functions - like time bucketing and gap filling - to monitor all buses running in NYC and display real-time results Grafana. You’ll see how to track when buses go off route, deal with missing data points, and beyond.

Datasets

The above examples include a few datasets that you can use to practice your skills, but there are thousands of options available. A few places to start:

  • San Francisco Open Data includes comprehensive time-series datasets for health and safety, city services, and much more. Similar projects exist for cities all over the world, including New York, London, Los Angeles, various Government of India entities.
  • Beyond cities and municipalities, you can find and access data about weather, cryptocurrency, and the United States Census Bureau.
  • Kaggle offers almost 20K publicly available datasets, with search and sort functionality based on your interests and programming language of choice. You’ll find everything from public education to presidential debate Twitter sentiment and video game sales - as well as tutorials and competitions for all levels.

Wrapping up

Time-series data allows us to better understand our world, whatever that means to you (be it your open source project community, your business operations, or the global population writ large).

🤞I hope this list gives you a place to start or brush up on existing skills. If I've piqued your curiosity and you’d like to connect with like-minded folks, I invite you to join us for DataPub - a virtual meetup for data enthusiasts. Our next meeting will be on Tuesday, April 21st at 1pm PT/ 4pm ET/ 8pm GMT.

  • You’ll hear from community members about their projects, ways developers are using open data to build amazing applications, and most, importantly, get a chance to ask questions and share your own.
  • Everyone’s welcome, so invite friends, family, colleagues, or your pets!

Can’t make this month’s session? You’ll receive a recording and follow-up even if you’re unable to attend (we know things come up).

You can also subscribe to our newsletterto learn about next month’s date and guest speakers.

. . . . . . . . . . . . . . . . . . . . . . . . . .