How soon is now? Why you might want to consider a time series database

When we think about data storage, most of us immediately jump to the trusty relational databases that we’ve used time and again—systems like Microsoft SQL Server, Oracle, or MySQL. They’ve been around for a long time, and they do their job well, particularly when it comes to transactional data. However, not all data fits neatly into that structure, especially when we’re talking about time-sensitive data. That’s where time series databases (TSDBs) come in, and I’m here to explain why you might want to consider them for your next project.

In my years of working on various data-centric projects, I’ve encountered different storage solutions tailored for specific needs. From large-scale data lakes designed for reporting infrastructures to real-time monitoring systems in critical environments, time has always been a key factor. How long did something take? When did an event happen? These are questions that drive many business decisions, but answering them can get complicated—especially when handling large datasets. And if you’ve ever dealt with time-stamped data in traditional systems, you’ll know it’s not always straightforward.

Why not stick to relational databases?

Let’s start with what most of us are familiar with—relational databases. They’re excellent for transactional systems where data integrity and ACID compliance are essential. You know the drill: you’re working with a structured schema, and everything has its place. They’re great at maintaining hierarchical relationships between data entities, which is why they’re at the heart of so many critical systems. But they’re not built for speed when it comes to time-related data, especially when you need to scale quickly.

Relational databases like SQL Server and Oracle don’t scale elegantly when faced with high-speed data insertion requirements, particularly when the data is streaming in from real-world devices or sensors. It’s not just about storing the data; it’s also about ensuring no one else is inserting data at the same time, updating indexes, and dealing with other behind-the-scenes processes. This complexity can cause performance issues over time. When you need to collect and analyse data in real-time—like with IoT devices or system monitoring applications—this becomes a significant bottleneck.

Comparing different data stores

Before diving into time series databases, it’s worth understanding where they sit in the broader landscape of data stores. Besides relational databases, we have:

  • Document stores: Great for unstructured data like JSON or XML, where the data may evolve over time and you don’t want to enforce a strict schema. Think MongoDB or Couchbase.
  • Key-value stores: These are fast and good for scenarios where you need quick reads and writes of individual data points. Redis and DynamoDB fall into this category.
  • Message queues: If you need reliable, ordered delivery of messages, message queues like Kafka or RabbitMQ help buffer the flow of data between systems.

Each of these data stores has its strengths, but when you’re dealing with time-sensitive data that needs to be ingested and queried efficiently, none of these options compares to a TSDB.

The philosophical dilemma of time

One thing that often gets overlooked in data systems is the very nature of time. Time, after all, is not always as straightforward as it seems. When you’re working with data, understanding when something happened can be surprisingly tricky. You have to deal with different time zones, daylight saving changes, and differences in time stamps from different systems. When is time the same, and when is it different?

TSDBs are built to handle the complexities of time more gracefully. Whether it’s making sure events are recorded in the right sequence, managing the differences in time zones, or dealing with overlapping events, TSDBs take the guesswork out of time-sensitive data. They ensure that events happen in the correct order, allowing you to make accurate inferences based on the time stamps and event sequence.

The power of time series databases

That’s where time series databases come into play. TSDBs are designed specifically for handling time-stamped data. The key advantage of a TSDB is its ability to quickly ingest vast amounts of data without slowing down. When you’re dealing with devices that send data every few milliseconds, insert speed is crucial. TSDBs are optimised for this, making them a far better choice for real-time data ingestion than traditional relational databases.

For example, if you’re monitoring a factory’s production lines, you might be gathering sensor data at an incredible rate. You want to capture every event as it happens—every temperature reading, every pressure change, every machine status update—and you need to store it in the exact order it was received. Time series databases are built to handle this kind of workload. They maintain the order of events, and once data is stored, it remains fixed. This allows for high-speed queries on the most recent data, which is often the most valuable.

One thing I’ve found particularly interesting is how the value of data changes over time. In most applications, the freshest data is the most important. For example, in a system monitoring application, the last 24 hours might be critical, but data from six months ago? Probably not as useful in the same level of detail. TSDBs allow you to store high-resolution data for recent time periods while aggregating older data at lower resolutions to save space and costs.

Handling sparse and bursty data

Another advantage of TSDBs is their ability to handle “bursty” or sparse data. This is a common problem with IoT devices or systems where data may not be coming in at regular intervals. For instance, a sensor might report temperature changes every minute but go quiet for hours if there’s no significant fluctuation. In a relational database, you’d end up with lots of empty fields or redundant data. TSDBs, however, are built to manage this type of data flow efficiently, storing only what changes and ignoring the rest.

Querying time-based aggregations

If you’ve ever tried to perform time-based queries in SQL, you know it’s no picnic. Time-based calculations like averages, time windows, or event durations are clunky in traditional databases. TSDBs simplify this by providing native support for time-based queries. Want to know the average temperature over the past hour? How about comparing performance data between two specific time frames? These are standard operations in a TSDB, but they can be incredibly painful to implement in a relational database.

A real-world example: Trains, leaves, and skidding

Let me tell you about a project I worked on that really drove home the value of time series databases. A few years ago, I collaborated with a research team that was investigating a recurring problem on a major railway network. Every autumn, trains would start having trouble stopping—yes, really, the trains just couldn’t brake properly. The culprit? Fallen leaves.

When leaves fall on the tracks and get squashed by the train wheels, they create this slippery varnish that makes the rails dangerously slick. As a result, trains would overrun stations, leading to delays, safety concerns, and massive disruptions in the schedule. Because reversing trains isn’t always allowed, this problem caused even more headaches, requiring incident reports and knocking timetables out of whack.

The research team wanted to dig deeper into this problem, so we used a time series database to monitor the data coming in from the trains. The trains were equipped with transport hardened data gateways  that sent data from their black box recorders to a cloud-based system. Using this data, we created a heat map of the rail network, showing where and when the skidding was happening.

This was possible because of the power of the time series database. We were able to aggregate data from thousands of train journeys, across different time periods, and generate heat maps in real-time. In fact, we could zoom in from a year’s worth of data down to a specific second when a train started skidding. The dashboard we built using Grafana allowed the rail operators to visualise the problem and start developing solutions. Without a time series database, this kind of real-time analysis simply wouldn’t have been possible.

Integrating time series tools

In that project, we used a combination of InfluxDB (a time series database), Kafka (a message queue to buffer the streams of data from devices), and Grafana (for visualisation). These tools worked together seamlessly, and the beauty of it was that the research team didn’t need to build a whole new infrastructure from scratch. By using software as a service (SaaS) solutions from  InfluxDB , Confluent Kafka, and Grafana Cloud , we were able to set up the system quickly and scale it as needed without heavy upfront investment.

When to consider a time series database

So, when should you think about using a time series database? Well, any project that involves a lot of real-time data—like IoT, system monitoring, or financial transaction analysis—would benefit from a TSDB. If you’re looking to optimise for speed, time series databases allow you to capture data as it happens and query it in real-time, whether it’s for performance monitoring, predictive maintenance, or operational efficiency.

Another key area is application performance monitoring. Many tools like Datadog and Splunk use time series databases on the backend because of their ability to handle metrics data effectively. If you’re gathering stats on memory usage, CPU load, or network failures, a TSDB can help you aggregate and analyse that data to identify trends and potential areas for improvement.

The bottom line: Why time series databases matter

Time series databases aren’t the solution to every data problem, but when it comes to time-sensitive data, they’re hard to beat. Whether you’re dealing with IoT devices, monitoring critical systems, or trying to solve quirky problems like trains slipping on wet leaves, TSDBs give you the ability to collect, store, and analyse data in real-time, with the speed and scalability that traditional databases can’t match.

If you’re starting a project that involves large amounts of real-time data, I’d encourage you to consider a time series database. It might just save you a lot of time—and headaches—down the road.

Want to know more about how DiUS can help you?

Offices

Melbourne
Level 3, 31 Queen St Melbourne, Victoria, 3000

Phone: 03 9008 5400

Sydney
The Commons

32 York St Sydney,

New South Wales, 2000

DiUS wishes to acknowledge the Traditional Custodians of the lands on which we work and gather at both our Melbourne and Sydney offices. We pay respect to Elders past, present and emerging and celebrate the diversity of Aboriginal peoples and their ongoing cultures and connections to the lands and waters of Australia.

Subscribe to updates from DiUS

Sign up to receive the latest news, insights and event invites from DiUS straight into your inbox.

© 2024 DiUS®. All rights reserved.

Privacy  |  Terms