REGISTER NOW: Supercharge Your Cloud Integrations and Watch Our Webinar Recording

Kloudless Blog

Tutorials, case studies and how-tos from our experts

Monitoring PostgreSQL with Telegraf, InfluxDB, and Grafana

Kloudless uses Telegraf and InfluxDB for our system and application metrics as well as Grafana for visualizations and alerting. Telegraf allows us great flexibility in terms of data sources. It supports everything from the StatsD wire-protocol to custom shell scripts and database queries. Together, these tools form InfluxData’s TICK stack.

For context, Kloudless provides an API abstraction layer that enables apps to integrate with any of their users’ SaaS tools like Google Drive, Slack, and Salesforce, with a single implementation. As part of this, Kloudless tracks changes in connected accounts, to offer an activity stream API endpoint and webhooks. A single account can have thousands of events in a couple of minutes that must all be temporarily stored in our database for apps to retrieve from our platform.

Kloudless uses the TICK stack not only to track billions of API requests themselves, but also to monitor our database’s performance.

To exemplify the latter scenario, we can use a real life issue that we encountered with PostgreSQL. Queries involving a certain table were performing poorly. We narrowed down the cause to table bloat, but in order to ensure that our changes were having any effect, we needed to measure it over time. Otherwise, any solution of ours would just be a best guess!

Checking Bloat

In order to manually check the table bloat, we can use the following SQL query adapted from the Postgres wiki (for versions 9.0+):

Configuring Telegraf

With the PostgreSQL Extensible Telegraf plugin, the following configuration will send the results of the above query to the configured outputs:

One caveat is that the user must have read access on all of the relevant tables in that database. Grant read-only permissions with the following SQL query:

Querying the data in InfluxDB is relatively straightforward. The following InfluxQL shows recent measurements:

Viewing Results in Grafana

The easiest way to monitor the stored data is via Grafana. Using dashboard level filters to limit the graph of Bloat Ratio to a particular table, we can easily see that the behavior over time changes:

Bloat ratio over 4 days

There is a minor improvement to the steady-state bloat ratio after updating our code to avoid unnecessary tuple updates. The most dramatic improvement comes from periodically repacking the table with pg_repack. Pg_repack performs a full clean up of dead tuples without blocking the table the way a full vacuum would. The smaller variations in the ratio correspond with the background autovacuuming processes. Autovacuuming does seem to help, but is not sufficient to keep space usage for this particular table under control.

Conclusion

At the end of the day, it’s always important to track metrics to ensure things are behaving as expected. This will definitely not be the last query we monitor, since it has been so useful! This system can be used to track the results of any query, whether for monitoring Postgres itself or your application.

Published By

David Thorman