A Casual Walk Down Bottleneck Road

Let’s imagine your working at a company with the following back-end architecture.

A Rest Server receives client requests, accesses the database and responds back to clients.

Your back-end is working fine, smooth and dandy. Client requests are completing quickly, timeouts are rare occurrences. Users are happy..

Then one day you receive a barrage of alerts. “WEB REQUESTS ARE TAKING SECONDS-MINUTES TO COMPLETE”, “APIS ARE SLOW”. Users are angry…

Developer receiving onset of alerts

Your team have tasked you with fixing this issue. You know the bottleneck lies in the database.

Luckily for you, most Database Management Systems(DBMS) provide the tools and capabilities for diagnosing this issue.

For this blog we will look at the popular PostgreSQL DBMS. Demonstrating how to fix query bottlenecks using the PostgreSQL's diagnostic tooling and capabilities.

Tracking down the query

The pg_stat_statments tool(extension) records queries that are running against your database. Saving query data, such as query execution time, and the number of calls.

Slow queries will show up bright and clear in this table.

pg_stat_statement table output

Here we see our application query taking 7 seconds to complete. This also correlates with the end-to-end latency reported by users.

Tip: Correlate and identify the database query that is holding up the API request.

Understanding Query Execution

With query in hand, the next step is understanding the query execution.

For this we have the explain analyze tool, which tracks the execution steps(“nodes”) of the query.

Just prefix the query with explain analyze, and observe the results.

Explain analyze on query with cost redaction

Executions plans are read in the order the plans are executed in; inside out.

The -> sign of the EXPLAIN command indicates the hierarchy of steps (node). The earlier the step is executed, the greater indent it has.

Each plan has actual time=<startup>..<total>, more on this later.

Query starts with lookup by performing a ‘Sequential Scan’. Finishing after 282ms(“….282.427”).

Step 1: Sequential scan completes after 282ms.

With this step taking 282ms out of a total execution time of 6606ms. We have the ratio of 282ms/6606ms. In other words, this step is only taking 4% of the total execution time. Lets move on to the second step.

The query continues with the sort operation. Here is where we see a jump in latency.

Step 2: Sort completes after 6475ms.

Step two begins at the 282ms mark in the query. You can visualize the execution breakdown with the following time diagram:

It becomes quite clear that it’s indeed this step, the sort step, that is taking most of the total execution time.

Understanding Step Execution

Now that we understand which step is latent, lets understand why.

The answer lies in the sort method: external merge , the most costly sort method in Postgres.

External merge is much like Merge sort, and is used when the sort operation cannot be performed in memory. Instead of an in-memory sort, the data-set is sorted and flushed on disk. As disk access is an order of magnitude slower than memory access, the disk sort operation takes much longer as well.

This disk access is what’s holding up the query.

Optimizing the Step

As the saying goes, “understanding the problem is half the solution”. Since we have identified the issue, we can optimize the query.

There are two optimizations available:

  1. Increase the work_mem parameter so there is enough memory for sorting operations.
  2. Create index on the order by condition. Since B-Trees indexes are already sorted.

For a more detailed explanation see article by the folks at Cybertech.

In Conclusion ..

Combining ‘pg_stat_statement’, with ‘explain analyze’ we were able to identify the bottleneck query and ultimately fix the issue.

This is really a primer on the subject of query optimization. In the future I hope to cover this subject in more depth.

Feel free to drop a comment! Its always appreciated.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store