Postgres Partitioning Best Practices: Sofia's Story

September 19, 2025

Postgres Partitioning Best Practices: Sofia's Story

Thank you to everyone who came to listen to my talk, "Postgres Partitioning Best Practices", at Euruko in Viana do Castelo, Portugal on 18 September 2025.

Thank you for all the questions and conversations, and thank you, especially, to the real-life Sofia - the person who found me to say

"Your talk described exactly what I went through, and I wish I'd been able to watch a talk like this before I started."

I'm sorry this talk came too late to save you some pain, but it's good to know it will be helpful to other people in the same situation.

Here are the slides and transcript from the talk. I'll share the recording once it's available.

Oh, and I fixed the seemingly random chapter numbers to convince folks that I can actually count!

One final note before I get into the actual content: I've included just very brief alt-text for each slide image, but I've also added any code or other essential information into the body of the blog post.

Postgres Partitioning Best Practices

As an aside, you may or may not know that Crunchy Data is now part of Snowflake, and this was the first outing for my Snowflake t-shirt, but that's a story for another day!

whoami: photo of Karen and representation of DBA career

I always feel like an interloper at Ruby conferences, because I'm not a developer.

I like to show this diagram of my career so far, just to prove that I do at least know about databases!

I was a DBA for about 20 years before becoming a database consultant and now senior solutions architect.

I’m also on the PostgreSQL Europe board of directors and I’m leading the PostgreSQL Europe Diversity Task Force so feel free to find me to talk to me about that.

Introduction

What am I talking about today, and why should you care about it?

Maybe you're thinking about partitioning your huge database tables to make them easier to manage.
Perhaps you want to know how partitioning works and what the best practices are,
or maybe you're just wondering what table partitioning is all about.

I've worked with lots of customers who have already been through that thought process, and this presentation is based on some of the questions they asked me and the things they learnt along the way.

But instead of just presenting a series of questions and answers, I'd rather tell you a story.

Picture of Sofia with a screen showing ships representing NavTracker

Let me introduce you to the (fictional) heroine of our story: Sofia, a talented and passionate Ruby developer.

Sofia is working on the (equally fictitious) "NavTracker" logistics application, used by shipping companies to track the movement and status of cargo ships.

Sofia and her small team have built "NavTracker" from the ground up. It's an overnight sensation in the shipping industry, with several fleets of ships now using the service.

The shipment_logs table forms the backbone of the NavTracker application.

It keeps track of each ship's current and historical coordinates, status and other important information. The code snippet shows a simplified version of the table.

It's constantly, and frequently updated for every ship:

From every few minutes when anchored,
to every few seconds whilst underway,
and also based on events such as loading, unloading, docking etc.

Code from the slide:

  
    # navtracker_app.rb

    require 'active_record'
    require 'pg'

    ActiveRecord::Base.establish_connection(
      adapter: 'postgresql',
      ...
    )

    # Create shipment_logs table

    ActiveRecord::Schema.define do
      create_table :shipment_logs, force: true do |t|
        t.integer :ship_id
        t.decimal :latitude
        t.decimal :longitude
        t.string :current_status
        t.timestamp :log_timestamp
      end
    end

    class ShipmentLog < ActiveRecord::Base
      ...
    end

Chapter 1: The Looming Crisis

NavTracker is a victim of its own success.

With more and more fleets using the application, some of whom have hundreds of ships, there's more and more data, bigger and bigger database tables, and longer and longer maintenance processes.

The shipment_logs table in particular is growing at an incredible rate, with millions of new rows created by the constant status updates for thousands of ships.

Clients start to complain about slow dashboards and delayed reports.

Sofia's team notices that the VACUUM process takes many times longer than it used to.

It takes far too long to rebuild indexes, and purging old data takes hours.

The team is spending more time firefighting than building new features.

Chapter 2:
A Glimmer of Hope: Discovering Partitioning

Sofia needed to find a better way.

Separate a table into smaller pieces based on column value

So she asked around, watched some database talks, and learnt about Postgres table partitioning.

She found out that she could break up her huge shipment_logs table into smaller, more manageable pieces.

several rows of data from the shipping_logs table. Ship_ids 001, 002, 003, various latitude and longitude values, textual status updates and various timestamps.

Just so you can visualise the contents of Sofia’s shipment_logs table, this is an extract containing a few rows each for 3 different ships.

She wondered about partitioning the table by ship_id, in other words using ship_id as her partition key.

She could create one partition per ship, and each partition would contain just the rows for a specific ship.

Any of her queries that select from the main shipment_logs table would still see all the rows, but the main (parent) shipment_logs table wouldn't physically contain any data.

The data would now be stored in the individual partitions.

List partitioning (image of shopping list)

This type of partitioning is called list partitioning.

Each partition contains rows with one or more specific values of the partition key.

It’s useful if you have a database that’s easily separated by a common field across your entire data set. In Sofia's case, that's ship_id, but you might have data separated by region, by warehouse, by fleet or any number of other values.

Sofia would creating a list-partitioned shipping_logs table in 2 steps:

First create the main table, specifying LIST as the type of partitioning and ship_id as the partition key.

Code from the slide:

  
    CREATE TABLE shipment_logs (...)
    PARTITION BY LIST (ship_id);

Then create the individual partitions, indicating which value or values of the partition key (i.e. which ship IDs) should go into each partition.

Code from the slide:

  
    CREATE TABLE shipment_logs_ship_1
    PARTITION OF shipment_logs
    FOR VALUES IN (001);

    CREATE TABLE shipment_logs_ship_2
    PARTITION OF shipment_logs
    FOR VALUES IN (002);

image of calendar and stopwatch plus sample queries

Sofia realizes that actually, most user activity happens on the most recent log entries.

For example this first query that looks at the last 10 updates for ship number 3, or the second query that looks at all ships that were loaded in the past 24 hours.

What if, instead of searching the whole of the shipment_logs table for that information, Postgres could just look in a smaller table containing just the last day's worth of data?

Code from the slide:

Query 1:

  
    select * from shipment_logs
    where ship_id = 003
    order by log_timestamp desc
    limit 10;

Query 2:

  
    select * from shipment_logs
    where current_status = 'loaded'
    and log_timestamp > current_timestamp - interval '1 day';

Query performance: scan entire data set vs. scan individual date ranges

She found a blog post on the Crunchy Data website that talked about the potential performance benefits of table partitioning.
As Matz said in his keynote - everyone loves performance!

If you have a huge table containing data for an entire year, but your query is only looking at a small time period within that, Postgres might have to scan the whole table to find the information it needs.

If the table is partitioned on a date field, i.e. split into multiple smaller table partitions each covering just a small date range, Postgres can scan just the partitions it needs, rather than the entire dataset.

It does this using a technique called Partition pruning.

Partitioning a table in this way, into “ranges” of values of the partition key, is called range partitioning.

It's probably the most common type of partitioning, typically used with time or integer series data.

Again, the first step to creating a range-partitioned table is to create the parent table, state the type of partitioning (range) and the partition key (in Sofia's case log_timestamp).

Code from the slide:

  
    CREATE TABLE shipment_logs (...)
    PARTITION BY RANGE (log_timestamp);

Then, create the partitions for the required ranges.

These are a couple of Sofia's one-day partitions (for 1st September 2025 and for 2nd September 2025).

Note: the ranges of values assigned to partitions mustn’t overlap.

Also, the ranges’ bounds are inclusive at the lower end and exclusive at the upper end, so the 1st September partition includes values for 1st September but not for 2nd September.

Code from the slide:

  
    CREATE TABLE shipment_logs_2025_09_01 
    PARTITION OF shipment_logs
    FOR VALUES FROM ('2025_09_01') TO ('2025_09_02');

    CREATE TABLE shipment_logs_2025_09_02 
    PARTITION OF shipment_logs
    FOR VALUES FROM ('2025_09_02') TO ('2025_09_03');

As well as list and range partitioning, Sofia learnt that Postgres also allows hash partitioning (buckets of values) and composite partitioning (a combination of different types of partitioning).

Hash partitioning could be useful if Sofia needs to partition a table where the data doesn’t have a clearly defined pattern.

When creating your hash partitions, you use modulus to say how many partitions you want, and remainder to specify the values that should go into each partition.

Code from the slide:

  
    CREATE TABLE shipment_logs (...)
    PARTITION BY HASH (id);

    CREATE TABLE shipment_logs_p1
    PARTITION OF shipment_logs
    FOR VALUES WITH (modulus 5, remainder 0);

    ...

    CREATE TABLE shipment_logs_p5
    PARTITION OF shipment_logs
    FOR VALUES WITH (modulus 5, remainder 4);

Postgres allows you to subpartition a table, i.e. partition the partitions.

Subpartitioning lets you combine different types of partitioning.

One of the team members reminds Sofia that they often need to manage data for each fleet separately, and that they actually have a fleet_id in most of their tables.

They could potentially use list partitioning to partition by fleet, then use range partitioning to subpartition by log timestamp to get a daily partition for each fleet.

Chapter 3: Difficult Decisions

Sofia already knows she needs to decide what type of partitioning to use, and what column to use as the partition key.

But there are other decisions to be made as well.

Simple entity relationship diagram representing the NavTracker application, with some partitioned tables.

Which tables should she partition?

She won’t necessarily partition all of the Navtracker tables. For example, the Fleets table, and reference tables such as ship_types and status_categories are probably much too small to need partitioning.

Create a default partition? Picture of a junk drawer.

Does she or does she not create a default partition?

It’s kind of the junk drawer of the partition world: anything that doesn’t fit neatly into one of the partitions she's defined will go in the default partition.

Sofia wonders whether or not it’s “good practice” to create a default partition.

Fortunately, she found some useful information in another blog post!

She read that on the positive side:

If you have a default partition, you won’t get errors when inserting data that doesn’t belong to an existing partition.
That means she wouldn't need to decide in advance exactly which partitions are needed,
and she wouldn't need to make application changes to deal with any incorrect or unexpected data.

Although, of course, any incorrect data that ends up in the default partition would still need to be cleaned up later.

The main issue is that if there's a lot of data in the default partition, it can take a very long time to add any new partitions to the table.

This is because Postgres has to check every single entry in the default partition to make sure there aren’t any rows that should actually be in the new partition. It takes an exclusive lock on the entire partitioned table during the check.

Entity relationship diagram with roadworks sign.

She wonders if she'll need to make application changes.

She's heard that applications should be carefully designed to make the best use of partitioning.

Will there be schema changes?

At the very least, she'll need to add the partition key to all tables that will be partitioned, including child tables that would not necessarily otherwise contain the column.

Will she need to make changes to queries?

If she wants to take advantage of partition pruning, she'll need to make sure that the queries filter on the partition key.

6 fleets x 365 days = 2190 subpartitions!

Sofia starts to think about partition management.

She does a quick calculation and realises that if she decides to partition the shipping_logs table by fleet and to subpartition on log_timestamp to create a subpartition for each day's logs, she's going to end up with over 2,000 partitions by the end of the year, just for that one table.

She starts to wonder how the team should manage the partitions. When and how should the daily partitions be created?

She wonders if she should look for a tool that can help automate some of the partition maintenance tasks.

Sofia decides to investigate pg_partman to see if it would be useful for NavTracker.

It's a PostgreSQL extension that enhances Postgres' native partitioning to make managing partitions easier.

pg_partman

automates creating new partitions,
can drop old partitions based on a retention policy,
uses a background worker process to automatically run partition maintenance.

Chapter 4: The Big Migration

Finally, after making all of the decisions, the team was ready to start actually partitioning the tables.

Sofia finds that it's not as simple as she'd hoped to partition their existing tables.

Unfortunately, she can't just convert a non-partitioned table to a partitioned table (or vice-versa for that matter). She needs to create a new, partitioned table and migrate the data into it.

But, as we've seen, the current shipment_logs table is huge, so that would be very time-consuming, and they can't afford to take it offline for long because it's constantly in use.

She really wishes she'd known about partitioning at the beginning of the project so she could have planned ahead and created partitioned tables from the start.

She was pleased to find, though, that pg_partman has features that can help to turn an existing table into a partitioned table.

There's an offline and an online method.

The offline method

is the simpler of the two.
allows data to be moved in small batches.
but: the data is unavailable during the migration.

The online method

is more complicated and less flexible.
There are restrictions such as needing to drop foreign keys to the table and re-create them against the new partitioned table.
but: it lets you do the migration with minimal downtime.

Sofia uses the detailed instructions in the pg_partman docs to plan a careful migration, moving the data in small batches with the most recent, most frequently-used data first.

Chapter 5:
Smooth Sailing: Benefits of Partitioning

Moving full steam ahead to the end of the story to talk about all the good things that Sofia encountered as a result of table partitioning.

Matz also said that joy is greater than performance. Performance is just one of the benefits that partitioning brings.

large table with full data set vs. smaller table containing just the data that's needed

Sofia found it much easier to manage the NavTracker data lifecycle. This is an image from the "Native Partitioning with Postgres" blog post, which describes how partitioning makes it easy to store just the data you need.

(Image from a Crunchy Data blog post)

bloat caused by deleting large number of rows

Before partitioning the shipment_logs table, if Sofia needed to archive or purge all of the data for one particular fleet, she’d need to do a delete.

That would be time-consuming, would generate lots of WAL (write-ahead logs), would leave the table bloated, require vacuum and potentially an index rebuild.

shipping_logs table split into one partition per fleet

With the table partitioned by fleet,

Data Lifecycle: Purging Data - dropping a partition

Sofia could just DROP the fleet_1 partition, which is a much faster and cleaner operation.

Code from the slide:

  
    DROP TABLE fleet_1;

Data Lifecycle: Purging Data - detach a partition

If she didn't want to get rid of that data just yet, but doesn't want it to be part of the main table any more, She could use DETACH PARTITION so it becomes a separate table in its own right.

Code from the slide:

  
    ALTER TABLE shipment_logs
    DETACH PARTITION fleet_1;

Similarly, instead of doing a bulk insert into a table - say she needs to load data for an entirely new fleet into the table - Sofia can create a new table called fleet_4.

(She used the LIKE shipment_logs syntax here to avoid having to retype the column names etc.)

She would load the data into the new table, perform any checks etc. ...

Code from the slide:

  
    CREATE TABLE fleet_4 (
    LIKE shipment_logs 
    INCLUDING DEFAULTS INCLUDING CONSTRAINTS );

...and then attach it as a partition of the shipment_logs table.

Code from the slide:

  
    ALTER TABLE shipment_logs
    ATTACH PARTITION fleet_4
    FOR VALUES IN (004);

Maintenance processes became much easier to manage:

Instead of one, interminable vacuum on the whole table, there are multiple, smaller vacuums - one per partition.

Instead of one, long index rebuild, the index for each partition is rebuilt separately.

And, of course, they can be run in parallel if you have the resources and want them to go faster.

multi-tenancy: entity relationship diagram showing some tables partitioned

Another advantage of partitioning, that Sofia is keeping in mind for future projects, is for a multi-tenant environment, where the database hosts multiple customers or applications.

Each fleet might be a separate tenant, able to access only its own data, which would be in separate partitions.

Row level security (RLS) policies could be used to restrict fleet operators to querying/modifying only their own data.

Chapter 6: Navigating Obstacles

Despite the success, Sofia encountered some obstacles.

There are some limitations that apply to partitioned tables, and some things that she needed to watch out for.

We already saw, for example, that Sofia found partitioning existing tables complex and time consuming.

Postgres table partitioning features are being enhanced in every new version, but:

You can't (yet) create a global index on a partitioned table.
An index on a partitioned table is actually a set of local indexes, one per partition.
This also means that the primary key constraint on a partitioned table has to include all of the partition key columns.
It's not (yet) possible to merge or split partitions

Partitioning isn't a magic bullet for performance. Sofia needed to work to make sure partitioning was actually making things better.

The application needs to be aware of the partitions, and queries need to filter on the partition key(s).

She found that it was necessary to test, and to re-write some of the queries to make sure there were no regressions after partitioning.

Epilogue

NavTracker is now faster, more stable, and easier to maintain than ever before.

The team is no longer held back by database issues, and Sofia is back to doing what she loves most: building and releasing amazing features.

She takes a few minutes to sit down with a coffee and reflect on what she's learnt over the past few months.

She's learnt that partitioning can be extremely useful to make huge tables easier to manage:

It's much easier to manage the data lifecycle, for example purging old data or bulk loading.
Maintenance tasks such as vacuum/analyze/index rebuilds are faster.
She might use partitions to separate data for multiple customers/applications to implement multi-tenancy in the future.
She was pleased to learn that pg_partman can be useful if you need features that don’t (yet) exist in native Postgres partitioning,
and that a carefully thought-out partitioning strategy can improve performance of certain queries.

Quote from Sofia, information repeated below slide.

She leaves you with a final reminder to think about partitioning right from the design phase of your application if you get the chance.

That way, not only do you avoid the stress and downtime of partitioning existing large tables, but you can also choose the partition key(s) and type(s) of partitioning that will work best with your application's data access patterns, your data retention policies, and the way you need to perform maintenance tasks.

She shared links to the documentation that she used:

and links to the blog posts that she read:

Thank you for reading!

Get in touch, or follow me for more database chat:

LinkedIn: https://www.linkedin.com/in/karenhjex/
Mastodon: @karenhjex@mastodon.online
Bluesky: @karenhjex.bsky.social

Search This Blog

Karen Jex

Postgres Partitioning Best Practices: Sofia's Story

Postgres Partitioning Best Practices

Introduction

Chapter 1: The Looming Crisis

Chapter 2:
A Glimmer of Hope: Discovering Partitioning

Chapter 3: Difficult Decisions

Chapter 4: The Big Migration

Chapter 5:
Smooth Sailing: Benefits of Partitioning

Chapter 6: Navigating Obstacles

Epilogue

Postgres Partitioning Best Practices: Sofia's Story

Postgres Partitioning Best Practices

Introduction

Chapter 1: The Looming Crisis

Chapter 2:A Glimmer of Hope: Discovering Partitioning

Chapter 3: Difficult Decisions

Chapter 4: The Big Migration

Chapter 5:Smooth Sailing: Benefits of Partitioning

Chapter 6: Navigating Obstacles

Epilogue

Chapter 2:
A Glimmer of Hope: Discovering Partitioning

Chapter 5:
Smooth Sailing: Benefits of Partitioning