Latest Blog Posts

All Your GUCs in a Row: checkpoint_flush_after and checkpoint_warning
Posted by Christophe Pettus in pgExperts on 2026-05-23 at 01:00
Meet two checkpoint neighbors paired by the alphabet, not the topic: `checkpoint_flush_after` tunes kernel writeback hints, while `checkpoint_warning` logs…

pgvector 0.8.2 and the Trouble With Parallel HNSW
Posted by Christophe Pettus in pgExperts on 2026-05-22 at 15:00
pgvector 0.8.2 fixes CVE-2026-3172, a heap buffer overflow in parallel HNSW index builds that can leak data or crash the server.

Long-running transactions, job queues, and the cascade that wreaks havoc
Posted by Umair Shahid in Stormatics on 2026-05-22 at 11:17

A scheduled PostgreSQL migration step held an open transaction snapshot for hours during the initial data copy. A job queue running at production write velocity began to slow down. Twelve hours later, the queue was seven million rows deep, the primary was pinned at 100 percent CPU across 24 cores, and customer support was fielding complaints about delayed transactions.

Staging had validated the migration plan. Every rehearsal was green. And yet here we were.

The team had the system stable that same day. A joint check the next morning confirmed it was operating cleanly. Three changes to our migration SOP have emerged from the retrospective. This post walks through what happened, why staging validated the procedure cleanly while the production failure mode sat outside its reach, and the changes we have made so this specific cascade stays out of future migrations.

The setup

The database in question was a production OLTP system at a significant scale. The largest table was append-only, sitting at around 11 TB and holding transactional records. The same database hosted a job queue processed via SELECT FOR UPDATE SKIP LOCKED, with worker concurrency tuned for production throughput.

The customer was migrating to a new version of PostgreSQL through logical replication. The initial data sync used COPY. The plan had been validated in staging across multiple rehearsal runs. The COPY completed cleanly there. Replication caught up. Resource footprint stayed within expectations. The plan was as ready as a plan gets.

The migration sequence began with the largest table first. Starting with the biggest workload gives the most runway for monitoring, validation, and adjustment before cutover. It is a reasonable choice for most migrations.

Level one of the cascade

A long-running COPY holds a transaction snapshot open for as long as the copy takes to complete. That part is well known. The next part is the one that hurts: PostgreSQL autovacuum hold

[...]

Why Postgres Lacks Transparent Data Encryption
Posted by Shaun Thomas in pgEdge on 2026-05-22 at 11:14

If you've ever compared database feature matrices, you may have noticed something a bit peculiar. Oracle has Transparent Data Encryption. SQL Server has it. MySQL has it. Even MariaDB has it. But Postgres, which we all consider the best database engine? Conspicuously absent.It’s not that nobody wants TDE. Compliance frameworks like PCI DSS and HIPAA practically demand encryption at rest. Cloud deployments make the “stolen disk” threat model more tangible than ever. And the question comes up constantly on mailing lists, at conferences, and in every database evaluation checklist ever assembled by a procurement department. So what gives?It’s complicated. The real answer involves nearly a decade of mailing list threads, competing proposals, fundamental disagreements about threat models, and a problem scope so vast it makes most contributors quietly back away. Let’s trace the history and find out why the elephant in the room is still unencrypted.

The Grand Promise of TDE

The concept of TDE is straightforward: encrypt all data when it’s written to disk, decrypt it when it’s read back into memory. The “transparent” part means the encryption layer sits below the SQL engine so client applications operate normally without special tooling or requirements.TDE protects against an attacker who gains read access to the physical storage: a stolen or improperly decommissioned drive, a backup tape that fell off a truck, or perhaps a misconfigured cloud storage volume. TDE does not protect against a compromised application, a malicious DBA, those with superuser privileges, or SQL injection. The contents of the database can be fully decrypted while the server is running, which means anyone who can connect and query can read everything.This is the crux of a debate that has raged in the Postgres community for years. If TDE only protects against disk theft, and filesystem encryption (LUKS, ZFS native encryption, dm-crypt) already does the same thing, why should Postgres spend enormous engineering effort to reinvent it inside the[...]

All Your GUCs in a Row: checkpoint_timeout and checkpoint_completion_target
Posted by Christophe Pettus in pgExperts on 2026-05-22 at 01:00
PostgreSQL's default 5-minute checkpoint interval wastes I/O on modern servers.

Contributions for week 19, 2026
Posted by Cornelia Biacsics in postgres-contrib.org on 2026-05-21 at 21:44

On 12 May, 2026 the San Francisco Bay Area PostgreSQL Meetup Group met virtually, organized by Katharine Saar, Stacey Haysler and Christophe Pettus. Alex Yarotsky spoke at the event.

The Swiss PGDay Program Committee met to finalize the schedule:

  • Marion Baumgartner
  • Tobias Bussmann
  • Andreas Geppert
  • Johannes Graën
  • Stefan Keller
  • Michelle Willen

Community Blog Post:

Patch Today: CVE-2026-6473
Posted by Christophe Pettus in pgExperts on 2026-05-21 at 15:00
A critical integer wraparound bug lets unprivileged SQL users trigger heap corruption with potential arbitrary code execution.

From Managed PostgreSQL to Production RAG: Build Your Own Ellie in pgEdge Cloud
Posted by Antony Pegg in pgEdge on 2026-05-21 at 12:27

If you've used docs.pgedge.com recently, you've probably met Ellie. Ask her how to set up multi-master replication, or what port the MCP Server listens on, and she pulls the relevant documentation, assembles it into context, and gives you a grounded answer with source citations. She doesn't guess or hallucinate. She finds the actual docs and synthesizes an answer from them.Ellie is a RAG Server deployment. The pgEdge RAG Server is an API server for retrieval-augmented generation that runs hybrid search (vector similarity plus BM25) over content stored in Postgres, then sends matches to an LLM for grounded answers.pgEdge Cloud Deploy it alongside your database, point it at your tables, and your application gets the same pipeline Ellie uses, running against your data.Prefer to run it yourself? The RAG Server is 100% open source under the PostgreSQL License and available on GitHub. It's a single Go binary you point at any PostgreSQL 14+ database with pgvector installed, configured via YAML, with your own API keys for OpenAI, Anthropic, Voyage, or local Ollama. Hybrid search (vector similarity plus BM25), token budgets, and streaming responses are all in the binary. The Agentic AI Toolkit FAQ has more on how it fits with the rest of the toolkit, including Vectorizer and Docloader.

What kinds of problems do you run into building RAG pipelines from scratch?

If you've tried building a retrieval-augmented generation pipeline from scratch, you already know this: the problems compound on each other. You start with vector similarity search to find semantically relevant documents, but pure vector search misses exact-term queries. Someone asks about "error code 4012" or "order ABC-1234" and the semantic search returns conceptually related results instead of the one document that actually mentions that string. So you add BM25 keyword matching. Now you have two ranked result sets and you need a fusion algorithm to combine them without letting either method dominate the other.That's just retrieval. You also need token bud[...]

How to Cut Over After a PostgreSQL Migration
Posted by semab tariq in Stormatics on 2026-05-21 at 10:39

One Database at a Time, or All at Once?

You have deployed your new cluster. Now comes the work of moving your data and cutting over to it. Reading that sentence, you might assume cutover is something you figure out at the end, after the migration is done. And in practice, that is the order in which things happen. But technically, it is your cutover strategy that decides how you migrate, not the other way around. 

The strategy you pick determines how you configure replication, how many slots you provision, how you handle schema changes, and what your rollback path looks like. 

So before you touch replication, decide how you want to cut over. 

In this post, I will walk through the two most common cutover strategies, what each one costs you, and what each one gives you back.

Approach 1: Cut Over One Database at a Time

Imagine you are moving an entire office to a new location. One way to do it is to move one team at a time, finance this week, engineering next week. Each team settles in before the next one arrives. If something goes wrong with finance’s move, it doesn’t affect engineering. You fix the problem, learn from it, and continue.

That’s exactly how this approach works with databases. You pick one database, migrate it, test it, cut it over, confirm everything is fine, and then move to the next.

Why This Approach Makes Sense

1. It’s Easier to Manage

When you are watching one database go through a cutover, you know exactly where to look if something breaks. Your team isn’t juggling ten things at once. Attention is focused, and problems surface quickly.

2. Issues Show Up Early

The first database you cut over is like a fire drill. You discover what your runbook missed, what monitoring didn’t catch, what your rollback steps actually look like in practice, all with limited impact. By the time you reach database n

[...]

ORDER BY coalesce(x, 0): there is no x
Posted by Lætitia AVROT on 2026-05-21 at 00:00
I was reading Markus Winand’s latest post on ORDER BY history last week. If you haven’t read it yet, go read it. Markus is one of the best writers on SQL standards, and this post is no exception. One line stopped me cold. The compatibility table for “expressions on selected columns.” Postgres: partial. PostgreSQL 18: still partial. That itch needed scratching. What the standard says 🔗SQL:1999 lifted a restriction from the original ORDER BY clause.

All Your GUCs in a Row: check_function_bodies
Posted by Christophe Pettus in pgExperts on 2026-05-20 at 21:06
PostgreSQL validates function bodies at creation time by default, catching syntax errors early.

Data analyst vs width_bucket()
Posted by Kaarel Moppel on 2026-05-20 at 21:00
After helping out a buddy with the job title of Data Analyst, who experienced some light Postgres “bucketing” woes - and given the fact that this was not the first such occasion in that area over the years, though I’d help future googlers / LLM-ers out a bit as well,...

Table Access Methods Wake Up
Posted by Christophe Pettus in pgExperts on 2026-05-20 at 15:00
PostgreSQL's Table Access Method API is finally seeing real action.

All Your GUCs in a Row: bytea_output
Posted by Christophe Pettus in pgExperts on 2026-05-20 at 01:00
PostgreSQL's `bytea_output` parameter controls how binary data is formatted when sent to clients: the modern `hex` format (default since 9.0) or the legacy…

Patch PgBouncer Today
Posted by Christophe Pettus in pgExperts on 2026-05-19 at 15:00
PgBouncer 1.25.2 shipped a patch for a pre-authentication crash (CVE-2026-6664). Any TCP connection can take down your pooler. Patch this week.

Keeping pgBackRest Open, Healthy, and Community Driven
Posted by Jan Wieremjewicz in Percona on 2026-05-19 at 13:08

When the future of pgBackRest suddenly became uncertain, the PostgreSQL ecosystem reacted quickly.

At Percona, we believed the most important question was not:

what replaces it?

but:

how do we ensure pgBackRest remains healthy, sustainable, and open for everyone?

That distinction matters.

pgBackRest is critical infrastructure used by enterprises around the world to protect some of their most important data. When projects like this face maintainership or sustainability challenges, organizations need trusted open source partners that can help provide continuity, stability, and confidence.

Supporting continuity, not fragmentation

From the beginning, Percona believed the best outcome for pgBackRest was not fragmentation, forks, or closed alternatives.

What the project needed was continuity.

That meant working collaboratively across the ecosystem to help strengthen the project itself:

– coordinating funding discussions
– contributing engineering resources
– helping expand the maintainer base
– encouraging participation from multiple organizations

The goal was never to control the project. The goal was to help ensure pgBackRest remained open, healthy, and sustainable for the entire PostgreSQL community.

A healthier future for pgBackRest

Those efforts are already producing results.

A joint effort across maintainers, contributors, and multiple companies is helping ensure pgBackRest returns in a stronger and healthier position than before. Funding, engineering support, and long-term sustainability discussions are now happening collaboratively across the ecosystem.

Percona is proud to play a part in that effort.
Just as importantly, this moment would likely never have happened without David Steele bringing visibility to the sustainability realities behind maintaining critical open source infrastructure.

For more than a decade, David built pgBackRest into one of the most trusted backup and recovery solutions in the PostgreSQL

[...]

PostgreSQL’s Growing Role in AI Infrastructure
Posted by Annie Ghazali in Stormatics on 2026-05-19 at 11:32

PostgreSQL, often through platforms like Supabase, is increasingly becoming part of the default stack for many AI applications. That level of adoption says something important about where engineering teams are placing their trust.

Supabase has become one of the most common starting points for AI products. Most AI frameworks support PostgreSQL and pgvector directly. For many teams, PostgreSQL is already part of the stack before the AI layer is even introduced. 

PostgreSQL, a strong default for AI applications

PostgreSQL didn’t have to reposition itself as an “AI database” for this to happen. The characteristics that made it reliable for traditional applications also made it a natural fit for AI workloads. As more AI products move into production, engineering teams are starting to spend more time thinking about how PostgreSQL behaves under concurrency, how vector workloads scale, and how operational ownership changes once usage grows.

How PostgreSQL became deeply embedded in AI infrastructure

PostgreSQL has been in continuous development since the POSTGRES project began at UC Berkeley in 1986. Over decades, it matured into a database known for transactional reliability, extensibility, replication, and operational stability. That extensibility became especially important for AI applications.

The pgvector extension allows embeddings and similarity search to live inside PostgreSQL alongside transactional application data. For many teams, this keeps the architecture simpler. Instead of introducing a separate vector database, embeddings can be stored and queried inside infrastructure the team already understands and operates.

Supabase accelerated adoption even further. When developers create a Supabase project, they are creating a PostgreSQL deployment underneath. As AI startups increasingly adopted Supabase for authentication, APIs, and backend infrastructure,

[...]

Backrest's back, alright!
Posted by Jan Wieremjewicz in Percona on 2026-05-19 at 11:00

Events unfolded quickly over the course of a couple of weeks starting on 27 April 2026, when a message appeared on the pgBackRest project announcing: that the repository would be archived and active maintenance would stop.

blog/2026/05/Jan-pgb-news-1.png

For many in the PostgreSQL ecosystem, this landed like a shock. pgBackRest is one of the most widely used backup and recovery tools for PostgreSQL, deeply embedded in production environments across enterprises large and small. Now it was suddenly described as “dead”, “EOL”, or “abandoned”. The trigger was clear: its long-time maintainer, after more than a decade of work, announced he could no longer continue without sustainable funding and would archive the repository. i That message spread fast. The interpretation spread even faster.

All Your GUCs in a Row: bonjour and bonjour_name
Posted by Christophe Pettus in pgExperts on 2026-05-19 at 01:00
PostgreSQL's Bonjour parameters let you advertise a server on the local network via Apple's service-discovery protocol—a clever 2002 idea that hasn't aged well.

Beyond Vector Search: Why PostgreSQL Could Become the Memory Layer for Enterprise AI Systems
Posted by Vibhor Kumar on 2026-05-18 at 21:14

The conversation around AI infrastructure today is heavily focused on models, GPUs, inference speed, and vector databases. These are important building blocks, but they often distract from a deeper architectural challenge that is beginning to emerge as enterprises move from experimentation toward operational AI systems.

The challenge is memory.

Not memory in the simplistic sense of storing chat history or embeddings, but memory in the broader sense of maintaining durable context, operational continuity, historical understanding, workflow state, reasoning traceability, and business awareness across long-running AI interactions.

Many of the current AI systems appear intelligent during a single interaction, yet surprisingly fragile across time. They can summarize documents, answer questions, call APIs, generate code, and reason effectively within a bounded context window. However, once interactions become long-running, collaborative, stateful, and operationally significant, the limitations quickly become visible.

The issue is not necessarily that the models lack intelligence. The issue is that most AI systems today lack a coherent memory architecture.

As enterprises begin deploying agentic AI systems capable of acting autonomously across workflows, applications, and business processes, this gap becomes increasingly important. In many ways, modern AI agents resemble highly capable employees who forget large portions of their institutional knowledge every few hours. They may understand the current task extremely well, but they often struggle to consistently retain, prioritize, organize, and retrieve contextual knowledge accumulated over time.

This is where I believe PostgreSQL may become significantly more important in the future AI stack than many people currently realize.

Not simply as a vector database.

Not merely as storage for embeddings.

But potentially as the durable memory, operational state, and governance substrate for enterprise AI systems.

The Emerging Proble

[...]

I think AI can actually help me…
Posted by Henrietta Dombrovskaya on 2026-05-18 at 18:12

Note: this post was not rewritten by AI 🙂

I’ve been saying for a long time that AI can’t help me because no one else codes the way I do, so it doesn’t have any reference points. Then I realized many advantages of having AI perform some boring tasks, like writing tests (we know we need unit tests, and why we are not writing them? because we don’t have time!).

Something changed a couple of weeks ago, after some conversations I had at work, and here is what I think could potentially happen and bring some positive change.

I have been complaining for years about the application developers’ inertia and their overreliance on ORM tools rather than writing high-performing SQL. I am not going to repeat this rant here – I am not the only one, and you all know! And when I asked what I could do to facilitate changing the course, the answer would be: I am used to that way of programming; I know it works and produces the correct result, too bad it’s not always the most performant!

But now that thing have changed, and most developers use Claude Code, I am wondering whether it would be possible to teach just an AI assistant to use better techniques? Will it work, if people are not actually writing the code? I know that AI can use SQL performance tuning tips; would it be possible to teach AI to use NoRM?

Any thoughts? Or any volunteers to give it a try?

PostgreSQL 19 Beta: The Four Features You’ll Actually Feel
Posted by Christophe Pettus in pgExperts on 2026-05-18 at 15:00
PostgreSQL 19 beta arrives with four operational game-changers: 64-bit MultiXact Members kill a decades-old "vacuum or die" failure mode, parallel autovacuum…

CloudNativePG and Crunchy PGO: an honest, opinionated comparison
Posted by Gabriele Bartolini in EDB on 2026-05-18 at 06:54

This article compares CloudNativePG and Crunchy PGO, two of the most adopted open-source operators for running PostgreSQL on Kubernetes. It covers architecture, image design, backup strategy, major version upgrades, observability, licensing and community health. As a co-founder and maintainer of CloudNativePG, I make no claim to neutrality, and I say so upfront. What I can offer is informed bias, grounded in years of daily work on the project and a genuine respect for what Crunchy Data built in this space.

XID Wraparound's Equally-Evil Twin
Posted by Richard Yen on 2026-05-18 at 06:00

Introduction

If you’ve been running PostgreSQL for any length of time, you’ve probably heard about transaction ID (XID) wraparound. It’s one of the most well-known maintenance concerns in Postgres, and there’s no shortage of blog posts, conference talks, and war stories about it. But there’s a quieter, less-discussed cousin that can cause the exact same kind of outage: MultiXact ID wraparound.

I’ve seen this surprise more than a few experienced DBAs. They’ve got their autovacuum tuned, they’re monitoring age(datfrozenxid), and they’re feeling good – and then out of nowhere, Postgres starts refusing certain writes because it’s approaching MultiXact ID wraparound.

The fix is the same as regular XID wraparound – a simple vacuum. But the reason is different, and understanding it can help you keep your monitoring complete.

What’s a MultiXact ID?

In Postgres, every row has a system column called xmax. In the simplest case, xmax holds the transaction ID of the transaction that deleted or updated the row. But what happens when multiple transactions hold locks on the same row at the same time?

Consider SELECT ... FOR SHARE. Multiple transactions can hold a shared lock on the same row concurrently. Postgres needs to record all of those transactions somewhere, but xmax is only wide enough to store a single transaction ID. The solution is the MultiXact mechanism.

A MultiXact ID is essentially a pointer into a separate structure (stored as a file in the pg_multixact/ dir) that maps to a list of transaction IDs and their lock modes. When multiple transactions need to lock a row, Postgres:

  1. Allocates a new MultiXact ID
  2. Records the set of transaction IDs (and their lock types) in the MultiXact member data
  3. Stores the MultiXact ID in the row’s xmax field, with a flag (specifically, the HEAP_XMAX_IS_MULTI infomask bit in the tuple header) indicating it’s a multi-xact reference rather than a plain XID

This lets the xmax field stay a fixed 32-bit value while still representing an arb

[...]

All Your GUCs in a Row: block_size
Posted by Christophe Pettus in pgExperts on 2026-05-18 at 01:00
A parameter you cannot change. block_size lives in the “Preset Options” section of the docs, alongside its read-only cousins like data_checksums, wal_block_size, and server_version. It reports the size of a PostgreSQL page — the fundamental unit of on-disk storage and buffer-pool accounting. Defa…

All Your GUCs in a Row: bgwriter_lru_maxpages and bgwriter_lru_multiplier
Posted by Christophe Pettus in pgExperts on 2026-05-17 at 01:00
These two parameters close out the bgwriter cluster. Together with bgwriter_delay, they govern how the background writer decides what to write each round, and they are where the actual leverage lives — the previous post ended by saying so explicitly. Here is why. The bgwriter’s algorithm, in one …

All Your GUCs in a Row: bgwriter_delay and bgwriter_flush_after
Posted by Christophe Pettus in pgExperts on 2026-05-16 at 01:00
The B cluster shifts gears: from one-off oddities to the background writer parameters, which span four GUCs. We do the first two as a pair because bgwriter_delay introduces the process at all, and bgwriter_flush_after slots cleanly into the writeback tour from backend_flush_after. What the backgr…

Two Decades, Two RCEs: What pgcrypto Has Been Doing Since 2005
Posted by Christophe Pettus in pgExperts on 2026-05-15 at 20:00
Two remote code execution bugs lived in pgcrypto for twenty years until an AI fuzzer found them in a weekend. Here's what you need to know.

All Your GUCs in a Row: backtrace_functions
Posted by Christophe Pettus in pgExperts on 2026-05-15 at 01:00
Debug PostgreSQL errors by capturing C-level stack traces for specific internal functions.

Welcome to ORDER BY jungle
Posted by Radim Marek on 2026-05-15 at 00:00

SQL is fun and not at all boring. The latest article by Markus Winand on Order by Has Come a Long Way sent me on quite a journey.

First, set up a table called nums with one integer column and four rows:

CREATE TABLE nums (a int);
INSERT INTO nums VALUES (0), (1), (2), (3);

Try to guess what these two queries return.

SELECT -a AS a FROM nums ORDER BY a;
SELECT -a AS a FROM nums ORDER BY -a;

Most of us would guess the same rows in a different order. The actual answer is that they produce exactly the same rows in exactly the same order. By the same logic you might expect

SELECT a AS c FROM nums ORDER BY -c;

to do exactly the same. Except it does not. It errors with column "c" does not exist despite the alias being right there in the statement. Welcome to ORDER BY jungle.

Names and expressions are not the same

If you ask most developers how ORDER BY works, they will say "you put a column name there and it sorts the rows". In 99% of queries that is exactly what happens. People sort by created_at or id and move on.

Strictly speaking, three, if you count ORDER BY 1. Positional references are their own can of worms and out of scope for this post.
But `ORDER BY` accepts two different kinds of things:
SELECT created_at, user_id FROM events ORDER BY created_at;
SELECT created_at, user_id FROM events ORDER BY date(created_at);

Both feel natural. And the thing nobody tells you is that they go down completely different code paths in the parser. Different scope rules, different lookups, different error messages. The first looks at your SELECT list. The second looks at your FROM clause. They never look at the same place.

Same answer, two different sorts

Look at the first query again.

SELECT -a AS a FROM nums ORDER BY a;

You wrote ORDER BY a. A bare identifier, no decoration. Postgres goes down the name path. It scans the SELECT list for something called a, finds the aliased column -a AS a, and sorts by its output values. The negated values are -3, -2, -1, 0, ascending is -3, -2, -1,

[...]

Top posters

Number of posts in the past two months

Top teams

Number of posts in the past two months

Feeds

Planet

  • Policy for being listed on Planet PostgreSQL.
  • Add your blog to Planet PostgreSQL.
  • List of all subscribed blogs.
  • Manage your registration.

Contact

Get in touch with the Planet PostgreSQL administrators at planet at postgresql.org.