Latest Blog Posts

REPACK CONCURRENTLY: pg_squeeze Gets a Promotion
Posted by Christophe Pettus in pgExperts on 2026-05-27 at 15:00
PostgreSQL 19 brings REPACK CONCURRENTLY, a native alternative to pg_repack that rewrites tables without crippling locks.

Graph Queries in Postgres with Apache AGE
Posted by Elizabeth Garrett Christensen in Snowflake on 2026-05-27 at 07:00

The Iceberg tables look like normal Postgres tables. You create them with USING iceberg and they're backed by Parquet on S3:

Postgres engines now have access to more data than ever. With extensions like pg_lake, you can connect Postgres to gobs of files in object storage like csv, json, Apache Parquet™ and Apache Iceberg™.

But having access to data in object storage and being able to aggregate data in object storage are two different things. This blog walks through the Postgres extension, Apache AGE™, that makes working with huge files of data sets much friendlier through graph relationships.

Why graph matters for data lakes

Let's consider a healthcare network with providers, patients, facilities and referral chains. The analytical questions are straightforward:

  • What's the total billed amount per region?
  • Which patients have the highest spend?
  • What's the average claim by specialty?

SQL on Iceberg handles all of these beautifully. But if you need to know: "Which in-network providers are referring patients to out-of-network providers through chains of intermediaries, and what's the dollar impact?" Those questions have two parts: a graph traversal (find the referral chains) and an analytical aggregation (sum the costs). Neither a pure graph database nor a pure analytical engine can answer it alone. You need both, working together, in the same query.

This is where data lakes need graphs.

Why Apache AGE

Apache AGE is a PostgreSQL extension that adds openCypher graph query support directly inside Postgres. There are other graph databases out there — Neo4j, Amazon Neptune, TigerGraph — but AGE has a unique advantage for the modern data platform — it runs inside PostgreSQL.

We've recently seen customers increasingly using Apache AGE for a couple of reasons:

  1. No data movement: Your Iceberg tables and your graph live in the same database. You don't extract, transform and load (ETL) data out of your lake into a separate graph database, keep it in sync and mainta
[...]

All Your GUCs in a Row: cluster_name
Posted by Christophe Pettus in pgExperts on 2026-05-27 at 01:00
cluster_name looks like a cosmetic label for process listings, but on a standby it silently becomes the name your primary uses to verify synchronous…

PGConf.dev 2026: Why It Remains My Favorite PostgreSQL Conference
Posted by cary huang in Highgo Software on 2026-05-26 at 23:33

About PGConf.dev

PGConf.dev is an annual developer event focused entirely on contributing to the PostgreSQL ecosystem, including core software development and community building. It serves as a primary hub for PostgreSQL hackers, maintainers, and ecosystem developers to meet, collaborate, and share knowledge.

This year’s conference was hosted once again in Vancouver, which also happens to be my hometown :D. On top of that, 2026 also marks PostgreSQL’s 30th anniversary, giving this year’s conference an even more meaningful atmosphere for long-time contributors and community members around the world.

This is one of my favorite PostgreSQL conferences because it gives people an opportunity to truly meet, learn from, and interact with the brilliant minds behind PostgreSQL. Many of the names we usually only see in hacker mailing list discussions, patch reviews and commit messages suddenly become real people standing right beside you, discussing ideas and working together to make PostgreSQL better.

It feels like the PostgreSQL hacker discussions have come to life in the real world. The conversations are real, the passion is real, and the community spirit is everywhere.

A huge thank you once again to all the organizers, volunteers, speakers, and sponsors who invested countless hours to make this conference possible. This blog is a personal summary of my own experience attending pgconf.dev 2026, and for those who could not attend, I hope it can give you at least a small glimpse of what it feels like to be part of this incredible PostgreSQL community.

Long Post Ahead!

Tuesday No Longer Feels Like a “Pre-Conference” Day

Unlike the “Tuesdays” from previous pgconf.dev conferences, this year’s Tuesday schedule is packed with an impressive variety of engaging sessions, including round-table discussions, interviews, keynote-style talks, brainstorming sessions, and community-led discussions focused on spe

[...]

5 PostgreSQL locking behaviors that trip people up
Posted by Shinya Kato on 2026-05-26 at 20:00

Introduction

PostgreSQL uses MVCC (Multi-Version Concurrency Control) for concurrency control: reads never block writes, and writes never block reads.

Its locking system has 8 table-level lock modes and 4 row-level lock modes, and the conflict tables in the documentation tell you exactly which lock modes conflict with which.

In practice, though, once you actually operate PostgreSQL, locks end up conflicting in places you never expected. Queries take far longer than anticipated, and in the worst case you end up with an outage.

This article walks through five of these counterintuitive locking behaviors.

Environment

  • Version: PostgreSQL 18
  • Transaction isolation level: READ COMMITTED (the default)

1. Once an ACCESS EXCLUSIVE request is queued, subsequent queries get blocked in a chain

The first one: an ALTER TABLE that should finish instantly can bring your entire service to a halt.

Suppose one session is running a long SELECT on table t, and another session runs the following ALTER TABLE:

Session 1

SELECT pg_sleep(600) FROM t LIMIT 1; -- a long-running SELECT

Session 2

ALTER TABLE t ADD COLUMN name text;

Since Session 1 holds an ACCESS SHARE lock on table t, the ACCESS EXCLUSIVE lock that Session 2's ALTER TABLE requires is forced to wait. So far, this is expected behavior.

But PostgreSQL's lock waiting works like a FIFO queue. While the ACCESS EXCLUSIVE lock is waiting, any SELECT issued against table t afterward gets stuck behind it — even though that SELECT does not conflict with Session 1's currently running SELECT at all.

Diagram showing subsequent SELECTs also forced to wait in the lock queue

Besides a long-running SELECT, the same thing happens with a session that ran a SELECT inside a BEGIN transaction and then left it open without COMMIT/ROLLBACK (a so-called idle in transaction session). An ACCESS SHARE lock is held until the transaction ends, so even if the SELECT itself finished in an instant, simply forgetting to close the transaction will keep blocking ALTER TABLE. This is often caused by

[...]

My Reflections on PGConf.dev 2026
Posted by Ashutosh Bapat on 2026-05-26 at 17:07

From Ottawa to Vancouver

For a long time, PGCon in Ottawa was a staple for the PostgreSQL community. I always had a soft spot for Ottawa; it fondly reminded me of my childhood days in Pune. So, when it was first announced that the reincarnated PGConf.dev would take place in Vancouver, I felt a twinge of sadness.

However, the moment I landed in Vancouver, all my reservations dissolved. The city won me over instantly—from riding a bicycle along the seawall and admiring the beautiful totem poles, to watching the Gastown Steam Clock. Returning this year, I was incredibly happy to explore Stanley Park on foot once again.

But as beautiful as Vancouver is, the real magic happened inside the venue.

Tuesday: The "Conference Before the Conference"

This year’s event kicked off with a brand-new addition: the Tuesday community discussions. It turned out to be so packed with value that it felt like getting two conferences in a single week—just like the advertisement in my hotel elevator promised!


1. Graph Databases Developer Meeting

I conducted a session to discuss the priorities and future of graph database offerings in PostgreSQL, specifically focusing on SQL/PGQ and Apache/AGE. Given that SQL/PGQ is a newly-committed feature, I expected a modest turnout of about a dozen people. To my surprise, over 25 people attended, and most actively participated in the debate.

2. Working Groups: Logical Replication & Multithreading

  • Logical Replication: While it has come a long way over the last decade, there is still a long road ahead. We dove deep into DDL replication and replication in multi-tenant, multi-DB clusters. My main takeaway? There are still a few missing pieces before logical replication can be effectively leveraged for high-volume OLAP use cases.

  • Multithreading: This ambitious effort aims to replace Postgres's traditional process-driven architecture with threads. As I fondly like to say: it’s like replacing an elephant’s heart with a cougar’s

[...]

Managed Postgres, Examined: Google AlloyDB for PostgreSQL
Posted by Christophe Pettus in pgExperts on 2026-05-26 at 17:00
Fourth in a series of dispassionate tours of managed PostgreSQL services. Previously: RDS, Aurora, and Cloud SQL. AlloyDB is Google’s distributed-storage PostgreSQL, the closest architectural parallel on GCP to Aurora on AWS, with enough distinctive differences to be worth understanding as its ow…

What Else Is In There?
Posted by Christophe Pettus in pgExperts on 2026-05-26 at 16:00
A 1990s-vintage PostgreSQL extension shipping a critical buffer overflow in 2026 reveals a larger problem: most teams can't even say what's actually installed…

PGConf.dev 2026 Trip Summary
Posted by Jeremy Schneider on 2026-05-26 at 15:46

I’m back home from Vancouver. What a great week – in every way. I’ll try to share a few highlights here.

Updated Happiness Hints

First and foremost: after many years, the Happiness Hints have received a major update! Before the conference, I updated the hints based on all the feedback I’ve collected over the past few years. Then the hints were updated into a poster format and we printed it as part of the pgconf.dev poster session. Throughout the week, I continued collecting more feedback. I used a sharpie during the conference and marked up the poster with ideas. Special thanks to Laurenz Albe, David Rader, Sami Imseih, Ryan Booz and Nik Samokhvalov (Nik you weren’t at the conference but a happiness hint resulted from other discussions we’ve had). Of course I’m forgetting more people who gave feedback making the happiness hints better. After coming home from the conference, I incorporated all the notes I had – and the version that’s now published here at ardentperf.com is the latest & best version I’ve assembled so far.

Physical Replication and Postgres High Availability

An extraordinary number of postgres users rely on physical replication for high availability. It’s been around for a long time and it works well. Nonetheless, there are a few rough edges and over the years there have been various mailing list threads that haven’t fully been resolved.

I proposed a Friday unconference session on this topic, and the topic received enough votes to be selected. Notes from the unconference are available on the Postgres wiki. But the discussion extended far beyond the unconference; there were also hallway discussions over coffee (thanks Thomas Munro) and then continuing discussions over dinner at Joey Burrard and beers at Steamworks (thanks Ants Aasma).

The first question that everybody asks is “should postgres have more HA capabilities in core”? And a discussion starting along these lines consumed the first half of the unconference.

But I thought the most interesting train of

[...]

EXPLAIN Prettier, or Post-Processing Query Plans in Postgres
Posted by Andrei Lepikhov in pgEdge on 2026-05-26 at 11:22

This story started with a book gifted by a colleague. Reading Jimmy Angelakos' «PostgreSQL Mistakes and How to Avoid Them», I realised something that had been bugging me - in Postgres, the EXPLAIN command produces far too much information. The examples that authors typically present when discussing various aspects of database systems make it harder to analyse the problem at hand and distract the reader. That's how the idea of a post-processing for EXPLAIN output was born - to make query plans more readable and problem-focused.

Information Overload

Anyone who has worked with PostgreSQL knows the command, or more precisely . It is typically used to investigate query performance issues or demonstrate optimisation techniques. But there is one problem: its output is packed with highly specific information. For instance, the parameter is rarely needed when analysing EXPLAIN output. Some fields, such as , take up a lot of visual space, are system-dependent, and frequently unnecessary - yet if we want to see , we have to run EXPLAIN with , and inevitably comes along for the ride.So, is it really a big deal to study an EXPLAIN with slightly more information? Ha! Let’s look at a typical query plan from my investigations - here, for example, are two plans for the same query: one is the bad plan, and the other is the good one.Finding problems in such a large plan takes time, and every extraneous detail makes it harder to spot the problematic decision. Sure, with the rise of AI agents, I can just ask Claude to compare a pair of plans, highlight the differences, and analyse what’s wrong. But this doesn’t always work - either there are too many details, or automation is needed across a large stream of queries - so the problem remains.

Regression Test Stability

EXPLAIN output changes between Postgres versions, and if your extension supports 4-5 recent versions, then your tests must pass on each of them. This means we need to filter EXPLAIN output to guarantee stable test runs across different hardware and software conf[...]

AI Without Semantics Is Just Expensive Guessing
Posted by Vibhor Kumar on 2026-05-26 at 11:01

Why Metadata, Taxonomy, Ontology, Knowledge Graphs, and Context Are Becoming the Foundation of Enterprise AI

The technology industry is entering a fascinating phase where many older ideas are suddenly becoming critically important again. Concepts that once lived mostly inside data governance discussions, library sciences, enterprise architecture diagrams, semantic web research, and metadata catalogs are now being pulled directly into the center of modern AI conversations. This shift is not accidental. As organizations move from AI experimentation to real enterprise deployment, they are discovering that intelligence does not come only from models. It also depends on how well the organization understands, organizes, governs, and connects its data.

Over the last two years, terms such as context engineering, AI memory, semantic retrieval, knowledge-driven agents, and context-aware systems have rapidly entered mainstream technology discussions. Vendors now describe platforms as intelligent because they can retrieve documents semantically, maintain conversational memory, or connect AI systems to enterprise knowledge. At first glance, this sounds entirely new. In reality, much of the industry is rediscovering a foundational truth that enterprise architects, database professionals, and information management practitioners have understood for years: AI systems cannot reason reliably without meaningful data.

This is not simply a model problem. It is fundamentally a data architecture problem. Large Language Models are extraordinarily capable at generating language, summarizing content, writing code, and answering questions conversationally. However, enterprises are increasingly discovering that language generation alone does not create trustworthy intelligence. An AI system may generate answers that sound convincing while still lacking operational correctness, governance awareness, relationship understanding, or contextual grounding.

This is precisely why concepts such as metadata, semantics, taxonom

[...]

Critical PMM Alerts Every PostgreSQL DBA Must Track
Posted by SHRIDHAR KHANAL in Stormatics on 2026-05-26 at 09:50

The Night When Things Almost Went Down

Have you ever left for home on a Friday evening feeling confident about your work for the day, at peace knowing your system would survive the coming weekend? We’ve all felt that way at some point.

Meanwhile, the disk on the server had quietly reached 90% utilization. Write-Ahead Log (WAL) files had accumulated enormously, one long-running query had been running for over an hour, and nobody noticed because, some time earlier, the dashboard had looked fine.

After a while, writes to the database started to fail, and you received messages like, “Hey, the DB seems a bit slow?” In reality, the database was no longer slow; it had already gone down.

However, this wasn’t anything new. The signs of trouble had been there the entire time. The monitoring simply wasn’t connected to the right signals to raise the alarm.

And that’s what this post is about.

Dashboards vs Alerts

Let’s be honest. Dashboards are great for screenshots.

However, they make one risky assumption that someone will make this check before it’s too late. Alerts make no such assumption.

  • Dashboards = “Here’s what’s happening.”
  • Alerts = “There is something wrong. Need to address it.”

A dashboard gently shows an increasing CPU. But an alert rings, saying: “This has been bad for 30 minutes, wake up.”

Dashboards are waiting for us. But alerts find us. After a few incidents, we no longer rely on the dashboard to keep us safe while asleep. It’s through alerts that it happens.

Configuring Alerts in PMM

These rules are set up by going to Alerting → Alert Rules in the PMM interface and creating a New Alert Rule. For every alert, we need to configure three things:

Duration: How long the condition must hold before the alert fires. This filters out temporary sp

[...]

The PostgreSQL Conference HOW2026 in Jinan, China
Posted by Josef Machytka in credativ on 2026-05-26 at 09:33

At the end of April, I had the incredible opportunity to represent credativ on the HOW2026 (Hello Open-source World), the PostgreSQL & IvorySQL Eco Conference in Jinan, China. IvorySQL is a Chinese fork of PostgreSQL with extended Oracle compatibility. In recent years, it has been gaining popularity not only in China. The conference brought together PostgreSQL and IvorySQL experts, contributors, and open-source database enthusiasts from around the world. The atmosphere was excellent, and I would like to share several highlights together with a recap of my contributions.

My Contributions

I was honored to contribute to the HOW2026 program with two sessions.

Workshop: The Alchemy of Shared Buffers

On Sunday afternoon, I led a three-hour deep-dive workshop exploring the intricate mechanics of PostgreSQL shared buffers. I discussed how Linux implements shared memory operations through the tmpfs file system, how shared buffers work internally, their practical limitations and usage patterns, why and how huge pages should be used for their allocation, why transparent huge pages negatively affect performance, and practical strategies for balancing high concurrency with optimal system performance.

Talk: Linux and PostgreSQL in the Multiverse of Connections

On Tuesday afternoon, I gave a talk focused on PostgreSQL connection overhead. I discussed how PostgreSQL connections interact with the Linux kernel, covering memory usage, physical memory allocation mapping, context-switching overhead, and PostgreSQL internals to explain why large numbers of connections can significantly degrade performance.
 

Selected Highlights from the Conference
The conference opened on Monday morning in the Golden Hall of the Shandong Hotel in Jinan. The opening sequence featured a series of keynote talks and community presentations:
  • The Development Trends of Databases in the Era of Data Int
[...]

Contributions for week 20, 2026
Posted by Cornelia Biacsics in postgres-contrib.org on 2026-05-26 at 07:50

PGConf.dev 2026 took place from May 19-22 2026, organized by

  • Gwen Shapira
  • Jonathan Katz
  • Kaiting Chen
  • Magnus Hagander
  • Melanie Plageman
  • Paul Ramsey
  • Robert Haas
  • Steve Singer

Program Committee:

  • Jacob Champion
  • Jonathan Katz
  • Dilip Kumar
  • Melanie Plageman
  • Paul Ramsey

Tuesday Planning Committee:

  • Claire Giordano
  • Corey Huinker
  • Matthias van de Meent
  • Paul Jungwirth
  • Robert Haas

Code of Conduct Committee:

  • Pavlo Golub
  • Stacey Haysler
  • Masahiko Sawada

Volunteers:

  • Alena Rybakina
  • Álvaro Herrera
  • Amit Kapila
  • Amit Langote
  • Andrew Dunstan
  • Cary Huang
  • Chirag Dave
  • Claire Giordano
  • Cornelia Biacsics
  • Daniel Gustafsson
  • David Rader
  • Divya Bhargov
  • Euler Taveira
  • Gary Evans
  • Grant Zhou
  • Hari P Kiran
  • Jeremy Schneider
  • Kai Wagner
  • Keiko Oda
  • Koji Annoura
  • Lilian Ontowhee
  • Manni Wood
  • Mark Wong
  • Marshall Bush
  • Miaolai Zhou
  • Nazir Bilal Yavuz
  • Oleksii Kliukin
  • Palak Chaturvedi
  • Pavlo Golub
  • Phil Alger
  • Philippe Noël
  • Rahila Syed
  • Raluca Constantin
  • Rick Lowe
  • Robert Haas
  • Shayon Sanyal
  • Steve Singer
  • Sumedh Pathak
  • Tristan Partin
  • Xuneng Zhou
  • Yogesh Sharma

Speakers (including Tuesday Community Sessions & Lightning Talks):

  • Alastair Turner
  • Alena Rybakina
  • Alexandra Wang
  • Álvaro Herrera
  • Amit Kapila
  • Amit Langote
  • Amul Sul
  • Andreas Scherbaum
  • Andres Freund
  • Andrew Dunstan
  • Andrey Borodin
  • Ants Aasma
  • Ashutosh Bapat
  • Bruce Momjian
  • Christoph Berg
  • Christophe Pettus
  • Claire Giordano
  • Corey Huinkez
  • Cornelia Biacsics
  • Daniel Gustafsson
  • Daniele Varrazzo
  • Dave Page
  • David E. Wheeler
  • David Rowley
  • David Shorten
  • Devrim Gündüz
  • Dilip Kumar
  • Floor Drees
  • Grant Zhou
  • Greg Burd
  • Haibo Yan
  • Hari Kiran
  • Hayato Kuroda
  • He
[...]

All Your GUCs in a Row: client_min_messages
Posted by Christophe Pettus in pgExperts on 2026-05-26 at 01:00
`client_min_messages` controls what your session sees, not what the server logs—a confusion that spawns most of its trouble.

Foreign Tables and Materialized Views: A Dynamic Duo
Posted by Richard Yen on 2026-05-25 at 08:00

Introduction

I recently wrote a post about WAL log shipping and how a standby built on log shipping is a great way to give data analysts production data without putting the primary at risk. Having access to the production data in this way is great, but it’s read-only. How can we create views of this data for better analytics work? I want to make the case today that Foreign Data Wrappers and Materialized Views can make a great solution – not only in accessing production Postgres data, but also working with other data sources.


Moving Beyond FDW Demos

Most people meet foreign data wrappers (FDWs) through a quick demo, and I’ve highlighted some of their features in previous conference talks. There is high novelty in being able to query MySQL from Postgres, but the reality is often that the latency between the local database and the foreign table can be pretty high. Sometimes, predicate push-down isn’t what you’d expect, and indexing may not be very transparent. In the end, setting up and managing FDWs may seem more work than it’s worth, and that’s a mistake. Used correctly, foreign tables are one of the most practical tools for analytics across heterogeneous data sources – especially when paired with materialized views.


The Real Problem: Heterogeneous Data

Modern data rarely lives in one place:

  • Legacy systems in MySQL
  • Operational data in PostgreSQL
  • Flat files sitting in object storage (I’ve seen people do this with AWS Athena)
  • Maybe even some CSVs someone refuses to migrate

Foreign tables give you a unified SQL interface, but under the hood, the query performance can be unpredictable as you may be forced to rely on another engine’s query planner (and in the case of that CSV data source, it might not even be indexed).

In other words, FDWs optimize developer experience, not query performance.


The Pattern: FDW + Materialized Views

Instead of querying foreign tables directly in analytics workloads, we can opt to use FDWs as ingestion points, not as the servi

[...]

All Your GUCs in a Row: client_connection_check_interval
Posted by Christophe Pettus in pgExperts on 2026-05-25 at 01:00
PostgreSQL 14 added a parameter that detects dead client connections during query execution, stopping wasted work before it's too late.

A Correction: Snowflake Postgres Is Just Postgres
Posted by Christophe Pettus in pgExperts on 2026-05-24 at 15:00
Snowflake Postgres is community Postgres, not a fork with a proprietary storage layer—a correction to Wednesday's post on table access methods, and a closer…

TOAST: Where PostgreSQL hides big values
Posted by Radim Marek on 2026-05-24 at 12:15

In earlier posts in this series we established that every heap tuple lives inside a strict 8KB page. Everything else is built on top of that hard limit: MVCC, HOT updates, and indexes that point at (page, line_pointer). And yet this still works:

CREATE TABLE docs (id int PRIMARY KEY, body jsonb);
INSERT INTO docs VALUES (1, (SELECT jsonb_agg(g) FROM generate_series(1, 100000) g));

That body value is somewhere north of half a megabyte. The heap page is still 8KB. Both statements are true at the same time, and the mechanism that makes them coexist is TOAST: The Oversized-Attribute Storage Technique.

The 2KB threshold

The page is 8KB, but PostgreSQL starts shrinking tuples at around 2KB. The goal is to keep at least four tuples per page, so anything bigger triggers TOAST.

Two constants are at play. TOAST_TUPLE_THRESHOLD decides when the toaster runs; TOAST_TUPLE_TARGET is the size it tries to shrink the tuple down to. Both default to 2032 bytes on a standard 8KB build. The threshold is fixed at compile time; the target is per-table tunable via the toast_tuple_target storage parameter.

When an INSERT or UPDATE produces a tuple wider than the threshold, PostgreSQL doesn't wait to see whether the row would actually fail to fit on a page. It starts compressing and relocating attributes one at a time, and stops the moment the tuple drops below the target. A row only modestly over the line often gets away with one cheap compression pass; a much wider one walks the full procedure.

Only variable-length attributes are candidates. A bigint is always 8 bytes, so there is nothing to compress and nowhere to move it. A text, bytea, jsonb, numeric, or array, on the other hand, is a varlena: a length-prefixed structure PostgreSQL can freely rearrange.

The four storage strategies

Each variable-length column has a storage strategy: a flag in pg_attribute.attstorage that answers two yes/no questions for the toaster. Can this value be compressed? Can it be moved out-of-line? PostgreSQL picks a default

[...]

On Polymorphic Associations in Postgres
Posted by Andrei Lepikhov in pgEdge on 2026-05-24 at 11:03

OUTER JOIN is a typical plague of ORM-based PostgreSQL configurations: the planner is still relatively poor at optimising it. At the same time, ORM frameworks — and 1C as a prominent example — often generate outer joins from standard templates, which opens the door to targeted optimisations. In this article, I dig into one such template — polymorphic reference resolution: what the pattern is, where it comes from (Rails, Django, Hibernate, Salesforce — not just 1C), how widespread it is, and why its structural properties make it possible to significantly speed up execution.

Just for the introduction

Think of the homepage of a typical online store — say, Amazon. Personally, on my homepage, I see ads for several products:

  • An air fryer

  • Magnesium Citrate + B6

  • Protein brownies

  • and so on.

Thinking like a database developer, I can roughly imagine that this page is built from the results of a query like:

SELECT name, description FROM products p
  LEFT JOIN kitchen_appliances ka ON (p.id = ka.id)
    LEFT JOIN pharmacy f ON (p.id = f.id)
      LEFT JOIN sports_nutrition sf ON (p.id = sf.id)
      ...
ORDER BY p.popularity DESC
LIMIT N

Planning such a query efficiently is no easy task — and in my experience, this is confirmed by user reports from the 1C world, since PostgreSQL is currently not rich in LEFT JOIN optimisations. At the same time, the properties of this pattern enable the development of various techniques to improve execution efficiency. I've managed to implement several straightforward optimisations of this template. But first, let's understand what polymorphic references actually are, where they come from, and how common they really are. That's the gap I'm trying to fill with this post.

The Pattern

Many real-world data models contain references that can point to one of several target entity types. An order line may reference a physical product, a digital download, a gift card, or a subscrip

[...]

All Your GUCs in a Row: client_encoding
Posted by Christophe Pettus in pgExperts on 2026-05-24 at 01:00
Character encoding conversions happen silently between client and server—until they don't.

Upgrade PostgreSQL from 17 to 18 on Ubuntu 26.04
Posted by Paolo Melchiorre in ITPUG on 2026-05-23 at 22:00

Howto guide for upgrading PostgreSQL from version 17 to 18 on Ubuntu, after its upgrade from version 25.10 (Questing Quokka) to 26.04 (Resolute Raccoon).

All Your GUCs in a Row: checkpoint_flush_after and checkpoint_warning
Posted by Christophe Pettus in pgExperts on 2026-05-23 at 01:00
Meet two checkpoint neighbors paired by the alphabet, not the topic: `checkpoint_flush_after` tunes kernel writeback hints, while `checkpoint_warning` logs…

pgvector 0.8.2 and the Trouble With Parallel HNSW
Posted by Christophe Pettus in pgExperts on 2026-05-22 at 15:00
pgvector 0.8.2 fixes CVE-2026-3172, a heap buffer overflow in parallel HNSW index builds that can leak data or crash the server.

Long-running transactions, job queues, and the cascade that wreaks havoc
Posted by Umair Shahid in Stormatics on 2026-05-22 at 11:17

A scheduled PostgreSQL migration step held an open transaction snapshot for hours during the initial data copy. A job queue running at production write velocity began to slow down. Twelve hours later, the queue was seven million rows deep, the primary was pinned at 100 percent CPU across 24 cores, and customer support was fielding complaints about delayed transactions.

Staging had validated the migration plan. Every rehearsal was green. And yet here we were.

The team had the system stable that same day. A joint check the next morning confirmed it was operating cleanly. Three changes to our migration SOP have emerged from the retrospective. This post walks through what happened, why staging validated the procedure cleanly while the production failure mode sat outside its reach, and the changes we have made so this specific cascade stays out of future migrations.

The setup

The database in question was a production OLTP system at a significant scale. The largest table was append-only, sitting at around 11 TB and holding transactional records. The same database hosted a job queue processed via SELECT FOR UPDATE SKIP LOCKED, with worker concurrency tuned for production throughput.

The customer was migrating to a new version of PostgreSQL through logical replication. The initial data sync used COPY. The plan had been validated in staging across multiple rehearsal runs. The COPY completed cleanly there. Replication caught up. Resource footprint stayed within expectations. The plan was as ready as a plan gets.

The migration sequence began with the largest table first. Starting with the biggest workload gives the most runway for monitoring, validation, and adjustment before cutover. It is a reasonable choice for most migrations.

Level one of the cascade

A long-running COPY holds a transaction snapshot open for as long as the copy takes to complete. That part is well known. The next part is the one that hurts: PostgreSQL autovacuum hold

[...]

Why Postgres Lacks Transparent Data Encryption
Posted by Shaun Thomas in pgEdge on 2026-05-22 at 11:14

If you've ever compared database feature matrices, you may have noticed something a bit peculiar. Oracle has Transparent Data Encryption. SQL Server has it. MySQL has it. Even MariaDB has it. But Postgres, which we all consider the best database engine? Conspicuously absent.It’s not that nobody wants TDE. Compliance frameworks like PCI DSS and HIPAA practically demand encryption at rest. Cloud deployments make the “stolen disk” threat model more tangible than ever. And the question comes up constantly on mailing lists, at conferences, and in every database evaluation checklist ever assembled by a procurement department. So what gives?It’s complicated. The real answer involves nearly a decade of mailing list threads, competing proposals, fundamental disagreements about threat models, and a problem scope so vast it makes most contributors quietly back away. Let’s trace the history and find out why the elephant in the room is still unencrypted.

The Grand Promise of TDE

The concept of TDE is straightforward: encrypt all data when it’s written to disk, decrypt it when it’s read back into memory. The “transparent” part means the encryption layer sits below the SQL engine so client applications operate normally without special tooling or requirements.TDE protects against an attacker who gains read access to the physical storage: a stolen or improperly decommissioned drive, a backup tape that fell off a truck, or perhaps a misconfigured cloud storage volume. TDE does not protect against a compromised application, a malicious DBA, those with superuser privileges, or SQL injection. The contents of the database can be fully decrypted while the server is running, which means anyone who can connect and query can read everything.This is the crux of a debate that has raged in the Postgres community for years. If TDE only protects against disk theft, and filesystem encryption (LUKS, ZFS native encryption, dm-crypt) already does the same thing, why should Postgres spend enormous engineering effort to reinvent it inside the[...]

All Your GUCs in a Row: checkpoint_timeout and checkpoint_completion_target
Posted by Christophe Pettus in pgExperts on 2026-05-22 at 01:00
PostgreSQL's default 5-minute checkpoint interval wastes I/O on modern servers.

Contributions for week 19, 2026
Posted by Cornelia Biacsics in postgres-contrib.org on 2026-05-21 at 21:44

On 12 May, 2026 the San Francisco Bay Area PostgreSQL Meetup Group met virtually, organized by Katharine Saar, Stacey Haysler and Christophe Pettus. Alex Yarotsky spoke at the event.

The Swiss PGDay Program Committee met to finalize the schedule:

  • Marion Baumgartner
  • Tobias Bussmann
  • Andreas Geppert
  • Johannes Graën
  • Stefan Keller
  • Michelle Willen

Community Blog Post:

Patch Today: CVE-2026-6473
Posted by Christophe Pettus in pgExperts on 2026-05-21 at 15:00
A critical integer wraparound bug lets unprivileged SQL users trigger heap corruption with potential arbitrary code execution.

From Managed PostgreSQL to Production RAG: Build Your Own Ellie in pgEdge Cloud
Posted by Antony Pegg in pgEdge on 2026-05-21 at 12:27

If you've used docs.pgedge.com recently, you've probably met Ellie. Ask her how to set up multi-master replication, or what port the MCP Server listens on, and she pulls the relevant documentation, assembles it into context, and gives you a grounded answer with source citations. She doesn't guess or hallucinate. She finds the actual docs and synthesizes an answer from them.Ellie is a RAG Server deployment. The pgEdge RAG Server is an API server for retrieval-augmented generation that runs hybrid search (vector similarity plus BM25) over content stored in Postgres, then sends matches to an LLM for grounded answers.pgEdge Cloud Deploy it alongside your database, point it at your tables, and your application gets the same pipeline Ellie uses, running against your data.Prefer to run it yourself? The RAG Server is 100% open source under the PostgreSQL License and available on GitHub. It's a single Go binary you point at any PostgreSQL 14+ database with pgvector installed, configured via YAML, with your own API keys for OpenAI, Anthropic, Voyage, or local Ollama. Hybrid search (vector similarity plus BM25), token budgets, and streaming responses are all in the binary. The Agentic AI Toolkit FAQ has more on how it fits with the rest of the toolkit, including Vectorizer and Docloader.

What kinds of problems do you run into building RAG pipelines from scratch?

If you've tried building a retrieval-augmented generation pipeline from scratch, you already know this: the problems compound on each other. You start with vector similarity search to find semantically relevant documents, but pure vector search misses exact-term queries. Someone asks about "error code 4012" or "order ABC-1234" and the semantic search returns conceptually related results instead of the one document that actually mentions that string. So you add BM25 keyword matching. Now you have two ranked result sets and you need a fusion algorithm to combine them without letting either method dominate the other.That's just retrieval. You also need token bud[...]

Top posters

Number of posts in the past two months

Top teams

Number of posts in the past two months

Feeds

Planet

  • Policy for being listed on Planet PostgreSQL.
  • Add your blog to Planet PostgreSQL.
  • List of all subscribed blogs.
  • Manage your registration.

Contact

Get in touch with the Planet PostgreSQL administrators at planet at postgresql.org.