Latest Blog Posts

Memories from PGConf.dev 2026
Posted by Stefan Fercot in Data Egret on 2026-05-28 at 15:04

Thanks to the organising team, I had the chance to attend PGConf.dev last week in Vancouver, Canada. And luckily, I wasn’t alone there — Valeria could join as well!

This year’s edition was particularly special: we celebrated 30 years of open source PostgreSQL together! Many activities revolved around the anniversary, including a special celebration-themed conference t-shirt, stickers, commemorative posters, and more.

My personal highlight was definitely Wednesday’s round-table retrospective, which brought together project founders and early contributors to reflect on PostgreSQL’s formative years and its remarkable evolution. Featuring Bruce Momjian, Tom Lane, Thomas Lockhart, Jan Wieck, Vadim Mikheev, and Jolly Chen, the discussion revisited what the Postgres project was like in its earliest open source incarnation — technically, culturally, and socially.

Then, on Thursday, a birthday cake cutting ceremony took place following a short speech from Bruce Momjian.

Even though the project still relies on decisions made in its early days — such as hacker mailing-list-focused discussions and documentation written close to the source code using DocBook — it is hard to imagine where PostgreSQL will be 10 or 20 years from now.

Especially today, with AI-assisted content generation accelerating code writing (though not necessarily code review!), the project will inevitably evolve. But as Jan Wieck concluded during the panel, we should take care to preserve and foster the PostgreSQL spirit.

After the panel, a group photo was organised. I can’t wait to see the result!

Speaking of pictures, a photo booth was set up to let attendees take funny pictures celebrating PostgreSQL’s 30th anniversary. As member of the PGDay Lowlands 2026 talk selection committee, I couldn’t resist taking a picture together with Floor Drees, who was the only representative from the organising team attending the event.

This year’s PGConf.dev schedule was divided into three parts.

Tuesda

[...]

Twenty Years, Three CVEs, One AI
Posted by Christophe Pettus in pgExperts on 2026-05-28 at 15:00
Three heap buffer overflows in PostgreSQL — including a 20-year-old pgcrypto bug — were found by an AI code analyzer. But.

Postgres as an Execution Environment for AI: Failure Modes, Hooks, and the ORBIT Framework
Posted by Vibhor Kumar on 2026-05-28 at 12:50

A field report from PGConf Dev 2026 — and a working framework for everyone who has to keep AI workloads running in production.

It’s 3:47 AM. The Pager Goes Off.

A production AI batch job is stuck. Sixty thousand rows are locked. Your application performance is degrading. The post-mortem the next morning will be filed under “unknown cause.”

Here is what actually happened, minute by minute:

  • T+00:00 — The summarization job kicks off. Two hundred workers open transactions, query rows, and call out to an LLM provider.
  • T+00:32 — The LLM provider hits a rate limit and returns HTTP 429 across all two hundred concurrent calls simultaneously.
  • T+00:32 — Every worker retries. All of them. At the same instant. Still inside their open transactions. Row locks still held.
  • T+01:04 — Retry storm number two. PgBouncer’s pool is exhausted. Normal application traffic now starts failing, too.
  • T+01:36 — The on-call engineer wakes up and finds pg_stat_activity full of “idle in transaction.”
  • T+03:47 — The table is finally vacuumed. Service restored.

The root cause has a name, but it’s a name the industry hasn’t agreed on yet: external call state held inside an open transaction. There is no alert for it. There is no entry in the runbook. There is no metric you can graph. The post-mortem says “unknown cause” because we don’t yet have a shared vocabulary for this failure mode.

I gave a talk at PGConf Dev about this and a class of related production incidents. By the end of the session, I wanted the audience to walk away with two things: enough technical understanding to fix something on Monday morning, and enough conceptual vocabulary to argue for the right primitives in the Postgres community.

This post is the long-form version of that talk. It introduces a five-letter framework — ORBIT — that maps every failure mode I’ll discuss to one of five principles, and shows how the same framework applies whether you’re an SRE staring at a pager, an Enterp

[...]

Automating PostgreSQL Index Tuning Using AI
Posted by warda bibi in Stormatics on 2026-05-28 at 11:45

If you have a slow query, one of the obvious moves is to add an index. So you look at the WHERE clause, pick a column, run CREATE INDEX, and test again. Sometimes it helps, often it doesn’t. And now you have an index sitting there, not helping reads, but slowing down every write, because INSERT, UPDATE, and DELETE all have to maintain it. And it gets worse as your system grows.

Five queries are manageable. You can reason about column choices, test combinations, and check EXPLAIN output. When you are dealing with fifty queries across a dozen tables,  you are evaluating hundreds of possible column combinations manually, each one potentially breaking something in production if you get the locking wrong. 

Why index tuning is harder than it looks

Most people know that indexes speed up reads. Fewer people think carefully about what actually happens when they add one.

First, PostgreSQL might not use it. The planner compares strategies and picks the cheapest one. If your query touches a large fraction of the table, a sequential scan might actually be cheaper than an index scan. Creating the index doesn’t change that math, it just adds overhead.

Second, even with CONCURRENTLY, creating an index on a busy table isn’t free. It competes with your workload, can cause replication lag, and sometimes it times out. People don’t plan for this until it happens at 2am.

Third, and this one is subtle, adding the wrong index is sometimes worse than adding nothing. You pay the write overhead with zero read benefit.

The harder part is column ordering. A composite index on (status, customer_id) and one on (customer_id, status) are completely different things. The planner’s decision about which one to use depends on your data distribution, which conditions appear in your WHERE clause, and how selective each column is. Getting this wrong by hand is easy. Verifying it without touching production is the real challenge.

The idea behind automation

[...]

How to hack Logical Replication in PostgreSQL: Insights from contributors
Posted by Hayato Kuroda in Fujitsu on 2026-05-28 at 01:05

Logical replication has come a long way since its introduction in PostgreSQL. It is now being adopted more widely than ever, powering cross-version upgrades, multi-region deployments, and real-time analytics pipelines. Yet significant opportunities for improvement still remain.

 

All Your GUCs in a Row: commit_delay and commit_siblings
Posted by Christophe Pettus in pgExperts on 2026-05-28 at 01:00
Tune `commit_delay` to batch WAL flushes and trade latency for throughput—but only if `pg_test_fsync` proves sync time is your bottleneck.

REPACK CONCURRENTLY: pg_squeeze Gets a Promotion
Posted by Christophe Pettus in pgExperts on 2026-05-27 at 15:00
PostgreSQL 19 brings REPACK CONCURRENTLY, a native alternative to pg_repack that rewrites tables without crippling locks.

Graph Queries in Postgres with Apache AGE
Posted by Elizabeth Garrett Christensen in Snowflake on 2026-05-27 at 07:00

The Iceberg tables look like normal Postgres tables. You create them with USING iceberg and they're backed by Parquet on S3:

Postgres engines now have access to more data than ever. With extensions like pg_lake, you can connect Postgres to gobs of files in object storage like csv, json, Apache Parquet™ and Apache Iceberg™.

But having access to data in object storage and being able to aggregate data in object storage are two different things. This blog walks through the Postgres extension, Apache AGE™, that makes working with huge files of data sets much friendlier through graph relationships.

Why graph matters for data lakes

Let's consider a healthcare network with providers, patients, facilities and referral chains. The analytical questions are straightforward:

  • What's the total billed amount per region?
  • Which patients have the highest spend?
  • What's the average claim by specialty?

SQL on Iceberg handles all of these beautifully. But if you need to know: "Which in-network providers are referring patients to out-of-network providers through chains of intermediaries, and what's the dollar impact?" Those questions have two parts: a graph traversal (find the referral chains) and an analytical aggregation (sum the costs). Neither a pure graph database nor a pure analytical engine can answer it alone. You need both, working together, in the same query.

This is where data lakes need graphs.

Why Apache AGE

Apache AGE is a PostgreSQL extension that adds openCypher graph query support directly inside Postgres. There are other graph databases out there — Neo4j, Amazon Neptune, TigerGraph — but AGE has a unique advantage for the modern data platform — it runs inside PostgreSQL.

We've recently seen customers increasingly using Apache AGE for a couple of reasons:

  1. No data movement: Your Iceberg tables and your graph live in the same database. You don't extract, transform and load (ETL) data out of your lake into a separate graph database, keep it in sync and mainta
[...]

All Your GUCs in a Row: cluster_name
Posted by Christophe Pettus in pgExperts on 2026-05-27 at 01:00
cluster_name looks like a cosmetic label for process listings, but on a standby it silently becomes the name your primary uses to verify synchronous…

PGConf.dev 2026: Why It Remains My Favorite PostgreSQL Conference
Posted by cary huang in Highgo Software on 2026-05-26 at 23:33

About PGConf.dev

PGConf.dev is an annual developer event focused entirely on contributing to the PostgreSQL ecosystem, including core software development and community building. It serves as a primary hub for PostgreSQL hackers, maintainers, and ecosystem developers to meet, collaborate, and share knowledge.

This year’s conference was hosted once again in Vancouver, which also happens to be my hometown :D. On top of that, 2026 also marks PostgreSQL’s 30th anniversary, giving this year’s conference an even more meaningful atmosphere for long-time contributors and community members around the world.

This is one of my favorite PostgreSQL conferences because it gives people an opportunity to truly meet, learn from, and interact with the brilliant minds behind PostgreSQL. Many of the names we usually only see in hacker mailing list discussions, patch reviews and commit messages suddenly become real people standing right beside you, discussing ideas and working together to make PostgreSQL better.

It feels like the PostgreSQL hacker discussions have come to life in the real world. The conversations are real, the passion is real, and the community spirit is everywhere.

A huge thank you once again to all the organizers, volunteers, speakers, and sponsors who invested countless hours to make this conference possible. This blog is a personal summary of my own experience attending pgconf.dev 2026, and for those who could not attend, I hope it can give you at least a small glimpse of what it feels like to be part of this incredible PostgreSQL community.

Long Post Ahead!

Tuesday No Longer Feels Like a “Pre-Conference” Day

Unlike the “Tuesdays” from previous pgconf.dev conferences, this year’s Tuesday schedule is packed with an impressive variety of engaging sessions, including round-table discussions, interviews, keynote-style talks, brainstorming sessions, and community-led discussions focused on spe

[...]

5 PostgreSQL locking behaviors that trip people up
Posted by Shinya Kato on 2026-05-26 at 20:00

Introduction

PostgreSQL uses MVCC (Multi-Version Concurrency Control) for concurrency control: reads never block writes, and writes never block reads.

Its locking system has 8 table-level lock modes and 4 row-level lock modes, and the conflict tables in the documentation tell you exactly which lock modes conflict with which.

In practice, though, once you actually operate PostgreSQL, locks end up conflicting in places you never expected. Queries take far longer than anticipated, and in the worst case you end up with an outage.

This article walks through five of these counterintuitive locking behaviors.

Environment

  • Version: PostgreSQL 18
  • Transaction isolation level: READ COMMITTED (the default)

1. Once an ACCESS EXCLUSIVE request is queued, subsequent queries get blocked in a chain

The first one: an ALTER TABLE that should finish instantly can bring your entire service to a halt.

Suppose one session is running a long SELECT on table t, and another session runs the following ALTER TABLE:

Session 1

SELECT pg_sleep(600) FROM t LIMIT 1; -- a long-running SELECT

Session 2

ALTER TABLE t ADD COLUMN name text;

Since Session 1 holds an ACCESS SHARE lock on table t, the ACCESS EXCLUSIVE lock that Session 2's ALTER TABLE requires is forced to wait. So far, this is expected behavior.

But PostgreSQL's lock waiting works like a FIFO queue. While the ACCESS EXCLUSIVE lock is waiting, any SELECT issued against table t afterward gets stuck behind it — even though that SELECT does not conflict with Session 1's currently running SELECT at all.

Diagram showing subsequent SELECTs also forced to wait in the lock queue

Besides a long-running SELECT, the same thing happens with a session that ran a SELECT inside a BEGIN transaction and then left it open without COMMIT/ROLLBACK (a so-called idle in transaction session). An ACCESS SHARE lock is held until the transaction ends, so even if the SELECT itself finished in an instant, simply forgetting to close the transaction will keep blocking ALTER TABLE. This is often caused by

[...]

My Reflections on PGConf.dev 2026
Posted by Ashutosh Bapat on 2026-05-26 at 17:07

From Ottawa to Vancouver

For a long time, PGCon in Ottawa was a staple for the PostgreSQL community. I always had a soft spot for Ottawa; it fondly reminded me of my childhood days in Pune. So, when it was first announced that the reincarnated PGConf.dev would take place in Vancouver, I felt a twinge of sadness.

However, the moment I landed in Vancouver, all my reservations dissolved. The city won me over instantly—from riding a bicycle along the seawall and admiring the beautiful totem poles, to watching the Gastown Steam Clock. Returning this year, I was incredibly happy to explore Stanley Park on foot once again.

But as beautiful as Vancouver is, the real magic happened inside the venue.

Tuesday: The "Conference Before the Conference"

This year’s event kicked off with a brand-new addition: the Tuesday community discussions. It turned out to be so packed with value that it felt like getting two conferences in a single week—just like the advertisement in my hotel elevator promised!


1. Graph Databases Developer Meeting

I conducted a session to discuss the priorities and future of graph database offerings in PostgreSQL, specifically focusing on SQL/PGQ and Apache/AGE. Given that SQL/PGQ is a newly-committed feature, I expected a modest turnout of about a dozen people. To my surprise, over 25 people attended, and most actively participated in the debate.

2. Working Groups: Logical Replication & Multithreading

  • Logical Replication: While it has come a long way over the last decade, there is still a long road ahead. We dove deep into DDL replication and replication in multi-tenant, multi-DB clusters. My main takeaway? There are still a few missing pieces before logical replication can be effectively leveraged for high-volume OLAP use cases.

  • Multithreading: This ambitious effort aims to replace Postgres's traditional process-driven architecture with threads. As I fondly like to say: it’s like replacing an elephant’s heart with a cougar’s

[...]

Managed Postgres, Examined: Google AlloyDB for PostgreSQL
Posted by Christophe Pettus in pgExperts on 2026-05-26 at 17:00
Fourth in a series of dispassionate tours of managed PostgreSQL services. Previously: RDS, Aurora, and Cloud SQL. AlloyDB is Google’s distributed-storage PostgreSQL, the closest architectural parallel on GCP to Aurora on AWS, with enough distinctive differences to be worth understanding as its ow…

What Else Is In There?
Posted by Christophe Pettus in pgExperts on 2026-05-26 at 16:00
A 1990s-vintage PostgreSQL extension shipping a critical buffer overflow in 2026 reveals a larger problem: most teams can't even say what's actually installed…

PGConf.dev 2026 Trip Summary
Posted by Jeremy Schneider on 2026-05-26 at 15:46

I’m back home from Vancouver. What a great week – in every way. I’ll try to share a few highlights here.

Updated Happiness Hints

First and foremost: after many years, the Happiness Hints have received a major update! Before the conference, I updated the hints based on all the feedback I’ve collected over the past few years. Then the hints were updated into a poster format and we printed it as part of the pgconf.dev poster session. Throughout the week, I continued collecting more feedback. I used a sharpie during the conference and marked up the poster with ideas. Special thanks to Laurenz Albe, David Rader, Sami Imseih, Ryan Booz and Nik Samokhvalov (Nik you weren’t at the conference but a happiness hint resulted from other discussions we’ve had). Of course I’m forgetting more people who gave feedback making the happiness hints better. After coming home from the conference, I incorporated all the notes I had – and the version that’s now published here at ardentperf.com is the latest & best version I’ve assembled so far.

Physical Replication and Postgres High Availability

An extraordinary number of postgres users rely on physical replication for high availability. It’s been around for a long time and it works well. Nonetheless, there are a few rough edges and over the years there have been various mailing list threads that haven’t fully been resolved.

I proposed a Friday unconference session on this topic, and the topic received enough votes to be selected. Notes from the unconference are available on the Postgres wiki. But the discussion extended far beyond the unconference; there were also hallway discussions over coffee (thanks Thomas Munro) and then continuing discussions over dinner at Joey Burrard and beers at Steamworks (thanks Ants Aasma).

The first question that everybody asks is “should postgres have more HA capabilities in core”? And a discussion starting along these lines consumed the first half of the unconference.

But I thought the most interesting train of

[...]

EXPLAIN Prettier, or Post-Processing Query Plans in Postgres
Posted by Andrei Lepikhov in pgEdge on 2026-05-26 at 11:22

This story started with a book gifted by a colleague. Reading Jimmy Angelakos' «PostgreSQL Mistakes and How to Avoid Them», I realised something that had been bugging me - in Postgres, the EXPLAIN command produces far too much information. The examples that authors typically present when discussing various aspects of database systems make it harder to analyse the problem at hand and distract the reader. That's how the idea of a post-processing for EXPLAIN output was born - to make query plans more readable and problem-focused.

Information Overload

Anyone who has worked with PostgreSQL knows the command, or more precisely . It is typically used to investigate query performance issues or demonstrate optimisation techniques. But there is one problem: its output is packed with highly specific information. For instance, the parameter is rarely needed when analysing EXPLAIN output. Some fields, such as , take up a lot of visual space, are system-dependent, and frequently unnecessary - yet if we want to see , we have to run EXPLAIN with , and inevitably comes along for the ride.So, is it really a big deal to study an EXPLAIN with slightly more information? Ha! Let’s look at a typical query plan from my investigations - here, for example, are two plans for the same query: one is the bad plan, and the other is the good one.Finding problems in such a large plan takes time, and every extraneous detail makes it harder to spot the problematic decision. Sure, with the rise of AI agents, I can just ask Claude to compare a pair of plans, highlight the differences, and analyse what’s wrong. But this doesn’t always work - either there are too many details, or automation is needed across a large stream of queries - so the problem remains.

Regression Test Stability

EXPLAIN output changes between Postgres versions, and if your extension supports 4-5 recent versions, then your tests must pass on each of them. This means we need to filter EXPLAIN output to guarantee stable test runs across different hardware and software conf[...]

AI Without Semantics Is Just Expensive Guessing
Posted by Vibhor Kumar on 2026-05-26 at 11:01

Why Metadata, Taxonomy, Ontology, Knowledge Graphs, and Context Are Becoming the Foundation of Enterprise AI

The technology industry is entering a fascinating phase where many older ideas are suddenly becoming critically important again. Concepts that once lived mostly inside data governance discussions, library sciences, enterprise architecture diagrams, semantic web research, and metadata catalogs are now being pulled directly into the center of modern AI conversations. This shift is not accidental. As organizations move from AI experimentation to real enterprise deployment, they are discovering that intelligence does not come only from models. It also depends on how well the organization understands, organizes, governs, and connects its data.

Over the last two years, terms such as context engineering, AI memory, semantic retrieval, knowledge-driven agents, and context-aware systems have rapidly entered mainstream technology discussions. Vendors now describe platforms as intelligent because they can retrieve documents semantically, maintain conversational memory, or connect AI systems to enterprise knowledge. At first glance, this sounds entirely new. In reality, much of the industry is rediscovering a foundational truth that enterprise architects, database professionals, and information management practitioners have understood for years: AI systems cannot reason reliably without meaningful data.

This is not simply a model problem. It is fundamentally a data architecture problem. Large Language Models are extraordinarily capable at generating language, summarizing content, writing code, and answering questions conversationally. However, enterprises are increasingly discovering that language generation alone does not create trustworthy intelligence. An AI system may generate answers that sound convincing while still lacking operational correctness, governance awareness, relationship understanding, or contextual grounding.

This is precisely why concepts such as metadata, semantics, taxonom

[...]

Critical PMM Alerts Every PostgreSQL DBA Must Track
Posted by SHRIDHAR KHANAL in Stormatics on 2026-05-26 at 09:50

The Night When Things Almost Went Down

Have you ever left for home on a Friday evening feeling confident about your work for the day, at peace knowing your system would survive the coming weekend? We’ve all felt that way at some point.

Meanwhile, the disk on the server had quietly reached 90% utilization. Write-Ahead Log (WAL) files had accumulated enormously, one long-running query had been running for over an hour, and nobody noticed because, some time earlier, the dashboard had looked fine.

After a while, writes to the database started to fail, and you received messages like, “Hey, the DB seems a bit slow?” In reality, the database was no longer slow; it had already gone down.

However, this wasn’t anything new. The signs of trouble had been there the entire time. The monitoring simply wasn’t connected to the right signals to raise the alarm.

And that’s what this post is about.

Dashboards vs Alerts

Let’s be honest. Dashboards are great for screenshots.

However, they make one risky assumption that someone will make this check before it’s too late. Alerts make no such assumption.

  • Dashboards = “Here’s what’s happening.”
  • Alerts = “There is something wrong. Need to address it.”

A dashboard gently shows an increasing CPU. But an alert rings, saying: “This has been bad for 30 minutes, wake up.”

Dashboards are waiting for us. But alerts find us. After a few incidents, we no longer rely on the dashboard to keep us safe while asleep. It’s through alerts that it happens.

Configuring Alerts in PMM

These rules are set up by going to Alerting → Alert Rules in the PMM interface and creating a New Alert Rule. For every alert, we need to configure three things:

Duration: How long the condition must hold before the alert fires. This filters out temporary sp

[...]

The PostgreSQL Conference HOW2026 in Jinan, China
Posted by Josef Machytka in credativ on 2026-05-26 at 09:33

At the end of April, I had the incredible opportunity to represent credativ on the HOW2026 (Hello Open-source World), the PostgreSQL & IvorySQL Eco Conference in Jinan, China. IvorySQL is a Chinese fork of PostgreSQL with extended Oracle compatibility. In recent years, it has been gaining popularity not only in China. The conference brought together PostgreSQL and IvorySQL experts, contributors, and open-source database enthusiasts from around the world. The atmosphere was excellent, and I would like to share several highlights together with a recap of my contributions.

My Contributions

I was honored to contribute to the HOW2026 program with two sessions.

Workshop: The Alchemy of Shared Buffers

On Sunday afternoon, I led a three-hour deep-dive workshop exploring the intricate mechanics of PostgreSQL shared buffers. I discussed how Linux implements shared memory operations through the tmpfs file system, how shared buffers work internally, their practical limitations and usage patterns, why and how huge pages should be used for their allocation, why transparent huge pages negatively affect performance, and practical strategies for balancing high concurrency with optimal system performance.

Talk: Linux and PostgreSQL in the Multiverse of Connections

On Tuesday afternoon, I gave a talk focused on PostgreSQL connection overhead. I discussed how PostgreSQL connections interact with the Linux kernel, covering memory usage, physical memory allocation mapping, context-switching overhead, and PostgreSQL internals to explain why large numbers of connections can significantly degrade performance.
 

Selected Highlights from the Conference
The conference opened on Monday morning in the Golden Hall of the Shandong Hotel in Jinan. The opening sequence featured a series of keynote talks and community presentations:
  • The Development Trends of Databases in the Era of Data Int
[...]

Contributions for week 20, 2026
Posted by Cornelia Biacsics in postgres-contrib.org on 2026-05-26 at 07:50

PGConf.dev 2026 took place from May 19-22 2026, organized by

  • Gwen Shapira
  • Jonathan Katz
  • Kaiting Chen
  • Magnus Hagander
  • Melanie Plageman
  • Paul Ramsey
  • Robert Haas
  • Steve Singer

Program Committee:

  • Jacob Champion
  • Jonathan Katz
  • Dilip Kumar
  • Melanie Plageman
  • Paul Ramsey

Tuesday Planning Committee:

  • Claire Giordano
  • Corey Huinker
  • Matthias van de Meent
  • Paul Jungwirth
  • Robert Haas

Code of Conduct Committee:

  • Pavlo Golub
  • Stacey Haysler
  • Masahiko Sawada

Volunteers:

  • Alena Rybakina
  • Álvaro Herrera
  • Amit Kapila
  • Amit Langote
  • Andrew Dunstan
  • Cary Huang
  • Chirag Dave
  • Claire Giordano
  • Cornelia Biacsics
  • Daniel Gustafsson
  • David Rader
  • Divya Bhargov
  • Euler Taveira
  • Gary Evans
  • Grant Zhou
  • Hari P Kiran
  • Jeremy Schneider
  • Kai Wagner
  • Keiko Oda
  • Koji Annoura
  • Lilian Ontowhee
  • Manni Wood
  • Mark Wong
  • Marshall Bush
  • Miaolai Zhou
  • Nazir Bilal Yavuz
  • Oleksii Kliukin
  • Palak Chaturvedi
  • Pavlo Golub
  • Phil Alger
  • Philippe Noël
  • Rahila Syed
  • Raluca Constantin
  • Rick Lowe
  • Robert Haas
  • Shayon Sanyal
  • Steve Singer
  • Sumedh Pathak
  • Tristan Partin
  • Xuneng Zhou
  • Yogesh Sharma

Speakers (including Tuesday Community Sessions & Lightning Talks):

  • Alastair Turner
  • Alena Rybakina
  • Alexandra Wang
  • Álvaro Herrera
  • Amit Kapila
  • Amit Langote
  • Amul Sul
  • Andreas Scherbaum
  • Andres Freund
  • Andrew Dunstan
  • Andrey Borodin
  • Ants Aasma
  • Ashutosh Bapat
  • Bruce Momjian
  • Christoph Berg
  • Christophe Pettus
  • Claire Giordano
  • Corey Huinkez
  • Cornelia Biacsics
  • Daniel Gustafsson
  • Daniele Varrazzo
  • Dave Page
  • David E. Wheeler
  • David Rowley
  • David Shorten
  • Devrim Gündüz
  • Dilip Kumar
  • Floor Drees
  • Grant Zhou
  • Greg Burd
  • Haibo Yan
  • Hari Kiran
  • Hayato Kuroda
  • He
[...]

All Your GUCs in a Row: client_min_messages
Posted by Christophe Pettus in pgExperts on 2026-05-26 at 01:00
`client_min_messages` controls what your session sees, not what the server logs—a confusion that spawns most of its trouble.

Foreign Tables and Materialized Views: A Dynamic Duo
Posted by Richard Yen on 2026-05-25 at 08:00

Introduction

I recently wrote a post about WAL log shipping and how a standby built on log shipping is a great way to give data analysts production data without putting the primary at risk. Having access to the production data in this way is great, but it’s read-only. How can we create views of this data for better analytics work? I want to make the case today that Foreign Data Wrappers and Materialized Views can make a great solution – not only in accessing production Postgres data, but also working with other data sources.


Moving Beyond FDW Demos

Most people meet foreign data wrappers (FDWs) through a quick demo, and I’ve highlighted some of their features in previous conference talks. There is high novelty in being able to query MySQL from Postgres, but the reality is often that the latency between the local database and the foreign table can be pretty high. Sometimes, predicate push-down isn’t what you’d expect, and indexing may not be very transparent. In the end, setting up and managing FDWs may seem more work than it’s worth, and that’s a mistake. Used correctly, foreign tables are one of the most practical tools for analytics across heterogeneous data sources – especially when paired with materialized views.


The Real Problem: Heterogeneous Data

Modern data rarely lives in one place:

  • Legacy systems in MySQL
  • Operational data in PostgreSQL
  • Flat files sitting in object storage (I’ve seen people do this with AWS Athena)
  • Maybe even some CSVs someone refuses to migrate

Foreign tables give you a unified SQL interface, but under the hood, the query performance can be unpredictable as you may be forced to rely on another engine’s query planner (and in the case of that CSV data source, it might not even be indexed).

In other words, FDWs optimize developer experience, not query performance.


The Pattern: FDW + Materialized Views

Instead of querying foreign tables directly in analytics workloads, we can opt to use FDWs as ingestion points, not as the servi

[...]

All Your GUCs in a Row: client_connection_check_interval
Posted by Christophe Pettus in pgExperts on 2026-05-25 at 01:00
PostgreSQL 14 added a parameter that detects dead client connections during query execution, stopping wasted work before it's too late.

A Correction: Snowflake Postgres Is Just Postgres
Posted by Christophe Pettus in pgExperts on 2026-05-24 at 15:00
Snowflake Postgres is community Postgres, not a fork with a proprietary storage layer—a correction to Wednesday's post on table access methods, and a closer…

TOAST: Where PostgreSQL hides big values
Posted by Radim Marek on 2026-05-24 at 12:15

In earlier posts in this series we established that every heap tuple lives inside a strict 8KB page. Everything else is built on top of that hard limit: MVCC, HOT updates, and indexes that point at (page, line_pointer). And yet this still works:

CREATE TABLE docs (id int PRIMARY KEY, body jsonb);
INSERT INTO docs VALUES (1, (SELECT jsonb_agg(g) FROM generate_series(1, 100000) g));

That body value is somewhere north of half a megabyte. The heap page is still 8KB. Both statements are true at the same time, and the mechanism that makes them coexist is TOAST: The Oversized-Attribute Storage Technique.

The 2KB threshold

The page is 8KB, but PostgreSQL starts shrinking tuples at around 2KB. The goal is to keep at least four tuples per page, so anything bigger triggers TOAST.

Two constants are at play. TOAST_TUPLE_THRESHOLD decides when the toaster runs; TOAST_TUPLE_TARGET is the size it tries to shrink the tuple down to. Both default to 2032 bytes on a standard 8KB build. The threshold is fixed at compile time; the target is per-table tunable via the toast_tuple_target storage parameter.

When an INSERT or UPDATE produces a tuple wider than the threshold, PostgreSQL doesn't wait to see whether the row would actually fail to fit on a page. It starts compressing and relocating attributes one at a time, and stops the moment the tuple drops below the target. A row only modestly over the line often gets away with one cheap compression pass; a much wider one walks the full procedure.

Only variable-length attributes are candidates. A bigint is always 8 bytes, so there is nothing to compress and nowhere to move it. A text, bytea, jsonb, numeric, or array, on the other hand, is a varlena: a length-prefixed structure PostgreSQL can freely rearrange.

The four storage strategies

Each variable-length column has a storage strategy: a flag in pg_attribute.attstorage that answers two yes/no questions for the toaster. Can this value be compressed? Can it be moved out-of-line? PostgreSQL picks a default

[...]

On Polymorphic Associations in Postgres
Posted by Andrei Lepikhov in pgEdge on 2026-05-24 at 11:03

OUTER JOIN is a typical plague of ORM-based PostgreSQL configurations: the planner is still relatively poor at optimising it. At the same time, ORM frameworks — and 1C as a prominent example — often generate outer joins from standard templates, which opens the door to targeted optimisations. In this article, I dig into one such template — polymorphic reference resolution: what the pattern is, where it comes from (Rails, Django, Hibernate, Salesforce — not just 1C), how widespread it is, and why its structural properties make it possible to significantly speed up execution.

Just for the introduction

Think of the homepage of a typical online store — say, Amazon. Personally, on my homepage, I see ads for several products:

  • An air fryer

  • Magnesium Citrate + B6

  • Protein brownies

  • and so on.

Thinking like a database developer, I can roughly imagine that this page is built from the results of a query like:

SELECT name, description FROM products p
  LEFT JOIN kitchen_appliances ka ON (p.id = ka.id)
    LEFT JOIN pharmacy f ON (p.id = f.id)
      LEFT JOIN sports_nutrition sf ON (p.id = sf.id)
      ...
ORDER BY p.popularity DESC
LIMIT N

Planning such a query efficiently is no easy task — and in my experience, this is confirmed by user reports from the 1C world, since PostgreSQL is currently not rich in LEFT JOIN optimisations. At the same time, the properties of this pattern enable the development of various techniques to improve execution efficiency. I've managed to implement several straightforward optimisations of this template. But first, let's understand what polymorphic references actually are, where they come from, and how common they really are. That's the gap I'm trying to fill with this post.

The Pattern

Many real-world data models contain references that can point to one of several target entity types. An order line may reference a physical product, a digital download, a gift card, or a subscrip

[...]

All Your GUCs in a Row: client_encoding
Posted by Christophe Pettus in pgExperts on 2026-05-24 at 01:00
Character encoding conversions happen silently between client and server—until they don't.

Upgrade PostgreSQL from 17 to 18 on Ubuntu 26.04
Posted by Paolo Melchiorre in ITPUG on 2026-05-23 at 22:00

Howto guide for upgrading PostgreSQL from version 17 to 18 on Ubuntu, after its upgrade from version 25.10 (Questing Quokka) to 26.04 (Resolute Raccoon).

All Your GUCs in a Row: checkpoint_flush_after and checkpoint_warning
Posted by Christophe Pettus in pgExperts on 2026-05-23 at 01:00
Meet two checkpoint neighbors paired by the alphabet, not the topic: `checkpoint_flush_after` tunes kernel writeback hints, while `checkpoint_warning` logs…

pgvector 0.8.2 and the Trouble With Parallel HNSW
Posted by Christophe Pettus in pgExperts on 2026-05-22 at 15:00
pgvector 0.8.2 fixes CVE-2026-3172, a heap buffer overflow in parallel HNSW index builds that can leak data or crash the server.

Top posters

Number of posts in the past two months

Top teams

Number of posts in the past two months

Feeds

Planet

  • Policy for being listed on Planet PostgreSQL.
  • Add your blog to Planet PostgreSQL.
  • List of all subscribed blogs.
  • Manage your registration.

Contact

Get in touch with the Planet PostgreSQL administrators at planet at postgresql.org.