In earlier posts in this series we established that every heap tuple lives inside a strict 8KB page. Everything else is built on top of that hard limit: MVCC, HOT updates, and indexes that point at (page, line_pointer). And yet this still works:
CREATE TABLE docs (id int PRIMARY KEY, body jsonb);
INSERT INTO docs VALUES (1, (SELECT jsonb_agg(g) FROM generate_series(1, 100000) g));
That body value is somewhere north of half a megabyte. The heap page is still 8KB. Both statements are true at the same time, and the mechanism that makes them coexist is TOAST: The Oversized-Attribute Storage Technique.
The page is 8KB, but PostgreSQL starts shrinking tuples at around 2KB. The goal is to keep at least four tuples per page, so anything bigger triggers TOAST.
TOAST_TUPLE_THRESHOLD decides when the toaster runs; TOAST_TUPLE_TARGET is the size it tries to shrink the tuple down to. Both default to 2032 bytes on a standard 8KB build. The threshold is fixed at compile time; the target is per-table tunable via the toast_tuple_target storage parameter.
When an INSERT or UPDATE produces a tuple wider than the threshold, PostgreSQL doesn't wait to see whether the row would actually fail to fit on a page. It starts compressing and relocating attributes one at a time, and stops the moment the tuple drops below the target. A row only modestly over the line often gets away with one cheap compression pass; a much wider one walks the full procedure.
Only variable-length attributes are candidates. A bigint is always 8 bytes, so there is nothing to compress and nowhere to move it. A text, bytea, jsonb, numeric, or array, on the other hand, is a varlena: a length-prefixed structure PostgreSQL can freely rearrange.
Each variable-length column has a storage strategy: a flag in pg_attribute.attstorage that answers two yes/no questions for the toaster. Can this value be compressed? Can it be moved out-of-line? PostgreSQL picks a default
OUTER JOIN is a typical plague of ORM-based PostgreSQL configurations: the planner is still relatively poor at optimising it. At the same time, ORM frameworks — and 1C as a prominent example — often generate outer joins from standard templates, which opens the door to targeted optimisations. In this article, I dig into one such template — polymorphic reference resolution: what the pattern is, where it comes from (Rails, Django, Hibernate, Salesforce — not just 1C), how widespread it is, and why its structural properties make it possible to significantly speed up execution.
Think of the homepage of a typical online store — say, Amazon. Personally, on my homepage, I see ads for several products:
An air fryer
Magnesium Citrate + B6
Protein brownies
and so on.
Thinking like a database developer, I can roughly imagine that this page is built from the results of a query like:
SELECT name, description FROM products p
LEFT JOIN kitchen_appliances ka ON (p.id = ka.id)
LEFT JOIN pharmacy f ON (p.id = f.id)
LEFT JOIN sports_nutrition sf ON (p.id = sf.id)
...
ORDER BY p.popularity DESC
LIMIT N
Planning such a query efficiently is no easy task — and in my experience, this is confirmed by user reports from the 1C world, since PostgreSQL is currently not rich in LEFT JOIN optimisations. At the same time, the properties of this pattern enable the development of various techniques to improve execution efficiency. I've managed to implement several straightforward optimisations of this template. But first, let's understand what polymorphic references actually are, where they come from, and how common they really are. That's the gap I'm trying to fill with this post.
Many real-world data models contain references that can point to one of several target entity types. An order line may reference a physical product, a digital download, a gift card, or a subscrip
[...]Howto guide for upgrading PostgreSQL from version 17 to 18 on Ubuntu, after its upgrade from version 25.10 (Questing Quokka) to 26.04 (Resolute Raccoon).
If you've ever compared database feature matrices, you may have noticed something a bit peculiar. Oracle has Transparent Data Encryption. SQL Server has it. MySQL has it. Even MariaDB has it. But Postgres, which we all consider the best database engine? Conspicuously absent.It’s not that nobody wants TDE. Compliance frameworks like PCI DSS and HIPAA practically demand encryption at rest. Cloud deployments make the “stolen disk” threat model more tangible than ever. And the question comes up constantly on mailing lists, at conferences, and in every database evaluation checklist ever assembled by a procurement department. So what gives?It’s complicated. The real answer involves nearly a decade of mailing list threads, competing proposals, fundamental disagreements about threat models, and a problem scope so vast it makes most contributors quietly back away. Let’s trace the history and find out why the elephant in the room is still unencrypted.
On 12 May, 2026 the San Francisco Bay Area PostgreSQL Meetup Group met virtually, organized by Katharine Saar, Stacey Haysler and Christophe Pettus. Alex Yarotsky spoke at the event.
The Swiss PGDay Program Committee met to finalize the schedule:
Community Blog Post:
If you've used docs.pgedge.com recently, you've probably met Ellie. Ask her how to set up multi-master replication, or what port the MCP Server listens on, and she pulls the relevant documentation, assembles it into context, and gives you a grounded answer with source citations. She doesn't guess or hallucinate. She finds the actual docs and synthesizes an answer from them.Ellie is a RAG Server deployment. The pgEdge RAG Server is an API server for retrieval-augmented generation that runs hybrid search (vector similarity plus BM25) over content stored in Postgres, then sends matches to an LLM for grounded answers.pgEdge Cloud Deploy it alongside your database, point it at your tables, and your application gets the same pipeline Ellie uses, running against your data.Prefer to run it yourself? The RAG Server is 100% open source under the PostgreSQL License and available on GitHub. It's a single Go binary you point at any PostgreSQL 14+ database with pgvector installed, configured via YAML, with your own API keys for OpenAI, Anthropic, Voyage, or local Ollama. Hybrid search (vector similarity plus BM25), token budgets, and streaming responses are all in the binary. The Agentic AI Toolkit FAQ has more on how it fits with the rest of the toolkit, including Vectorizer and Docloader.
When the future of pgBackRest suddenly became uncertain, the PostgreSQL ecosystem reacted quickly.
At Percona, we believed the most important question was not:
what replaces it?
but:
how do we ensure pgBackRest remains healthy, sustainable, and open for everyone?
That distinction matters.
pgBackRest is critical infrastructure used by enterprises around the world to protect some of their most important data. When projects like this face maintainership or sustainability challenges, organizations need trusted open source partners that can help provide continuity, stability, and confidence.
From the beginning, Percona believed the best outcome for pgBackRest was not fragmentation, forks, or closed alternatives.
What the project needed was continuity.
That meant working collaboratively across the ecosystem to help strengthen the project itself:
– coordinating funding discussions
– contributing engineering resources
– helping expand the maintainer base
– encouraging participation from multiple organizations
The goal was never to control the project. The goal was to help ensure pgBackRest remained open, healthy, and sustainable for the entire PostgreSQL community.
Those efforts are already producing results.
A joint effort across maintainers, contributors, and multiple companies is helping ensure pgBackRest returns in a stronger and healthier position than before. Funding, engineering support, and long-term sustainability discussions are now happening collaboratively across the ecosystem.
Percona is proud to play a part in that effort.
Just as importantly, this moment would likely never have happened without David Steele bringing visibility to the sustainability realities behind maintaining critical open source infrastructure.
For more than a decade, David built pgBackRest into one of the most trusted backup and recovery solutions in the PostgreSQL
Events unfolded quickly over the course of a couple of weeks starting on 27 April 2026, when a message appeared on the pgBackRest project announcing: that the repository would be archived and active maintenance would stop.
For many in the PostgreSQL ecosystem, this landed like a shock. pgBackRest is one of the most widely used backup and recovery tools for PostgreSQL, deeply embedded in production environments across enterprises large and small. Now it was suddenly described as “dead”, “EOL”, or “abandoned”. The trigger was clear: its long-time maintainer, after more than a decade of work, announced he could no longer continue without sustainable funding and would archive the repository. i That message spread fast. The interpretation spread even faster.
The conversation around AI infrastructure today is heavily focused on models, GPUs, inference speed, and vector databases. These are important building blocks, but they often distract from a deeper architectural challenge that is beginning to emerge as enterprises move from experimentation toward operational AI systems.
The challenge is memory.
Not memory in the simplistic sense of storing chat history or embeddings, but memory in the broader sense of maintaining durable context, operational continuity, historical understanding, workflow state, reasoning traceability, and business awareness across long-running AI interactions.
Many of the current AI systems appear intelligent during a single interaction, yet surprisingly fragile across time. They can summarize documents, answer questions, call APIs, generate code, and reason effectively within a bounded context window. However, once interactions become long-running, collaborative, stateful, and operationally significant, the limitations quickly become visible.
The issue is not necessarily that the models lack intelligence. The issue is that most AI systems today lack a coherent memory architecture.
As enterprises begin deploying agentic AI systems capable of acting autonomously across workflows, applications, and business processes, this gap becomes increasingly important. In many ways, modern AI agents resemble highly capable employees who forget large portions of their institutional knowledge every few hours. They may understand the current task extremely well, but they often struggle to consistently retain, prioritize, organize, and retrieve contextual knowledge accumulated over time.
This is where I believe PostgreSQL may become significantly more important in the future AI stack than many people currently realize.
Not simply as a vector database.
Not merely as storage for embeddings.
But potentially as the durable memory, operational state, and governance substrate for enterprise AI systems.
Note: this post was not rewritten by AI
I’ve been saying for a long time that AI can’t help me because no one else codes the way I do, so it doesn’t have any reference points. Then I realized many advantages of having AI perform some boring tasks, like writing tests (we know we need unit tests, and why we are not writing them? because we don’t have time!).
Something changed a couple of weeks ago, after some conversations I had at work, and here is what I think could potentially happen and bring some positive change.
I have been complaining for years about the application developers’ inertia and their overreliance on ORM tools rather than writing high-performing SQL. I am not going to repeat this rant here – I am not the only one, and you all know! And when I asked what I could do to facilitate changing the course, the answer would be: I am used to that way of programming; I know it works and produces the correct result, too bad it’s not always the most performant!
But now that thing have changed, and most developers use Claude Code, I am wondering whether it would be possible to teach just an AI assistant to use better techniques? Will it work, if people are not actually writing the code? I know that AI can use SQL performance tuning tips; would it be possible to teach AI to use NoRM?
Any thoughts? Or any volunteers to give it a try?
This article compares CloudNativePG and Crunchy PGO, two of the most adopted open-source operators for running PostgreSQL on Kubernetes. It covers architecture, image design, backup strategy, major version upgrades, observability, licensing and community health. As a co-founder and maintainer of CloudNativePG, I make no claim to neutrality, and I say so upfront. What I can offer is informed bias, grounded in years of daily work on the project and a genuine respect for what Crunchy Data built in this space.
If you’ve been running PostgreSQL for any length of time, you’ve probably heard about transaction ID (XID) wraparound. It’s one of the most well-known maintenance concerns in Postgres, and there’s no shortage of blog posts, conference talks, and war stories about it. But there’s a quieter, less-discussed cousin that can cause the exact same kind of outage: MultiXact ID wraparound.
I’ve seen this surprise more than a few experienced DBAs. They’ve got their autovacuum tuned, they’re monitoring age(datfrozenxid), and they’re feeling good – and then out of nowhere, Postgres starts refusing certain writes because it’s approaching MultiXact ID wraparound.
The fix is the same as regular XID wraparound – a simple vacuum. But the reason is different, and understanding it can help you keep your monitoring complete.
In Postgres, every row has a system column called xmax. In the simplest case, xmax holds the transaction ID of the transaction that deleted or updated the row. But what happens when multiple transactions hold locks on the same row at the same time?
Consider SELECT ... FOR SHARE. Multiple transactions can hold a shared lock on the same row concurrently. Postgres needs to record all of those transactions somewhere, but xmax is only wide enough to store a single transaction ID. The solution is the MultiXact mechanism.
A MultiXact ID is essentially a pointer into a separate structure (stored as a file in the pg_multixact/ dir) that maps to a list of transaction IDs and their lock modes. When multiple transactions need to lock a row, Postgres:
xmax field, with a flag (specifically, the HEAP_XMAX_IS_MULTI infomask bit in the tuple header) indicating it’s a multi-xact reference rather than a plain XID
This lets the xmax field stay a fixed 32-bit value while still representing an arb
Number of posts in the past two months
Number of posts in the past two months
Get in touch with the Planet PostgreSQL administrators at planet at postgresql.org.