The Iceberg tables look like normal Postgres tables. You create them with USING iceberg and they're backed by Parquet on S3:
Postgres engines now have access to more data than ever. With extensions like pg_lake, you can connect Postgres to gobs of files in object storage like csv, json, Apache Parquet™ and Apache Iceberg™.
But having access to data in object storage and being able to aggregate data in object storage are two different things. This blog walks through the Postgres extension, Apache AGE™, that makes working with huge files of data sets much friendlier through graph relationships.
Let's consider a healthcare network with providers, patients, facilities and referral chains. The analytical questions are straightforward:
SQL on Iceberg handles all of these beautifully. But if you need to know: "Which in-network providers are referring patients to out-of-network providers through chains of intermediaries, and what's the dollar impact?" Those questions have two parts: a graph traversal (find the referral chains) and an analytical aggregation (sum the costs). Neither a pure graph database nor a pure analytical engine can answer it alone. You need both, working together, in the same query.
This is where data lakes need graphs.
Apache AGE is a PostgreSQL extension that adds openCypher graph query support directly inside Postgres. There are other graph databases out there — Neo4j, Amazon Neptune, TigerGraph — but AGE has a unique advantage for the modern data platform — it runs inside PostgreSQL.
We've recently seen customers increasingly using Apache AGE for a couple of reasons:
PGConf.dev is an annual developer event focused entirely on contributing to the PostgreSQL ecosystem, including core software development and community building. It serves as a primary hub for PostgreSQL hackers, maintainers, and ecosystem developers to meet, collaborate, and share knowledge.
This year’s conference was hosted once again in Vancouver, which also happens to be my hometown :D. On top of that, 2026 also marks PostgreSQL’s 30th anniversary, giving this year’s conference an even more meaningful atmosphere for long-time contributors and community members around the world.
This is one of my favorite PostgreSQL conferences because it gives people an opportunity to truly meet, learn from, and interact with the brilliant minds behind PostgreSQL. Many of the names we usually only see in hacker mailing list discussions, patch reviews and commit messages suddenly become real people standing right beside you, discussing ideas and working together to make PostgreSQL better.
It feels like the PostgreSQL hacker discussions have come to life in the real world. The conversations are real, the passion is real, and the community spirit is everywhere.
A huge thank you once again to all the organizers, volunteers, speakers, and sponsors who invested countless hours to make this conference possible. This blog is a personal summary of my own experience attending pgconf.dev 2026, and for those who could not attend, I hope it can give you at least a small glimpse of what it feels like to be part of this incredible PostgreSQL community.
Unlike the “Tuesdays” from previous pgconf.dev conferences, this year’s Tuesday schedule is packed with an impressive variety of engaging sessions, including round-table discussions, interviews, keynote-style talks, brainstorming sessions, and community-led discussions focused on spe
[...]PostgreSQL uses MVCC (Multi-Version Concurrency Control) for concurrency control: reads never block writes, and writes never block reads.
Its locking system has 8 table-level lock modes and 4 row-level lock modes, and the conflict tables in the documentation tell you exactly which lock modes conflict with which.
In practice, though, once you actually operate PostgreSQL, locks end up conflicting in places you never expected. Queries take far longer than anticipated, and in the worst case you end up with an outage.
This article walks through five of these counterintuitive locking behaviors.
ACCESS EXCLUSIVE request is queued, subsequent queries get blocked in a chain
The first one: an ALTER TABLE that should finish instantly can bring your entire service to a halt.
Suppose one session is running a long SELECT on table t, and another session runs the following ALTER TABLE:
Session 1
SELECT pg_sleep(600) FROM t LIMIT 1; -- a long-running SELECT
Session 2
ALTER TABLE t ADD COLUMN name text;
Since Session 1 holds an ACCESS SHARE lock on table t, the ACCESS EXCLUSIVE lock that Session 2's ALTER TABLE requires is forced to wait. So far, this is expected behavior.
But PostgreSQL's lock waiting works like a FIFO queue. While the ACCESS EXCLUSIVE lock is waiting, any SELECT issued against table t afterward gets stuck behind it — even though that SELECT does not conflict with Session 1's currently running SELECT at all.
[...]Besides a long-running
SELECT, the same thing happens with a session that ran aSELECTinside aBEGINtransaction and then left it open withoutCOMMIT/ROLLBACK(a so-called idle in transaction session). AnACCESS SHARElock is held until the transaction ends, so even if theSELECTitself finished in an instant, simply forgetting to close the transaction will keep blockingALTER TABLE. This is often caused by
For a long time, PGCon in Ottawa was a staple for the PostgreSQL community. I always had a soft spot for Ottawa; it fondly reminded me of my childhood days in Pune. So, when it was first announced that the reincarnated PGConf.dev would take place in Vancouver, I felt a twinge of sadness.
However, the moment I landed in Vancouver, all my reservations dissolved. The city won me over instantly—from riding a bicycle along the seawall and admiring the beautiful totem poles, to watching the Gastown Steam Clock. Returning this year, I was incredibly happy to explore Stanley Park on foot once again.
But as beautiful as Vancouver is, the real magic happened inside the venue.
This year’s event kicked off with a brand-new addition: the Tuesday community discussions. It turned out to be so packed with value that it felt like getting two conferences in a single week—just like the advertisement in my hotel elevator promised!
I conducted a session to discuss the priorities and future of graph database offerings in PostgreSQL, specifically focusing on SQL/PGQ and Apache/AGE. Given that SQL/PGQ is a newly-committed feature, I expected a modest turnout of about a dozen people. To my surprise, over 25 people attended, and most actively participated in the debate.
Logical Replication: While it has come a long way over the last decade, there is still a long road ahead. We dove deep into DDL replication and replication in multi-tenant, multi-DB clusters. My main takeaway? There are still a few missing pieces before logical replication can be effectively leveraged for high-volume OLAP use cases.
Multithreading: This ambitious effort aims to replace Postgres's traditional process-driven architecture with threads. As I fondly like to say: it’s like replacing an elephant’s heart with a cougar’s
I’m back home from Vancouver. What a great week – in every way. I’ll try to share a few highlights here.
First and foremost: after many years, the Happiness Hints have received a major update! Before the conference, I updated the hints based on all the feedback I’ve collected over the past few years. Then the hints were updated into a poster format and we printed it as part of the pgconf.dev poster session. Throughout the week, I continued collecting more feedback. I used a sharpie during the conference and marked up the poster with ideas. Special thanks to Laurenz Albe, David Rader, Sami Imseih, Ryan Booz and Nik Samokhvalov (Nik you weren’t at the conference but a happiness hint resulted from other discussions we’ve had). Of course I’m forgetting more people who gave feedback making the happiness hints better. After coming home from the conference, I incorporated all the notes I had – and the version that’s now published here at ardentperf.com is the latest & best version I’ve assembled so far.
An extraordinary number of postgres users rely on physical replication for high availability. It’s been around for a long time and it works well. Nonetheless, there are a few rough edges and over the years there have been various mailing list threads that haven’t fully been resolved.
I proposed a Friday unconference session on this topic, and the topic received enough votes to be selected. Notes from the unconference are available on the Postgres wiki. But the discussion extended far beyond the unconference; there were also hallway discussions over coffee (thanks Thomas Munro) and then continuing discussions over dinner at Joey Burrard and beers at Steamworks (thanks Ants Aasma).
The first question that everybody asks is “should postgres have more HA capabilities in core”? And a discussion starting along these lines consumed the first half of the unconference.
But I thought the most interesting train of
[...]This story started with a book gifted by a colleague. Reading Jimmy Angelakos' «PostgreSQL Mistakes and How to Avoid Them», I realised something that had been bugging me - in Postgres, the EXPLAIN command produces far too much information. The examples that authors typically present when discussing various aspects of database systems make it harder to analyse the problem at hand and distract the reader. That's how the idea of a post-processing for EXPLAIN output was born - to make query plans more readable and problem-focused.
The technology industry is entering a fascinating phase where many older ideas are suddenly becoming critically important again. Concepts that once lived mostly inside data governance discussions, library sciences, enterprise architecture diagrams, semantic web research, and metadata catalogs are now being pulled directly into the center of modern AI conversations. This shift is not accidental. As organizations move from AI experimentation to real enterprise deployment, they are discovering that intelligence does not come only from models. It also depends on how well the organization understands, organizes, governs, and connects its data.
Over the last two years, terms such as context engineering, AI memory, semantic retrieval, knowledge-driven agents, and context-aware systems have rapidly entered mainstream technology discussions. Vendors now describe platforms as intelligent because they can retrieve documents semantically, maintain conversational memory, or connect AI systems to enterprise knowledge. At first glance, this sounds entirely new. In reality, much of the industry is rediscovering a foundational truth that enterprise architects, database professionals, and information management practitioners have understood for years: AI systems cannot reason reliably without meaningful data.
This is not simply a model problem. It is fundamentally a data architecture problem. Large Language Models are extraordinarily capable at generating language, summarizing content, writing code, and answering questions conversationally. However, enterprises are increasingly discovering that language generation alone does not create trustworthy intelligence. An AI system may generate answers that sound convincing while still lacking operational correctness, governance awareness, relationship understanding, or contextual grounding.
This is precisely why concepts such as metadata, semantics, taxonom
[...]At the end of April, I had the incredible opportunity to represent credativ on the HOW2026 (Hello Open-source World), the PostgreSQL & IvorySQL Eco Conference in Jinan, China. IvorySQL is a Chinese fork of PostgreSQL with extended Oracle compatibility. In recent years, it has been gaining popularity not only in China. The conference brought together PostgreSQL and IvorySQL experts, contributors, and open-source database enthusiasts from around the world. The atmosphere was excellent, and I would like to share several highlights together with a recap of my contributions.
I was honored to contribute to the HOW2026 program with two sessions.
On Sunday afternoon, I led a three-hour deep-dive workshop exploring the intricate mechanics of PostgreSQL shared buffers. I discussed how Linux implements shared memory operations through the tmpfs file system, how shared buffers work internally, their practical limitations and usage patterns, why and how huge pages should be used for their allocation, why transparent huge pages negatively affect performance, and practical strategies for balancing high concurrency with optimal system performance.
On Tuesday afternoon, I gave a talk focused on PostgreSQL connection overhead. I discussed how PostgreSQL connections interact with the Linux kernel, covering memory usage, physical memory allocation mapping, context-switching overhead, and PostgreSQL internals to explain why large numbers of connections can significantly degrade performance.
PGConf.dev 2026 took place from May 19-22 2026, organized by
Program Committee:
Tuesday Planning Committee:
Code of Conduct Committee:
Volunteers:
Speakers (including Tuesday Community Sessions & Lightning Talks):
I recently wrote a post about WAL log shipping and how a standby built on log shipping is a great way to give data analysts production data without putting the primary at risk. Having access to the production data in this way is great, but it’s read-only. How can we create views of this data for better analytics work? I want to make the case today that Foreign Data Wrappers and Materialized Views can make a great solution – not only in accessing production Postgres data, but also working with other data sources.
Most people meet foreign data wrappers (FDWs) through a quick demo, and I’ve highlighted some of their features in previous conference talks. There is high novelty in being able to query MySQL from Postgres, but the reality is often that the latency between the local database and the foreign table can be pretty high. Sometimes, predicate push-down isn’t what you’d expect, and indexing may not be very transparent. In the end, setting up and managing FDWs may seem more work than it’s worth, and that’s a mistake. Used correctly, foreign tables are one of the most practical tools for analytics across heterogeneous data sources – especially when paired with materialized views.
Modern data rarely lives in one place:
Foreign tables give you a unified SQL interface, but under the hood, the query performance can be unpredictable as you may be forced to rely on another engine’s query planner (and in the case of that CSV data source, it might not even be indexed).
In other words, FDWs optimize developer experience, not query performance.
Instead of querying foreign tables directly in analytics workloads, we can opt to use FDWs as ingestion points, not as the servi
[...]
In earlier posts in this series we established that every heap tuple lives inside a strict 8KB page. Everything else is built on top of that hard limit: MVCC, HOT updates, and indexes that point at (page, line_pointer). And yet this still works:
CREATE TABLE docs (id int PRIMARY KEY, body jsonb);
INSERT INTO docs VALUES (1, (SELECT jsonb_agg(g) FROM generate_series(1, 100000) g));
That body value is somewhere north of half a megabyte. The heap page is still 8KB. Both statements are true at the same time, and the mechanism that makes them coexist is TOAST: The Oversized-Attribute Storage Technique.
The page is 8KB, but PostgreSQL starts shrinking tuples at around 2KB. The goal is to keep at least four tuples per page, so anything bigger triggers TOAST.
TOAST_TUPLE_THRESHOLD decides when the toaster runs; TOAST_TUPLE_TARGET is the size it tries to shrink the tuple down to. Both default to 2032 bytes on a standard 8KB build. The threshold is fixed at compile time; the target is per-table tunable via the toast_tuple_target storage parameter.
When an INSERT or UPDATE produces a tuple wider than the threshold, PostgreSQL doesn't wait to see whether the row would actually fail to fit on a page. It starts compressing and relocating attributes one at a time, and stops the moment the tuple drops below the target. A row only modestly over the line often gets away with one cheap compression pass; a much wider one walks the full procedure.
Only variable-length attributes are candidates. A bigint is always 8 bytes, so there is nothing to compress and nowhere to move it. A text, bytea, jsonb, numeric, or array, on the other hand, is a varlena: a length-prefixed structure PostgreSQL can freely rearrange.
Each variable-length column has a storage strategy: a flag in pg_attribute.attstorage that answers two yes/no questions for the toaster. Can this value be compressed? Can it be moved out-of-line? PostgreSQL picks a default
OUTER JOIN is a typical plague of ORM-based PostgreSQL configurations: the planner is still relatively poor at optimising it. At the same time, ORM frameworks — and 1C as a prominent example — often generate outer joins from standard templates, which opens the door to targeted optimisations. In this article, I dig into one such template — polymorphic reference resolution: what the pattern is, where it comes from (Rails, Django, Hibernate, Salesforce — not just 1C), how widespread it is, and why its structural properties make it possible to significantly speed up execution.
Think of the homepage of a typical online store — say, Amazon. Personally, on my homepage, I see ads for several products:
An air fryer
Magnesium Citrate + B6
Protein brownies
and so on.
Thinking like a database developer, I can roughly imagine that this page is built from the results of a query like:
SELECT name, description FROM products p
LEFT JOIN kitchen_appliances ka ON (p.id = ka.id)
LEFT JOIN pharmacy f ON (p.id = f.id)
LEFT JOIN sports_nutrition sf ON (p.id = sf.id)
...
ORDER BY p.popularity DESC
LIMIT N
Planning such a query efficiently is no easy task — and in my experience, this is confirmed by user reports from the 1C world, since PostgreSQL is currently not rich in LEFT JOIN optimisations. At the same time, the properties of this pattern enable the development of various techniques to improve execution efficiency. I've managed to implement several straightforward optimisations of this template. But first, let's understand what polymorphic references actually are, where they come from, and how common they really are. That's the gap I'm trying to fill with this post.
Many real-world data models contain references that can point to one of several target entity types. An order line may reference a physical product, a digital download, a gift card, or a subscrip
[...]Howto guide for upgrading PostgreSQL from version 17 to 18 on Ubuntu, after its upgrade from version 25.10 (Questing Quokka) to 26.04 (Resolute Raccoon).
If you've ever compared database feature matrices, you may have noticed something a bit peculiar. Oracle has Transparent Data Encryption. SQL Server has it. MySQL has it. Even MariaDB has it. But Postgres, which we all consider the best database engine? Conspicuously absent.It’s not that nobody wants TDE. Compliance frameworks like PCI DSS and HIPAA practically demand encryption at rest. Cloud deployments make the “stolen disk” threat model more tangible than ever. And the question comes up constantly on mailing lists, at conferences, and in every database evaluation checklist ever assembled by a procurement department. So what gives?It’s complicated. The real answer involves nearly a decade of mailing list threads, competing proposals, fundamental disagreements about threat models, and a problem scope so vast it makes most contributors quietly back away. Let’s trace the history and find out why the elephant in the room is still unencrypted.
On 12 May, 2026 the San Francisco Bay Area PostgreSQL Meetup Group met virtually, organized by Katharine Saar, Stacey Haysler and Christophe Pettus. Alex Yarotsky spoke at the event.
The Swiss PGDay Program Committee met to finalize the schedule:
Community Blog Post:
If you've used docs.pgedge.com recently, you've probably met Ellie. Ask her how to set up multi-master replication, or what port the MCP Server listens on, and she pulls the relevant documentation, assembles it into context, and gives you a grounded answer with source citations. She doesn't guess or hallucinate. She finds the actual docs and synthesizes an answer from them.Ellie is a RAG Server deployment. The pgEdge RAG Server is an API server for retrieval-augmented generation that runs hybrid search (vector similarity plus BM25) over content stored in Postgres, then sends matches to an LLM for grounded answers.pgEdge Cloud Deploy it alongside your database, point it at your tables, and your application gets the same pipeline Ellie uses, running against your data.Prefer to run it yourself? The RAG Server is 100% open source under the PostgreSQL License and available on GitHub. It's a single Go binary you point at any PostgreSQL 14+ database with pgvector installed, configured via YAML, with your own API keys for OpenAI, Anthropic, Voyage, or local Ollama. Hybrid search (vector similarity plus BM25), token budgets, and streaming responses are all in the binary. The Agentic AI Toolkit FAQ has more on how it fits with the rest of the toolkit, including Vectorizer and Docloader.
Number of posts in the past two months
Number of posts in the past two months
Get in touch with the Planet PostgreSQL administrators at planet at postgresql.org.