Latest Blog Posts

Open Source, Open Nerves
Posted by Vibhor Kumar on 2026-02-27 at 10:22

Trust, Governance, Talent, and the Enterprise Reality of PostgreSQL

Last year at the CIO Summit Mumbai, I had the opportunity to participate in a leadership roundtable with CIOs across banking, fintech, telecom, manufacturing, and digital enterprises.

The session was not a product showcase.

It wasn’t a benchmarking debate.

It wasn’t even primarily about technology.

It was about risk.

Specifically, the evolving role of open source — and particularly PostgreSQL — inside mission-critical enterprise environments.

Over the past week, I revisited those conversations in a LinkedIn series titled “Open Source, Open Nerves.” This blog expands on that series, capturing the deeper strategic undercurrents that surfaced in that room — and why they matter even more today.


The Evolution of the Conversation

There was a time when open source debates revolved around performance and cost. That time has passed.

PostgreSQL has proven itself across:

  • Core banking workloads
  • Telecom billing engines
  • Financial risk platforms
  • Regulatory reporting systems
  • AI-driven analytics platforms
  • Hybrid and multi-cloud architectures

No one in the room questioned whether PostgreSQL could handle enterprise-grade workloads.

The real conversation had shifted.

From capability

to accountability.


1. Trust Is the Real Architecture

One recurring sentiment defined the tone of the discussion:

“Power is no longer the question. Trust is.”

CIOs are not evaluating features in isolation. They are evaluating consequences.

When PostgreSQL becomes the backbone of a regulated enterprise system, the stakes include:

  • Revenue continuity
  • Regulatory exposure
  • Customer trust
  • Brand reputation
  • Executive accountability

Trust in this context has multiple dimensions:

Operational Trust

Will it stay up under stress?

Will failover behave as designed?

Will replication hold during peak load?

Security Trust

Is the

[...]

How Patroni Brings High Availability to Postgres
Posted by Shaun Thomas in pgEdge on 2026-02-27 at 05:33

Let’s face it, there are a multitude of High Availability tools for managing Postgres clusters. This landscape evolved over a period of decades to reach its current state, and there’s a lot of confusion in the community as a result. Whether it’s Reddit, the Postgres mailing lists, Slack, Discord, IRC, conference talks, or any number of venues, one of the most frequent questions I encounter is: How do I make Postgres HA?My answer has been a steadfast “Just use Patroni,” since about 2017. Unless something miraculous happens in the Postgres ecosystem, that answer is very unlikely to change. But why? What makes Patroni the “final answer” when it comes to Postgres and high availability? It has a lot to do with how Patroni does its job, and that’s what we’ll be exploring in this article.

The elephant in the room

By itself, Postgres is not a cluster in the sense most people visualize. They may envision a sophisticated mass of interconnected servers, furiously blinking their lights at each other, aware of each computation the others make, ready to take over should one fail. In reality, the “official” use of the word “cluster” in the Postgres world is just one or more databases associated with a single Postgres instance. It’s right in the documentation for Creating a Database Cluster.“A database cluster is a collection of databases that is managed by a single instance of a running database server.”The concept of multiple such instances interacting is so alien to Postgres that it didn’t even exist until version 9.0 introduced Hot Standbys and streaming replication back in 2010. And how do hot standby instances work? The same way as the primary node: they apply WAL pages to the backend heap files. Those WAL pages may be supplied from archived WAL files or by streaming them from the primary itself, but it’s still just continuous crash recovery by another name.This matters because each Postgres node still knows little to nothing about other nodes in this makeshift cluster over 15 years later. This isn’t necessarily a p[...]

PostgreSQL Statistics: Why queries run slow
Posted by Radim Marek on 2026-02-26 at 23:01

Every query starts with a plan. Every slow query probably starts with a bad one. And more often than not, the statistics are to blame. But how does it really work? PostgreSQL doesn't run the query to find out — it estimates the cost. It reads pre-computed data from pg_class and pg_statistic and does the maths to figure out the cheapest path to your data.

In ideal scenario, the numbers read are accurate, and you get the plan you expect. But when they are stale, the situation gets out of control. Planner estimates 500 rows, plans a nested loop, and hits 25,000. What seemed as optimal plan turns into a cascading failure.

How do statistics get stale? It can be either bulk load, a schema migration, faster-than-expected growth, or simply VACUUM not keeping up. Whatever the cause, the result is the same. The planner is flying blind. Choosing paths based on reality that no longer exists.

In this post we will go inside the two catalogs the planner depends on, understand what ANALYZE actually gets for you from a 30,000-row table, and see how those numbers determine whether your query takes milliseconds or minutes.

Sample schema

For demonstration purposes we will use the same schema as in the article Reading Buffer statistics in EXPLAIN output.

CREATE TABLE customers (
    id integer GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    name text NOT NULL
);

CREATE TABLE orders (
    id integer GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    customer_id integer NOT NULL REFERENCES customers(id),
    amount numeric(10,2) NOT NULL,
    status text NOT NULL DEFAULT 'pending',
    note text,
    created_at date NOT NULL DEFAULT CURRENT_DATE
);

INSERT INTO customers (name)
SELECT 'Customer ' || i
FROM generate_series(1, 2000) AS i;

INSERT INTO orders (customer_id, amount, status, note, created_at)
SELECT
    (random() * 1999 + 1)::int,
    (random() * 500 + 5)::numeric(10,2),
    (ARRAY['pending','shipped','delivered','cancelled'])[floor(random()*4+1)::int],
    CASE WHEN random() < 0.3 THEN 'Some note text here
[...]

A reponsible role for AI in Open Source projects?
Posted by Alastair Turner in Percona on 2026-02-26 at 14:00

AI-driven pressure on open source maintainers, reviewers and, even, contributors, has been very much in the news lately. Nobody needs another set of edited highlights on the theme from me. For a Postgres-specific view, and insight on how low quality AI outputs affect contributors, Tomas Vondra published a great post on his blog recently, which referenced an interesting talk by Robert Haas at PGConf.dev in Montreal last year. I won’t rehash the content here, they’re both quite quick reads and well worth the time.

The real cost of random I/O
Posted by Tomas Vondra on 2026-02-26 at 13:00

The random_page_cost was introduced ~25 years ago, and since the very beginning it’s set to 4.0 by default. The storage changed a lot since then, and so did the Postgres code. It’s likely the default does not quite match the reality. But what value should you use instead? Flash storage is much better at handling random I/O, so maybe you should reduce the default? Some places go as far as recommending setting it to 1.0, same as seq_page_cost. Is this intuition right?

Postgres JSONB Columns and TOAST: A Performance Guide
Posted by Paul Ramsey in Crunchy Data on 2026-02-25 at 15:05

Postgres has a big range of user-facing features that work across many different use cases — with complex abstraction under the hood. 

Working with APIs and arrays in the jsonb type has become increasingly popular recently, and storing pieces of application data using jsonb has become a common design pattern.

But why shred a JSON object into rows and columns and then rehydrate it later to send it back to the client?

The answer is efficiency. Postgres is most efficient when working with rows and columns, and hiding data structure inside JSON makes it difficult for the engine to go as fast as it might.

JSONB in Postgres

How does JSON work in a database like Postgres that is optimized for rows and columns?

Like the text, bytea and geometry types, the jsonb type is "variable length" — there is no limit to how big it can be.

Under the covers, the PostgreSQL database stores all data in fixed-size 8-KB pages. But how can a data type with no size limit, like jsonb, exist in a database with such a small fixed storage size limit?

It does this by using The Oversize Attribute Storage Technique, aka "TOAST."

Ordinarily, all the attributes in a row can fit inside a page.

But sometimes, one or more of the attributes are too big to fit.

Under the covers, PostgreSQL quietly cuts up the big attributes, puts them in a side table and replaces them in the original page with a unique identifier.

So even for large attributes, the user does not have to do anything special to store them. The database abstraction remains intact. Or does it?

JSONB column types are convenient, but are they fast?

We are going to test JSONB performance for differently sized documents. The function generate_item_json will generate a jsonb object of arbitrary size.

The item_description can be expanded to make the object too big to fit on a page. This example generates a JSON object with a 40-byte description.

Here we create a 10,000-row table with four columns, a key, a name and price, and the o

[...]

February Meetup: slides and recording are available!
Posted by Henrietta Dombrovskaya on 2026-02-25 at 11:30

Thank you, Shaun, for presenting, and huge thanks to all participants for an engaging and productive discussion!

As always, I am glad that people from all over the world can join us virtually, but if you are local, consider coming next time! We have pizza, and you can’t consume it virtually!

Semantic Caching in PostgreSQL: A Hands-On Guide to pg_semantic_cache
Posted by Muhammad Aqeel in pgEdge on 2026-02-25 at 06:03

Your LLM application is probably answering the same question dozens of times a day. It just doesn't realize it because the words are different each time.

The Problem with Exact-Match Caching

If you're running an AI-powered application like a chatbot, a RAG pipeline, an analytics assistant, or others, you've likely added a cache to cut down on expensive LLM calls. Most caches work by matching the exact query string. Same string, cache hit. Different string, cache miss.The trouble is that humans don't repeat themselves verbatim. These three queries all want the same answer:A traditional cache sees three unique strings and triggers three separate LLM calls. In production AI applications, research shows that 40-70% of all queries are semantic duplicates: different words, same intent. That translates directly into wasted API calls, wasted latency, and a bloated cloud bill.Semantic caching fixes this by matching on meaning instead of text. It uses vector embeddings to recognize that "Q4 revenue" and "last quarter's sales" are asking for the same thing, and serves the cached result in milliseconds instead of making another round trip to the LLM.pg_semantic_cache is a PostgreSQL extension that brings this capability directly into your database. In this post, we'll set it up from scratch in a Docker container using pgEdge Enterprise Postgres 17 and walk through working examples you can run yourself.

What You'll Build

By the end of this post, you'll have:
  • A Docker container running pgEdge Enterprise Postgres 17 with pgvector and pg_semantic_cache.
  • A working semantic cache that matches queries by meaning.
  • Hands-on experience with caching and retrieval.
  • A clear understanding of how semantic similarity matching works in practice.

Setting Up the Environment

For our example, we'll use a Rocky Linux 9 container with pgEdge Enterprise Postgres 17, which bundles pgvector out of the box.

Dockerfile

First, we create a file called Dockerfile that defines the content of our container:

Bui

[...]

Create your PostgreSQL clusters with the "builtin" C collation!
Posted by Laurenz Albe in Cybertec on 2026-02-24 at 05:18

Finally rediscovered: the original design document for a stable database collation 1) Communist Collation: all strings are equal (PostgreSQL = MySQL = Oracle) 2) Composite Collation: order by number of characters (PostgreSQL > MySQL = Oracle) 3) C Collation: byte-wise comparison of encoded string (PostgreSQL > Oracle > MySQL

A while ago, I wrote about the index corruption that you can get after an operating system upgrade, and recently I detailed how to keep the pain of having to rebuild indexes minimal. Since this is an embarrassing problem that keeps resurfacing, here is my recommendation on how to avoid the problem entirely by using the C collation.

Management summary of PostgreSQL's collation problem

For those who cannot be bothered to follow the links above: PostgreSQL by default uses the locale support provided by external libraries, either the C library or the ICU library. One aspect of locale is the collation, the rules to compare and sort strings. Upgrading the operating system will upgrade the C and ICU libraries. Sometimes such upgrades will change the collation rules. As a consequence, indexes on string expressions may suddenly end up sorted in the wrong way, which means index corruption.

To deal with the problem, you have to rebuild affected indexes after an operating system upgrade.

Why does PostgreSQL have this problem, when other databases don't?

There are two reasons:

  • Many other databases implement their own collations, so they can avoid breaking changes. When commit 5b1311acfb introduced locale support in 1997, it simply saved development effort by using an existing implementation. In retrospect, this may not have been the best decision.
  • Many other databases use the C collation by default, which avoids the problem altogether. PostgreSQL takes the default value for the locale from the environment of the shell where you run initdb to create the cluster. As a consequence, you often end up using a natural language collation with PostgreSQL.

Why are natural language collations subject to changes, while the C collation is stable?

The pleasant simplicity of the C collation

The C collation is very simple: strings are compared byte by byte, and the numerical value determines the order. That means that a simple call to memcmp() can determine the sorting order of two strings

[...]

Row Locks With Joins Can Produce Surprising Results in PostgreSQL
Posted by Haki Benita on 2026-02-23 at 22:00

Here's a database riddle: you have two tables with data connected by a foreign key. The foreign key field is set as not null and the constraint is valid and enforced. You execute a query that joins these two tables and you get no results! How is that possible? We thought it wasn't possible, but a recent incident revealed an edge case we never thought about.

In this article I show how under some circumstances row locks with joins can produce surprising results, and suggest ways to prevent it.

Weird Join<br><small>Image by abstrakt design</small>
Weird Join
Image by abstrakt design
Table of Contents

The Problem

Imagine you work in the DMV and you are in charge of managing car ownership. You have two tables:

db=# CREATE TABLE owner (
    id int PRIMARY KEY,
    name text NOT NULL
);
CREATE TABLE

db=# CREATE TABLE car (
    id int PRIMARY KEY,
    owner_id int NOT NULL,
        CONSTRAINT car_owner_id_fk FOREIGN KEY (owner_id) REFERENCES owner(id)
);
CREATE TABLE

Add a car and some potential owners:

db=# INSERT INTO owner (id, name) VALUES
    (1, 'haki'),
    (2, 'jerry'),
    (3, 'george')
RETURNING *;
 id  name
────┼───────
  1  haki
  2  jerry
  3 | george
(3 rows)

INSERT 0 2

db=# INSERT INTO car (id, owner_id) VALUES(1, 1) RETURNING *;
 id  owner_id
────┼──────────
  1         1
(1 row)

INSERT 0 1

You have three owners - "haki", "jerry" and "george", and a single car with id 1 which is currently owned by "haki".

Changing Owner

[...]

Contributions for week 7, 2026
Posted by Cornelia Biacsics in postgres-contrib.org on 2026-02-23 at 08:48

New podcast episode “Why it's fun to hack on Postgres performance“ with Tomas Vondra published on February 20 2026 by Claire Giordano and Aaron Wislang from the series “Talking Postgres”.

Hyderabad PostgreSQL User Group met on February 20, organized by Hari Kiran, Ameen Abbas and Rajesh Madiwale.

Speaker:

  • Ameen Abbas
  • Soqra Banu(sughra) Rumi
  • Ashoka Reddy Tatiparthi
  • Vikas Gupta

The programme committee for PGConf.dev 2026 finalized a part of the conference schedule for Wednesday, Thursday and Friday.

  • Melanie Plageman (Microsoft)
  • Dilip Kumar (Google)
  • Jonathan Katz (Databricks)
  • Paul Ramsey (Snowflake)
  • Jacob Champion (EDB) (Tuesday sessions will follow).

PGConf.de 2026 talk selection committee met to finalize the session list:

  • Christoph Berg
  • Josef Machytka
  • Olga Kramer
  • Polina Bungina
  • Priyanka Chatterjee

Openclaw is Spam, Like Any Other Automated Email
Posted by Jeremy Schneider on 2026-02-23 at 01:23

Open Source communities are trying to quickly adapt to the present rapid advances in technology. I would like to propose some clarity around something that should be common sense.

Automated emails are spam. They always have been. Openclaw (and whatever new thing surfaces this summer) is no different.

Policies saying automated emails/messages are banned – including anything AI generated – are not only common-sense policies, they aren’t even a change from how we’ve always worked. This includes automated comments on github issues, automated PRs, automated patch submissions, and even any kind of automated review. Copilot automated reviews, snyk, etc – are ok if and only if it’s configured by the owners of the repo/project. Common sense.

Enforcement of these policies – more than ever – depends on trust and relationships. I do think, for example, that non-native-english-speakers should be allowed to use AI to help them check their english. Used responsibly, AI tools can help a lot with language learning! Your grammar checker is probably using some kind of LLM anyway. But I’m saying that a human always presses the “send” button on the message, and this human is responsible for the words they sent. If moderators suspect automated messages, every open source project should have a policy they can cite for blocking/banning the account.

Tomas Vondra’s article “the AI inversion” is the latest of many good and thought-provoking pieces I’ve read – it’s well worth the read – although he’s getting at deeper problems than what I’m writing about here – and he has very good reasons to have a much deeper level of concern for the impact of AI tooling on open source communities. These are interesting times and we don’t have all the answers yet.

A few more things I’ve recently read, which I think are good:

[...]

What Does INSERT 0 1 Actually Tell You?
Posted by Lætitia AVROT on 2026-02-23 at 00:00
If you’ve ever run an insert statement in a terminal or an IDE, you’ve seen it: the cryptic insert 0 1 message. While it looks like a bit of ancient binary, it’s actually a precise status report from the database engine. The Anatomy of a Command Tag 🔗In PostgreSQL, every successful command returns a “Command Tag.” For an insertion, the format is: INSERT [oid] [rows] The “0” (oid1): Historically, Postgres could assign an internal Object ID to every row.

Upgrading PostgreSQL in place on FreeBSD
Posted by Dan Langille on 2026-02-22 at 16:11

I’ve updated one of my PostgreSQL instances to PostgreSQL 18, it’s time to update the others. This time, I’m going to try pg_update. My usual approach is pg_dump and pg_restore.

As this is my first attempt doing this, I’m posting this mostly for future reference when I try this again. There will be another blog post when I try this again. Which should be soon. This paragraph will link to that post when it is available.

In this post:

  • FreeBSD 15.0
  • PostgreSQL 16.12 (pg03)
  • PostgreSQL 18.2 (pg02)

The names in (brackets) are the names of the jail in question.

If you’re upgrading in place, and not copying data around like me, skip down until you see Saving the old binaries.

I’m reading http://www.unibia.com/unibianet/freebsd/upgrading-between-major-versions-postgresql-freebsd and thinking this might work well for me.

The overview of upgrade-in-place

The PostgreSQL upgrade-in-place needs these main parts:

  1. The old binaries (e.g. postgresql16-server-16.12.pkg)
  2. The new binaries (postgresql18-server-18.2.pkg)
  3. The old data (/var/db/postgres/data16)

Keep that in mind as I go through this. We can’t install both packages at once, so we’ll untar the old package into a safe location.

How you get that package: up to you. Try /var/cache/pkg, or the FreeBSD package servers, or (while you still have the old package), run pkg create postgresql16-server (for example).

My data

Ignore this section if you have the data. For me, I’m testing this process, and I’m documenting this part here.

This is how the data is laid out. My idea: snapshot line 7 and use it in line 12.

[18:23 r730-01 dvl ~] % zfs list | grep pg
data02/jails/pg01                                                   34.9G   175G  10.8G  /jails/pg01
data02/jails/pg02                                                   12.7G   175G  11.6G  /jails/pg02
data02/jails/pg03                                                   11.5G   175G  10.8G  /jails/pg03
data03/pg01                                   
[...]

Waiting for PostgreSQL 19 – Allow log_min_messages to be set per process type
Posted by Hubert 'depesz' Lubaczewski on 2026-02-21 at 19:05
On 9th of February 2026, Álvaro Herrera committed patch: Allow log_min_messages to be set per process type   Change log_min_messages from being a single element to a comma-separated list of type:level elements, with 'type' representing a process type, and 'level' being a log level to use for that type of process. The list must also … Continue reading "Waiting for PostgreSQL 19 – Allow log_min_messages to be set per process type"

Waiting for PostgreSQL 19 – psql: Add %i prompt escape to indicate hot standby status.
Posted by Hubert 'depesz' Lubaczewski on 2026-02-21 at 14:51
On 3rd of February 2026, Fujii Masao committed patch: psql: Add %i prompt escape to indicate hot standby status.   This commit introduces a new prompt escape %i for psql, which shows whether the connected server is operating in hot standby mode. It expands to standby if the server reports in_hot_standby = on, and primary … Continue reading "Waiting for PostgreSQL 19 – psql: Add %i prompt escape to indicate hot standby status."

Per-worker, and global, IO bandwidth in explain plans
Posted by Hubert 'depesz' Lubaczewski on 2026-02-21 at 13:24
Jeremy Schneider suggested a change to how plans are displayed – adding another bit of information in case we have timing information for IO for explain node. Took me a while to research, but it finally made it's way… Let's consider this simple plan. In it's Parallel Seq Scan node we see: -> Parallel Seq … Continue reading "Per-worker, and global, IO bandwidth in explain plans"

Fixing ORM Slowness by 80% with Strategic PostgreSQL Indexing
Posted by Hamza Sajawal in Stormatics on 2026-02-20 at 11:41

Modern applications heavily rely on ORMs (Object-Relational Mappers) for rapid development. While ORMs accelerate development, they often generate queries that are not fully optimized for database performance. In such environments, database engineers have limited control over query structure, leaving indexing and database tuning as the primary performance optimization tools.

In this article, I’ll share how we improved PostgreSQL performance dramatically for one of our enterprise customers by applying strategic indexing techniques,without modifying application queries.

The Challenge: High Read IOPS and Slow Query Performance

One of our customers experienced severe performance degradation, including:

  • High Read IOPS on the database
  • Slow page loads and delayed reports
  • Increasing database load during peak hours

After analyzing PostgreSQL configuration parameters, we confirmed that:

  • Memory parameters were properly tuned
  • Autovacuum was functioning correctly
  • Hardware resources were sufficient

However, performance issues persisted.

Since the application relied entirely on ORM-generated queries, rewriting queries was not an option. We needed a solution at the database level.

Root Cause Analysis: Excessive Sequential Scans

We analyzed PostgreSQL statistics using:

  • pg_stat_user_tables
  • pg_stat_user_indexes
  • pg_constraint
  • pg_index

We discovered extremely high sequential scans on large tables—some exceeding 41 million scans.

Sequential scans on large tables significantly increase disk I/O and slow query execution.

The primary reason: Missing indexes on foreign key columns and freq

[...]

The AI inversion
Posted by Tomas Vondra on 2026-02-20 at 10:00

If you attended FOSDEM 2026, you probably noticed discussions on how AI impacts FOSS, mostly in detrimental ways. Two of the three keynotes in Janson mentioned this, and I assume other speakers mentioned the topic too. Moreover, it was a very popular topic in the “hallway track.” I myself chatted about it with multiple people, both from the Postgres community and outside of it. And the experience does not seem great …

Inside PostgreSQL's 8KB Page
Posted by Radim Marek on 2026-02-19 at 21:45

If you read previous post about buffers, you already know PostgreSQL might not necessarily care about your rows. You might be inserting a user profile, or retrieving payment details, but all that Postgres works with are blocks of data. 8KB blocks, to be precise. You want to retrieve one tiny row? PostgreSQL hauls an entire 8,192-byte page off the disk just to give it to you. You update a single boolean flag? Same thing. The 8KB page is THE atomic unit of I/O.

But knowing those pages exist isn't enough. To understand why the database behaves the way it does, you need to understand how it works. Every time you execute INSERT, PostgreSQL needs to figure out how to fit it into one of those 8,192-byte pages.

The buffer pool caches them, Write-Ahead Log (WAL) protects them, and VACUUM cleans them. The deep dive into the PostgreSQL storage internals starts by understanding what happens inside those 8KB pages. Pages that are used by PostgreSQL to organize all data - tables, indexes, sequences, TOAST relations.

The 8KB

In case of Oracle, the default block size is set at database creation (DB_BLOCK_SIZE), though tablespaces with non-standard block sizes can be created separately.
Before looking inside we actually need to discuss why 8KB in first place? The answer might be surprising - it's a setting that survived for over 40 years and nobody found a good reason to change it. Plus PostgreSQL isn't the only one who thinks that way. Oracle and SQL Server use the same exact number.

The 8KB page size can be traced down to original Berkley POSTGRES project created in mid-1980s. In those times Unix systems typically used 4KB or 8KB virtual memory pages, and disk sectors were 512 bytes. Choosing 8KB meant a single database page mapped cleanly to OS memory pages and aligned well with filesystem I/O.

And the math still works today. Modern Linux kernels manage memory in 4KB virtual memory pages. SSDs now use 4KB physical sectors instead of 512 bytes. The default filesystem block size on ext4 and XFS is 4KB.

[...]

Building Ask Ellie: A RAG Chatbot Powered by pgEdge
Posted by Dave Page in pgEdge on 2026-02-19 at 07:59

If you've visited the pgEdge documentation site recently, you may have noticed a small elephant icon in the bottom right corner of the page. That's Ask Ellie; our AI-powered documentation assistant, built to help users find answers to their questions about pgEdge products quickly and naturally. Rather than scrolling through pages of documentation, you can simply ask Ellie a question and get a contextual, accurate response drawn directly from our docs.What makes Ellie particularly interesting from an engineering perspective is that she's built on PostgreSQL and pgEdge's ecosystem of extensions and tools, and she serves as both a useful tool for our users and a real-world demonstration of what you can build on top of PostgreSQL when you pair it with the right components. In this post, I'll walk through how we built her and the technologies that power the system.

The Architecture at a Glance

At its core, Ask Ellie is a Retrieval Augmented Generation (RAG) chatbot. For those unfamiliar with the pattern, RAG combines a traditional search step with a large language model to produce answers that are grounded in actual source material, rather than relying solely on the LLM's training data. This is crucial for a documentation assistant, because we need Ellie to give accurate, up-to-date answers based on what's actually in our docs, not what the model happens to remember from its training set.The architecture breaks down into several layers:
  • Content ingestion
  • : crawling and loading documentation into PostgreSQL
  • Embedding and chunking
  • : automatically splitting content into searchable chunks and generating vector embeddings
  • Retrieval and generation
  • : finding relevant chunks for a user's query and generating a natural language response
  • Frontend
  • : a chat widget embedded in the documentation site that streams responses back to the user
Let's look at each of these in turn.

Loading the Documentation

The first challenge with any RAG system is getting your content into a form[...]

Hacking Workshop for March 2026
Posted by Robert Haas in EDB on 2026-02-19 at 02:43

For next month's hacking workshop, I'm scheduling 2 or 3 discussions of Tomas Vondra's talk, Performance Archaeology, given at 2024.PGConf.EU. If you're interested in joining us, please sign up using this form and I will send you an invite to one of the sessions. Thanks as always to Tomas for agreeing to attend the sessions.

Read more »

PostgreSQL minor release postponed in Q1’ 2026
Posted by Jan Wieremjewicz in Percona on 2026-02-18 at 11:00

In case you are awaiting the February PostgreSQL Community minor update released on plan on February 12 we want to make sure that our users and customers are up to date and aware of what to expect.

This scheduled PostgreSQL release was delivered by the PostgreSQL Community on time and came carrying 5 CVE fixes and over 65 bugs bug fixes.

Unfortunately shortly after, the release team announced that an additional out of cycle release is planned for February 26. This follow up release addresses two regressions identified in the February 12 update.

Lessons Learned Writing an MCP Server for PostgreSQL
Posted by Dave Page in pgEdge on 2026-02-18 at 06:44

Over the past few months or so, we've been building the pgEdge Postgres MCP Server, an open source tool that lets LLMs talk directly to PostgreSQL databases through the Model Context Protocol. It supports Claude, GPT, local models via Ollama, and pretty much any MCP-compatible client you can throw at it. Along the way, we've learned quite a lot about what it takes to make AI and databases work well together, and the single biggest lesson has been about tokens.If you've used an LLM for any length of time, you'll know that context windows are finite and tokens cost money. When you're working with a database, however, the problem becomes acute in a way that catching up on email or writing prose simply doesn't prepare you for. A single  on a modest table can return tens of thousands of rows, each with a dozen columns, and every character of that output consumes tokens. Multiply that across a conversation where the LLM is exploring a schema, running queries, and refining its understanding, and you can burn through a context window before anything genuinely useful has been accomplished.This post covers the strategies we developed to keep token usage under control whilst still giving the LLM enough information to be helpful. If you're building an MCP server, or just curious about the practicalities of connecting LLMs to structured data, I hope some of these lessons will save you a few wrong turns.

Choosing the right output format for tabular data

When we first built the  tool, we returned results as JSON. It seemed like the obvious choice since every LLM understands JSON and it's what most APIs speak natively. The problem became apparent almost immediately: JSON is extraordinarily wasteful for tabular data.Consider a simple query returning employee records. In JSON, every single row repeats the column names as keys, wraps every value in quotes, and adds colons, commas, and braces as structural overhead. For a table with columns , , , and , a ten-row result might look something like this:Every row carries the full[...]

PostgreSQL 19: part 2 or CommitFest 2025-09
Posted by Pavel Luzanov in Postgres Professional on 2026-02-18 at 00:00

We continue our series of articles reviewing changes in PostgreSQL 19. This time we'll look at what emerged from the September 2025 CommitFest.

The highlights from the first July CommitFest are available here: 2025-07.

  • GROUP BY ALL
  • Window functions: NULL value handling
  • Event triggers in PL/Python
  • More precise error message for incorrect routine parameter names
  • random: random date/time within a specified range
  • base64url format for encode and decode functions
  • New debug_print_raw_parse parameter
  • The log_lock_waits parameter is now enabled by default
  • pg_stat_progress_basebackup: backup type
  • vacuumdb: collecting statistics on partitioned tables
  • Buffer cache: using clock-sweep algorithm to find free buffers
  • Fake table aliases in queries

...

pg_clickhouse v0.1.4
Posted by David Wheeler on 2026-02-17 at 22:24

Just a quick post to note the release of pg_clickhouse v0.1.4. This v0.1 maintenance release can be upgraded in-place and requires no ALTER EXTENSION UPDATE command; as soon as sessions reload the shared library they’ll be good to go.

Thanks in part to reports from attentive users, v0.1.4’s most significant changes improve the following:

  • The binary driver now properly inserts NULL into a Nullable(T) column. Previously it would raise an error.
  • The http driver now properly parses arrays. Previously it improperly included single quotes in string items and would choke on brackets ([]) in values.
  • Both drivers now support mapping a ClickHouse String types to Postgres BYTEA columns. Previously the worked only with text types, which is generally preferred. But since ClickHouse explicitly supports binary data in String values (notably hash function return values), pg_clickhouse needs to support it, as well.

Get it in all the usual places:

My thanks to pg_clickhouse users like Rahul Mehta for reporting issues, and to my employer, ClickHouse, for championing this extension. Next up: more aggregate function mapping, hash function pushdown, and improved subquery (specifically, SubPlan) pushdown.

PostgreSQL 19: part 1 or CommitFest 2025-07
Posted by Pavel Luzanov in Postgres Professional on 2026-02-17 at 00:00

We’re launching a new series of articles covering the changes coming up in PostgreSQL 19. This first article focuses on the events from last summer’s July CommitFest.

  • Connection service file in libpq parameter and psql variable
  • regdatabase: a type for database identifiers
  • pg_stat_statements: counters for generic and custom plans
  • pg_stat_statements: FETCH command normalization
  • pg_stat_statements: normalizing commands with parameter lists in IN clauses
  • EXPLAIN: Memoize node estimates
  • btree_gin: comparison operators for integer types
  • pg_upgrade: optimized migration of large objects
  • Optimized temporary table truncation
  • Planner: incremental sort in Append and MergeAppend nodes
  • Domain constraint validation no longer blocks DML operations
  • CHECKPOINT command parameters
  • COPY FROM: skipping initial rows
  • pg_dsm_registry_allocations: dynamic shared memory (DSM) usage

...

pg_background: make Postgres do the long work (while your session stays light)
Posted by Vibhor Kumar on 2026-02-16 at 16:25

There’s a special kind of joy in watching a database do something heavy… without making your app threads cry.

That’s the promise of pg_background: execute SQL asynchronously in background worker processes inside PostgreSQL, so your client session can move on—while the work runs in its own transaction. 

It’s a deceptively simple superpower:

  • Kick off a long-running query (or maintenance) without holding the client connection open
  • Run “autonomous transaction”-style side effects (commit/rollback independent of the caller)
  • Monitor, wait, detach, or cancel explicitly
  • Keep the operational model “Postgres-native” instead of adding another job system  

What pg_background is (and what it isn’t)

pg_background enables PostgreSQL to execute SQL commands asynchronously in dedicated background worker processes. Unlike approaches such as opening another connection (e.g., dblink) or doing client-side async orchestration, these workers run inside the server with local resources and their own transaction scope. 

It’s not a full-blown scheduler. It’s not a queueing platform. It’s a sharp tool: “run this SQL over there, and let me decide how to interact with it.”

Why you’d want this in production

I’m opinionated here: most teams don’t need more moving parts. They need fewer, with better failure modes.

pg_background shines when you need:

  • Non-blocking operations: launch long queries without pinning client connections  
  • Autonomous transactions: commit/rollback independently of the caller’s transaction  
  • Resource isolation & safer cleanup: worker lifecycle is explicit (launch / result / detach / cancel / wait)  
  • Server-side observability: list workers and see state/errors (v2)  

Common production patterns:

  • background VACUUM / ANALYZE / REINDEX
  • async backfills or data repairs
  • fire-and-forget audit/outbox writes
  • “do the expensive part later” workflows (especially in OLTP-heavy apps)

The big deal: v2 API (recomme

[...]

The MCD: Your Rosetta Stone for Turning "We Need a Database" into Actual Requirements
Posted by Lætitia AVROT on 2026-02-16 at 00:00
Part 2 of the MERISE series New here? Part 1 covers why MERISE exists and how it differs from ER modeling. Short version: it’s a conversation framework that forces business stakeholders to articulate their actual rules before you write a single line of SQL. Last time, I introduced MERISE and why this French methodology kicks ass for database modeling. Today, we’re diving into the first and most crucial step: the Conceptual Data Model (MCD).

Top posters

Number of posts in the past two months

Top teams

Number of posts in the past two months

Feeds

Planet

  • Policy for being listed on Planet PostgreSQL.
  • Add your blog to Planet PostgreSQL.
  • List of all subscribed blogs.
  • Manage your registration.

Contact

Get in touch with the Planet PostgreSQL administrators at planet at postgresql.org.