Planet PostgreSQL

Jimmy Angelakos: FOSDEM 2026 — Defining "Drop-in Replacement" and Beyond

Fri, 06 Feb 2026 18:00:00 +0000

Back from Brussels where I was doing the annual pilgrimage to the awesome FOSDEM gathering. I was very pleased to see the popularity and positive vibe of the (first time) joint Databases Devroom. Community-oriented and community-run conferences are the best IMHO.

It was great to share the stage this time with Daniël van Eeden, an engineer from PingCAP and a MySQL Rockstar. I enjoyed the collaboration because we approached a thorny issue from two different aspects: the PostgreSQL emerging standard and the implementation of MySQL compatibility.

The Talk: "Drop-in Replacement"

Our presentation, "Drop-in Replacement: Defining Compatibility for Postgres and MySQL Derivatives", tackled a problem in our industry: the "wild west" of marketing claims. The success of open source databases has created an ecosystem of derivatives claiming "drop-in compatibility."

The reality, however, is that this often leads to user confusion and brand dilution. As we discussed, compatibility is not an absolute Yes/No situation—even different versions of the same database are not 100% compatible due to deprecated or added features.

The Standard: The Riga Consensus

In my section of the talk, I focused on the governance perspective. I presented the findings from the "Establishing the PostgreSQL Standard" working group held at PGConf.EU 2025 in Riga last October.

We are pivoting from a binary "Pass/Fail" certification to a granular compatibility matrix. We need to ensure that when someone says "Postgres Compatible," they don't just mean matching the wire protocol. We need to look at:

Core SQL & Implicit Behaviours: It's not just about functions; it's about undocumented behaviors users rely on, like how INSERT ... SELECT ... ORDER BY behaves.
System Catalogs: Monitoring tools rely on the pg_catalog being present and predictable.
No Silent Failures: A command like CREATE INDEX must actually build the index, not just return "success" while doing nothing.

The Implementation: The TiDB Experience

Daniël provided a very interesting look at the other side of the coin: the engineering reality of maintaining compatibility in TiDB, a distributed database written in Go.

He highlighted the "architectural friction" involved in making a distributed engine speak MySQL. For example, TiDB accepts the ENGINE=InnoDB syntax to remain compatible, but silently ignores it because it uses TiKV (RocksDB) for storage. He also showed how "Explain" formats can diverge significantly because distributed query plans simply look different than local MySQL plans.

Watch the Talk

We had a great turnout and some excellent engagement from audience members after the talk. If you missed it, you can watch the recording below:

Video on YouTube: youtube.com/watch?v=MXkdJH_ztpA
Link to talk slides: vyruss.org/computing/slides/fosdem2026_drop_in_replacement.pdf

Next Stop: Vancouver 🇨🇦

The conversation doesn't stop here. I am very excited to announce that we are taking this work to the next level at PGConf.dev 2026 in Vancouver!

Our session, "Establishing the PostgreSQL standard: What's Postgres compatible?", has been confirmed for the Community Open Discussion track.

As PostgreSQL becomes "the new Linux" for the enterprise, defining compatibility is critical. Building upon the foundations already set in Riga and Brussels, we will be continuing work on the granular compatibility matrix and test harness.

If you are going to be in Vancouver, please join us! We aim to leave the session with refined criteria, progress reports, and next steps for continued collaboration.

Check out current goings-on at the PostgreSQL Wiki: Establishing the PostgreSQL standard — What is Postgres compatible.

Henrietta Dombrovskaya: Prague PostgreSQL Dev Day – a very late follow up

Fri, 06 Feb 2026 16:00:25 +0000

Everyone who was in Prague on January 27-28 has already posted their feedback and moved on, so I am late, as it often happens. However, I still maintain that better late than never!

This year was the second time I attended this event, and this time, I didn’t have to rush out immediately after my training session, and was able to stay longer and to connect with many attendees. Same as last time, I was very impressed with the whole event organization, and a very warm and welcoming atmosphere at the event. Many thanks to Tomas Vondra!

I delivered the same workshop I did last year. Last year, I ran out of time despite my best efforts, and since I hate to be that presenter who doesn’t keep track of time and then rushes through the last twenty slides, I made an effort to remove the content I presumed I won’t take time to cover, in advance. It looks like I overdid it a little bit, because I ended up finishing earlier, but I think it’s still better than later

My usual lament about these training sessions is gender uniformity, and I still do not know what is the right solution for this problem!

Also, many thanks to Gülçin Yıldırım Jelínek for extending my 63-rd birthday celebraiton for another week

As it often happens, my only regret is that there were so many interesting talks happening at the same time! I could avoid FOMO if I would check out the Nordic PG schedule earlier, because some of the talks will be replayed there. I could plan it better! But in any case, I had a great time

Radim Marek: Reading Buffer statistics in EXPLAIN output

Fri, 06 Feb 2026 15:00:49 +0000

In the article about Buffers in PostgreSQL we kept adding EXPLAIN (ANALYZE, BUFFERS) to every query without giving much thought to the output. Time to fix that. PostgreSQL breaks down buffer usage for each plan node, and once you learn to read those numbers, you'll know exactly where your query spent time waiting for I/O - and where it didn't have to. That's about as fundamental as it gets when diagnosing performance problems.

PostgreSQL 18: BUFFERS by Default
Starting with PostgreSQL 18, EXPLAIN ANALYZE automatically includes buffer statistics - you no longer need to explicitly add BUFFERS. The examples below use the explicit syntax for compatibility with older versions, but on PG18+ a simple EXPLAIN ANALYZE gives you the same information.

A complete example

For this article we will use following schema and seeded data.

CREATE TABLE customers (
    id integer GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    name text NOT NULL
);

CREATE TABLE orders (
    id integer GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    customer_id integer NOT NULL REFERENCES customers(id),
    amount numeric(10,2) NOT NULL,
    status text NOT NULL DEFAULT 'pending',
    note text,
    created_at date NOT NULL DEFAULT CURRENT_DATE
);

INSERT INTO customers (name)
SELECT 'Customer ' || i
FROM generate_series(1, 2000) AS i;

-- seed data: ~100,000 orders spread across 2022-2025
INSERT INTO orders (customer_id, amount, status, note, created_at)
SELECT
    (random() * 1999 + 1)::int,
    (random() * 500 + 5)::numeric(10,2),
    (ARRAY['pending','shipped','delivered','cancelled'])[floor(random()*4+1)::int],
    CASE WHEN random() < 0.3 THEN 'Some note text here for padding' ELSE NULL END,
    '2022-01-01'::date + (random() * 1095)::int  -- ~3 years of data
FROM generate_series(1, 100000);

-- make sure stats are up to date
ANALYZE customers;
ANALYZE orders;

-- we are going to skip indexes on purpose

-- and fire sample query
select count(1) from customers;

Let's start with a random query

EXPLAIN (ANALYZE, BUFFERS)
SELECT o.*, c.name
FROM orders o
JOIN customers c ON o.customer_id = c.id
WHERE o.created_at > '2024-01-01';

and its output.

                                                         QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=58.00..2253.87 rows=33784 width=71) (actual time=0.835..26.695 rows=33239.00 loops=1)
   Hash Cond: (o.customer_id = c.id)
   Buffers: shared hit=13 read=857
   ->  Seq Scan on orders o  (cost=0.00..2107.00 rows=33784 width=58) (actual time=0.108..18.106 rows=33239.00 loops=1)
         Filter: (created_at > '2024-01-01'::date)
         Rows Removed by Filter: 66761
         Buffers: shared read=857
   ->  Hash  (cost=33.00..33.00 rows=2000 width=17) (actual time=0.697..0.698 rows=2000.00 loops=1)
         Buckets: 2048  Batches: 1  Memory Usage: 118kB
         Buffers: shared hit=13
         ->  Seq Scan on customers c  (cost=0.00..33.00 rows=2000 width=17) (actual time=0.007..0.231 rows=2000.00 loops=1)
               Buffers: shared hit=13
 Planning:
   Buffers: shared hit=130 read=29 dirtied=3
 Planning Time: 1.585 ms
 Execution Time: 28.067 ms
(16 rows)

As that's quite a bit of information, let's break it down by individual categories.

Shared buffers: hit, read, dirtied & written

As described in previous article these are most common buffers statistics you will see.

shared hit is number of pages found in shared buffers (i.e. cached). This is fast path where no disk I/O is required. Higher is better for performance.

shared read identifies number of pages not in shared buffers, fetched from disk (or OS cache), and each one of them adds potential I/O latency.

If you see a SELECT that shows `dirtied` pages. It's not a bug. PostgreSQL sets hint bits and prunes HOT chains during reads - the first backend to read a page after writes will dirty it. Normal behavior, not a problem.

**shared dirtied** has number of pages modified by this query. The query changed the data that was already cached (in the buffer pool) and those pages will eventually need to be written to the disk.

shared written is a number of pages written to disk during query execution. To remind us, this happens when query needs buffer space but needs to evict dirty pages synchronously. If you see this repeatedly during a SELECT it might be a warning sign - your background writer is not keeping up as it should.

Let's have a look at our query's top-level buffer stats:

Buffers: shared hit=13 read=857

Only 13 pages were cached in shared buffers, while 857 had to be fetched from disk (or OS cache). No pages were dirtied or written - expected for a pure SELECT with no side effects.

But where did those 13 hits come from? The breakdown by node tells us:

->  Seq Scan on orders o
      Buffers: shared read=857
->  Seq Scan on customers c
      Buffers: shared hit=13

The customers table (in this case small - 2,000 rows, 13 pages) was fully cached - likely accessed frequently or as it's in our case accessed recently. The orders table (100,000 rows, 857 pages) had zero hits - every single page required I/O. This is typical after a restart or when scanning a table that doesn't fit comfortably in shared buffers.

Interpreting the ratio

In the context of this article we're going to consider ratio between shared hit and total buffers processed. Is there a perfect ratio you should strive for? As we will demonstrate there's no such universal value.

Let's calculate it for our query:

hit_ratio = shared hit / (shared hit + shared read)
          = 13 / (13 + 857)
          = 1.5%

In an OLTP workload, the same small set of rows gets accessed over and over - fetch a customer by ID, look up an order by reference number, check inventory for a product. The working set is a small fraction of the total database. These queries touch a handful of pages each, and those pages stay hot in shared buffers because they keep getting requested. A well-tuned OLTP system naturally converges toward high hit ratios - not because someone set a target, but because the access pattern keeps the relevant data cached.

Honestly, that looks terrible. If this were the ratio of most of your OLTP queries you can easily say - there's a problem. But this was run on a freshly loaded dataset with cold caches - every page of the orders table had to be fetched for the first time. Run the same query again and you'll likely see most of those 857 reads become hits as shared buffers and the OS page cache warm up. On a test environment (where nothing else runs you will most likely hit the 100%).

What matters is the hit ratio per query, tracked over time, compared to its own baseline:

A reporting query scanning a large date range might consistently show 10-30% hit ratio. That's fine - it's expected to touch cold data.
A query serving your login page should be near 100%. If it drops to 80%, something changed - maybe the table grew, an index was rebuilt, or shared_buffers is under pressure from a new workload.
A query that ran at 95% hit ratio last week and now runs at 40% deserves investigation, regardless of whether 40% sounds "good" or "bad" in isolation.

The ratio is a diagnostic tool, not a scorecard. Use it to spot regressions, compare before-and-after when tuning, and understand where your query's time is actually going. A low ratio paired with high execution time points you toward I/O as the bottleneck. A high ratio with high execution time tells you to look elsewhere - maybe CPU, maybe row count, maybe a bad plan.

Context matters more than absolute numbers. Compare similar queries over time, not arbitrary benchmarks.

Local buffers

Local buffers track I/O for temporary tables. Unlike regular tables that live in shared buffers, temp tables use per-backend memory - each connection gets its own local buffer pool, controlled by the temp_buffers setting.

CREATE TEMP TABLE temp_large_orders AS
SELECT o.id, o.amount, o.status, o.created_at, c.name AS customer_name
FROM orders o
JOIN customers c ON o.customer_id = c.id
WHERE o.amount > 200;

EXPLAIN (ANALYZE, BUFFERS)
SELECT status, count(*), sum(amount)
FROM temp_large_orders
GROUP BY status;

First thing you should notice is there's no shared buffers at all, at least in execution phase. The entire query ran against local buffers because temp tables are invisible to other backends.

                                                          QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=1281.60..1284.10 rows=200 width=72) (actual time=24.659..24.661 rows=4.00 loops=1)
   Group Key: status
   Batches: 1  Memory Usage: 32kB
   Buffers: local hit=576
   ->  Seq Scan on temp_large_orders  (cost=0.00..979.20 rows=40320 width=48) (actual time=0.009..5.965 rows=60731.00 loops=1)
         Buffers: local hit=576
 Planning:
   Buffers: shared hit=36 read=5
 Planning Time: 0.294 ms
 Execution Time: 24.708 ms
(10 rows)

The individual values that might be reported are local hit/read with the same concept as shared, but for temp tables in the per-backend buffer pool.

Another value is local dirtied/written representing temp table modifications. "Dirtied" means the query modified pages in the local buffer pool. "Written" means dirty pages had to be flushed to disk to make room for new ones - the same clock-sweep eviction mechanism as shared buffers, but against the local buffer pool. Unlike shared buffers, temp table writes don't generate WAL and aren't subject to checkpointing.

In practice, local written is rare to see - PostgreSQL handles temp table overflow efficiently enough that you're unlikely to encounter it unless your temp_buffers is severely undersized relative to your temp table workload.

Temp buffers: when work_mem isn't enough

While local buffers are not that often considered a problem, or visible, temp buffers track cases where operations spill from memory to disk for sorts, hashes, and other operations that exceed current work_mem settings.

SET work_mem = '256kB';

EXPLAIN (ANALYZE, BUFFERS)
SELECT o.id, o.amount, o.status, o.created_at, c.name
FROM orders o
JOIN customers c ON o.customer_id = c.id
ORDER BY o.amount DESC;

We explictely forced lower work_mem to see the impact.

                                                              QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=38374.70..38874.70 rows=200000 width=36) (actual time=109.345..120.574 rows=200000.00 loops=1)
   Sort Key: o.amount DESC
   Sort Method: external merge  Disk: 9736kB
   Buffers: shared hit=1738, temp read=3636 written=3722
   ->  Hash Join  (cost=116.00..4353.56 rows=200000 width=36) (actual time=1.597..34.857 rows=200000.00 loops=1)
         Hash Cond: (o.customer_id = c.id)
         Buffers: shared hit=1738
         ->  Seq Scan on orders o  (cost=0.00..3712.00 rows=200000 width=27) (actual time=0.016..6.973 rows=200000.00 loops=1)
               Buffers: shared hit=1712
         ->  Hash  (cost=66.00..66.00 rows=4000 width=17) (actual time=1.568..1.569 rows=4000.00 loops=1)
               Buckets: 4096  Batches: 1  Memory Usage: 235kB
               Buffers: shared hit=26
               ->  Seq Scan on customers c  (cost=0.00..66.00 rows=4000 width=17) (actual time=0.012..0.629 rows=4000.00 loops=1)
                     Buffers: shared hit=26
 Planning:
   Buffers: shared hit=15
 Planning Time: 1.184 ms
 Execution Time: 123.932 ms

There you can see the temp read/written with number of pages read from and written to temporary files on disk. This indicates the operation couldn't fit in memory.

Naming Confusion Alert

temp read/written in EXPLAIN has nothing to do with the temp_buffers parameter.

temp_buffers = memory for temporary tables (CREATE TEMP TABLE)
temp read/written = disk spill from sorts/hashes (governed by work_mem)

The Sort Method: external merge Disk: 9736kB confirms it - sorting 200,000 rows with only 256kB of work_mem forced PostgreSQL to spill ~9.7MB to temporary files on disk. The temp written=3722 happened during the sort phase as pages were flushed out, and temp read=3636 happened during the merge phase as PostgreSQL read them back to produce the final sorted result.

Notice something else: the Hash Join and everything below it shows only shared hit=1738 with no temp buffers at all. The hash table for 4,000 customers fit comfortably in 235kB of memory. The temp spill is isolated to the Sort node - buffer stats always attribute I/O to the node that caused it.

Try bumping work_mem to something reasonable and the spill disappears:

SET work_mem = '16MB';

You should see no temp buffers at all. The sort completed in memory, execution time dropped, and the only I/O was reading the actual table data.

To reduce temp file usage you can:

Increase work_mem (careful there, don't forget it's per-operation setting, not per-query, so a complex query with multiple sorts or hash joins allocates work_mem for each one)
Optimize the query to process fewer rows before the sort
And probably most importantly, consider adding indexes to avoid sorts entirely - an index on orders(amount DESC) would eliminate the sort node altogether

Planning buffers

Up until now we have avoided planning buffers completely. It's an addition that started with PostgreSQL 13, allowing you to see the buffer usage during the query planning, separate from the execution:

 Planning:
   Buffers: shared hit=36 read=5

Why does planning need buffers? The planner reads system catalogs (pg_class, pg_statistic, pg_index, etc.) to understand table structures and statistics. Complex queries touching many tables can have a non-trivial impact on planning-time I/O.

High read count in planning phase suggests either system catalogues aren't cached (cold start most likely), or your query is touching many tables or columns.

If planning time is a problem, ensure system catalogs stay hot. On systems with many partitions, planning overhead can become significant - this is one reason partition pruning matters.

The blurry line between planning and execution buffers

While writing the articles for this blog I often face the doubts whatever I have everything correct. Because with a complex system like PostgreSQL one is always learning. Recently learned more about planning buffers based on my somewhat technically imprecise assumption.

PostgreSQL actually does not resolve all metadata during the planning phase. The planner does the minimum work needed to choose the best plan, but defers some catalog lookups to execution time. When a Sort node first runs, it looks up the comparison function from pg_amproc via get_opfamily_proc(). That lookup hits shared buffers and gets counted as execution buffers. On the second run in the same session, the syscache already has that information - no buffer access, fewer reported buffers.

Putting it together

Here's a sample output of a query with problems across every buffer category:

 Buffers: shared hit=50 read=15000 written=847
          temp read=2500 written=2500
 Planning:
   Buffers: shared hit=12 read=156
 Planning Time: 45.678 ms
 Execution Time: 12345.678 ms

Reading this top to bottom: the hit ratio is abysmal (50 hits vs 15,000 reads), so the working set isn't cached. The written=847 means the query forced synchronous evictions - the background writer can't keep up. The temp spill points to an operation exceeding work_mem. Even planning needed 156 reads, suggesting system catalogs got evicted from cache.

Each number points to a specific tuning lever: shared_buffers, bgwriter_lru_maxpages, work_mem, or query optimization to touch less data.

Looking beyond single queries

Single query analysis is useful, but patterns across your workload matter more. pg_stat_statements exposes the same buffer counters aggregated over time:

SELECT
    substring(query, 1, 60) AS query,
    calls,
    shared_blks_hit,
    shared_blks_read,
    round(100.0 * shared_blks_hit /
      nullif(shared_blks_hit + shared_blks_read, 0), 2) AS hit_pct,
    temp_blks_written
FROM pg_stat_statements
WHERE calls > 100
ORDER BY shared_blks_read DESC
LIMIT 10;

This shows which queries are causing the most disk reads across your system - often more actionable than analyzing one query at a time.

Conclusion

Buffer statistics transform EXPLAIN from "here's the plan" to "here's exactly where the time went." Every number points to a specific cause and a specific fix. Once you start reading them, you stop guessing and start tuning.

If you need to get bigger picture on buffer management, check out Introduction to Buffers in PostgreSQL.

Shinya Kato: Reducing row count estimation errors in PostgreSQL

Fri, 06 Feb 2026 05:41:10 +0000

Introduction

PostgreSQL's query planner relies on table statistics to estimate the number of rows (estimated rows) each operation will process, and then selects an optimal execution plan based on these estimates. When the estimated rows diverge significantly from the actual rows, the planner can choose a suboptimal plan, leading to severe query performance degradation.

This article walks through four approaches I used to reduce row count estimation errors, ordered from least to most invasive. Due to confidentiality constraints, I cannot share actual SQL or execution plans, so the focus is on the diagnostic thought process and the techniques applied.

Prerequisites

The approaches in this article are applicable to any modern PostgreSQL version, as the underlying mechanisms (autovacuum, pg_statistic, extended statistics) have been stable across versions
The target table had a high update frequency
Actual SQL and execution plans cannot be shared; this article focuses on methodology

Approach 1: Tuning autovacuum auto-ANALYZE frequency per table

The target table was known to have a very high update rate, so the first hypothesis was that the statistics were simply stale.

In PostgreSQL, the autovacuum daemon automatically runs ANALYZE to update statistics stored in pg_statistic. However, for tables with heavy write activity, auto-ANALYZE may not keep up with the rate of change, causing the statistics to drift from reality.

To address this, I adjusted the auto-ANALYZE frequency for the specific table rather than changing the server-wide settings in postgresql.conf.

The two key parameters are:

autovacuum_analyze_threshold: The minimum number of tuple modifications before auto-ANALYZE is triggered (default: 50)
autovacuum_analyze_scale_factor: The fraction of the table size added to the threshold (default: 0.1, i.e., 10%)

ALTER TABLE table_name SET (
    autovacuum_analyze_threshold = 0,
    autovacuum_analyze_scale_factor = 0.01
);

In this example, autovacuum_analyze_threshold is set to 0 (default: 50) and autovacuum_analyze_scale_factor is reduced from 0.1 to 0.01, so that auto-ANALYZE triggers after just 1% of the table has been modified.

Note: You can verify whether auto-ANALYZE is keeping up by querying the pg_stat_user_tables view. Check the last_autoanalyze column for the timestamp of the last auto-ANALYZE, and n_mod_since_analyze for the number of row modifications since the last ANALYZE.

For the full list of per-table storage parameters, see the PostgreSQL documentation on storage parameters.

Approach 2: Increasing the statistics sampling target per column

After adjusting autovacuum frequency, the next hypothesis was that ANALYZE was running often enough but the sample size was too small to produce accurate statistics.

PostgreSQL's ANALYZE collects samples from each column and stores most-common values (MCVs) and histograms in pg_statistic. The precision of this information is governed by default_statistics_target (default: 100), which controls the number of histogram buckets and MCV entries.

Rather than changing the server-wide setting, I increased the statistics target on a per-column basis for the affected table:

ALTER TABLE table_name ALTER COLUMN column_name SET STATISTICS 500;

As a general guideline, setting the target to 500-1000 for columns frequently used in WHERE clauses is a common tuning step. However, higher values increase ANALYZE execution time and the size of pg_statistic, so there is a trade-off to consider.

Note: After changing the statistics target with SET STATISTICS, you must run ANALYZE (or wait for the next auto-ANALYZE) for the new setting to take effect.

Approach 3: Using extended statistics

Even after improving the freshness and precision of the base statistics, row count estimation errors can persist. This happens when the planner's estimation model itself has structural limitations.

By default, PostgreSQL's planner assumes that conditions on different columns are independent. When this assumption is violated, the planner multiplies selectivities independently, often resulting in a dramatic underestimation of the actual row count.

This occurs when:

A table has two columns a1 and a2
There is a functional dependency between them (e.g., a1 determines a2)
The WHERE clause includes conditions on both columns

A concrete example would be columns like country and city -- knowing the country largely determines the set of possible cities. The planner treats the selectivities of each condition as independent, producing an estimate far lower than the actual row count.

To address this, I created extended statistics on the correlated columns:

CREATE STATISTICS stat_name ON a1, a2 FROM table_name;

CREATE STATISTICS supports three kinds of statistics: ndistinct (multi-column distinct counts), dependencies (functional dependencies), and mcv (multi-column most-common values). If you omit the kind specification, all three are collected, which is what I did as a starting point.

Note: CREATE STATISTICS only defines the statistics object. The actual statistics are not populated until ANALYZE runs on the table.

For more details, see the PostgreSQL documentation on CREATE STATISTICS.

Approach 4: Using pg_hint_plan as a last resort

When statistics-based approaches are not sufficient, pg_hint_plan provides a way to directly control the planner's behavior through SQL comment-based hints.

For row count estimation issues specifically, the Rows hint allows you to override the planner's row estimate for a given table:

/*+ Rows(table_name #1000) */ SELECT ...

In the example above, the estimated row count for table_name is fixed at 1000. You can also use +, -, or * instead of # to add, subtract, or multiply the planner's original estimate.

However, hint-based approaches come with significant drawbacks:

Fragile to data changes: Fixed row counts become inaccurate as data volumes change
Reduced maintainability: Team members unfamiliar with the hints may be confused by them
Masks root causes: Hints can hide underlying statistics or schema issues that should be properly addressed

For these reasons, I recommend using hints only when statistics-based approaches have been exhausted, or as a temporary measure while investigating the root cause.

Conclusion

This article covered four approaches to reducing row count estimation errors in PostgreSQL, ordered by increasing invasiveness:

Tune autovacuum frequency: Are the statistics stale?
Increase the statistics target: Is the sample size sufficient?
Create extended statistics: Can the planner account for cross-column correlations?
Apply hint clauses: A last resort when statistics alone cannot solve the problem

When facing row estimation errors, a systematic approach works best: start with EXPLAIN ANALYZE to compare estimated vs. actual row counts, then work through the possible causes in order -- statistics freshness, then precision, then structural limitations of the estimation model. I hope this article serves as a useful reference in your troubleshooting process.

References

Jeremy Schneider: Postgres client_connection_check_interval

Thu, 05 Feb 2026 04:54:40 +0000

Saw this post on LinkedIn yesterday:

I also somehow missed this setting for years. And it’s crazy timing, because it’s right after I published a blog about seeing the exact problem this solves. In my blog post I mentioned “unexpected behaviors (bugs?) in… Postgres itself.” Turns out Postgres already has the fix; it’s just disabled by default.

It was a one-line change to add the setting to my test suite and verify the impact. As a reminder, here’s the original problematic behavior which I just now reproduced again:

At the T=20sec mark, TPS drops from 700 to around 30. At T=26sec the total connections hit 100 (same as max_connections) and then TPS drops to almost zero. This total system outage continues until T=72sec when the system recovers after the blocking session has been killed by the transaction_timeout setting.

So what happens if we set client_connection_check_interval to 15 seconds? Quick addition to docker-compose.yml and we find out!

Fascinating! The brown line and the red line are the important ones. As before, the TPS drops at T=20sec and zeros out after we hit max_connections. But at T=35sec we start to see the total connection count slowly decrease! This continues until T=42sec when the PgBouncer connections are finally released – and at this point we repeat the whole cycle a second time, as the number of total connections climbs back up to the max.

So we can see that the 15 second client_connection_check_interval setting is working exactly as expected (if a little slowly) – at the 15 second mark Postgres begins to clean up the dead connections.

What if we do a lower setting like 2 seconds?

This looks even better! The total connections climbs to around 30-ish and holds stable there. And more importantly, the TPS never crashes out all the way to zero and the system is able to continue with a small workload until the blocking session is killed.

There is definitely some connection churn happening here (expected due to golang context timeouts) and with Postgres taking 2 seconds to clear them out, equilibrium is apparently around 30. A higher attempted TPS would bring this value higher.

Lets try one more time with an even lower setting of 500ms:

The TPS seems around the same and this time the connection count seems to stay very low.

Finally, lets take a look at the networking stack from the OS perspective at the number of sockets in CLOSE-WAIT state:

This is where the impact of client_connection_check_interval becomes very clear. Postgres is working exactly as expected and cleaning up dead connections based on the delay that’s specified in this parameter.

I find myself agreeing with Marat on LinkedIn, and I feel like there’s a strong case for giving this parameter a default value.

And now please excuse me while I go update my original blog post.

Ibrar Ahmed: MCP Transport: Architecture, Boundaries, and Failure Modes

Thu, 05 Feb 2026 04:51:32 +0000

You can prototype an impressive agent in a notebook, but you can’t run one in production without a transport strategy. The Model Context Protocol standardizes how agents call tools and access memory, but it intentionally does not define how bytes move between systems. That responsibility sits with your architecture. Most teams treat transport as an implementation detail, and default to whatever works in a development container. That shortcut becomes technical debt the moment the system faces real traffic. The symptoms are predictable:

Latency becomes inconsistent under load.

Streams stall without clear failure signals.

Security boundaries blur between internal and external systems.

These are not edge cases. They are the inevitable outcome of using transport designed for convenience rather than reliability.In a production environment, transport failure creates availability incidents. Model failures create incorrect answers or stop the system from working. Transport failure can also produce a bad output while the system stays online. Because of this, transport design belongs in reliability engineering, not just in application plumbing. Transport defines how quickly agents respond, how failures propagate, how systems recover, and how safely tools and memory services are exposed. When transport design conflicts with infrastructure reality, agents appear unstable even when tool logic and model behavior are correct. Stability in agent systems is determined at boundaries, not inside components.This post examines the three dominant MCP transport models used in production systems.

stdio MCP provides process-level isolation and the lowest possible latency for same-host execution.

HTTP with Server-Sent Events enables cluster-scale orchestration, shared tool services, and internal memory systems.

HTTPS with Server-Sent Events enforces encryption and identity guarantees across trust boundaries, including customer, partner, and multi-organization environments.

You will learn where each model fits operationally, the real cost of running each at scale, and how mature agent platforms combine all three into a hybrid routing architecture. The goal is not to pick one transport; the goal is to match transport to the boundary, so that performance, reliability, and security remain predictable as systems scale from workstations to clusters to the internet edge.

Postgres as an MCP tool execution platform

Postgres is shifting from a passive storage engine to an active tool execution surface for agent systems. Through MCP servers such as our MCP server for Postgres, database functions, extensions, administrative operations, and query workflows become callable tools exposed through a standardized interface. This turns Postgres into part of the agent control plane, where SQL execution, extension logic, and data access happen as tool invocations rather than indirect application calls. In this model, the database is no longer only a persistence layer. It acts as a deterministic execution environment with transactional guarantees, auditability, and strong consistency, making it a natural backend for agent memory, tool orchestration, and operational automation.

Why transport becomes a database reliability boundary

Once Postgres is exposed through MCP as a tool service, transport directly influences database reliability, latency stability, and failure propagation. Transport failures surface upstream as latency spikes, retry storms, or connection floods against the database, even when Postgres itself is healthy:

Poor streaming behavior can hold connections open and exhaust pools

Misconfigured retries can amplify load and create cascading availability incidents

Certificate or identity failures can block tool access while the database remains operational, creating partial outages that are difficult to diagnose

Because of this, transport selection is not an application detail. It is part of database reliability engineering, affecting SLOs, capacity planning, and operational recovery behavior across agent driven systems.

What MCP Transport Must Handle

Your agent does not just call a tool once. It calls tools in tight loops, streams partial results, retries, runs under load, ...it runs in places with different trust boundaries.A transport layer must support:

Low-latency request/response cycles for real-time interactivity.

Streaming tokens or events for long-running processes.

Backpressure controls so a fast sender cannot overwhelm a slow receiver, protecting system stability.

Authentication and authorization to control access to tools.

Observability, tracing, and correlation across calls to diagnose complex failures.

Failure isolation so a problem in one tool cannot take down the entire agent.

If you select a transport that does not match your boundary, scaling model, or security model, you will need to rebuild it later.

Three MCP Transport Models

The models differ in where the server runs and how messages are routed. stdio runs as a child process, as messages flow through stdin and stdout pipes, delimited by newlines. HTTP with SSE runs as a network service inside a trusted network. HTTPS with SSE runs as a network service across a trust boundary; TLS is not optional there. Treat each boundary model as a tool; just choose the one that fits the edge you are crossing.

The stdio MCP

The stdio MCP runs over a process pipe. The agent launches the MCP server as a child process, writing JSON-RPC messages to the server's stdin, while the server writes responses to stdout. The OS handles buffering and scheduling.

The agent starts the server binary as a subprocess.

Agent sends MCP JSON-RPC frames over stdin.

The server reads, runs the tool, and then writes a JSON-RPC response to stdout.

The agent reads from stdout, matches the response to the request ID, and then continues.

Use stdio when the tool runs on the same host, and you want the lowest-overhead path.It fits local tool execution for development, supporting database-sidecar tools that run alongside a local database. It fits GPU inference adapters on a workstation, with CLI automation that must run offline. It fits well into secure air-gapped environments.Latency is the lowest achievable without shared memory, providing strong throughput for request response patterns. Overhead is minimal because there is no network stack. Security is OS process isolation plus any additional sandboxing you add.Failure ModelWhen the tool process crashes, the pipe closes. The agent detects it and restarts the child process. Recovery can be fast, depending on the state; if the tool:

stores its state in memory, it is lost on restart.

keeps state on disk, you can recover, but you must design for it.

has long-running work, you need a cancel path so restarts do not duplicate work.

Operational notes you should not skip

You will need a watchdog for the child process, with per-call timeouts, even on a pipe. Make sure you provide a backpressure policy.

Limit concurrent tool calls to prevent resource exhaustion. Limit the max response size to avoid blocking the pipe and stream results when possible.

You also need log capture. In stdio, the spec reserves stderr for logs. Capture it and correlate it with request ids.

MCP messages are strictly newline-delimited JSON-RPC. Ensure your parser handles partial reads and buffering correctly.

When stdio is the wrong choicestdio starts to hurt when you want shared access, scaling, or remote control. stdio doesn't support load balancing across hosts, so you cannot share a single tool server across many agents without extra plumbing. Monitoring stdio configurations from outside the host is also more difficult.

HTTP with SSE MCP

HTTP with SSE exposes MCP endpoints over TCP using Server-Sent Events. The agent connects to a service address, often via Kubernetes DNS. The agent uses two channels; the GET channel establishes a persistent connection for receiving messages via SSE, while the POST channel handles ephemeral client-to-server JSON-RPC messages.The process is fairly simple:

An agent opens an HTTP GET connection to the SSE endpoint (e.g.,
/sse
).

The server accepts and keeps the connection open, sending an initial
endpoint
event containing the URI for the message POST endpoint.

The agent sends HTTP POST requests with MCP JSON-RPC messages to the provided endpoint.

The server processes the request and then pushes the JSON-RPC response over the open SSE channel.

Messages are correlated via the JSON-RPC
id
field. The agent matches the
id
in the SSE event payload to the
id
of the POST request.

This dual-channel approach enables server-push notifications without the complexity of full WebSockets. HTTP keeps the network boundary inside a trusted zone. That zone can be a VPC, a cluster, or a mesh segment. It is still a network. You still need to plan for failure.HTTP with SSE is suitable for Kubernetes internal services. It fits service mesh environments. It fits shared tool clusters. It fits horizontal scaling for heavy tools.Performance CharacteristicsFor this transport model, latency is moderate due to network hop overhead, while throughput stays high when connections are pooled and reused. Overhead includes TCP transport, HTTP headers and request parsing. Security depends on both network policy and service authentication, often using tokens. Additionally, you must also validate the Origin header to prevent DNS rebinding attacks.Configuring for ProductionYou should configure connection pooling to handle concurrency, and use Postgres parameters to streamline performance and functionality. A good starting configuration might set:

the maximum number of connections (
max_connections
) to roughly 10-15 per CPU core for optimal throughput.

connection timeouts (
connect_timeout
) to around 600 seconds.

idle timeouts (
idle_session_timeout
) to 300 seconds to clean up stale sessions.

hard request timeouts (
transaction_timeout
) to 30 seconds as a baseline.

Use HTTP/2 or StreamableHTTP if available to multiplex requests and reduce connection overhead by up to 60%.Failure Model HTTP provides you a mature ecosystem for retries, load balancing, and timeouts. It also gives you more ways to fail.Common failure cases can result from a number of issues:

DNS issues, incorrect service names, and stale endpoints can prevent agents from reaching the intended service.

Load balancer configuration may route traffic to unhealthy instances, causing intermittent failures.

Slow or overloaded instances can lead to queue buildup and request timeouts.

User retry storms can amplify load and make a bad situation worse. Deliberate retry logic is essential, but you should never retry blindly. You can use bounded retries with jitter to avoid synchronized retry spikes.

Retry only safe operations, or use idempotency keys to prevent duplicate effects.

Implement circuit breakers so a failing tool does not consume all available agent time.

Connection pooling matters: If you don't reuse connections, you will pay extra latency per call. You also risk running out of ephemeral ports under load. Keep connections alive when you can and cap the pool size to avoid overloading the tool server.

HTTPS with SSE MCP

HTTPS with SSE MCP is HTTP over TLS; this basically means the Model Context Protocol is transported using standard, encrypted web infrastructure. This model is useful when the call crosses a trust boundary. That can be the public internet, a partner network, or a cross-organization boundary inside a large company.TLS adds two things you need across boundaries:

Encryption in transit

Identity verification through certificates

When using this model:

The agent performs a TLS handshake with the server.

The agent validates the server certificate chain and hostname.

The agent sends HTTP requests over the encrypted channel.

The server replies over the same channel.

Optional mTLS also authenticates the client at the transport layer.

HTTPS with SSE is suitable for customer-facing tool endpoints, and fits well into multi-tenant SaaS agent infrastructure, zero-trust architectures, and compliance-regulated environments.Latency is higher on new connections due to the TLS handshake, but throughput stays high with keep-alive and session resumption. The overhead for this model includes encryption, CPU costs and certificate validation. Be sure you're using strong transport security, with durable authentication that provides strong assurances that your users are really who they authenticate as.

Failure Model

The hard failure case is certificate expiration; this can take you down fast. You will need to implement certificate expiration alerts, automate certificate rotation, use staging and canary rollouts for new certificates, and maintain clear runbooks for emergency rotation.You also need to handle:

TLS version mismatch.

Bad cipher suite config.

Clock skew that breaks certificate validity checks.

Revocation and CA chain issues in some environments.

Choosing between stdio, HTTP, and HTTPS

Use this checklist to avoid most bad choices.Use stdio: When the tool runs on the same host as the agent. You want fast iteration and low overhead, but you do not need shared access across many agents. You can accept process-level isolation as the main boundary.Use HTTP with SSE: When the tool must serve many agents within a trusted network with a mesh or gateway that provides tracing and retries. You need load balancing and autoscaling, but can enforce service authentication with tokens and network policy.Use HTTPS with SSE: When the call crosses a trust boundary. You handle customer or regulated data, exposing a tool endpoint to partners or clients. For HTTPS with SSE, you need transport-level identity and encryption.One hard rule: If the network is not fully trusted, use HTTPS. If you are not sure, treat it as untrusted.

The pgEdge MCP Server

The pgEdge Postgres MCP server repo contains a real MCP tool server wired for both local and network execution; you can run the same database tool layer through stdio for subprocess speed or through HTTP and HTTPS endpoints for cluster and external access. You get a design suited for Postgres driven agent tools where SQL, extensions, or admin actions sit behind MCP, with token authentication for internal service traffic and TLS support for secure endpoints, which makes your Postgres layer act like a first-class tool service instead of a passive database.

Future direction

As you design your system, there are additional considerations we didn't mention, but that you should consider:

WebSockets are not the primary focus of this blog, but they are a logical concern for bidirectional heavy lifting.

HTTP/3 and QUIC can cut latency for encrypted traffic in some paths. This can also change failure modes and what you can observe in the network.

Persistent streaming transports can cut per-call overhead for chatty tools.

Hardware TLS acceleration can reduce CPU cost for high-throughput TLS services.

Be sure you choose your transport architecture by boundary. Bound retries. Bound concurrency. Observe everything.

Richard Yen: History Repeats Itself

Wed, 04 Feb 2026 17:00:00 +0000

Over 15 years later, some solutions are still great solutions

Introduction

OpenAI recently shared their story about how they scaled to 800 million users on their ChatGPT platform. With the boom of AI in the past year, they’ve certainly had to deal with some significant scaling challenges, and I was curious how they’d approach it. To sum it up, they addressed the following issues with the following solutions:

Reducing load on the primary (offloaded read-only queries to replicas)
Query optimization (query tuning and configuring timeouts like idle_in_transaction_session_timeout)
Single point of failure mitigation (configured hot-standby for high-availability)
Workload isolation (implemented a software solution for load-balancing)
Connection pooling (deployed pgBouncer)
Cache misses (implemented a cache locking mechanism)
Scaling read replicas (implemented cascading replication)
Resource exhaustion (implemented rate-limiting, tuned ORM)
Full table rewrites on schema changes (enforced strict DML policies)

Indeed, there was a lot of work put in to scale to “millions of queries per second (QPS)” and I applaud their team for implementing these solutions to handle the unique challenges that they faced. 👏👏👏

Taking a Walk Down Memory Lane

While reading through their post, I couldn’t help but think to myself, wow, some of the solutions they used are not much different from ours 15 years ago! Fifteen years ago, I was the head DBA at Turnitin (called iParadigms at the time). Times were different back then, before the massive boom of social media (Instagram wasn’t a thing at the time!), and we were all on-prem, switching from spindle-based disk to SSDs. At that time, we were likewise facing challenges scaling to 3000 QPS to serve up data to students and teachers across the US, Canada, and the UK. Our founders were making a lot of headway in promoting Turnitin to secondary schools and universities, and we were regularly facing the struggle of having “just enough” resources to keep our systems running smoothly.

Some Things Don’t [Need to] Change

To address the challenges that we faced 15 years ago, we employed similar solutions that the OpenAI team devised in 2025, namely:

Reducing load on the primary

To reduce load on the primary, we also implemented a software-based solution to send read-only queries to our replicas. Written in Perl, our Multiplexor listened to all incoming database traffic (port 5432) and directed transactions with DML queries to the primary, while sending other queries to the standbys. This ensured that the primary only received write traffic (though some read traffic was necessary) and kept I/O as low as we could manage.

Connection pooling

To ensure that each database session gets maximum resources for sort, join, and aggregation operations, OpenAI selected pgBouncer as the connection pooler of choice, and the used Kubernetes as a load-balancing mechanism. This is clever (we didn’t have Kubernetes at the time, but I think I might implement it if I find myself in a DBA role again). pgBouncer is a solid choice for connection pooling; with its high configurability and server session management, DBAs get great benefit in keeping operational overhead low and resource availability high.

Workload isolation

To isolate high-tier and low-tier workloads, OpenAI implemented a software solution. They didn’t specifically call this out, but I suppose this is in conjunction to their Kubernetes load-balancing configuration. At the time, we also wanted to ensure that load was balanced across our four replicas, and that no one of them would take the brunt of read traffic. To implement this at the time, we used haproxy and configured it to run some health-checking Bash scripts to determine where to route traffic. Fifteen years later, haproxy might not be a buzzword, but solid scripting and software engineering keep the lights on!

Scaling read replicas

The OpenAI team detailed how they employed cascading replication as the mechanism to scale out to “nearly 50 read replicas” to handle their millions of QPS. I suspect that in addition to adding tremendous load on the databases, the millions of QPS probably caused their network team some headaches in consuming bandwidth, but I digress… At Turnitin, we also employed cascading replication – not just for scaling read traffic, but also as a mechanism for high-availability and disaster recovery. When shipping WAL files to a different region, we were able to have a completely identical cluster of databases – 1 primary and 4 standbys – and performing a failover was just a matter of changing a CNAME to direct write traffic to the new location. From there, we could use tools like pg_rewind to re-attach the old region to the new primary region.

Conclusion

It’s interesting and reassuring to see that 15 years later, some of the same solutions we used at Turnitin are being used by one of the biggest Postgres deployments in the world. This only affirms the fact that Postgres is indeed “The World’s Most Advanced Open Source Relational Database.” The Postgres community is incredibly talented, their expertise is deep, and their code is robust. Even tools like pgBouncer are incredibly reliable, suitable for ultra-heavy, millions-of-QPS workloads. Power to Postgres! 🐘

Esther Minano: pgstream v1.0.0: Stateless schema change replication

Wed, 04 Feb 2026 15:58:34 +0000

A major architectural milestone that removes schema logs and simplifies how pgstream captures and replicates Postgres schema changes

Robins Tharakan: The "Skip Scan" You Already Had Before v18

Wed, 04 Feb 2026 13:05:00 +0000

PostgreSQL 18 introduces native "Skip Scan" for multicolumn B-tree indexes, a major optimizer enhancement. However, a common misconception is that pre-v18 versions purely resort to sequential scans when the leading column isn't filtered. In reality, the cost-based planner has long been capable of leveraging such indexes when efficient.

How it works under the hood:

In pre-v18 versions, if an index is significantly smaller than the table (the heap), scanning the entire index to find matching rows—even without utilizing the tree structure to "skip"—can be cheaper than a sequential table scan. This is sometimes also referred as a Full-Index Scan. While this lacks v18’s ability to hop between distinct leading keys, it effectively still speeds up queries on non-leading columns for many common workloads.

Notably, we would see why this v18 improvement is not a game-changer for all workloads, and why you shouldn't assume speed-up for all kinds of datasets.

Benchmarks: v17 vs v18

To understand when the new v18 Skip Scan helps, we tested two distinct scenarios.

Scenario A: Low Cardinality (The "Success" Case)

The Setup: We created an index on (bid, abalance) and ran the following query:

SELECT COUNT(*) FROM pgbench_accounts WHERE abalance = -2187;

Note: We did not run a read-write workload, so abalance is 0 for all rows. The query returns 0 rows.

Data Statistics:

Table Rows: 1,000,000
Leading Column (bid): ~10 distinct values (Low Cardinality).
Filter Column (abalance): 1 distinct value (Uniform).

Results:

Version	Strategy	TPS	Avg Latency	Buffers Hit
v17	Index Only Scan (Full)	~2,511	0.40 ms	845
v18	Index Only Scan (Skip Scan)	~14,382	0.07 ms	39

Why the massive 5.7x gain?

Pre-v18 Postgres sees that bid is not constrained. It decides its only option is to read the entire index (all leaf pages) to find rows where abalance = -2187. It scans 1,000,000 items (845 pages).

Postgres v18 uses Skip Scan. It knows bid comes first. instead of scanning sequentially, it:

Seeks to the first bid (e.g., 1). Checks for abalance = -2187.
"Skips" to the next unique bid (e.g., 2). Checks for abalance = -2187.
Repeats for all 10 branches.

It effectively turns one giant scan into 10 small seeks. The EXPLAIN output confirms this with Index Searches: 11 (10 branches + 1 stop condition) and only 39 buffer hits.

v18 Skip Scan Plan (Scenario A):

 Index Only Scan using pgbench_accounts_bid_abalance_idx on public.pgbench_accounts ...
   Index Cond: (pgbench_accounts.abalance = '-2187'::integer)
   Heap Fetches: 0
   Index Searches: 11
   Buffers: shared hit=39

Scenario B: High Cardinality (The "Failure" Case)

The Setup: We used the same query, but against an index on (aid, abalance).

Data Statistics:

Leading Column (aid): 1,000,000 distinct values (100% Unique / High Cardinality).

Results:

Version	Strategy	TPS	Avg Latency	Buffers Hit
v17	Index Only Scan (Full)	~55.8	17.92 ms	2735
v18	Index Only Scan (Full)	~51.2	19.52 ms	2741

Why no improvement? If v18 attempted to "Skip Scan" here, it would have to skip 1,000,000 times (once for every unique aid). Performing 1 million seeks is significantly heavier than simply reading the 1 million index entries linearly (which benefits from sequential I/O and page pre-fetching). The planner correctly estimated this cost and fell back to the standard "Index Only Scan" used in v17.

v18 Plan (Identical to v17):

 Index Only Scan using pgbench_accounts_aid_abalance_idx on public.pgbench_accounts ...
   Index Cond: (pgbench_accounts.abalance = '-2187'::integer)
   Heap Fetches: 0
   Index Searches: 1
   Buffers: shared hit=2741

How to Identify a Skip Scan

You might notice that the node type in the EXPLAIN output remains Index Scan or Index Only Scan in both cases. PostgreSQL 18 does not introduce a special "Skip Scan" node.

Instead, you must look at the Index Searches line (visible with EXPLAIN (ANALYZE)):

v17 (and older): The Index Searches row does not appear.
v18 (Standard Scan): Index Searches: 1. The scan started at one point and read sequentially (as seen in Scenario B).
v18 (Skip Scan): Index Searches: > 1. The engine performed multiple seeks (as seen in Scenario A which had 11).

In our Scenario A (Success), Index Searches: 11 tells us it performed ~11 hops, confirming the feature was used.

How to reproduce:

Note: We use PostgreSQL 13 in this section to meaningfully demonstrate that even older versions (long before v18) could efficiently utilize multicolumn indexes for these types of queries, to clarify that older Postgres versions were capable of avoiding a Full Table Scan (and instead fallback to Full Index Scan) in such scenarios.

PostgreSQL 13 (a running cluster)
A pgbench-initialized database (run pgbench -i to create pgbench_accounts, pgbench_branches, pgbench_tellers, and pgbench_history)
Reasonable data loaded (default pgbench -i -s 10 or similar)

Steps to reproduce the example:

Follow these steps in your terminal to set up the test environment. This assumes you have PostgreSQL 13 (or a similar version) installed and its command-line tools are in your PATH.

Step 1: Create and Initialize the Database

createdb testdb                # Create a new database
pgbench -i -s 10 testdb         # Initialize it with 1 million rows

Step 2: Run a Quick Read-Write Test

To simulate some database activity, run a standard 10-second read-write benchmark.

pgbench -T 10 testdb

Step 3: Connect and Run the Test Query

Now, connect to the database with psql to run the SQL commands that demonstrate the index scan behavior.

SQL steps (run in psql):

psql testdb
-- create a multicolumn index (first column aid, second column abalance)
testdb=# CREATE INDEX CONCURRENTLY IF NOT EXISTS pgbench_accounts_aid_abalance_idx
  ON pgbench_accounts(aid, abalance);

testdb=# select abalance, count(*) from pgbench_accounts group by abalance order by count(*) desc limit 5;
 abalance | count
----------+--------
        0 | 995151
    -2187 |      4
    -4621 |      4
    -4030 |      4
    -2953 |      4
(5 rows)

-- run an explain on a selection that filters only on the second column (abalance)
testdb=# EXPLAIN (ANALYSE, VERBOSE, COSTS)
SELECT COUNT(*) FROM pgbench_accounts WHERE abalance = -2187;

Aggregate  (cost=18480.77..18480.78 rows=1 width=8) (actual time=18.672..18.674 rows=1 loops=1)
   Output: count(*)
   ->  Index Only Scan using pgbench_accounts_aid_abalance_idx on public.pgbench_accounts  (cost=0.42..18480.69 rows=33 width=0) (actual time=8.583..18.659 rows=4 loops=1)
         Output: aid, abalance
         Index Cond: (pgbench_accounts.abalance = '-2187'::integer)
         Heap Fetches: 0
 Planning Time: 0.328 ms
 Execution Time: 18.732 ms
(8 rows)

Notes on the output above:

The important line is Index Only Scan using pgbench_accounts_aid_abalance_idx along with Index Cond: (pgbench_accounts.abalance = '-2187'::integer) — this demonstrates the planner chose an index scan that probes the multicolumn btree for entries where the second column matches the predicate.

Why this is not the v18 "skip-scan" feature in full:

The v18 skip-scan adds optimizer logic to efficiently iterate distinct values of leading columns and probe the remainder of the index; it's a targeted algorithmic addition. What we show above is the planner choosing an index scan over a multicolumn index and applying the Index Cond to the second column. That can be effective for many queries and data layouts, but it lacks the specialized skip-scan internals that v18 adds for certain other cases.

Production Tips

If you need queries on a non-leading column to be fast on older Postgres versions, create the right index and keep statistics current (ANALYZE). The planner may prefer an index scan over a seq scan when selectivity and costs align.
Consider partial or expression indexes when appropriate; they let you make an index that directly serves the filter you need.
When portability across versions is important, test on the earliest Postgres version you need to support; planner behavior can vary by release, statistics, and data distribution.

Conclusion

Postgres v18's documented skip-scan addition for B-tree indexes is a welcome and useful optimizer enhancement, specifically for low-cardinality leading columns. However, for high-cardinality leading columns (like our first example), the standard Full Index Scan remains optimal, and pre-v18 versions handle them just fine.

References

PostgreSQL 18 release notes (skip-scan, indexes): https://www.postgresql.org/docs/18/release-18.html
nbtree skip-scan optimization: https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=92fe23d93
Further optimize nbtree search scan key comparisons: https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=8a510275d

Kai Wagner: PGDay and FOSDEM Report from Kai

Wed, 04 Feb 2026 10:00:00 +0000

The following thoughts and comments are completely my personal opinion and do not reflect my employers thoughts or beliefs. If you don’t like anything in this post, reach out to me directly, so I can ignore it ;-).

I’m currently on the train on my way back home from FOSDEM this year and man, I’m exhausted but also happy. Why? Because the PG and FOSDEM community is just crazily awesome. While it’s always too much of everything, it’s at the same time inspiring to see so many enthusiastic IT nerds in one place, discussing and working on what they love - technology and engineering challenges.

Umair Shahid: PostgreSQL Materialized Views: When Caching Your Query Results Makes Sense (And When It Doesn’t)

Tue, 03 Feb 2026 09:17:40 +0000

The Pain and the Real Constraint

Your dashboard queries are timing out at 30 seconds. Your BI tool is showing spinners. Your users are refreshing the page, wondering if something’s broken.

You’ve indexed everything. You’ve tuned shared_buffers. You’ve rewritten the query three times. The problem isn’t bad SQL – it’s that you’re forcing PostgreSQL to aggregate, join, and scan millions of rows every single time someone opens that report.

Here’s a clear stance: repeated heavy computations are a design choice, not a badge of honour. If you’re running the same expensive calculation dozens of times a day, you’re choosing to do more work than necessary.

This post shows you how to turn one expensive query shape into a fast, indexed object with explicit freshness and operational control. Materialized views give you predictable reads when you’re willing to accept a refresh contract.

What a Materialized View Actually Is (and What It Is Not)

Definition in Plain Words

A materialized view is a physical relation that stores the result set of a query.

When you create one, PostgreSQL runs your query, writes the output to disk, and keeps it there until you tell it to refresh. That’s it. No magic. Just a snapshot you control.

Compare the Three Common Patterns

Let’s be precise about what you’re choosing:

View: Computed at read time, always current. PostgreSQL rewrites your query against the underlying tables every time. Zero staleness, full computation cost on every read.

Materialized view: Computed at refresh time, fast reads. You decide when to refresh. Reads are fast and predictable because they’re hitting stored data. Staleness is explicit and bounded by your refresh schedule.

Summary table: You own the update pipeline. Whether it’s ETL jobs, application code, or triggers—you’re writing the insert/update logic and managing incremental changes yourself.

Why the “Physical” Part Matters

Because it’s a physical relation, you can:

Index it like any table
Let the planner treat it like a table (predictable execution plans)
Pay for storage and refresh work in exchange for making reads fast and deterministic

You’re trading computation-on-read for computation-on-schedule. That’s the contract.

When Materialized Views Are a Strong Fit

The Best-Fit Workload Shapes

Materialized views work best when:

Repeated reporting queries with stable patterns hit the same aggregations (BI dashboards, executive summaries, weekly rollups)
Heavy joins and aggregations across large tables that don’t change second-by-second
Precomputed metrics that are “fresh enough” on a schedule your business can accept

If your query shape is stable and your freshness requirement is measured in minutes or hours (not milliseconds), materialized views are worth evaluating.

A Concrete Motivating Example

Let’s use an e-commerce order revenue summary. You’re joining:

orders (10M rows)
order_items (40M rows)
products (500K rows)
customers (2M rows)

Your query aggregates revenue by product category, customer region, and week. It’s grouped, filtered by tenant, and sorted by revenue descending.

Baseline symptoms:

Execution time: 28 seconds
The query runs 40+ times per day (dashboard loads, exports, API calls)
Users complain, BI tool times out, someone opens a ticket

You know the query is expensive. The question is whether you want to keep paying that cost every single time.

Build It Step-by-Step (Copy/Paste SQL)

Start with the Baseline Query

Here’s the query you want to stop recomputing:

SELECT 
    p.category,
    c.region,
    DATE_TRUNC('week', o.order_date) AS week,
    COUNT(DISTINCT o.order_id) AS order_count,
    SUM(oi.quantity * oi.unit_price) AS total_revenue
FROM orders o
JOIN order_items oi ON o.order_id = oi.order_id
JOIN products p ON oi.product_id = p.product_id
JOIN customers c ON o.customer_id = c.customer_id
WHERE o.tenant_id = 'acme_corp'
  AND o.order_date >= CURRENT_DATE - INTERVAL '90 days'
GROUP BY p.category, c.region, DATE_TRUNC('week', o.order_date)
ORDER BY total_revenue DESC;

Execution: 28 seconds. Buffers: scanning 80GB+ across four tables.

Create the Materialized View Safely

CREATE MATERIALIZED VIEW mv_order_revenue_summary AS
SELECT 
    o.tenant_id,
    p.category,
    c.region,
    DATE_TRUNC('week', o.order_date) AS week,
    COUNT(DISTINCT o.order_id) AS order_count,
    SUM(oi.quantity * oi.unit_price) AS total_revenue
FROM orders o
JOIN order_items oi ON o.order_id = oi.order_id
JOIN products p ON oi.product_id = p.product_id
JOIN customers c ON o.customer_id = c.customer_id
WHERE o.order_date >= CURRENT_DATE - INTERVAL '90 days'
GROUP BY o.tenant_id, p.category, c.region, DATE_TRUNC('week', o.order_date)
WITH NO DATA;

Why WITH NO DATA?

It creates the structure without populating it immediately. This gives you control—you can add indexes first, then populate in a maintenance window. It’s a cleaner rollout.

Populate It

REFRESH MATERIALIZED VIEW mv_order_revenue_summary;

First refresh: 4.2 seconds. That’s the cost you’ll pay each time you refresh.

Index It Like a Production Object

Here’s where most performance wins happen. Your materialized view is a table—treat it like one.

-- Filter columns (tenant, date bucket)
CREATE INDEX idx_mv_revenue_tenant_week 
ON mv_order_revenue_summary(tenant_id, week);

-- Common grouping dimensions
CREATE INDEX idx_mv_revenue_category 
ON mv_order_revenue_summary(category);

CREATE INDEX idx_mv_revenue_region 
ON mv_order_revenue_summary(region);

Critical guidance: Index design should match the read patterns of the MV consumers, not the base tables. Ask yourself: how will people query this snapshot?

Now your dashboard query becomes:

SELECT category, region, week, order_count, total_revenue
FROM mv_order_revenue_summary
WHERE tenant_id = 'acme_corp'
  AND week >= CURRENT_DATE - INTERVAL '90 days'
ORDER BY total_revenue DESC;

Execution: 180 milliseconds. Index scan, no joins, no aggregation.

You’ve turned a 28-second computation into a 180ms index lookup.

Refresh Strategies (and How to Choose)

Full Refresh (Simple, Predictable)

REFRESH MATERIALIZED VIEW mv_order_revenue_summary;

This rewrites the entire materialized view. While it’s running, reads are blocked. The view is locked until the refresh completes.

Operational notes:

Schedule off-peak if your refresh takes more than a few seconds
Treat it as a batch job with a runtime budget
Monitor duration as data grows

Concurrent Refresh (Keeps Reads Available)

REFRESH MATERIALIZED VIEW CONCURRENTLY mv_order_revenue_summary;

This builds the new snapshot in the background and swaps it in atomically. Reads stay available throughout.

Requirements:

A unique index that reflects the logical uniqueness of MV rows

For our example:

CREATE UNIQUE INDEX idx_mv_revenue_unique 
ON mv_order_revenue_summary(tenant_id, category, region, week);

Trade-offs (framed constructively):

Higher refresh overhead (PostgreSQL does more work to build and merge)
You need to design for uniqueness and accept slightly longer refresh times

I believe concurrent refresh is worth it when your materialized view serves user-facing queries and a 4-second lock would be visible.

Freshness as a Contract (What Teams That Do Well Always Define)

Define “fresh enough” in business terms:

Every 5 minutes for near-real-time dashboards
Hourly for internal reporting
Daily for executive summaries

Define operational SLOs:

Max refresh runtime: 10 seconds
Acceptable staleness window: up to 1 hour

Make staleness visible. Users should know when the data was last refreshed. We’ll cover how in the observability section.

Scheduling Refreshes (Cron, pg_cron, K8s, Managed Cloud)

One Scheduler Owns Refresh

Avoid multiple sources triggering refresh. Choose one mechanism and stick with it.

Scheduling Options

OS cron + psql:

# /etc/cron.d/refresh-mv
0 * * * * postgres psql -d production -c "REFRESH MATERIALIZED VIEW CONCURRENTLY mv_order_revenue_summary;"

pg_cron (when available):

SELECT cron.schedule('refresh-revenue-mv', '0 * * * *', 
  $$REFRESH MATERIALIZED VIEW CONCURRENTLY mv_order_revenue_summary$$);

Kubernetes CronJob:

apiVersion: batch/v1
kind: CronJob
metadata:
  name: refresh-mv-revenue
spec:
  schedule: "0 * * * *"
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: refresh
            image: postgres:16
            command:
            - psql
            - -c
            - "REFRESH MATERIALIZED VIEW CONCURRENTLY mv_order_revenue_summary;"
          restartPolicy: OnFailure

App-driven refresh (only when you control concurrency and backoff):

Use this sparingly. Application-triggered refreshes can lead to refresh storms if you’re not careful.

Managed Cloud Realities (Practical Notes)

RDS, Azure Database for PostgreSQL, and Cloud SQL have different extension policies. pg_cron might not be available, or you might not have permissions to install it.

In that case, scheduling often moves to:

Cloud Scheduler (GCP)
EventBridge (AWS)
Azure Automation
Kubernetes CronJobs in the same environment

These work fine. The important part is having one source of truth for your refresh schedule.

Prevent Overlapping Refresh Runs

If your refresh takes 6 minutes and you schedule it every 5 minutes, you’ll have multiple refresh jobs competing.

Advisory lock pattern (conceptual):

DO $$
BEGIN
  IF pg_try_advisory_lock(12345) THEN
    REFRESH MATERIALIZED VIEW CONCURRENTLY mv_order_revenue_summary;
    PERFORM pg_advisory_unlock(12345);
  ELSE
    RAISE NOTICE 'Refresh already running, skipping';
  END IF;
END $$;

Refresh starts only if the lock is acquired. This avoids refresh storms during delays.

Observability: Measure Refresh Cost and Staleness

Before/After Proof (What to Show in the Post)

Let’s put real numbers on this:

Baseline query time: 28 seconds
MV query time: 180 milliseconds
Refresh overhead: 4.2 seconds
Refresh cadence: Every hour

You’re doing 4.2 seconds of work every hour to save 28 seconds on each of 40+ queries per day. The math works.

EXPLAIN (ANALYZE, BUFFERS) for baseline:

GroupAggregate  (cost=2847392.18..2847395.32 rows=1 width=89) (actual time=27823.445..27823.451 rows=156 loops=1)
  Buffers: shared hit=9234 read=1847234
  ->  Sort  (cost=2847392.18..2847392.68 rows=200 width=57) (actual time=27811.234..27812.891 rows=8923456 loops=1)

After (querying the MV):

Index Scan using idx_mv_revenue_tenant_week on mv_order_revenue_summary  (cost=0.42..23.18 rows=156 width=89) (actual time=0.034..0.178 rows=156 loops=1)

Buffers: shared hit=12

That’s the difference. You’ve moved the heavy lifting to a scheduled job.

Tracking “Last Refresh Time” (Correct Approach)

Important: pg_matviews does not store last_refresh. That column doesn’t exist.

Practical patterns:

A small mv_refresh_log table updated by your refresh job:

CREATE TABLE mv_refresh_log (
    mv_name TEXT PRIMARY KEY,
    last_refresh_at TIMESTAMPTZ,
    refresh_duration_ms INTEGER
);

DO $$
DECLARE
  start_time TIMESTAMPTZ := clock_timestamp();
  end_time TIMESTAMPTZ;
  duration_ms INTEGER;
BEGIN
  REFRESH MATERIALIZED VIEW CONCURRENTLY mv_order_revenue_summary;
  end_time := clock_timestamp();
  duration_ms := EXTRACT(EPOCH FROM (end_time - start_time)) * 1000;

  INSERT INTO mv_refresh_log (mv_name, last_refresh_at, refresh_duration_ms)
  VALUES ('mv_order_revenue_summary', end_time, duration_ms)
  ON CONFLICT (mv_name) 
  DO UPDATE SET last_refresh_at = EXCLUDED.last_refresh_at,
                refresh_duration_ms = EXCLUDED.refresh_duration_ms;
END $$;

Now you can expose freshness to users:

SELECT 
    mv_name,
    last_refresh_at,
    NOW() - last_refresh_at AS staleness,
    refresh_duration_ms
FROM mv_refresh_log
WHERE mv_name = 'mv_order_revenue_summary';

Scheduler job history (pg_cron job run details, or platform logs)
Logging refresh duration in PostgreSQL logs and aggregating in your observability stack

Watch the Impact on the Rest of the System

Monitor:

Refresh CPU and IO (does it spike? does it compete with writes?)
Temp file usage (large sorts/hashes during refresh can spill to disk)
Replica lag sensitivity (if refresh competes with write workload on the primary, replicas might fall behind)

Materialized view refreshes are queries. They use resources. Plan accordingly.

Performance Tuning That Consistently Pays Off

Indexes on the MV aligned to read patterns: We covered this. It’s the single biggest lever.

Ensure base tables stay healthy:

ANALYZE keeps statistics fresh
VACUUM and autovacuum prevent bloat
Bloated base tables make refresh slower

Reduce refresh work:

Simplify the MV query (do you really need all those joins?)
Pre-filter with partitions where relevant (if your base tables are partitioned by date, your refresh can scan fewer partitions)

Resource guardrails:

Schedule refresh during low-traffic windows
Understand sort and hash behavior in the refresh query (use EXPLAIN to spot large temp writes)

Trade-Offs (Framed as Design Decisions)

Every materialized view comes with trade-offs. Frame them as conscious decisions:

Freshness window: Staleness is a contract. You’re accepting data that’s up to X minutes or hours old. If you need second-by-second accuracy, you will get better outcomes with a different pattern—streaming aggregates, app-level cache with invalidation, or summary tables maintained incrementally.

Refresh cost and operational ownership: Someone has to own the refresh schedule, monitor it, and tune it as data grows.

Storage and backup footprint: You’re duplicating data. Factor this into disk capacity and backup windows.

Maintenance surface area: Another object to index, refresh, monitor, and document.

I believe the trade-offs are worth it when your workload matches the pattern. If it doesn’t, don’t force it.

Common Gotchas and Troubleshooting (Symptom → Cause → Fix)

MV Is Fast at First, Then Slows Down

Symptom: Queries against the MV start fast, then degrade over weeks.

Cause: Data growth + indexes not aligned to access patterns. As the MV grows, full scans become expensive.

Fix: Add indexes for filter columns and common joins. Run EXPLAIN ANALYZE on MV queries to confirm index usage.

Concurrent Refresh Fails

Symptom: REFRESH MATERIALIZED VIEW CONCURRENTLY errors out.

ERROR: cannot refresh materialized view “public.mv_order_revenue_summary” concurrently

HINT: Create a unique index with no WHERE clause on one or more columns of the materialized view.

Cause: Missing or incorrect unique index for logical uniqueness.

Fix: Identify the columns that make each row unique and create a unique index:

CREATE UNIQUE INDEX idx_mv_revenue_unique

ON mv_order_revenue_summary(tenant_id, category, region, week);

Refresh Jobs Overlap

Symptom: Multiple refresh processes running at the same time, competing for resources.

Cause: Scheduler runs without a guardrail. Refresh duration exceeds the schedule interval.

Fix: Implement an advisory lock pattern (shown earlier) or increase the schedule interval.

Refresh Workload Disrupts Primary Workload

Symptom: Writes slow down during refresh. Replica lag spikes.

Cause: Refresh scheduled during peak hours or competing for IO/CPU.

Fix:

Move refresh to off-peak windows
Simplify the refresh query
Consider resource limits (statement timeout, work_mem tuning)

MV Results Surprise People

Symptom: Users report “wrong” data. Confusion about why numbers don’t match real-time queries.

Cause: Freshness contract not explicit and not visible.

Fix: Expose last_refresh_at in your application. Add a timestamp to the dashboard showing when data was last updated. Communicate the refresh schedule clearly.

Alternatives Worth Considering (and When They Win)

Normal views: Use when you have few reads and an always-current requirement. No storage overhead, no refresh jobs, but you pay computation cost on every read.

Summary tables + ETL: Use when incremental updates are feasible and you want full control over the update pipeline. More work to build, but you own the logic and can optimize for incremental changes.

Partitioning + indexing: Use when the real constraint is data layout and pruning. If your query scans too many partitions, materialized views won’t help. Fix the partitioning strategy first.

Cache layer (Redis, Memcached): Use when app-level latency goals dominate and you need sub-millisecond response times. Caches are great for key-value lookups, less so for complex aggregations.

Timescale continuous aggregates: Use when time-series rollups are the core pattern. Continuous aggregates in TimescaleDB handle incremental refresh automatically for time-bucketed data.

Pick the tool that fits the constraint. Materialized views are one option, not the only option.

Conclusion

Materialized views give you predictable reads, controlled computation, and clear freshness. You’re trading real-time accuracy for speed and resource efficiency – on purpose, with a contract.

When the pattern fits, the results are immediate: dashboard queries drop from 28 seconds to 180 milliseconds, users stop complaining, and your database does less work.

The operational commitment is real – you own the refresh schedule, the indexes, and the monitoring. But the alternative is running the same expensive query over and over, hoping the database can keep up.

The post PostgreSQL Materialized Views: When Caching Your Query Results Makes Sense (And When It Doesn’t) appeared first on Stormatics.

Jobin Augustine: Importance of Tuning Checkpoint in PostgreSQL

Mon, 02 Feb 2026 15:04:30 +0000

The topic of checkpoint tuning is frequently discussed in many blogs. However, I keep coming across cases where it is kept untuned, resulting in huge wastage of server resources, struggling with poor performance and other issues. So it’s time to reiterate the importance again with more details, especially for new users. What is a checkpoint? […]

Akhil Reddy Banappagari: Null and Empty String in Oracle vs SQL Server vs PostgreSQL

Mon, 02 Feb 2026 11:46:07 +0000

When you are planning database migrations to PostgreSQL, it is usually the small things that cause the biggest production bugs. One of the most common traps for developers is how different databases handle NULL and empty strings ('').

Cornelia Biacsics: Contributions for week 1-4, 2026

Mon, 02 Feb 2026 09:39:37 +0000

The Nordic PGDay 2026 Call for Paper Committee met to finalize the talk selection:

Georgios Kokolatos
Louise Leinweber
Liisa Hämäläinen
Thea Stark

PGDay Paris 2026 schedule has also been announced — talk selection was made by:

Pavlo Golub
Sarah Conway
Valeria Kaplan

On Monday, January 26, the Prague PostgreSQL Meetup: January Edition met. It was organized by Mayuresh B. Gulcin Yildirim Jelinek.

Speaker:

Teresa Lopes
Josef Machytka
Luigi Nardi

Prague PostgreSQL Dev Day 2026 (P2D2) happened from January 27 - January 28.

Organized by:

Pavel Hák
Matěj Klonfar
Jan Pěček
Ellyne Phneah
Pavel Stěhule
Tomáš Vondra
Aleš Zelený

Talk selection committee:

Pavlo Golub
Pavel Hák
Hana Litavská
Teresa Lopes
Mayur B
Esther Miñano
Josef Šimánek
Pavel Stěhule
Tomáš Vondra

Workshops:

Hettie Dombrovskaya
Jonathan, Danish
Tomas Vondra
Nazir Bilal Yavuz
Josef Machytka
Pavlo Golub

Talks:

Cagri Biroglu
Jakub Kuzela
Alexander Kukushkin
Kranthi Kiran Burada
Narendra Tawar
Bruce Momjian
Anton Borisov
Mohsin Ejaz
Petr Šmejkal
Michal Bartak
Gulcin Yildirim Jelinek
Teresa Lopes
Ants Aasma
Robert Treat
Luigi Nardi
Peter Zaitsev
Grant Fritchey
Radim Marek
Adam Wolk

Lightning talks:

Mayuresh B.
Michal Bartak
Sergey Chehuta
Alijaz Mur Erzen
Josef Machytka
Henrietta Dombrovskaya
Ants Aasma
Jonathan Battiato
Ellyne Phneah
Mohsin Ejaz
Anton Borisov
Luigi Nardi

PostgreSQL was represented at FOSDEM’26 which took place from January 31 - February 1

Databases Devroom Speakers:

Rohit Nayak
Shlomi Noach
Ben Dicken
Pep Pla
Jimmy Angelakos
Daniël van Eeden
Nicoleta Lazar
Charly Batista
Greg Potter
Kevin Biju
Georgi Kodinov
Sunny Bains

Databases Devroom Volunteers:

Raymond Paik
Frédéric Descamps
Rohit Nayak
Alastair Turner
Floor Drees
Hari Krishna Sunder
Matt Lord
Anna Widenius
Edith Puclla
Matthias Crauwels
Mattias Jonsson
Peter Eisentraut

On January 30, PostgreSQL developer met at the FOSDEM PGDay 2026 Developer Meeting. It was organized by Peter Eisentraut.

FOSDEM PGDay 2026 happened on January 31.

Organized by:

Andreas Scherbaum
Celeste Horgan
Dave Page
Ilya Kosmodemiansky
Karen Jex
Magnus Hagander
Stefan Fercot
Teresa Lopes

Speaker:

Daniel Gustafsson
Alexander Sosna
Tomas Vondra
Stefan Fercot
Jonathan Battiato
Pierre Ducroquet
Claire Giordano
Amit Langote
Thomas Munro
Derk van Veen
Bruce Momjian
Matt Cornillon

Lightning Talks:

Jimmy Angelakos
Floor Drees with Jonathan Gonzalez
Devrim Gunduz
Mayuresh B.

At FOSDEM PGDay 2026 & FOSDEM’26 the following volunteers helped with several tasks (incl. the PostgreSQL Booth):

Chris Ellis
Dave Pitts
Devrim Gündüz
Ilaria Battiston
Jan Wieremjewicz
Jimmy Angelakos
Joris Pelgröm
Kai Wagner
Matt Cornillon
Michael Banck
Pavlo Golub
Pep Pla
Sebastiaan Mannem
Tobias Bussmann
Boriss Mejias
Yoann La Cancellera
plus more as during the day, more PostgreSQL community members showed up to help (without signing up before)

A new podcast episode “How I got started with DBtune (& why we chose Postgres) with Luigi Nardi “ was published on January 16 2026 by Claire Giordano - from her series “Talking Postgres”.

On January 14 there was another NYC PostgreSQL meetup, with Chelsea Dole speaking. Organizers were Jonathan Hinds (from sponsor Datadog), Mason Sharp, Mila Zhou, and Justin Iso took photos. The event had over 90 attendees. The organizers posted some highlights on LinkedIn.

Community Blog posts:

PostgreSQL Meetup in Frankfurt December 2025 by Andreas Scherbaum
I Can’t Close 2025 Without Saying Thank You to the People Behind Postgres by Haider Z
In Defense of Beautiful Uselessness by Ellyne Phneah

Luca Ferrari: pgagroal 2.0.0 is available!

Mon, 02 Feb 2026 00:00:00 +0000

The new major release of the fast connection pooler for PostgreSQL has been released!

pgagroal 2.0.0 is available!

It took quite a lot of time to get from version 1.6.0 to the new major version 2.0.0, but the new pgagroal is finally here! The project went thru two Google Summer of Code (GSoC 2025 and 2024) before this new great release was completed, but the project decided to prefer the code stability over the rush in releasing, and I think you are going to be amazed by how much improvements have been collapsed in this new version.

The official release note has been sent today.

The new features

There are a lot of new features and small gems in this 2.0.0 release, it is pretty much impossible to describe all of them here, but here it is a concise list of what you can expect from this new version.

The new event system

pgagroal has been event-driven from the very beginning, using the libev library for handling input/output in a more fast and scalable way.

The project decided to move from libev from something more modern and better mantained, and the natural choice for Linux operating systems was io_uring (yes, the same used in PostgreSQL 18) and kqueue for BSD systems. io_uring is an async method for read and write operations that aims at peformances, while kqueue is an event driven approach for FreeBSD and OpenBSD systems.

The key point here is event-driven and, obviously, asynchronous. The whole event management has been rewritten to wrap compatible structures and functions wherever possible. Performances have increased a lot from the 1.x releases.

The new management protocol

The management protocol is the way pgagroal-cli and pgagroal interact each other: it sends commands to the daemon and get back responses.

The new release provides a new fully rewritten management protocol that now speaks entirely JSON. Moreover, the protocol is now more robust and error tolerant. Moreover, every command and response now include the application and daemon version, so that it is possible to get detail about the endpoints.

The immediate effect of this change is that you are going to see a very different text output from each command.

Improved metrics

pgagroal provides Prometheus-ish metrics, and this release includes new metrics that can help monitoring better your pooler instance. Moreover, it is now possible to manage TLS for Prometheus.

Improved Grafana Dashboards

There is a new set of dashboards that are Grafana 12 compatible and that show a lot more gauges than before. This helps in monitoring the pgagroal instance.

The Vault

There is now a vault that can manage frontend users’ passwords, changing them (rotating) depending on the configuration. The vault exposes metrics too, that can be consumed by Prometheus.

UTF-8 passwords

There is now full support for UTF-8 passwords, reducing the gap between the split security of pgagroal to PostgreSQL.

Improved testing environment

A lot of work has been done on the testing and Continuous Integration (CI) side, so that now every contribution is strongly validated.

The project also changed the source code formatting tool passing from uncrustify to the modern clang-format.

Documentation

There is now a complete manual, that is built from Markdown documents automatically. This substitutes the old tutorials and provide a mucch complete and mature documentation for the usage and configuration of pgagroal.

Bug Fixing

A lot of bug have been fixed and discovered while working on all the above features. Thanks to the new testing infrastructure and containerization, it has been simpler to discover problems and fix them. We hope to be even better in the future with regard to this context.

A new website

This release also brings a new new web site and the migration of the Github repository.

How much?

There have been 221 commits that lead to this new release. It may sound not too much, given the long list of new features and gems, but please consider that the project does one commit per feature almost:

% git log 1.6.0..2.0.0 --oneline | wc -l
221

One important thing to note

If you ask me what is, according to me, the most important change that pgagroal has done during its trip to version 2, I quickly answer the growth of the community:

% git checkout 1.6.0 && wc -l AUTHORS && git checkout 2.0.0 && wc -l AUTHORS
10 AUTHORS
29 AUTHORS

As you can see, we increased the number of authors of 200%! Not bad, uh? Clearly, not all the authors are active today, as were not in the past, but the increase in size means that pgagroal caught interest and value.

Quite frankly, Jesper did and is doing a great job in making contributions pushes and pulls from the so called sister projects like pgmoneta and pgexported and this also helps in getting attention to the project, as well as increases the capability of code reviews and quality.

Conclusions

We are really happy with this new release of pgagroal; the pooler is getting more and more attention every day and the code quality and stability has improved a lot. We all learned during this process and are contributing well to the codebase, that is become every day more and more enterprise-level.

You will be amazed by the features the pooler now provides, and the peformances are getting better than ever!

Please stat by visiting the new pgagroal web site and reading the documentation and fire up your own pooler instance!

Kai Wagner: Hackorum - A Forum-Style View of pg-hackers

Mon, 02 Feb 2026 00:00:00 +0000

Last year at pgconf.dev, there was a discussion about improving the user interface for the PostgreSQL hackers mailing list, which is the main communication channel for PostgreSQL core development. Based on that discussion, I want to share a small project we have been working on:

https://hackorum.dev/

Hackorum provides a read-only (for now) web view of the mailing list with a more forum-like presentation. It is a work-in-progress proof of concept, and we are primarily looking for feedback on whether this approach is useful and what we should improve next.

Lætitia AVROT: FOSDEM 2026: €400 Repetto Heels, Recursive CTEs, and Europe's Tech Sovereignty Wake-Up Call

Sun, 01 Feb 2026 00:00:00 +0000

The Honor Part (That I’m Still Processing) 🔗Let me start with the big one: I was invited to the European Open Source Awards ceremony on Thursday evening. Not “bought a ticket” invited—actually invited. If you know the European Union, you know this is invitation-only, and honestly, I’m still a bit stunned. So, picture me, finally seated in my seat on the train, ready to go and checklisting everything. I have the dress, the belt, the purse, the jewels, even the hair accessories.

Deepak Mahto: Same SQL, Different Results: A Subtle Oracle vs PostgreSQL Migration Bug

Fri, 30 Jan 2026 14:52:35 +0000

Read time: ~6 minutes

A real-world deep dive into operator precedence, implicit casting, and why database engines “don’t think the same way”.

The Database Migration Mystery That Started It All

You migrate a perfectly stable Oracle application to PostgreSQL.

The SQL runs
The tests pass
The syntax looks correct
Nothing crashes

And yet… the numbers or query calculations are wrong.

Not obviously wrong. Not broken. Just different.
Those are the worst bugs the ones that quietly ship to production. This is a story about one such bug, hiding behind familiar operators, clean-looking conversions, and false confidence.

The Original Business Logic (Oracle)

Here’s a simplified piece of real production logic used to compute a varhour value from timestamp data:

			
CASE 
    WHEN TO_CHAR(varmonth,'MI') + 1 = 60 
    THEN varhr - 1 || TO_CHAR(varmonth,'MI') + 1 + 40 
    ELSE varhr - 1 || TO_CHAR(varmonth,'MI') + 1 
END AS varhour

		

At first glance, this feels routine:

Extract minutes
Perform arithmetic
Concatenate values

Nothing here screams “migration risk”.

The Migration Illusion: “Looks Correct, Right?”

During migration, teams don’t blindly copy Oracle SQL. They do the right thing make types explicit and clean up the logic.

Here’s the PostgreSQL converted version, already “fixed” with necessary casts:

			
SELECT
CASE WHEN TO_CHAR(varmonth, 'MI') :: integer + 1 = 60 
  THEN 
  (end_hr -1) :: text || TO_CHAR(varmonth, 'MI')::integer + 1 + 40
  ELSE 
  (end_hr -1)::text || TO_CHAR(varmonth, 'MI') ::integer + 1
  END varhour
FROM sample_loads
ORDER BY id;

		

No syntax errors. Explicit casting. Clean and readable. At this point, most migrations move on.

Side-by-Side: Oracle vs PostgreSQL (At First Glance)

Let’s compare the two versions:

Aspect	Oracle	PostgreSQL
Concatenation operator	\|\|	\|\|
Arithmetic operators	`+`, `-`	`+`, `-`
Minute extraction	`TO_CHAR(varmonth,'MI')`	`TO_CHAR(varmonth,'MI')::integer`
Explicit casting	Implicit	Explicit
Query runs successfully
Logic looks identical

Everything appears aligned.
Same operators. Same order. Same intent. So naturally, we expect the same result.

Let’s test with a real value:

			
end_hr  = 15
minutes = 59

Output:

Database	varhour
Oracle	`1500`
PostgreSQL	`14100`

Same logic. Same data. Different result. Now the real question appears:

How can two “explicit” queries still behave differently?

What Your Brain Thinks Is Happening

When most of us read this expression:

(end_hr - 1)::text || TO_CHAR(varmonth,'MI')::integer + 1 + 40

Our brain assumes:

Arithmetic happens first (+, -)
Concatenation happens last (||)

That assumption is correct in PostgreSQL. It is not correct in Oracle.

Oracle’s Behavior: “Let Me Help You”

Oracle aggressively applies implicit type conversion. Internally, Oracle rewrites the expression to something closer to:

			
TO_NUMBER
  (
   TO_CHAR(varhr - 1) || TO_CHAR(loadmonth,'MI')
  ) + 1 + 40

Concatenation happens before arithmetic.

Step by step:

varhr - 1 → 14
TO_CHAR(14) → '14'
TO_CHAR(varmonth,'MI') → '59'
'14' || '59' → '1459'
TO_NUMBER('1459') → 1459
1459 + 1 + 40 → 1500

Oracle silently guessed your intent.

PostgreSQL’s Behavior: “Be Explicit”

PostgreSQL does no guessing. It follows strict operator precedence:

TO_CHAR(loadmonth,'MI')::integer → 59
59 + 1 + 40 → 100
(end_hr - 1)::text → '14'
'14' || '100' → 14100

Different grouping. Different result. No error.

Proof: Oracle’s Execution Plan

Oracle doesn’t hide this, it just doesn’t advertise it.

			
EXPLAIN PLAN FOR
SELECT CASE 
        WHEN TO_CHAR(varmonth,'MI')+1=60 
        THEN varhr-1||TO_CHAR(varmonth,'MI')+1+40 
        ELSE varhr-1||TO_CHAR(varmonth,'MI')+1 
    END
FROM sample_loads;
SELECT *
FROM TABLE(DBMS_XPLAN.DISPLAY(NULL, NULL, 'ALL'));

		

The projection shows:

			
TO_NUMBER(
    TO_CHAR("VARHR"-1)||TO_CHAR(INTERNAL_FUNCTION("VARMONTH"),'MI')
    )

That TO_NUMBER() wrapping the concatenation is the smoking gun.

Why This Bug Is So Hard to Catch

It never throws an error
The SQL looks correct
Early test data rarely hits edge cases
Automated migration tools miss it
The behavior difference is undocumented in most migration guides

This is not a syntax problem. It’s a behavioral difference.

The Real Issue Isn’t concat operator(||) or implicit casting

This comes down to philosophy:

Aspect	Oracle	PostgreSQL
Type handling	Implicit type coercion	Explicit casting
Operator behavior	Flexible, context-driven	Strict and deterministic
Operator precedence	May group expressions implicitly	Fixed, well-defined precedence
Developer experience	Convenience-oriented	Precision-oriented
Error tolerance	Tries to “make it work”	Forces you to be explicit
Core philosophy	“Make it work”	“Say what you mean”

Neither is wrong. But assuming they behave the same is dangerous.

The Fix: Make Intent Explicit

			
SELECT
CASE WHEN TO_CHAR(varmonth, 'MI') :: integer + 1 = 60 
  THEN ((end_hr -1) :: text || TO_CHAR(varmonth, 'MI'))::integer + 1 + 40
ELSE ((end_hr -1)::text || TO_CHAR(varmonth, 'MI')) ::integer + 1 
  END varhour
FROM sample_loads
ORDER BY id;

		

Output:

This version:

Produces identical results
Documents intent
Survives migrations
Prevents silent data corruption

Real-World Impact

I’ve seen this exact pattern cause:

Financial miscalculations
Audit timestamp mismatches
Reconciliation failures weeks after go-live
“The numbers don’t add up” production escalations

The worst part? These bugs surface after trust is already established.

Key Takeaways

Execution plans reveal truth, not source code
|| mixed with + is a migration red flag
Explicit casting doesn’t guarantee identical behavior
Migration is about semantics, not syntax

The Bottom Line

Database migration isn’t translation. It’s interpretation.

When Oracle silently rewrites logic and PostgreSQL refuses to guess, you must be explicit. And once you start writing SQL that works the same everywhere, you don’t just migrate safely you migrate confidently.

Try it Yourself

			
-- Oracle
DROP TABLE sample_loads;
CREATE TABLE sample_loads (
    id INTEGER,
    varmonth TIMESTAMP,
    varhr INTEGER
);
INSERT INTO sample_loads VALUES (1, TIMESTAMP '2024-01-16 23:59:59', 15);
INSERT INTO sample_loads VALUES (2, TIMESTAMP '2024-01-15 23:59:59', 24);
SELECT varhr, 
       TO_CHAR(varmonth,'MI') as minutes,
       varhr-1||TO_CHAR(varmonth,'MI')+1+40 as loadhour
FROM sample_loads;
-- Check the execution plan
EXPLAIN PLAN FOR
SELECT varhr-1||TO_CHAR(varmonth,'MI')+1+40 FROM sample_loads;
SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY(NULL, NULL, 'ALL'));

		

Jimmy Angelakos: Panel Discussion: How to Work with Other Postgres People — PGConf.EU 2025

Thu, 29 Jan 2026 13:37:00 +0000

I have to apologise — it's been months since PGConf.EU 2025 in Riga, and I'm only now publishing this video. The delay was due to wanting to create accurate captions for the recording, which unfortunately took longer than expected.

In this session, Floor Drees, Karen Jex, and I joined host Boriss Mejias to examine how diverse minds work together in the PostgreSQL ecosystem. We touched upon the psychology of teamwork and the importance of accommodating neurodiverse conditions like ADHD and ASD.

A pleasant surprise for us during the session was the level of engagement from the audience. People connected deeply with the subject matter, turning the panel talk into a real conversation where we shared practical hacks — body doubling, "Pomodoro playlists", tactile focus tools like knitting, crocheting, and full-body fidget toys, and experiences with managers who actually "get it".

Building awareness is the first step on a journey that can lead to better outcomes for everyone. We do believe some things need to be adapted, and we can work together to make this gradual change happen.

Without further ado, I present the panel discussion below. I will be very happy to hear back from you at @vyruss@fosstodon.org — your comments, your experiences, your testimonials. This is how we continue to raise awareness together.

Video on YouTube: youtube.com/watch?v=PsxNhcBTrTU

warda bibi: Unlocking High-Performance PostgreSQL: Key Memory Optimizations

Thu, 29 Jan 2026 07:30:12 +0000

PostgreSQL can scale extremely well in production, but many deployments run on conservative defaults that are safe yet far from optimal. The crux of performance optimization is to understand what each setting really controls, how settings interact under concurrency, and how to verify impact with real metrics.

This guide walks through the two most important memory parameters:

shared_buffers
work_mem

shared_buffers

Let’s start with shared_buffers, because this is one of the most important concepts in PostgreSQL. When a client connects to PostgreSQL and asks for data, PostgreSQL does not read directly from disk and stream it back to the client. Instead, PostgreSQL does something that pulls the required data page into shared memory first and then serves it from there. The same design applies to writes. When the client updates a row, PostgreSQL does not immediately write that change to disk. It loads the page into memory, updates it in RAM, and marks that page as dirty. Disk writes come later.

And this design is intentional because reading and writing in memory are orders of magnitude faster than reading from or writing to disk, and it dramatically reduces random I/O overhead.

So what exactly is shared_buffers?

shared_buffers defines the size of the shared memory region that PostgreSQL uses as its internal buffer cache. And all the reads and writes go through shared_buffers. Disk interaction happens later asynchronously through background writing and checkpoints. So shared_buffers is the layer between the database processes and the disk.

By default, PostgreSQL sets shared_buffers to 128MB. That might be fine for local environments; however, it is not enough cache for real working sets, which means more disk reads, more I/O pressure, and less stable latency.

How do we size shared_buffers?

A common starting rule of thumb is:

If the server has more than 1GB RAM, start with 20–25% of total RAM on a dedicated PostgreSQL server and increase gradually if needed. Values above ~40% usually stop helping much.

There’s a reason we don’t just set it to ‘as high as possible’. If you give PostgreSQL too much buffer cache, you can start competing with the OS page cache, and you can also increase the volume of dirty data that must be flushed during checkpoints, which can increase checkpoint pressure and write spikes.

One more important thing to remember is that shared_buffers is a postmaster-level parameter. That means PostgreSQL allocates it at startup, and changing it requires a server restart.

How do I know if my current value is good?

As database engineers, our job is to size shared_buffers correctly:

large enough to reduce disk reads
but not so large that it harms the OS cache or makes checkpoints heavier

Step 1: Look at the cache hit ratio

One simple way is to look at the cache hit ratio using pg_stat_database.

SELECT
    sum(blks_hit) / nullif(sum(blks_hit + blks_read), 0) AS cache_hit_ratio
FROM
    Pg_stat_database;

If the cache hit ratio is close to 1, it means most reads are being served from memory, and this is generally what we want. If it’s low, it means PostgreSQL is doing more physical reads from disk, and that’s a signal to investigate.

Step 2: Verify it at the query level

To see whether a specific query is using cache, run:

EXPLAIN (ANALYZE, BUFFERS) 
SELECT …

In the output, look for :

buffer hits – served from shared_buffers
buffer reads – pulled from disk

If you run the same query again, most of the time, the second run shows far more hits because now the pages are already in shared_buffers.

Important Note

In large production workloads, not everything can or should fit in memory. So you will see disk reads, and that’s normal. The goal isn’t that everything must be a cache hit. The goal is :

Disk I/O shouldn’t be your bottleneck, and
Reads and writes should be smooth and
Latency shouldn’t spike because the cache is too small or mis-sized

If you want deeper visibility into what is currently stored in shared_buffers or which tables are occupying memory, PostgreSQL gives you tools for that. Extensions like:

pg_buffercache
pginspect

let you inspect shared buffers directly and understand memory usage patterns.

work_mem

After shared_buffers, the next memory parameter we need to focus on is work_mem.

And this is probably the most dangerous memory setting in PostgreSQL if you don’t fully understand how it works – not because it’s bad, but because it multiplies quietly. Many production outages caused by out-of-memory errors can be traced back to a misunderstanding of work_mem.

work_mem defines the limit or the maximum amount of memory allocated for executing operations such as:

Sorting, when performing operations like ORDER BY, DISTINCT, and GROUP BY.
JOINs usage (with hashing to build in-memory hash-tables, for example, for the hash join).
Set operations like UNION, INTERSECT, and EXCEPT.
Creating the bitmap arrays for the bitmap scan method

This parameter affects the efficiency of query execution and the overall performance of the database. It’s important to note that work_mem is allocated for each operation, not per the PostgreSQL session. This is a crucial detail, as a single SQL query can perform multiple sorting or join operations, each of which will consume its own area of memory. And some of these can be paralleized by PostgreSQL, and when that happens, each parallel worker uses up to work_mem per operation. If an operation runs sequentially, it can use up to work_mem. But if the same operation runs under a Gather node with, say, five parallel workers, then that single operation can consume:

5 × work_mem

This is exactly how databases run out of memory, even when the application hasn’t changed, because work_mem multiplies across:

Parallel workers
Multiple memory-intensive operations in a query
Concurrent queries running at the same time

This is why the most important thing to remember is that work_mem is per operation, and it can be used multiple times inside a single query, across many concurrent queries.

How do we tune work_mem?

By default, PostgreSQL sets work_mem to 4MB. For many simple OLTP workloads with high concurrency, this is actually fine. But for analytical or reporting queries, 4MB is often too small.

If the work_mem is too small, PostgreSQL starts spilling to disk, and you’ll typically see:

Temporary files are being created
Sorts switching to disk-based algorithms
Increased disk I/O and latency spikes

If the work_mem is too large, it will cause memory pressure or worst OOM kills.

We can measure if work_mem needs tuning using:

EXPLAIN (ANALYZE, BUFFERS)
SELECT …

If you look for:

Sort Method: external merge
temp file usage
temp reads and writes
disk usage reported in the plan

These signals describe exactly which operations are memory-bound, and those are the places worth tuning. Good thing about work_mem is that it’s is that it is not a postmaster-level parameter and you can tune it:

per session
per role
per transaction

For systems with less than 64 GB of RAM, you can start with:

work_mem = 0.25% of total system RAM

On smaller systems, this translates to ~3 MB per GB of RAM. This is because on smaller machines, concurrency and parallelism are usually limited, so this sizing is aggressive enough to reduce unnecessary disk spills without creating memory risk. On large machines, however, scaling work_mem linearly with RAM becomes dangerous. Parallel queries, many concurrent sessions, and multiple operations can cause memory usage to grow exponentially. So for larger systems(>64GB), we can switch to a more conservative formula:

work_mem = max(162MB, 0.125% of RAM + 80MB)

This approach does two important things:

It still allows work_mem to grow with system size
But it slows down the growth rate as RAM increases. In other words, it avoids giving every query an unnecessarily large memory area just because the machine is big.

Conclusion

Start with conservative, safe defaults. Measure behavior using real metrics like EXPLAIN (ANALYZE, BUFFERS) and system statistics. Tune selectively, especially for high-impact queries, instead of applying aggressive global changes.

The post Unlocking High-Performance PostgreSQL: Key Memory Optimizations appeared first on Stormatics.

Antony Pegg: How to Use the pgEdge MCP Server for PostgreSQL with Claude Cowork

Thu, 29 Jan 2026 05:18:20 +0000

The rise of agentic AI is transforming how we build applications, and databases are at the center of this transformation. As AI agents become more sophisticated, they need reliable, real-time access to data.If you’ve decided to use an MCP server for exposing data to large language models (LLMs) to build internal tools for trusted users, apply sophisticated database schema changes, or translate natural language into SQL, you might find the pgedge-postgres-mcp project (available on GitHub) useful to try.This 100% open source Natural Language Agent for PostgreSQL provides a connection between any MCP-compatible client (including AI assistants like Claude) and any standard flavor of Postgres, whether you’re creating a new greenfield project or are using an existing database.

Connecting AI agents to PostgreSQL with pgedge-postgres-mcp

The Model Context Protocol (MCP) is a standardized way for AI assistants to communicate with external data sources. Think of MCP as a universal adapter; just as USB-C provides a standard connection for devices, MCP provides a standard way for AI agents to connect to tools, databases, and services.pgedge-postgres-mcp implements this protocol specifically for PostgreSQL, creating a bridge between AI assistants and your data. It enables users to:

Query databases using natural language

Execute SQL queries safely in read-only transactions

Access local or distributed PostgreSQL instances

Work with production data safely (read-only by default)

Interact with database schemas and metadata

Instead of writing custom integration code for each AI application, you get a ready-to-use connection between AI agents and your PostgreSQL database. It works with any PostgreSQL instance, whether you're running locally for development or hosting remotely.

Benefits for AI Development

The pgEdge MCP Server solves a specific problem: giving AI assistants database access without building custom middleware, with configurable controls for authentication, read-only transactions, and per-database permissions.For developers, it means:

No custom database connectors to write and maintain

Faster prototyping of AI-powered applications

Natural language database operations through Claude

The ability to test and iterate on AI workflows without writing integration code

For teams running production systems:

Read-only transaction enforcement for safety

Works with both single-node and distributed PostgreSQL

Open source with production-ready performance

Support for complex queries with semantic search capabilities

The pgEdge MCP Server also provides flexibility. You can start with quick prototypes or enterprise-grade production AI applications.

Claude vs. Claude Cowork: What's the Difference?

Claude is Anthropic's AI assistant available through claude.ai and API. It's conversational and helps with analysis, writing, code generation, and problem-solving. When you chat with Claude in your browser or integrate it into applications through the API, you're using the core AI model.Claude Cowork is a research preview feature within the Claude Desktop application that brings agentic capabilities to Claude. Unlike standard chat interactions where Claude responds to one message at a time, Cowork enables Claude to take on complex, multi-step tasks and execute them autonomously on your behalf.The key differences:

Agentic execution
: Cowork can break complex projects into parallel workstreams and coordinate sub-agents to complete them

Extended processing
: Work on complex tasks for extended periods without conversation timeouts or context limits

Direct file access
: Read and write files on your local system without manual uploads and downloads

Professional outputs
: Generate Excel files with formulas, PowerPoint presentations, formatted documents, and more

When you connect the pgEdge MCP Server to Claude Cowork, the AI can query your database, analyze the results, and produce complete deliverables—all autonomously. For example, you could ask Cowork to "analyze our customer data and create a quarterly report with charts," then return later to find a finished document ready for review.

Getting Started: The Simplest Example

If you have Go installed and a PostgreSQL database accessible from your machine, the fastest way to get started is to build the MCP server from source and connect it directly to Claude Desktop using stdio.

Prerequisites

Claude Desktop application installed. The MCP Server works with Claude Desktop, even without CoWork

Claude Pro, Max, Team, or Enterprise subscription with access to Cowork

Go 1.24.0 or later installed

PostgreSQL running locally or on your network

Step 1: Build the MCP Server

Clone the repository and build the server binary:This produces the binary at

Step 2: Configure Claude Desktop

Edit the Claude Desktop configuration file at Replace the path and database credentials with your own. Restart Claude Desktop, and the pgEdge MCP Server will appear as an available tool.This approach runs the MCP server as a subprocess of Claude Desktop, communicating over stdio. There is no HTTP server, no authentication tokens, and no Node.js dependency. The trade-off is that the database must be directly reachable from the machine running Claude Desktop.

Getting Started: The Deep Dive

The stdio approach is ideal for quick experimentation, but for team environments, remote access, or production deployments, you will want to run the MCP server as a network service. The following sections cover deploying the server in HTTP mode—either as a standalone binary or as a Docker container—and configuring authentication, multiple databases, and Claude Desktop connectivity.

Step 1: Install the pgEdge MCP Server

The pgEdge MCP Server can be deployed as a standalone binary managed by systemd, or as a Docker container. Choose the option that best fits your infrastructure.Option A: Download the BinaryDownload the pre-built binary for your platform from the pgEdge Enterprise repository . This option works well with systemd for service management.You will need to configure authentication tokens for Claude Desktop to connect. See Step 2 for configuration details.Option B: Docker ComposeFor a containerized deployment, use the Docker Compose setup from the GitHub repository:Edit the file with your configuration. At minimum, you need to set:The setting creates an API token that you will use to connect Claude Desktop to the MCP server. Choose a secure token value and save it—you will need it in Step 3.For local development only, you can disable authentication by not setting , but this is not recommended for production or network-accessible deployments.After configuring your file, start the containers:This deploys both the MCP server (port 8080) and a web interface (port 8081). The web interface at is useful for testing your connection and exploring the server's capabilities.The approach is convenient for development, but for production deployments you will want to use a YAML configuration file instead. The repository includes a that mounts a local directory for the server configuration file, giving you full control over all server options. If you use the production compose file, read Step 2 below for guidance on the YAML configuration format and token creation. For details, see the Docker deployment guide .

Step 2: Configure the MCP Server (Binary Deployments)

If you are using the standard with an file, you can skip this step—configuration is handled through environment variables.For binary deployments, the MCP server reads its configuration from a YAML file named . By default, the server searches for this file in first, then in the same directory as the binary. You can specify a different location using the flag:Basic Configuration for HTTP ModeIf you are deploying the binary as a network service (e.g., with systemd), create a configuration file with HTTP mode enabled:Creating Authentication TokensFor HTTP mode deployments, you need to create authentication tokens before clients can connect. Use the built-in token management command:This generates a new token and displays it. Save this token—you will need it to configure Claude Desktop in Step 3.You can also list existing tokens:For local development, you can disable authentication by setting in your configuration file, or by passing the flag when starting the server. This is not recommended for production or network-accessible deployments.Configuring Multiple DatabasesIn beta 3 or later the MCP server supports multiple database connections with agents like Claude, allowing users to switch between different databases at runtime. This is useful for environments with separate development, staging, and production databases:Each database must have a unique that users reference when switching connections. The field controls which users can access each database—an empty list means all authenticated users have access.The and options (available from v1.0.0-beta3) allow agents such as Claude to select a different database connection using MCP Tools (unlike the pgEdge Natural Language Agents which use more tightly controlled REST APIs).For a complete configuration reference with all available options, see the server configuration documentation.

Step 3: Connect Claude Desktop to the MCP Server

How you connect Claude Desktop to the MCP server depends on your deployment method.Claude Desktop Configuration FileClaude Desktop reads MCP server configurations from a JSON file. On macOS, this file is located at:If this file does not exist, create it with a text editor.For HTTP Mode (Docker or Binary with systemd)When the MCP server is running in HTTP mode, you need to use the proxy to connect. This requires passing the authentication token you configured in Step 1 or Step 2:Replace with the token you configured in your file (for Docker) or created with (for binary deployments).Replace with your server's address if it is running on a different host or port. The path is required.Note: This method requires Node.js to be installed on your system. The command (included with Node.js) automatically downloads and runs the package. If you don't have Node.js installed, you can install it on macOS with Homebrew:Alternatively, download it from nodejs.org .After saving the file, restart Claude Desktop for the changes to take effect.For Remote HTTPS ServersWhen the MCP server is deployed remotely with HTTPS enabled, you can use Claude Desktop's built-in connector feature instead of editing the configuration file:

Open Claude Desktop and go to
Settings > Connectors

Click
Add custom connector

Enter your MCP server URL (e.g.,
https://mcp.example.com:8080
)

Click
Add

This method requires HTTPS—Claude Desktop's custom connector feature does not support plain HTTP URLs.

Step 4: Query Your Database

Now you can ask Claude Cowork to work with your database using plain English:You: "Show me the top 10 customers by revenue."Claude Cowork: Connects to the database and executes a queryReturns results in a formatted tableThe AI understands your request, translates it into appropriate SQL (inferring the semantics of the database from the schema object names it finds), executes the query in a read-only transaction, and presents the results in a readable format.

Step 5: Combine Database Queries with File Operations

The real power of Cowork emerges when you combine database queries with its file system capabilities:

"Pull last month's sales data and create an Excel spreadsheet with charts showing trends by region"

"Analyze customer purchase patterns and generate a PowerPoint presentation with key insights"

"Query the orders table for delayed shipments and create a formatted report I can send to the logistics team"

"Compare this quarter's revenue to last quarter and save a summary document to my Desktop"

Because Claude Cowork can interact with both your database and your file system, it can turn database insights into actionable outputs—reports, charts, presentations, or spreadsheets—all without manual intervention. You can describe the outcome you want, let Cowork run, and return to find completed deliverables.

Step 6: Explore Your Database Structure

You can also ask Claude Cowork to help you understand your databases:You: "What tables are in the database and what do they contain?"You: "Show me the schema for the orders table."You: "What columns have foreign key relationships?"This is helpful when working with unfamiliar databases or when you need to understand how data is structured before writing more complex queries.

Common Use Cases

Here are some practical ways to use the pgEdge MCP Server with Claude Cowork.Ad-hoc reporting: "Pull all orders from Q4 where the shipping cost exceeded 10% of the order value, and put it in a spreadsheet with a summary tab." Instead of writing the query, exporting to CSV, and formatting in Excel, you describe the output and get a finished file.Investigating issues: A customer reports a billing discrepancy. Ask Cowork to trace their order history, compare invoice amounts to order totals, and flag any mismatches. You get a summary of what it found rather than spending an hour joining tables.Understanding unfamiliar databases: You inherit a database with 200 tables and no documentation. Ask Cowork to explore the schema, identify the core entities, and explain how they relate. It's faster than reading through pg_catalog yourself.Data quality checks: "Are there any orders with a ship date before the order date? Any customers with duplicate email addresses?" Run sanity checks in plain English and get results you can act on.'Building queries: When you know what you want but not the exact SQL, describe it and let Cowork write the query. Review what it produces, learn the schema as you go, and iterate until you have what you need.

Why This Matters

The pgEdge MCP Server combined with Claude Cowork changes how you interact with databases. Instead of SQL being a barrier between you and your data, an AI agent becomes an intelligent intermediary that understands both what you want and how your database is structured—and can deliver complete, polished outputs.

Enterprise-Grade Postgres Features for Production Use

As you move from experimentation to production use, the foundation matters. The pgEdge MCP Server is built with enterprise requirements in mind:

Security:
Read-only transactions by default, TLS support, token authentication

Reliability:
Production-tested connection handling and query execution

Performance:
Connection pooling and efficient query processing

Compatibility:
Support for standard PostgreSQL features, extensions, and tools including pgvector for semantic search

The Distributed Postgres Advantage

This becomes particularly important for distributed PostgreSQL deployments. pgEdge's infrastructure is built for multi-region, active-active database architectures. As your database scales across nodes and regions, having an AI agent that can intelligently query the right data becomes
increasingly valuable.

Getting Started Today

There are ever more possibilities for interacting with databases in new ways as technology progresses. With Claude Cowork and the pgEdge MCP Server, you can describe the outcome you want—a report, an analysis, a data export—and let an AI agent handle the queries, analysis, and document creation. It’s a new way to work with data that can help expedite information management in the database.The pgEdge MCP Server is open source under the PostgreSQL license and is ready to use. Visit the pgEdge MCP Server GitHub repository for documentation and configuration examples.To download the MCP Server binary, visit the pgEdge Enterprise repository . For containerized deployments, clone the repository and use the included Docker Compose setup, or follow the Docker deployment guide.The pgEdge MCP Server is part of the larger pgEdge Agentic AI Toolkit, which includes additional tools and integrations for building AI-powered applications on PostgreSQL. Learn more at pgedge.com/ai.

Andrei Lepikhov: 500 Milliseconds on Planning: How PostgreSQL Statistics Slowed Down a Query 20 Times Over

Wed, 28 Jan 2026 15:25:29 +0000

A query executes in just 2 milliseconds, yet its planning phase takes 500 ms. The database is reasonably sized, the query involves 9 tables, and the default_statistics_target is set to only 500. Where does this discrepancy come from?

This question was recently raised on the pgsql-performance mailing list, and the investigation revealed a somewhat surprising culprit: the column statistics stored in PostgreSQL's pg_statistic table.

The Context

In PostgreSQL, query optimisation relies on various statistical measures, such as MCV, histograms, distinct values, and others - all stored in the pg_statistic table. By default, these statistics are based on samples of up to 100 elements. For larger tables, however, we typically need significantly more samples to ensure reliable estimates. A thousand to 5000 elements might not seem like much when representing billions of rows, but this raises an important question: could large statistical arrays, particularly MCVs on variable-sized columns, seriously impact query planning performance, even if query execution itself is nearly instantaneous?

Investigating the Problem

We're examining a typical auto-generated 1C system query. '1C' is a typical object-relational mapping framework for accounting applications. PostgreSQL version is 17.5. Notably, the default_statistics_target value is set to only 500 elements, even below the recommended value for 1C systems (2500). The query contains 12 joins, but 9 are spread across subplans, and the join search space is limited by three JOINs, which is quite manageable. Looking at the EXPLAIN output, the planner touches only 5 buffer pages during planning - not much.

Interestingly, the alternative PostgreSQL fork (such forks have become increasingly popular these days) executed this query with nearly identical execution plans, and the planning time is considerably shorter - around 80 milliseconds. Let's use this as our control sample.

The Hunt for Root Cause

The first suspicion was obvious: perhaps the developers expanded the optimiser's search space, and it's simply passing through multiple extra paths. A flamegraph comparison between the slow planning case and the alternative fork showed remarkably similar patterns. Both exhibited search space expansion from features standard in 1C-related PostgreSQL forks (Joinsel and 'Append of IndexScans'), but nothing surprising beyond that.

However, the detailed analysis of the flamegraph revealed something more telling: a performance bottleneck in the byteaeq() comparison operation, triggered by the cost_index() function's cost estimation and toast_raw_datum_size() calls. The optimiser invokes this repeatedly while evaluating all possible index combinations across various expressions - not just those explicitly mentioned in the query, but also derived ones through 'equivalence classes' created by equality operations.

The query references just three columns: inforg10621::fld10622rref, inforg10621::fld15131rref, and inforg8199::fld8200_rrref. Yet these are involved in 20 different expressions, 15 of which are join clauses. When you factor in the number of indexes on these tables - eight between the two - it becomes clear that the number of possible combinations can explode. But how can we confirm this suspicion? How many times does the optimiser actually consult table statistics?

Unfortunately, standard PostgreSQL doesn't provide this information directly. So I turned to my own project - pg_index_stats, which uses PostgreSQL's internal hooks (relation_stats_hook and get_index_stats_hook) to collect precisely this data and display it in EXPLAIN output.

Here's what we found (1c and alternative):

The statistics for four columns are being accessed more than 100 times each. Remarkably, for the fld10622rref column, the optimiser fetches, decompresses, and uses the statistics 217 times! While this is less critical for the fld809 column (which has no histogram or MCV due to its nearly unique nature), other columns require repeatedly decompressing substantial arrays. The alternative fork accesses statistics roughly twice as frequently - a significant improvement, though not quite enough to fully explain the planning time difference.

Digging Deeper

What statistics do we actually have, and in what volume? Comparing statistics dumps from both PostgreSQL versions (here and there) shows that our tables indeed contain MCV and histogram arrays with up to 500 elements for several columns. Their uncompressed size reaches tens of kilobytes (compressed, over 2KB), and extracting them requires decompression before use. Surely we don't need to fetch and decompress these large arrays repeatedly?

After all, PostgreSQL does have caching that should calculate selectivity for a given expression only once …

We have two obvious suspects: columns fld10622rref and fld8201rref. Let's test our hypothesis by mechanically zeroing out their statistics and seeing what happens:

UPDATE pg_statistic
SET stanumbers1 = CASE WHEN stakind1 = 1 THEN NULL ELSE stanumbers1 END,
    Stavalues1 = CASE WHEN stakind1 = 1 THEN NULL ELSE stavalues1 END,
    Stakind1 = CASE WHEN stakind1 = 1 THEN 0 ELSE stakind1 END,
    Stanumbers2 = CASE WHEN stakind2 = 1 THEN NULL ELSE stanumbers2 END,
    Stavalues2 = CASE WHEN stakind2 = 1 THEN NULL ELSE stavalues2 END,
    Stakind2 = CASE WHEN stakind2 = 1 THEN 0 ELSE stakind2 END
WHERE (starelid = ‘_inforg10621’::regclass AND staattnum = (
    SELECT attnum FROM pg_attribute
    WHERE (attrelid = ‘inforg10621’::regclass AND attname = ‘fld10622rref’)))
OR (starelid = ‘_inforg8199’::regclass AND staattnum = (
    SELECT attnum FROM pg_attribute
    WHERE (attrelid = ‘_inforg8199’::regclass AND attname = ‘_fld8201rref’)));

The result? EXPLAIN now shows planning time at around 30ms:

Planning: Buffers: shared hit=5 Memory: used=4030kB allocated=4096kB
Planning Time: 31.347 ms
Execution Time: 0.237 ms

If we delete all statistics entirely with:

DELETE FROM pg_statistic;

We get the theoretical minimum planning time for this query:

Planning: Buffers: shared hit=5 Memory: used=3932kB allocated=4096kB
Planning Time: 18.477 ms
Execution Time: 0.421 ms

This aligns perfectly with the alternative fork's planning time.

But in the current master branch, since commit 057012b, Postgres employs a hashing technique to reduce the N^2 overhead of long MCV array passes. Ok, let's backpatch our case and check the explain:

 Planning:
   Buffers: shared hit=5
   Memory: used=3984kB  allocated=4096kB
 Planning Time: 64.603 ms
 Execution Time: 0.197 ms

It is definitely better than before, but we still see overhead that may grow with larger statistical arrays and repeated detoasting/decompression attempts.

The Verdict

Statistics indeed causes the excessive planning time, but the question remains: is it the overhead of decompressing statistics, or the overhead of repeatedly iterating through long MCV and histogram arrays? The answer is likely both.

We can indirectly confirm the impact of repeatedly traversing MCV arrays by noting that changing the storage type of columns in pg_statistic from EXTENDED to EXTERNAL produces no measurable difference:

DELETE FROM pg_statistic;
SET allow_system_table_mods = ‘on’;
ALTER TABLE pg_statistic ALTER COLUMN stavalues1 SET STORAGE EXTERNAL;
…
VACUUM ANALYZE;

Conclusion and Solutions

The root cause is clear: the optimiser's search space expanded due to increased index counts and statistics sizes - both entirely legitimate scenarios that can occur beyond ORM applications. The execution itself remains efficient and doesn't consume significant disk or memory resources, so it doesn't significantly impact neighbouring operations. However, the planning time can become problematic.

What Can Be Done?

First approach: Implement a caching system for frequently accessed, extensive statistics. This could even be implemented as an extension (similar to how I collected statistics access patterns in pg_index_stats). The code wouldn't be overly complex - just a standard module allocating a DSM segment for a hash table and decompressed statistics. Additionally, it's worth exploring a balance and perhaps storing MCVs in sorted order (when the data type allows), enabling fast element matching on both sides during JOIN estimation and quick lookup during filter estimation.

Second approach: You can just reduce the statistics size on problematic tables or columns:

ALTER TABLE table_name ALTER COLUMN column_name SET STATISTICS 0;

Of course, the challenge here is detecting the problematic spots (columns, clauses) inside the query. There's no universal answer - you need to EXPLAIN on suspicious queries with and without statistics, then perform the same analysis I did above. And naturally, report findings to the vendor, because there's always room for improvement!

THE END.
Istanbul, Turkey. January 26, 2026.

Bruce Momjian: New Presentation

Wed, 28 Jan 2026 14:00:01 +0000

I just gave a new presentation at Prague PostgreSQL Developer Day titled What's Missing in Postgres? It's an unusual talk because it explains the missing features of Postgres, and why. One thing I learned in writing the talk is that the majority of our missing features are performance-related, rather than functionality-related. I took many questions:

some pointed out that extensions supply much of this missing functionality
some supported the lack of features because the features are either unnecessary or harmful
some features are in-progress

Thanks to Melanie Plageman for the idea of this talk.

Avi Vallarapu: Migrating Sybase ASE aka SAP ASE to PostgreSQL

Wed, 28 Jan 2026 11:04:06 +0000

Legacy Sybase ASE/SAP ASE databases are still powering mission-critical OLTP workloads, but modernization pressure keeps rising. Witness the differences between SAP ASE and PostgreSQL, and the migration path to PostgreSQL.

Lætitia AVROT: Why Your HA Architecture is a Lie (And That's Okay)

Wed, 28 Jan 2026 00:00:00 +0000

If Darth Vader existed and decided to do to Earth what he did to Alderaan, everyone would lose data. I love this quote from Robert Haas because it’s a reality check we all need. In the database world, we’re constantly sold the dream of “Five Nines” (99.999% uptime) and “Zero Data Loss” (RPO1 0). We spend months building complex clusters to achieve it. Let’s be honest: these are fairy tales. Beautiful to imagine, but they don’t exist in production.

semab tariq: Unused Indexes In PostgreSQL: Risks, Detection, And Safe Removal

Tue, 27 Jan 2026 09:57:39 +0000

Indexes exist to speed up data access. They allow PostgreSQL to avoid full table scans, significantly reducing query execution time for read-heavy workloads.

From real production experience, we have observed that well-designed, targeted indexes can improve query performance by 5× or more, especially on large transactional tables.

However, indexes are not free.

And in this blog, we are going to discuss what issues unused indexes can cause and how to remove them from production systems with a rollback plan, safely

1. Why Unused Large Indexes Become a Long-Term Problem

Over time, unused indexes can silently degrade database performance. Below are some of the most common issues they cause in production systems.

1.1. Slower INSERT, UPDATE, And DELETE Operations

Every write operation must update all indexes on a table, including those that are never used by queries.

1.2. Increased Vacuum And Autovacuum Overhead

Indexes accumulate dead tuples just like tables. These must be vacuumed, increasing I/O usage and extending vacuum runtimes.

1.3. Longer Maintenance Windows

Operations such as VACUUM and REINDEX take longer as the number and size of indexes grow.

1.4. Disk Space Waste And Cache Pollution

Large unused indexes consume disk space and can evict useful data from shared buffers, reducing cache efficiency.

Because of these reasons, it is always recommended to periodically identify and safely remove unused indexes from production systems, but only through a controlled and well-validated process.

2. How To Safely Drop Unused Indexes In PostgreSQL

Below is a step-by-step, production-safe checklist that should be followed before dropping any index.

2.1. Check When System Statistics Were Last Reset

If statistics were reset recently, an index may appear unused even though it is actively required by workloads.

SELECT
  datname,
  stats_reset
FROM pg_stat_database
WHERE datname = current_database();

An older stats_reset timestamp (or NULL, meaning statistics were never reset) provides more confidence in index usage data.

2.2. Check Whether The Index Backs Any Constraint

A large index can appear unused in statistics, but must not be dropped if it enforces a PRIMARY, UNIQUE, or FOREIGN KEY constraint.

PostgreSQL uses these indexes to guarantee data integrity and will not allow them to be dropped unless the constraint itself is explicitly removed.

SELECT
  i.relname AS index_name,
  c.conname AS constraint_name,
  c.contype AS constraint_type,
  c.conrelid::regclass AS table_name
FROM pg_constraint c
JOIN pg_class i ON i.oid = c.conindid
WHERE i.relname = '<IDX_NAME>';

If this query returns rows, the index can not be dropped.

2.3. Check Index Usage Statistics

This confirms whether PostgreSQL’s query planner has used the index during query execution.

SELECT
  s.indexrelname AS index_name,
  s.relname AS table_name,
  s.idx_scan,
  s.idx_tup_read,
  s.idx_tup_fetch
FROM pg_stat_user_indexes s
WHERE s.indexrelname = '<IDX_NAME>';

All the counts must be 0

3. Rollback Preparation

Before dropping any index, always capture its definition so it can be recreated quickly if needed.

SELECT pg_get_indexdef('<IDX_NAME>'::regclass) AS create_index_sql;

Store this output as part of your rollback plan.

4. Drop The Index Safely

Using DROP INDEX CONCURRENTLY avoids blocking reads and writes on the table, making it safe for production environments.

DROP INDEX CONCURRENTLY <IDX_NAME>;

If performance issues are observed after dropping the index, the rollback plan can be used to recreate the index concurrently without impacting availability.

5. Final thoughts

Dropping unused indexes can deliver meaningful performance and maintenance benefits, but only when done carefully.

Never rely on statistics alone; always validate constraints, understand workload patterns, and prepare a rollback plan.

In production systems, correctness and stability must always take priority over cleanup speed.

The post Unused Indexes In PostgreSQL: Risks, Detection, And Safe Removal appeared first on Stormatics.

Hubert 'depesz' Lubaczewski: How to render timestamp with a timezone that is different from current?

Tue, 27 Jan 2026 09:25:04 +0000

This question appeared on IRC, and while I wasn't there while it happened, it caught my eye: » Can I not render this with timezone offset: select ‘2026-01-09 04:35:46.9824-08'::timestamp with time zone at time zone ‘UTC'; » Returns ‘2026-01-09 12:35:46.9824' which is without the offset. Let's see what can be done about it. First, let's … Continue reading

Dave Stokes: Is the future of MySQL PostgreSQL (Or MariaDB, or TiDB, or ...)?

Sun, 25 Jan 2026 16:45:00 +0000

I am not intentionally trying to upset anyone with this blog post or minimize the efforts of many brilliant people whom I admire. However, I connected with several people over the 2025 holidays who all had the same question: What is the future of MySQL? At the upcoming FOSDEM conference, several events will discuss this subject and push a particular solution. And in several ways, they are all wrong.

Oracle has not been improving the community edition for a long time now. They have laid off many of their top performers in the MySQL group. We got almost a good decade and a half out of Oracle's stewardship of the "world's most popular database", and we should be thankful for that. However, now that time is over, it is time to consider future options that will involve no updates, CVEs, or innovation for what is the MySQL Community Edition.

There are several choices available.

Nothing!

The first choice is nothing. Many folks run old, end-of-life versions of MySQL. There are many instances of MySQL 5.7. There are some fantastic features in later versions of the software. But if those features are not needed or desired, then why upgrade? MySQL has always had a minimalist appeal for those who have little need for features like JSON, material views, and the like. This vanilla approach will be the default for many who do not change it if it is still working in the school of software management.

Pros: You do not have to make any changes.
Cons: You are taking on technical debt like the Titanic took on water. You may get a few years out of this, but this path is fraught with hungry dragons.

The Elephant

PostgreSQL? This is a great database, offering numerous valuable features and making it a solid choice. It is reasonably easy to port schemas and data from MySQL to PostgreSQL. You will want to run a connection pooler. You will need to keep an eye on the vacuum status. You will need to learn to pick from an embarrassing number of indexing options, with B+ tree probably still being your primary choice. A significant benefit is that PostgreSQL becomes a little faster each year. This is a sound choice for the future. But this is a big change, and a bridge too far for many.

Why? To a long-time MySQL user, PostgreSQL is much more complex to set up. It is more than capable, with its Swiss-knife-like set of options and extensions. The replication setup seems like putting together tinker toys. And if you do not know how to monitor for transaction ID wrap-around, it can bite you. MySQL has been set-and-forget operationally, whereas PostgreSQL needs more attention.

Pros: This is the way to go for a robust, really open-source database. This is my recommendation most of the time.
Cons: Some of the long-term issues seem archaic, and you have to pay attention to its operation.

What about the forks and/or branches?

MariaDB

MariaDB forked from MySQL fifteen years ago. In many cases, it is an easy port from MySQL to MariaDB. That is, unless you make heavy use of JSON or MySQL Cluster. MariaDB recently acquired Galera Cluster, a similar replication concept to MySQL Cluster, but not interchangeable. MariaDB had been hiring some of the top talent (Joro, Jeb, et al) that Oracle MySQL let slip away. MariaDB shares a lot of DNA with MySQL, but it has since diverged from MySQL. This is a 'may be close enough' option for some.

Pros: This is probably the second easiest port. Has some cool features not in Community MySQL.
Cons: MariaDB has some clunky parts, like its JSON implementation, which may not affect you. And many have been turned off by the soap opera foundation, corporation, crippled stock IPO, SkySQL dramas of the last few years.

Percona

Percona has been enhancing upstream Oracle MySQL for years, and the majority of its revenue was from MySQL. It has some clever features that Oracle should have added years ago. It is rock solid and as close to Oracle's product as possible. But if the upstream dries up, what happens downstream? And with Galera being purchased by MariaDB, will the Percona fork users be forced to move to the seal? Percona has also had employee turnover that could impact its branch.

Pros: As close as you can get to mainstream Oracle MySQL as possible.
Cons: Can Percona carry the torch for MySQLdom if or when Oracle walks away? Doubtful at this time.

Oracle

Well, you could actually pay for Enterprise Oracle MySQL. One of Percona's founders has been quoted as saying that Oracle really does not have customers, only hostages.

The last fifty years of computing history are full of folks who bet against Larry Ellison and lost.

But having to pay for something that was once free rubs me and others the wrong way.

Pros: The change from the community edition to the enterprise edition is easy.
Cons: That change is usually felt as a sharp pinching feeling in the wallet. Oracle audits.

Cloud

The past decade has been great for those who wanted to get out of running their own databases and pay for someone else to run them. They take on the responsibility, and you get to pick which product you want. And at a certain level, a relational database from A is like the one from B. Add in an ORM or two, and the database is abstracted to absurdum.

You can surrender all and let the cloud do the work. It used to be that DBAs fought hard for a 'golden truth of data' in which there was one copy of the data, with no redundancies, and that information was 100% correct. Then we replicated to read-only servers to take the load off the central read-write instance. Then the data was exported into NoSQL, eventually consistent, dunked in lakes, time-sliced, and sharded so that the one golden truth was gone. I see many big projects today trying to align slight variations in data to achieve a higher degree of confidence, and I know they are going to fail.

So you could just move everything to the cloud and let your cloud vendor worry about the icky backend stuff, just like you offloaded your RAID worries.

Pros: The database issue is now someone else's issue.
Cons: Expensive, and you put a high level of trust in someone else doing everything right.

Other MySQL Compatible Options

Yes, there are MySQL-compatible options. My fav is TiDB. But compatible is not precisely the same as MySQL. They work well and often do things better than MySQL. But if you need to fire up a quick instance for a proof of concept or your kids' sports league, they are overkill. At the bigger end of the continuum, they may be precisely what you need.

Pros: Compatible at a certain level with what you are used to with MySQL.
Cons: May not be exactly what you need, and may not be compatible enough.

Conclusion

Time and the marketplace will pick winners. They may not be from the list above, as some new hero will ride in on their unicorn with a new database that solves all our problems. But do not hold your breath.

My prediction: A mix of the above will take over where Oracle is surrendering MySQL Community Edition's market share. The two predominant options will be a continued shift to PostgreSQL or a non-shift, keeping the version of the Community Edition currently running. The first is for those who strive for the best technological solution, with an eye on future improvements, which the PostgreSQL contributors routinely deliver. The second is for those from the school of thought that has found you do not mess with what is running well.

Most of you will go with PostgreSQL. It is a solid technical choice, is getting better each year thanks to an amazing, dedicated team, and will do more than you need.

But I will still miss the spirit of MySQL AB, where I joined in 2007, where the feeling was that anything was possible. Almost two decades later, the current status is a sad way for such a dynamic project to limp to a halt.

damien clochard: PostgreSQL Anonymizer, available in all good shops

Sun, 25 Jan 2026 12:30:36 +0000

As we prepare for the upcoming release of PostgreSQL Anonymizer 3.0, I took some time to check which platforms now support the extension. What I discovered brought me a sense of achievement that I wanted to share with the community.

More and More Platforms Are Embracing Data Anonymization

Over the past months, several major Cloud Service Providers have adopted the PostgreSQL Anonymizer extension, making it easier than ever for organizations to protect sensitive data.

The new adopters include:

They add to the current list composed of Alibaba Cloud, Crunchy Bridge, Google Cloud SQL, Microsoft Azure Database, Neon and others

Growing Support Across PostgreSQL Forks

Perhaps even more remarkable is the adoption by major PostgreSQL forks and enterprise distributions. Each of these platforms has its own specific requirements and user base, and seeing PostgreSQL Anonymizer integrated across this ecosystem is truly humbling:

Please refer to their own documentation on how to activate the extension as they might have a platform-specific install procedure.

Beyond PostgreSQL: The Django Integration

I’ve also noticed a Django plugin for PostgreSQL Anonymizer, making it easier for Python developers to integrate data anonymization into their applications.

Reflecting on our journey

When we started working on PostgreSQL Anonymizer in 2018, the goal was simple: provide a straightforward way to mask personal information directly within PostgreSQL. We wanted to make privacy-preserving techniques accessible to anyone using PostgreSQL, without requiring complex and expansive external tools or ETLs.

Seeing this level of adoption across cloud providers, enterprise distributions, and even extending into application frameworks is incredibly rewarding. But it’s important to remember that this success belongs to everyone who contributed to the project.

I want to express my deepest gratitude to all the contributors who have submitted patches, reported bugs, improved documentation, and provided feedback over the years. And espacially to my colleagues who have supported and encouraged this work.

Looking Ahead

As we move toward the 3.0 release, this growing adoption motivates us to keep improving and maintaining the extension to the highest standards. The diversity of platforms now supportung the extension shows that protecting users privacy is now a global concern.

The journey continues, we actively working on supporting even more platforms such as SUSE, the CNPG operator and others. If you want to use the extension for your project, but you can’t install it for whatever reason, please don’t hesitate to reach out at contact@dalibo.com !

David Wheeler: 🛠️ PGXN Tools v1.7

Sat, 24 Jan 2026 22:53:11 +0000

Today I released v1.7.0 of the pgxn-tools OCI image, which simplifies Postgres extension testing and PGXN distribution. The new version includes just a few updates and improvements:

Upgraded the Debian base image from Bookworm to Trixie
Set the PGUSER environment variable to postgres in the Dockerfile, removing the need for users to remember to do it.
Updated pg-build-test to set MAKEFLAGS="-j $(nprocs)" to shorten build runtimes.
Also updated pgrx-build-test to pass -j $(nprocs), for the same reason.
Upgraded the pgrx test extension to v0.16.1 and test it on Postgres versions 13-16.

Just a security and quality of coding life release. Ideally existing workflows will continue to work as they always have.