PGBLR Meetup met on February 7, organized by Organizers Amit Kapila, Kuntal Ghosh, Sivji Kumar Jha and Vigneshwaran C.
Speaker:
Mumbai PostgreSQL UserGroup met on February 7 - organized by Ajit Gadge, Sovenath Shaw and Deepak Mahto
Speaker:
CERN PGDay 2026 took place on February 6, organized by:
Speaker:
PG DATA Conference 2026, CfP committee selected the talks. Committee members are listed below:
Community Blog Posts:
Kai Wagner - PGDay and FOSDEM Report from Kai
Henrietta Dombrovskaya - Prague PostgreSQL Dev Day – a very late follow up
PostgreSQL uses a process-based architecture where each connection is handled by a separate process. Some data structures are shared between these processes, for example, the shared buffer cache or the write-ahead log (WAL). To coordinate access to these shared resources, PostgreSQL uses several locking mechanisms, including spinlocks. Spinlocks are intended for very short-term protection of shared structures: rather than immediately putting a waiting process to sleep, they busy-wait and repeatedly check whether the lock is free. Under contention, PostgreSQL also applies an adaptive backoff that can include brief sleeps.
This article explains what spinlocks are and how they are implemented in PostgreSQL. It also describes how spinlocks can be monitored and demonstrates how my new pg_spinlock_tracer tool can be used to trace spinlock internals using eBPF.
When multiple processes need to access a shared resource, locks are used to ensure that only one process can modify the resource at a time. If a lock is not available, the waiting process is put to sleep until the lock can be acquired. This reduces CPU usage since the waiting process does not consume CPU cycles while sleeping. However, putting a process to sleep and waking it up again involves context switches, which take time and add latency to the operation. If the lock is expected to be held for a very short time, it may be more efficient for the waiting process to continuously check if the lock is available instead of sleeping. That is what spinlocks do: the lock spins in a loop, repeatedly checking the lock’s status until it can be acquired. Using a spinlock avoids the sleep/wakeup latency but can consume CPU cycles while spinning. If the hardware has only a few CPU cores, spinning can waste CPU cycles and lead to worse overall performance.
The PostgreSQL implementation of spinlocks is mainly in src/include/storage/s_lock.h and src/backend/storage/lmgr/s_lock.c. The spinlock API provides
Back from Brussels where I was doing the annual pilgrimage to the awesome FOSDEM gathering. I was very pleased to see the popularity and positive vibe of the (first time) joint Databases Devroom. Community-oriented and community-run conferences are the best IMHO.
It was great to share the stage this time with Daniël van Eeden, an engineer from PingCAP and a MySQL Rockstar. I enjoyed the collaboration because we approached a thorny issue from two different aspects: the PostgreSQL emerging standard and the implementation of MySQL compatibility.
Our presentation, "Drop-in Replacement: Defining Compatibility for Postgres and MySQL Derivatives", tackled a problem in our industry: the "wild west" of marketing claims. The success of open source databases has created an ecosystem of derivatives claiming "drop-in compatibility."
The reality, however, is that this often leads to user confusion and brand dilution. As we discussed, compatibility is not an absolute Yes/No situation—even different versions of the same database are not 100% compatible due to deprecated or added features.
In my section of the talk, I focused on the governance perspective. I presented the findings from the "Establishing the PostgreSQL Standard" working group held at PGConf.EU 2025 in Riga last October.
We are pivoting from a binary "Pass/Fail" certification to a granular compatibility matrix. We need to ensure that when someone says "Postgres Compatible," they don't just mean matching the wire protocol. We need to look at:
INSERT ... SELECT ... ORDER BY behaves.
pg_catalog being present and predictable.
CREATE INDEX must actually build the index, not just return "success" while doing nothing.
Everyone who was in Prague on January 27-28 has already posted their feedback and moved on, so I am late, as it often happens. However, I still maintain that better late than never!
This year was the second time I attended this event, and this time, I didn’t have to rush out immediately after my training session, and was able to stay longer and to connect with many attendees. Same as last time, I was very impressed with the whole event organization, and a very warm and welcoming atmosphere at the event. Many thanks to Tomas Vondra!
I delivered the same workshop I did last year. Last year, I ran out of time despite my best efforts, and since I hate to be that presenter who doesn’t keep track of time and then rushes through the last twenty slides, I made an effort to remove the content I presumed I won’t take time to cover, in advance. It looks like I overdid it a little bit, because I ended up finishing earlier, but I think it’s still better than later
My usual lament about these training sessions is gender uniformity, and I still do not know what is the right solution for this problem!
Also, many thanks to Gülçin Yıldırım Jelínek for extending my 63-rd birthday celebraiton for another week
As it often happens, my only regret is that there were so many interesting talks happening at the same time! I could avoid FOMO if I would check out the Nordic PG schedule earlier, because some of the talks will be replayed there. I could plan it better! But in any case, I had a great time
In the article about Buffers in PostgreSQL we kept adding EXPLAIN (ANALYZE, BUFFERS) to every query without giving much thought to the output. Time to fix that. PostgreSQL breaks down buffer usage for each plan node, and once you learn to read those numbers, you'll know exactly where your query spent time waiting for I/O - and where it didn't have to. That's about as fundamental as it gets when diagnosing performance problems.
EXPLAIN ANALYZE automatically includes buffer statistics - you no longer need to explicitly add BUFFERS. The examples below use the explicit syntax for compatibility with older versions, but on PG18+ a simple EXPLAIN ANALYZE gives you the same information.
For this article we will use following schema and seeded data.
CREATE TABLE customers (
id integer GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
name text NOT NULL
);
CREATE TABLE orders (
id integer GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
customer_id integer NOT NULL REFERENCES customers(id),
amount numeric(10,2) NOT NULL,
status text NOT NULL DEFAULT 'pending',
note text,
created_at date NOT NULL DEFAULT CURRENT_DATE
);
INSERT INTO customers (name)
SELECT 'Customer ' || i
FROM generate_series(1, 2000) AS i;
-- seed data: ~100,000 orders spread across 2022-2025
INSERT INTO orders (customer_id, amount, status, note, created_at)
SELECT
(random() * 1999 + 1)::int,
(random() * 500 + 5)::numeric(10,2),
(ARRAY['pending','shipped','delivered','cancelled'])[floor(random()*4+1)::int],
CASE WHEN random() < 0.3 THEN 'Some note text here for padding' ELSE NULL END,
'2022-01-01'::date + (random() * 1095)::int -- ~3 years of data
FROM generate_series(1, 100000);
-- make sure stats are up to date
ANALYZE customers;
ANALYZE orders;
-- we are going to skip indexes on purpose
-- and fire sample query
select count(1) from customers;
Let's start with a random query
EXPLAIN (ANALYZE, BUFFERS)
SELECT [...]
PostgreSQL's query planner relies on table statistics to estimate the number of rows (estimated rows) each operation will process, and then selects an optimal execution plan based on these estimates. When the estimated rows diverge significantly from the actual rows, the planner can choose a suboptimal plan, leading to severe query performance degradation.
This article walks through four approaches I used to reduce row count estimation errors, ordered from least to most invasive. Due to confidentiality constraints, I cannot share actual SQL or execution plans, so the focus is on the diagnostic thought process and the techniques applied.
pg_statistic, extended statistics) have been stable across versions
The target table was known to have a very high update rate, so the first hypothesis was that the statistics were simply stale.
In PostgreSQL, the autovacuum daemon automatically runs ANALYZE to update statistics stored in pg_statistic. However, for tables with heavy write activity, auto-ANALYZE may not keep up with the rate of change, causing the statistics to drift from reality.
To address this, I adjusted the auto-ANALYZE frequency for the specific table rather than changing the server-wide settings in postgresql.conf.
The two key parameters are:
autovacuum_analyze_threshold: The minimum number of tuple modifications before auto-ANALYZE is triggered (default: 50)
autovacuum_analyze_scale_factor: The fraction of the table size added to the threshold (default: 0.1, i.e., 10%)
ALTER TABLE table_name SET (
autovacuum_analyze_threshold = 0,
autovacuum_analyze_scale_factor = 0.01
);
In this ex
[...]Saw this post on LinkedIn yesterday:
I also somehow missed this setting for years. And it’s crazy timing, because it’s right after I published a blog about seeing the exact problem this solves. In my blog post I mentioned “unexpected behaviors (bugs?) in… Postgres itself.” Turns out Postgres already has the fix; it’s just disabled by default.
It was a one-line change to add the setting to my test suite and verify the impact. As a reminder, here’s the original problematic behavior which I just now reproduced again:
At the T=20sec mark, TPS drops from 700 to around 30. At T=26sec the total connections hit 100 (same as max_connections) and then TPS drops to almost zero. This total system outage continues until T=72sec when the system recovers after the blocking session has been killed by the transaction_timeout setting.
So what happens if we set client_connection_check_interval to 15 seconds? Quick addition to docker-compose.yml and we find out!
Fascinating! The brown line and the red line are the important ones. As before, the TPS drops at T=20sec and zeros out after we hit max_connections. But at T=35sec we start to see the total connection count slowly decrease! This continues until T=42sec when the PgBouncer connections are finally released – and at this point we repeat the whole cycle a second time, as the number of total connections climbs back up to the max.
So we can see that the 15 second client_connection_check_interval setting is working exactly as expected (if a little slowly) – at the 15 second mark Postgres begins to clean up the dead connections.
What if we do a lower setting like 2 seconds?
This looks even better! The total connections climbs to around 30-ish and holds stable there. And more importantly, the TPS never crashes out all the way to zero and the system is able to continue with a small workload until the blocking session is killed.
There is definitely some connection churn happening here (expected due to golang context timeouts) an
[...]
You can prototype an impressive agent in a notebook, but you can’t run one in production without a transport strategy. The Model Context Protocol standardizes how agents call tools and access memory, but it intentionally does not define how bytes move between systems. That responsibility sits with your architecture. Most teams treat transport as an implementation detail, and default to whatever works in a development container. That shortcut becomes technical debt the moment the system faces real traffic. The symptoms are predictable:
OpenAI recently shared their story about how they scaled to 800 million users on their ChatGPT platform. With the boom of AI in the past year, they’ve certainly had to deal with some significant scaling challenges, and I was curious how they’d approach it. To sum it up, they addressed the following issues with the following solutions:
idle_in_transaction_session_timeout)
Indeed, there was a lot of work put in to scale to “millions of queries per second (QPS)” and I applaud their team for implementing these solutions to handle the unique challenges that they faced. 👏👏👏
While reading through their post, I couldn’t help but think to myself, wow, some of the solutions they used are not much different from ours 15 years ago! Fifteen years ago, I was the head DBA at Turnitin (called iParadigms at the time). Times were different back then, before the massive boom of social media (Instagram wasn’t a thing at the time!), and we were all on-prem, switching from spindle-based disk to SSDs. At that time, we were likewise facing challenges scaling to 3000 QPS to serve up data to students and teachers across the US, Canada, and the UK. Our founders were making a lot of headway in promoting Turnitin to secondary schools and universities, and we were regularly facing the struggle of having “just enou
[...]PostgreSQL 18 introduces native "Skip Scan" for multicolumn B-tree indexes, a major optimizer enhancement. However, a common misconception is that pre-v18 versions purely resort to sequential scans when the leading column isn't filtered. In reality, the cost-based planner has long been capable of leveraging such indexes when efficient.
How it works under the hood:
In pre-v18 versions, if an index is significantly smaller than the table (the heap), scanning the entire index to find matching rows—even without utilizing the tree structure to "skip"—can be cheaper than a sequential table scan. This is sometimes also referred as a Full-Index Scan. While this lacks v18’s ability to hop between distinct leading keys, it effectively still speeds up queries on non-leading columns for many common workloads.
Notably, we would see why this v18 improvement is not a game-changer for all workloads, and why you shouldn't assume speed-up for all kinds of datasets.
To understand when the new v18 Skip Scan helps, we tested two distinct scenarios.
The Setup: We created an index on (bid, abalance) and ran the following query:
SELECT COUNT(*) FROM pgbench_accounts WHERE abalance = -2187;
Note: We did not run a read-write workload, so abalance is 0 for all rows. The query returns 0 rows.
Data Statistics:
bid): ~10 distinct values (Low Cardinality).
abalance): 1 distinct value (Uniform).
Results:
| Version | Strategy | TPS | Avg Latency | Buffers Hit |
|---|---|---|---|---|
| v17 | Index Only Scan (Full) | ~2,511 | 0.40 ms | 845 |
| v18 | Index Only Scan (Skip Scan) | ~14,382 |
The following thoughts and comments are completely my personal opinion and do not reflect my employers thoughts or beliefs. If you don’t like anything in this post, reach out to me directly, so I can ignore it ;-).
I’m currently on the train on my way back home from FOSDEM this year and man, I’m exhausted but also happy. Why? Because the PG and FOSDEM community is just crazily awesome. While it’s always too much of everything, it’s at the same time inspiring to see so many enthusiastic IT nerds in one place, discussing and working on what they love - technology and engineering challenges.
The topic of checkpoint tuning is frequently discussed in many blogs. However, I keep coming across cases where it is kept untuned, resulting in huge wastage of server resources, struggling with poor performance and other issues. So it’s time to reiterate the importance again with more details, especially for new users. What is a checkpoint? […]
The Nordic PGDay 2026 Call for Paper Committee met to finalize the talk selection:
PGDay Paris 2026 schedule has also been announced — talk selection was made by:
On Monday, January 26, the Prague PostgreSQL Meetup: January Edition met. It was organized by Mayuresh B. Gulcin Yildirim Jelinek.
Speaker:
Prague PostgreSQL Dev Day 2026 (P2D2) happened from January 27 - January 28.
Organized by:
Talk selection committee:
Workshops:
Talks:
Lightning talks:
PostgreSQL was represented at FOSDEM’26 which took place from January 31 - February 1
Databases Devroom Speakers:
The new major release of the fast connection pooler for PostgreSQL has been released!
It took quite a lot of time to get from version 1.6.0 to the new major version 2.0.0, but the new pgagroal is finally here! The project went thru two Google Summer of Code (GSoC 2025 and 2024) before this new great release was completed, but the project decided to prefer the code stability over the rush in releasing, and I think you are going to be amazed by how much improvements have been collapsed in this new version.
The official release note has been sent today.
There are a lot of new features and small gems in this 2.0.0 release, it is pretty much impossible to describe all of them here, but here it is a concise list of what you can expect from this new version.
pgagroal has been event-driven from the very beginning, using the libev library for handling input/output in a more fast and scalable way.
The project decided to move from libev from something more modern and better mantained, and the natural choice for Linux operating systems was io_uring (yes, the same used in PostgreSQL 18) and kqueue for BSD systems. io_uring is an async method for read and write operations that aims at peformances, while kqueue is an event driven approach for FreeBSD and OpenBSD systems.
The key point here is event-driven and, obviously, asynchronous. The whole event management has been rewritten to wrap compatible structures and functions wherever possible. Performances have increased a lot from the 1.x releases.
The management protocol is the way pgagroal-cli and pgagroal interact each other: it sends commands to the daemon and get back responses.
The new release provides a new fully rewritten management protocol that now speaks entirely JSON. Moreover, the protocol is now more robust and error tolerant. Moreover, every command and response now include the application and daemon version, so that it
[...]Last year at pgconf.dev, there was a discussion about improving the user interface for the PostgreSQL hackers mailing list, which is the main communication channel for PostgreSQL core development. Based on that discussion, I want to share a small project we have been working on:
Hackorum provides a read-only (for now) web view of the mailing list with a more forum-like presentation. It is a work-in-progress proof of concept, and we are primarily looking for feedback on whether this approach is useful and what we should improve next.
Read time: ~6 minutes
A real-world deep dive into operator precedence, implicit casting, and why database engines “don’t think the same way”.
You migrate a perfectly stable Oracle application to PostgreSQL.
And yet… the numbers or query calculations are wrong.
Not obviously wrong. Not broken. Just different.
Those are the worst bugs the ones that quietly ship to production. This is a story about one such bug, hiding behind familiar operators, clean-looking conversions, and false confidence.
Here’s a simplified piece of real production logic used to compute a varhour value from timestamp data:
CASE
WHEN TO_CHAR(varmonth,'MI') + 1 = 60
THEN varhr - 1 || TO_CHAR(varmonth,'MI') + 1 + 40
ELSE varhr - 1 || TO_CHAR(varmonth,'MI') + 1
END AS varhour
At first glance, this feels routine:
Extract minutes
Perform arithmetic
Concatenate values
Nothing here screams “migration risk”.
The Migration Illusion: “Looks Correct, Right?”
During migration, teams don’t blindly copy Oracle SQL. They do the right thing make types explicit and clean up the logic.
Here’s the PostgreSQL converted version, already “fixed” with necessary casts:
SELECT
CASE WHEN TO_CHAR(varmonth, 'MI') :: integer + 1 = 60
THEN
(end_hr -1) :: text || TO_CHAR(varmonth, 'MI')::integer + 1 + 40
ELSE
(end_hr -1)::text || TO_CHAR(varmonth, 'MI') ::integer + 1
END varhour
FROM sample_loads
ORDER BY id;
No syntax errors. Explicit casting. Clean and readable. At this point, most migrations move on.
Side-by-Side: Oracle vs PostgreSQL (At First Glance)
Let’s compare the two versions:
I have to apologise — it's been months since PGConf.EU 2025 in Riga, and I'm only now publishing this video. The delay was due to wanting to create accurate captions for the recording, which unfortunately took longer than expected.
In this session, Floor Drees, Karen Jex, and I joined host Boriss Mejias to examine how diverse minds work together in the PostgreSQL ecosystem. We touched upon the psychology of teamwork and the importance of accommodating neurodiverse conditions like ADHD and ASD.
A pleasant surprise for us during the session was the level of engagement from the audience. People connected deeply with the subject matter, turning the panel talk into a real conversation where we shared practical hacks — body doubling, "Pomodoro playlists", tactile focus tools like knitting, crocheting, and full-body fidget toys, and experiences with managers who actually "get it".
Building awareness is the first step on a journey that can lead to better outcomes for everyone. We do believe some things need to be adapted, and we can work together to make this gradual change happen.
Without further ado, I present the panel discussion below. I will be very happy to hear back from you at @vyruss@fosstodon.org — your comments, your experiences, your testimonials. This is how we continue to raise awareness together.
Video on YouTube: youtube.com/watch?v=PsxNhcBTrTU
The rise of agentic AI is transforming how we build applications, and databases are at the center of this transformation. As AI agents become more sophisticated, they need reliable, real-time access to data.If you’ve decided to use an MCP server for exposing data to large language models (LLMs) to build internal tools for trusted users, apply sophisticated database schema changes, or translate natural language into SQL, you might find the pgedge-postgres-mcp project (available on GitHub) useful to try.This 100% open source Natural Language Agent for PostgreSQL provides a connection between any MCP-compatible client (including AI assistants like Claude) and any standard flavor of Postgres, whether you’re creating a new greenfield project or are using an existing database.
A query executes in just 2 milliseconds, yet its planning phase takes 500 ms. The database is reasonably sized, the query involves 9 tables, and the default_statistics_target is set to only 500. Where does this discrepancy come from?
This question was recently raised on the pgsql-performance mailing list, and the investigation revealed a somewhat surprising culprit: the column statistics stored in PostgreSQL's pg_statistic table.
In PostgreSQL, query optimisation relies on various statistical measures, such as MCV, histograms, distinct values, and others - all stored in the pg_statistic table. By default, these statistics are based on samples of up to 100 elements. For larger tables, however, we typically need significantly more samples to ensure reliable estimates. A thousand to 5000 elements might not seem like much when representing billions of rows, but this raises an important question: could large statistical arrays, particularly MCVs on variable-sized columns, seriously impact query planning performance, even if query execution itself is nearly instantaneous?
We're examining a typical auto-generated 1C system query. '1C' is a typical object-relational mapping framework for accounting applications. PostgreSQL version is 17.5. Notably, the default_statistics_target value is set to only 500 elements, even below the recommended value for 1C systems (2500). The query contains 12 joins, but 9 are spread across subplans, and the join search space is limited by three JOINs, which is quite manageable. Looking at the EXPLAIN output, the planner touches only 5 buffer pages during planning - not much.
Interestingly, the alternative PostgreSQL fork (such forks have become increasingly popular these days) executed this query with nearly identical execution plans, and the planning time is considerably shorter - around 80 milliseconds. Let's use this as our control sample.
The first suspicion was obvious: perhaps the developer
[...]I just gave a new presentation at Prague PostgreSQL Developer Day titled What's Missing in Postgres? It's an unusual talk because it explains the missing features of Postgres, and why. One thing I learned in writing the talk is that the majority of our missing features are performance-related, rather than functionality-related. I took many questions:
Thanks to Melanie Plageman for the idea of this talk.
I am not intentionally trying to upset anyone with this blog post or minimize the efforts of many brilliant people whom I admire. However, I connected with several people over the 2025 holidays who all had the same question: What is the future of MySQL? At the upcoming FOSDEM conference, several events will discuss this subject and push a particular solution. And in several ways, they are all wrong.
Oracle has not been improving the community edition for a long time now. They have laid off many of their top performers in the MySQL group. We got almost a good decade and a half out of Oracle's stewardship of the "world's most popular database", and we should be thankful for that. However, now that time is over, it is time to consider future options that will involve no updates, CVEs, or innovation for what is the MySQL Community Edition.
There are several choices available.
The first choice is nothing. Many folks run old, end-of-life versions of MySQL. There are many instances of MySQL 5.7. There are some fantastic features in later versions of the software. But if those features are not needed or desired, then why upgrade? MySQL has always had a minimalist appeal for those who have little need for features like JSON, material views, and the like. This vanilla approach will be the default for many who do not change it if it is still working in the school of software management.
Pros: You do not have to make any changes.
Cons: You are taking on technical debt like the Titanic took on water. You may get a few years out of this, but this path is fraught with hungry dragons.
PostgreSQL? This is a great database, offering numerous valuable features and making it a solid choice. It is reasonably easy to port schemas and data from MySQL to PostgreSQL. You will want to run a connection pooler. You will need to keep an eye on the vacuum status. You will need to learn to pick from an embarrassing number of indexing options, with B+ tree probably still being your prima
[...]Number of posts in the past two months
Number of posts in the past two months
Get in touch with the Planet PostgreSQL administrators at planet at postgresql.org.