When implementing an optimization for derived clause lookup myself, Amit Langote and David Rowley argued about the initial size of hash table (which would hold the clauses). See some discussions around this email on pgsql-hackers.
The hash_create() API in PostgreSQL takes initial size as an argument. It allocates memory for those many hash entries upfront. If more entries are added, it will expand that memory later. The point of argument was what should be the initial size of the hash table, introduced by that patch, containing the derived clauses. During the discussion, David hypothesised that the size of the hash table affects the efficiency of the hash table operations depending upon whether the hash table fits cache line. While I thought it's reasonable to assume so, the practical impact wouldn't be noticeable. I thought that beyond saving a few bytes choosing the right hash table size wasn't going to have any noticeable effects. If an derived clause lookup or insert became a bit slower, nobody would even notice it. It was practically easy to address David's concern by using the number of derived clauses at the time of creating the hash table to decide initial size of the hash table. The patch was committed.
Within a few months, I faced the same problem again when working on resizing shared buffers without server restart. The buffer manager maintains a buffer look table in the form of a hash table to map a page to buffer. When the number of configured buffers changes upon a server restart the size of buffer lookup table also changes. Doing that in a running server would be significant work. To avoid that, we could create a buffer lookup table large enough to accommodate future buffer size needs. Even if the buffer pool shrinks or expands, the size of the buffer lookup table would not change. As long as the expansion is within the buffer lookup table size limit, it could be done without a restart. Buffer lookup table isn't as large as the buffer pool itself, thus wasting a bit of memory can be consi
[...]In Part 1 of this series, we discussed what active-active databases are and identified some “good” reasons for considering them, primarily centered around extreme high availability and critical write availability during regional outages. Now, let’s turn our attention to the less compelling justifications and the substantial challenges that come with implementing such a setup.
Last week I posted about how we often don’t pick the optimal plan. I got asked about difficulties when trying to reproduce my results, so I’ll address that first (I forgot to mention a couple details). I also got questions about how to best spot this issue, and ways to mitigate this. I’ll discuss that too, although I don’t have any great solutions, but I’ll briefly discuss a couple possible planner/executor improvements that might allow handling this better.
PostgreSQL 19 development is now officially under way, so from now on any new features will be committed to that version. Any significant PostgreSQL 18 changes (e.g. reversions or substantial changes to already committed features) will be noted here separately (there were none this week).
PostgreSQL 19 changes this weekThe first round of new PostgreSQL 19 features is here:
new object identifier type regdatabase , making it easier look up a database's OID COPY FROM now supports multi-line headers cross-type operator support added to contrib module btree_gin : non-array variants of function width_bucket() now permit operand input to be NaNWe are excited to announce the schedule for PGDay UK 2025 has been published. We've got an exciting line up for talks over a range of topics. There will be something for everyone attending.
Take a look at what we have going on: https://pgday.uk/events/pgdayuk2025/schedule/
We'd like to extend our gratitude to the whole CFP team, who did an amazing job selecting the talks to make up the schedule.
Thank you to all speakers whom submitted talks, it's always a shame that we can't accept all, and as ever it's a tough choice to choose the talks for the schedule. Be it your 100th time or 1st time submitting a talk, we hope you submit again in the future and at other PostgreSQL Europe events.
PGDay UK 2025 is taking place in London on September 9th, so don't forget to register for PGDay UK 2025, before it's too late!
The shared presentations are online, as are a couple of recordings and turtle-loading have-a-cup-of-tea locally stored photos.
Using the well known and broadly spread technique of inductive reasoning we came to the conclusion that this fourth PGConf.be conference was a success, as well as the art work. No animals or elephants we’re hurt during this event.
The statistics are
60 attendants
depending on the session, an extra 60 to 150 students attended as well
10 speakers
2 sponsors
This conference wouldn’t have been possible without the help of volunteers.
To conclude a big thank you to all the speakers, sponsors and attendants.
Without them a conference is just a like tee party.
Having attended PGConf.DE'2025 and discussed the practice of using Postgres on large databases there, I was surprised to regularly hear the opinion that query planning time is a significant issue. As a developer, it was surprising to learn that this factor can, for example, slow down the decision to move to a partitioned schema, which seems like a logical step once the number of records in a table exceeds 100 million. Well, let's figure it out.
The obvious way out of this situation is to use prepared statements, initially intended for reusing labour-intensive parts such as parse trees and query plans. For more specifics, let's look at a simple table scan with a large number of partitions (see initialisation script):
EXPLAIN (ANALYZE, COSTS OFF, MEMORY, TIMING OFF)
SELECT * FROM test WHERE y = 127;
/*
...
-> Seq Scan on l256 test_256
Filter: (y = 127)
Planning:
Buffers: shared hit=1536
Memory: used=3787kB allocated=4104kB
Planning Time: 61.272 ms
Execution Time: 4.929 ms
*/
In this scenario involving a selection from a table with 256 partitions, my laptop's PostgreSQL took approximately 60 milliseconds for the planning phase and only 5 milliseconds for execution. During the planning process, it allocated 4 MB of RAM and accessed 1,500 data pages. Quite substantial overhead for a production environment! In this case, PostgreSQL has generated a custom plan that is compiled anew each time the query is executed, choosing an execution strategy based on the query parameter values during optimisation. To improve efficiency, let's parameterise this query and store it in the 'Plan Cache' of the backend by executing PREPARE:
PREPARE tst (integer) AS SELECT * FROM test WHERE y = $1;
EXPLAIN (ANALYZE, COSTS OFF, MEMORY, TIMING OFF) EXECUTE tst(127);
/*
...
-> Seq Scan on l256 test_256
Filter: (y = $1)
Planning:
Buffers: shared hit=1536
Memory: used=3772kB allocated=4120kB
Planning Time: 59.525 ms
Execution Time: 5.184 ms
*/
The planning workload remains the same s
[...]Over the last decade, when working on databases where UUID Version 41 was picked as the primary key data type, these databases usually have bad performance and excessive IO.
UUID is a native data type that can be stored as binary data, with various versions outlined in the RFC. Version 4 is mostly random bits, obfuscating information like when the value was created, or where it was generated.
Version 4 UUIDs are easy to work with in Postgres as the gen_random_uuid()
2 function generates values natively since version 13 (2020).
I’ve learned there are misconceptions about UUID Version 4, and sometimes the reasons users pick this data type is based on them.
Because of the poor performance, misconceptions, and available alternatives, I’ve come around to a simple position: Avoid UUID Version 4 for primary keys.
My more controversial take is to avoid UUIDs in general, but I understand there are some legitimate scenarios where there aren’t practical alternatives.
As a database enthusiast, I wanted to have an articulated position on this classic “Integer v. UUID” debate.
Among databases folks, debating these alternatives may be tired and clichéd. However, from my consulting work, I can say that I’m working on databases with UUID v4 as the primary key in 2024 and 2025, and seeing the issues discussed in this post.
Let’s dig in.
uuid
data type in Postgres
Although unreleased as of this writing, and pulled from Postgres 17 previously, UUID V7 is part of Postgres 184 scheduled for release in the Fall of 2025.
What kind of app database
[...]PostgreSQL user groups are a fantastic way to build new connections and engage with the local community. Last week, I had the pleasure of speaking at the Stuttgart meetup, where I gave a talk on “Operating PostgreSQL as a Data Source for Analytics Pipelines.”
Below are my slides and a brief overview of the talk. If you missed the meetup but would be interested in an online repeat, let me know in the comments below!
As modern analytics pipelines evolve beyond simple dashboards into real-time and ML-driven environments, PostgreSQL continues to prove itself as a powerful, flexible, and community-driven database.
In my talk, I explored how PostgreSQL fits into modern data workflows and how to operate it effectively as a source for analytics.
PostgreSQL is widely used for OLTP workloads – but can it serve OLAP needs as well? With physical or logical replication, PostgreSQL can act as a robust data source for analytics, enabling teams to offload read-intensive queries without compromising production.
Physical replication provides an easy-to-operate, read-only copy of your production PostgreSQL database. It lets you use the full power of SQL and relational features for reporting – without the risk of data scientists or analysts impacting production. It offers strong performance, though with some limitations: no materialized views, no temporary tables, and limited schema flexibility. Honestly, there are more ways analysts could harm production even from the replica side.
Logical replication offers a better solution:
However, it also brings complexity – especially around DDL handling, failover, and more awareness from participating teams.
Data analytics in 2025 is more than jus
[...]Housekeeping announcements:
this website's PostgreSQL installation is now on version 17 ( insert champagne emoji here ) the search function now works properly with non-ASCII characters ( there was an embarrassing oversight which went unnoticed until someone kindly pointed it out ) PostgreSQL 18 changes this weekThis week there have been a couple of renamings:
psql 's meta-command \close was renamed \close_prepared pg_createsubscriber 's option --remove was renamed to --cleanBoth of these items were added during the PostgreSQL 18 development cycle so do not have any implications for backwards compatibility.
There have also been some fixes related to:
comments on NOT NULL constraints virtual/generated columnsSee below for other commits of note.
The basic promise of a query optimizer is that it picks the “optimal” query plan. But there’s a catch - the plan selection relies on cost estimates, calculated from selectivity estimates and cost of basic resources (I/O, CPU, …). So the question is, how often do we actually pick the “fastest” plan? And the truth is we actually make mistakes quite often.
Consider the following chart, with durations of a simple SELECT
query with a range condition. The condition is varied to match different fractions of the table, shown on the x-axis (fraction of pages with matching rows). The plan is forced to use different scan methods using enable_
options, and the dark points mark runs when the scan method “won” even without using the enable_
parameters.
It shows that for selectivities ~1-5% (the x-axis is logarithmic), the planner picks an index scan, but this happens to be a poor choice. It takes up to ~10 seconds, and a simple “dumb” sequential scan would complete the query in ~2 seconds.
PgPedia Week has been delayed this week due to malaise and other personal circumstances.
PostgreSQL 18 changes this week pg_dump has gained the ability to dump statistics on foreign tables various bugfixes for the new amcheck function gin_index_check() PostgreSQL 18 articles PostgreSQL 18 just dropped: 10 powerful new features devs need to know (2025-06-20) - Devlink Tips Preserve optimizer statistics during major upgrades with PostgreSQL v18 (2025-06-17) - Laurenz Albe / CYBERTECWhen running a PostgreSQL database in a High Availability (HA) cluster, it’s easy to assume that having multiple nodes means your data is safe. But HA is not a replacement for backups. If someone accidentally deletes important data or runs a wrong update query, that change will quickly spread to all nodes in the cluster. Without proper safeguards, that data is gone everywhere. In these cases, only a backup can help you restore what was lost.
The case mentioned above isn’t the only reason backups are important. In fact, many industries have strict compliance requirements that make regular backups mandatory. This makes backups essential not just for recovering lost data, but also for meeting regulatory standards.
Barman is a popular tool in the PostgreSQL ecosystem for managing backups, especially in High Availability (HA) environments. It’s known for being easy to set up and for offering multiple types and modes of backups. However, this flexibility can also be a bit overwhelming at first. That’s why I’m writing this blog to break down each backup option in a simple and clear way, so you can choose the one that best fits your business needs.
Barman primarily supports three backup types through various combinations of backup methods. The table below highlights the/z key differences between them
Feature | Full Backup | Incremental Backup | Differential backup |
Data Captured | An entire database at a specific point in time. | Changes since the last backup (full or incremental). | Changes since the last full backup. |
Relative To | N/A (it’s the foundation) | Previous backup (full or incremental). | The last full backup. |
Even the most experienced database professionals are known to feel a little anxious when peering into an unfamiliar database. Hopefully, they inspect to see how the data is normalized and how the various tables are combined to answer complex queries. Entity Relationship Maps (ERM) provide a visual overview of how tables are related and can document the structure of the data.
The Community Edition of DBeaver can provide an ERM easily. First, connect to the database. Right-click on the 'Tables' and then select 'View Diagram'.
The ERM is then displayed.
And it gets better! If you are not sure how two tables are joined? Click on the link between the tables, and the names of the columns you will want to use in your JOIN statement are highlighted.
Exploring an unfamiliar database can be daunting. But Entity Relationship Maps provide a way to navigate the territory. And Dbeaver is a fantastic tool for working with databases.
Explore the new readiness probe introduced in CloudNativePG 1.26, which advances Kubernetes-native lifecycle management for PostgreSQL. Building on the improved probing infrastructure discussed in my previous article, this piece focuses on how readiness probes ensure that only fully synchronised and healthy instances—particularly replicas—are eligible to serve traffic or be promoted to primary. Special emphasis is placed on the streaming
probe type and its integration with synchronous replication, giving administrators fine-grained control over failover behaviour and data consistency.
The registration for PGConf.EU 2025, which will take place on 21-24 October in Riga, is now open.
We have a limited number of tickets available for purchase with the discount code EARLYBIRD.
This year, the first day of training sessions has been replaced with a Community Events Day. This day has a more limited space, and can be booked as part of the conference registration process or added later, as long as seats last.
We hope you will be able to join us in Riga in October!
With First Steps with Logical Replication we set up a basic working replication between a publisher and a subscriber and were introduced to the fundamental concepts. In this article, we're going to expand on the practical aspects of logical replication operational management, monitoring, and dive deep into the foundations of logical decoding.
As we demonstrated in the first part, when setting up the subscriber, you can choose (or not to) to rely on initial data copy using the option WITH (copy_data = false)
. While the default copy is incredibly useful behavior, this default has characteristics you should understand before using it in a production environment.
The mechanism effectively asks the publisher to copy the table data by taking a snapshot (courtesy of MVCC), sending it to the subscriber, and thanks to the replication slot "bookmark," seamlessly continues streaming the changes from the point the snapshot was taken.
Simplicity is the key feature here, as a single command handles the snapshot, transfer, and transition to ongoing streaming.
The trade-off you're making is when it comes to performance, solely due to the fact that it's using a single process per table. While it works almost instantly for test tables, you will encounter notable delay and overhead when dealing with tables with gigabytes of data.
Although parallelism can be controlled by the max_sync_workers_per_subscription
configuration parameter, it still might leave you waiting for hours (and days) for any real-life database to get replicated. You can monitor whether the tables have already been synchronized or are still waiting/in progress using the pg_subscription_rel
catalog.
SELECT srrelid::regclass AS table_name, srsubstate
FROM pg_subscription_rel;
Where each table will have one of the following states:
i
not yet started
d
copy is in progress
s
syncing (or waiting for confirmation)
r
done & replicating
Luckily, the state r
indicates that the s
On June, 9th, Andrzej Nowicki held a talk about “From Queries to Pints: Building a Beer Recommendation System with pgvector” at the Malmö PostgreSQL User Group.
On June, 3rd the 5. PostgreSQL User Group NRW MeetUp took place in Germany.
Speakers:
* Josef Machytka about “Boldly Migrate to PostgreSQL - Introducing credativ-pg-migrator" * Mathis Rudolf about "PostgreSQL Locking – Das I in ACID" * Christoph Berg about "Modern VACUUM"
POSETTE: An Event for Postgres 2025 took place June 10-12 online. Organized by:
Talk Selection Team: * Claire Giordano * Daniel Gustafsson * Krishnakumar “KK” Ravi * Melanie Plageman
Hosts: * Adam Wolk * Boriss Mejías * Claire Giordano * Derk van Veen * Floor Drees * Melanie Plageman * Thomas Munro
Speakers: * Abe Omorogbe * Adam Wolk * Alexander Kukushkin * Amit Langote * Ashutosh Bapat * Bohan Zhang * Boriss Mejías * Bruce Momjian * Charles Feddersen * Chris Ellis * Cédric Villemain * David Rowley * Derk van Veen * Ellyne Phneah * Gayathri Paderla * Heikki Linnakangas * Jan Karremans * Jelte Fennema-Nio * Jimmy Zelinskie * Johannes Schuetzner * Karen Jex * Krishnakumar “KK” Ravi * Lukas Fittl * Marco Slot * Matt McFarland * Michael John Pena * Nacho Alonso Portillo * Neeta Goel * Nitin Jadhav * Palak Chaturvedi * Pamela Fox * Peter Farkas * Philippe Noël * Polina Bungina * Rahila Syed * Robert Haas * Sandeep Rajeev * Sarah Conway * Sarat Balijapelli * Shinya Kato * Silvano Coriani * Taiob Ali * Tomas Vondra * Varun Dhawan * Vinod Sridharan
Number of posts in the past two months
Number of posts in the past two months
Get in touch with the Planet PostgreSQL administrators at planet at postgresql.org.