Latest Blog Posts

Contributions for the week of 2025-06-02 (Week 23)
Posted by Boriss Mejias in postgres-contrib.org on 2025-06-12 at 11:12

PG Day France 2025 took place on June 3 and 4 in Mons, Belgium, organized by Leila Bakkali, Matt Cornillon, Stefan Fercot, Flavio Gurgel, Anthony Nowocien, and Julien Riou, with the help of Jean-Paul Argudo, Laetitia Avrot, Sylvain Beorchia, Damien Clochard, Yves Collin, Bertrand Drouvot, Cédric Duprez, Maxime Longuet, Helene Nguyen, and Anaïs Oberto as volunteers.

Speakers:

First steps with Logical Replication in PostgreSQL
Posted by Radim Marek on 2025-06-11 at 00:00

Most applications start with a single PostgreSQL database, but over time, the need to scale out, distribute the load, or integrate naturally arises. PostgreSQL's logical replication is one of the features that meets these demands by streaming row-level changes from one PostgreSQL instance to another, all using a publish-subscribe model. Logical replication is more than an advanced feature; it provides a flexible framework you can build on to further distribute and integrate PostgreSQL within your architecture.

In this article, we will start with the foundation, explore the core ideas behind logical replication, and learn how to use it.

Physical vs. Logical Replication

Before we can dive deeper, let's understand the role of replication in PostgreSQL and how it's built on top of the Write-Ahead Log (WAL).

The WAL is a sequential, append-only log that records every change made to the cluster data. For durability purposes, all modifications are first written to the WAL and only then permanently written to disk. This allows PostgreSQL to recover from crashes by replaying logged changes.

Versioned changes, necessitated by concurrent transactions, are managed through Multi-Version Concurrency Control (MVCC). Instead of overwriting data directly, MVCC creates multiple versions of rows, allowing each transaction to see a consistent snapshot of the database. It is the WAL that captures these versioned changes along with the transactional metadata to ensure data consistency at any given point in time.

Physical replication is built directly on the Write-Ahead Log. It enables streaming of the binary WAL data from the primary server to one or more standby servers (replicas), effectively creating a byte-for-byte copy of the entire cluster. This requirement makes the replicas read-only, making them ideal candidates for failover or scaling purposes.

Compared to this, Logical replication, while also being built on top of the WAL data, takes a fundamentally different approach. Instead of streaming ra

[...]

Checklist: Is Your PostgreSQL Deployment Production-Grade?
Posted by Umair Shahid in Stormatics on 2025-06-10 at 14:00

One of the things I admire most about PostgreSQL is its ease of getting started.

I have seen many developers and teams pick it up, launch something quickly, and build real value without needing a DBA or complex tooling. That simplicity is part of what makes PostgreSQL so widely adopted.

However, over time, as the application grows and traffic increases, new challenges emerge. Queries slow down, disk usage balloons, or a minor issue leads to unexpected downtime.

This is a journey I have witnessed unfold across many teams. I don’t think of it as a mistake or an oversight; it is simply the natural progression of a system evolving from development to production scale.

The idea behind this blog is to help you assess your current situation and identify steps that can enhance the robustness, security, and scalability of your PostgreSQL deployment.

1. Architecture: Is Your Deployment Designed to Withstand Failure?

As systems grow, so does the need for resilience. What worked fine on a single node during development might not hold up in production.

Questions to ask:

  • Are you still on a single-node setup?
  • Do you have at least one streaming replica?
  • Is failover possible — and tested?

Setting up high availability is about pre-emptive measures to ensure that your users continue to be serviced even in the face of software or hardware failures. Even a basic primary-replica setup can make a big difference. Add a failover tool like Patroni or repmgr, and you are well on your way to building a more resilient PostgreSQL foundation.

2. Configuration: Is PostgreSQL Tuned for Your Workload?

PostgreSQL’s defaults are intentionally conservative — they prioritize compatibility, not performance. That is great for getting started, but less ideal for scaling.

What to look for:

  • Is shared_buffers still set to 128MB?
  • Have you tuned work_mem or
[...]

PgPedia Week, 2025-06-08
Posted by Ian Barwick on 2025-06-10 at 10:11
PostgreSQL 18 changes this week

Of note:

new GUC log_lock_failure renamed to log_lock_failures commit e5a3c9d9 (" postgres_fdw: Inherit the local transaction's access/deferrable modes. ") was reverted PostgreSQL 18 articles What's New in PostgreSQL 18 - a DBA's Perspective (2025-05-23) - Tianzhou

more...

Using regular expressions and arrays in PostgreSQL
Posted by Hans-Juergen Schoenig in Cybertec on 2025-06-10 at 08:00

Regular expressions and PostgreSQL have been a great team for many many years. The same is true for PostgreSQL arrays, which have been around for a long time as well. However, what people rarely do is combine those two technologies into something more powerful that can be used for various purposes.

Using ANY and ALL

One of the most widely adopted ways of dealing with arrays is the idea of ANY and ALL, which has been supported by PostgreSQL since God knows when. Those two keywords allow us to figure out if a certain value and an array are a match.

Here are some examples:

```sql
test=# SELECT array_agg(x) FROM generate_series(1, 5) AS x;
  array_agg  
-------------
 {1,2,3,4,5}
(1 row)

test=#  SELECT 3 = ANY(array_agg(x)) 
        FROM   generate_series(1, 5) AS x;
 ?column? 
----------
 t
(1 row)

test=#  SELECT 3 = ALL(array_agg(x)) 
        FROM   generate_series(1, 5) AS x;
 ?column? 
----------
 f
(1 row)
```

The first statement simply generates an array of numbers, which can be used in my example to demonstrate how ANY and ALL work. The idea is simple: ANY will check if one of the values in the array matches - ALL will check if all of the values are a match. So far this is quite common.

However, what happens if we try to apply this concept to regular expressions?

PostgreSQL and regular expressions

Many readers might be surprised to learn that combining those two techniques is indeed possible and actually fairly straight forward. Note that in the example above ANY and ALL were essentially used in combination with the = operator. However, we can also apply the ~ operator, which is the PostgreSQL way of handling regular expressions:

```sql
test=# SELECT 'my fancy string' ~ '.*ancy.*ri.+$';
 ?column? 
----------
 t
(1 row)
```

What it essentially does is matching the regular expression on the right hand side of the operator with the string on the left. So far so good, but what happens if we expand on this a bit?

Here is what we can do:

```sql
test=# SELECT   array_agg(exp)
[...]

pgsql_tweaks 0.11.3 Released
Posted by Stefanie Janine on 2025-06-09 at 22:00

pgsql_tweaks is a bundle of functions and views for PostgreSQL

The source code is available on GitLab, a mirror is hosted on GitHub.
One could install the whole package, or just copy what is needed from the source code.

The extension is also available on PGXN.

General changes

PostgreSQL 18 Support

No code has been changed. Tests against PostgreSQL 18 beta 1 have been unsuccessful.

Therefore PostgreSQL 18 is now supoorted by pgsql_tweaks.

The reason it took some time to test against the upcoming version is that there have been some problems with the Docker containers, to have them running PostgreSQL 18 as there have been changes, that made it a bit hard to get that version running.
I execute the tests always against Docker containers.

Teresa Lopes
Posted by Andreas 'ads' Scherbaum on 2025-06-09 at 14:00
PostgreSQL Person of the Week Interview with Teresa Lopes: I am Teresa Lopes, born in Lisbon, Portugal. While living in Portugal, I always divided my time between Lisbon (house/school/work) and Sertã, Castelo Branco (weekends/holidays). At the end of 2023, I decided to move to Berlin and more recently relocated to Amsterdam (got to love the EU 💙💛).

PGConf.EU 2025 - Call for Presentations
Posted by Magnus Hagander in PostgreSQL Europe on 2025-06-09 at 13:04

The Call for Presentations for PostgreSQL Conference Europe that will take place in Riga, Latvia from October 21 to 24 is now open.

Key dates

  • Submission deadline: 30 June (at midnight, local time Riga, Latvia; GMT+3)
  • Speakers notified: before 1 August
  • Conference: 21-24 October

To submit your proposals and for more information see our website. Speakers can submit up to three (3) proposed talks each.

Session format

There will be a mix of shorter (25 min) and longer (45 min) sessions, which will be held in English. They may be on any topic related to PostgreSQL.

Selection process

The proposals will be considered by committee who will produce a schedule to be published nearer the conference date. The members of the talk selection committee will be listed soon on the conference website.

For more details about sponsoring the event, see the website.

For any questions, contact us at contact@pgconf.eu.

CORE Database Schema Design: Constraint-driven, Optimized, Responsive, and Efficient
Posted by Andrew Atkinson on 2025-06-09 at 00:00

Introduction

In this post, we’ll cover some database design principles and package them up into a catchy mnemonic acronym.

Software engineering is loaded with acronyms like this. For example, SOLID principles describe 5 principles, Single responsibility, Open-closed, Liskov substitution, Interface segregation and Dependency inversion, that promote good object-oriented design.

Databases are loaded with acronyms, for example “ACID” for the properties of a transaction, but I wasn’t familiar with one the schema designer could keep in mind while they’re working.

Thus, the motivation for this acronym was to help the schema designer, by packaging up some principles of good design practices for database schema design. It’s not based in research or academia though, so don’t take this too seriously. That said, I’d love your feedback!

Let’s get into it.

Picking a mnemonic acronym

In picking an acronym, I wanted it to be short and have each letter describe a word that’s useful, practical, and grounded in experience. I preferred a real word for memorability!

The result was “CORE.” Let’s explore each letter and the word behind it.

Constraint-Driven

The first word (technically two) is “constraint-driven.” Relational databases offer rigid structures, but the ability to be changed while online, a form of flexibility in their evolution. We evolve their structure through DDL. They use data types and constraints changes, as new entities and relationships are added.

Constraint-driven refers to leveraging all the constraint objects available, designing for our needs today, but also in a more general sense to apply restrictions to designs in the pursue of data consistency and quality.

Let’s look at some examples. Choose the appropriate data types, like a numeric data type and not a character data type when storing a number. Use NOT NULL for columns by default. Create foreign key constraints for table relationships by default.

Validate expected data inputs using check constraints. F

[...]

Understanding High Water Mark Locking Issues in PostgreSQL Vacuums
Posted by Shane Borden on 2025-06-06 at 20:49

I recently had a customer that wanted to leverage read replicas to ensure that their read queries were not going to impeded with work being done on the primary instance and also required an SLA of at worst a few seconds. Ultimately they weren’t meeting the SLA and my colleagues and I were asked to look at what was going on.

The first thing we came to understand is that the pattern of work on the primary is a somewhat frequent large DELETE statement followed by a data refresh accomplished by a COPY from STDIN command against a partitioned table with 16 hash partitions.

The problem being observed was that periodically the SELECTs occurring on the read replica would time out and not meet the SLA. Upon investigation, we found that the “startup” process on the read replica would periodically request an “exclusive lock” on some random partition. This exclusive lock would block the SELECT (which is partition unaware) and then cause the timeout. But what is causing the timeout?

After spending some time investigating, the team was able to correlate the exclusive lock with a routine “autovacuum” occurring on the primary. But why was it locking? After inspection of the WAL, it turns out that it the issue was due to a step in the vacuum process whereby it tries to return free pages at the end of the table back to the OS, truncation of the High Water Mark (HWM). Essentially the lock is requested on the primary and then transmitted to the replica via the WAL so that the tables can be kept consistent.

To confirm that it was in fact the step in VACUUM that truncates the HWM, we decided to alter each partition of the table to allow VACUUM to skip that step:

ALTER TABLE [table name / partition name] SET (vacuum_truncate = false);

After letting this run for 24 hours, we in fact saw no further blocking locks causing SLA misses on the replicas. Should we worry about shrinking the High Water Mark (HWM)? Well as with everything in IT, it depends. Other DBMS engines like Oracle do not shrink the

[...]

Just announced: PG DATA 2026
Posted by Henrietta Dombrovskaya on 2025-06-06 at 13:21

Chicago is better in summer! Our inaugural event, PG DATA 2026, will kick off 363 days from today, and we want you to be a part of it!

PG DATA’s mission is to encourage the growth of the PostgreSQL community in the Midwest through learning and networking opportunities, to foster collaboration with the developers’ community and academia, and to bring the best global Postgres speakers to our local community.

More details are coming soon! Watch for further announcements, and do not plan your vacation for the first week of June 2026. Alternately, make Chicago your vacation destination—our beautiful city has a lot to offer, especially in summer!

Call for Sponsors is open!

Avoiding disk spills due to PostgreSQL's logical replication
Posted by Ashutosh Bapat on 2025-06-06 at 09:40

Logical replication is a versatile feature offered in PostgreSQL. I have discussed the the theoretical background of this feature in detail in my POSETTE talk. At the end of the talk, I emphasize the need for monitoring logical replication setup. If you are using logical replication and have setup monitoring you will be familiar with pg_stat_replication_slots. In some cases this view shows high amount of spill_txnsspill_count and spill_bytes, which indicates that the WAL sender corresponding to that replication slot is using high amount of disk space. This increases load on the IO subsystem affecting the performance. It also means that there is less disk available for user data and regular transactions to operate. This is an indication that logical_decoding_work_mem has been configured too low. That's the subject of this blog: how to decide the right configuration value for logical_decoding_work_mem. Let's first discuss the purpose of this GUC. Blog might serve as a good background before reading further.

Reorder buffer and logical_decoding_work_mem

When decoding WAL, a logical WAL sender accumulates the transaction in an in-memory data structure called reorder buffer. For every transaction that WAL sender encounters, it maintains a queue of changes in that transaction. As it reads each WAL records, it finds the transaction ID which it belongs to and adds it to the corresponding queue of changes. As soon as it sees a COMMIT record of a transaction, it decodes all the changes in the corresponding queue and sends downstream. If the reorder buffer fills up by transactions whose COMMIT record is yet to be seen, it spills the queue to the disk. We see such disk spills accounted in spill_txnsspill_count and spill_bytes. The amount of memory allocated to reorder buffer is decided by logical_decoding_work_mem GUC. If GUC value is lower, it will cause high disk spills and if the value is higher it will waste memory. Every WAL sender in the server will allocate logical_decoding_work_mem amount of memory, thu
[...]

CNPG Recipe 18 - Getting Started with pgvector on Kubernetes Using CloudNativePG
Posted by Gabriele Bartolini in EDB on 2025-06-05 at 20:42

Learn how to set up a PostgreSQL cluster with the pgvector extension on Kubernetes using CloudNativePG—all in a fully declarative way. This article walks you through the process in just a few minutes, from cluster creation to extension installation.

Benchmarking is hard, sometimes ...
Posted by Tomas Vondra on 2025-06-05 at 10:00

I do a fair number of benchmarks, not only to validate patches, but also to find interesting (suspicious) stuff to improve. It’s an important part of my development workflow. And it’s fun ;-) But we’re dealing with complex systems (hardware, OS, DB, application), and that brings challenges. Every now and then I run into something that I don’t quite understand.

Consider a read-only pgbench, the simplest workload there is, with a single SELECT doing lookup by PK. If you do this with a small data set on any machine, the expectation is near linear scaling up to the number of cores. It’s not perfect, CPUs have frequency scaling and power management, but it should be close.

Some time ago I tried running this on a big machine with 176 cores (352 threads), using scale 50 (about 750MB, so tiny - it actually fits into L3 on the EPYC 9V33X CPU). And I got the following chart for throughput with different client counts:

results for read-only pgbench on a system with 176 cores

This is pretty awful. I still don’t think I entirely understand why this happens, or how to improve the behavior. But let me explain what I know so far, what I think may be happening, and perhaps someone will correct me or have an idea how to fix it.

Approximate the p99 of a query with pg_stat_statements
Posted by Michael Christofides on 2025-06-04 at 15:58

Cover photo by Luca Upper

I recently saw a feature request for pg_stat_statements to be able to track percentile performance of queries, for example the p95 (95th percentile) or p99 (99th percentile).

That would be fantastic, but isn’t yet possible. In the meantime, there is a statistically-dodgy-but-practically-useful (my speciality) way to approximate them using the mean and standard deviation columns in pg_stat_statements.

Why bother?

When wondering what our user experience is like across different queries, we can miss issues if we only look at things by the average time taken.

For example, let’s consider a query that takes on average 100ms but 1% of the time it takes over 500ms (its p99), and a second query that takes on average 110ms but with a p99 of 200ms. It is quite possible that the first query is causing more user dissatisfaction, despite being faster on average.

Brief statistics refresher

The standard deviation is a measure of the amount of variation from the mean. Wider distributions of values have larger standard deviations.

pg_stat_statements has mean_exec_time (mean execution time) and mean_plan_time (mean planning time) columns, but no median equivalents. The other columns we’ll be using for our approximation calculation are stddev_exec_time and stddev_plan_time.

In a perfectly normally distributed data set, the p90 is 1.28 standard deviations above the mean, the p95 is 1.65, and the p99 is 2.33.

Our query timings are probably not normally distributed, though. In fact, many will have a longer tail on the slow end, and some will have a multimodal distribution (with clustering due to things like non-evenly distributed data and differing query plans).

Having said that, even though many of our query timings are not normally distributed, queries with a high p99 are very likely to also have a high mean-plus-a-couple-of-standard-deviations, so if we approximate the p99 assuming a normal distribution, the results should be directionally correct.

Ju

[...]

Ultimate Guide to POSETTE: An Event for Postgres, 2025 edition
Posted by Claire Giordano in CitusData on 2025-06-04 at 15:26

POSETTE: An Event for Postgres 2025 is back for its 4th year—free, virtual, and packed with deep expertise. No travel needed, just your laptop, internet, and curiosity.

This year’s 45 speakers are smart, capable Postgres practitioners—core contributors, performance experts, application developers, Azure engineers, extension maintainers—and their talks are as interesting as they are useful.

The four livestreams (42 talks total) run from June 10-12, 2025. Every talk will be posted to YouTube afterward (un-gated, of course). But if you can join live, I hope you do! On the virtual hallway track on Discord, you’ll be able to chat with POSETTE speakers—as well as other attendees. And yes, there will be swag.

This “ultimate guide” blog post is your shortcut to navigating POSETTE 2025. In this post you’ll get:

“By the numbers” summary for POSETTE 2025

Here’s a quick snapshot of what you need to know about POSETTE:

About POSETTE: An Event for Postgres 2025
3 days June 10-12, 2025
4 livestreams In Americas & EMEA time zones (but of course you can watch from anywhere)
42 talks All free, all virtual
2 keynotes From Bruce Momjian & Charles Feddersen
45 speakers PG contributors, users, application developers, community members, & Azure engineers
[...]

Contributions for the week of 2025-05-19 (Week 21)
Posted by Boriss Mejias in postgres-contrib.org on 2025-06-04 at 10:06

[PGDay Blumenau 2025] (https://pgdayblumenau.com.br/) took place May 24 in Blumenau, Brazil, organized by João Foltran and Gustavo Lemos, with Sara Kruger and Leonardo Corsini as volunteers.

Speakers:

Boldly Migrate to PostgreSQL – Introducing credativ-pg-migrator
Posted by Josef Machytka in credativ on 2025-06-03 at 06:00

Many companies these days are thinking about migrating their databases from legacy or proprietary system to PostgreSQL. The primary aim is to reduce costs, enhance capabilities, and ensure long-term sustainability. However, even just the idea of migrating to PostgreSQL can be overwhelming. Very often, knowledge about the legacy applications is limited or even lost. In some cases, vendor support is diminishing, and expert pools and community support are shrinking. Legacy databases are also often running on outdated hardware and old operating systems, posing further risks and limitations. (more…)

PgPedia Week, 2025-06-01
Posted by Ian Barwick on 2025-06-02 at 23:00

If you, like me, set up read access to the PostgreSQL Git repository many years ago, and have been wondering why it's been returning fatal: Could not read from remote repository errors for the past few days, it's because git:// protocol support has been deactivated and maybe withdrawn entirely (see pgsql-www thread git repo "https://" working but "git://" is not ).

Resolution is to convert it to https:// , e.g.:

git remote set-url origin https://git.postgresql.org/git/postgresql.git

PostgreSQL 18 changes this week

A very quiet week, with commits-of-interest consisting of smaller changes togther with a bunch of back-patched fixes.

PostgreSQL 18 articles Postgres Extensions: Use PG_MODULE_MAGIC_EXT (2025-05-29) - David E. Wheeler discusses the changes to extension metadata added in commit 9324c8c5

more...

Yes, Postgres can do session vars - but should you use them?
Posted by Kaarel Moppel on 2025-06-02 at 21:00
Animated by some comments / complaints about Postgres’ missing user variables story on a Reddit post about PostgreSQL pain points in the real world - I thought I’d elaborate a bit on sessions vars - which is indeed a little known Postgres functionality. Although this “alley” has existed for ages...

The Fun of Open Source: Roman Numerals in PostgreSQL
Posted by Laurenz Albe in Cybertec on 2025-06-02 at 18:39

Roman numeral support in PostgreSQL taken too far: a screen with psql executing a query from generate_series(), and all numbers including the "n rows" are roman
© Laurenz Albe 2025

Recently, I wrote about the power of open source. Now, power is good and important, but open source software has other good sides as well. One of these aspects is fun. I will showcase that with the recently introduced support for converting Roman numerals to numbers.

The driving forces behind PostgreSQL development

Before we turn to the Roman numerals, let's take a step back. People with little knowledge about open source software sometimes believe that open source software is something that computer nerds write in their spare time. But developing open source software is not primarily a pastime for people with nothing better to do. The driving force behind most open source software development is actually money, directly or indirectly. Sometimes customers sponsor the development of a feature they need. Companies that live by selling PostgreSQL services or add-on products invest time and money into development so that the software is attractive to users.

But what drives the people who actually write the software? Well, certainly it is the money they get paid for it. But there is more to the picture:

  • Contributions to open source software are visible and earn you the respect of others
  • You want the tools you have to work with to be as good as possible (Did you ever wonder why psql has so many features?)
  • You develop emotional ties to the software and want it to be good
  • There is actually some fun to be had

Fun in PostgreSQL

Reading the PostgreSQL documentation or using the software you probably don't see a lot of fun. Both speak to you in a technical, no-nonsense fashion. You have to look to the places where people communicate about PostgreSQL. This is primarily the mailing list pgsql-hackers. Sure enough, discussions there are fact-oriented and sometimes heated, but there is room for fun.

But most of the fun with PostgreSQL happens at conferences. Read my article about PostgreSQL conferences and you will know what I mean.

Why do we need su

[...]

Dilan Tek
Posted by Andreas 'ads' Scherbaum on 2025-06-02 at 14:00
PostgreSQL Person of the Week Interview with Dilan Tek: My name is Dilan. I was born in Ankara, the capital of Turkiye. I love my city very much! Although I lived in Istanbul for a while for work, I returned to Ankara.

Postgres Partitioning Best Practices
Posted by Karen Jex in Crunchy Data on 2025-06-02 at 08:49

Slides and transcript from my talk, "Postgres Partitioning Best Practices", at PyCon Italia in Bologna on 29 May 2025.

Thank you to everyone who came to listen, apologies to the people who were turned away because the room was full (who knew so many people would want to learn about Partitioning!), and thank you for all the questions, which have given me lots of ideas for improvements.

I'll share the recording as soon as it's available.

Postgres Partitioning Best Practices



whoami: photo of Karen and representation of DBA career

I always feel the need to get this confession out of the way before I get too far in to the talk [at developer conferences] - I’m not a developer. Sorry!

But as you can see in this diagram of my career so far, I at least know about databases. I was a DBA for 20 years before becoming a database consultant, and now a senior solutions architect. I don't have "database" in my job title any more, but I still only work with database systems, specifically PostgreSQL.

I’ve worked with a lot of developers during that time and I’ve learnt a lot from them, and in return I try to share some of my database knowledge.

The photo is me in my happy place, on a local bike trail, splattered in mud!

I’m also on the PostgreSQL Europe board of directors and I’m leading the PostgreSQL Europe Diversity Task Force so feel free to find me to talk to me about that.



Looking at a huge table, wondering what to do

If you have huge database tables, or if you expect to have huge database tables in the future, you'll probably start to think about partitioning to make them easier to manage. You probably have lots of questions about how to do that and what the best practices are.

I work with lots of customers who have already been through that thought process, and this presentation is based on the questions they asked along the way, and the things they learnt.



Agenda. Text repeated below image.

We'll look at:

  • An introduction to partitioning:
    What is table partitioning, and how does it work?
  • What are the advantages of partitioning?
  • How do you choose a Partition Key?
    How
[...]

SCaLE 22x: Bringing the Open Source Community to Pasadena
Posted by Sarah Conway in Data Bene on 2025-06-02 at 00:00

The Southern California Linux Expo (SCaLE) 22x, recognized as being North America’s largest community-run open source and free software conference, took place at the Pasadena Convention Center from March 6-9, 2025. When I say community-run, I mean it—no corporate overlords dictating the agenda, just pure open source enthusiasm driving four days of technical discussions and collaboration.

This year’s conference focused around the topics of AI, DevOps and cloud-native technologies, open source community engagement, security and compliance, systems and infrastructure, and FOSS @ home (exploring the world of self-hosted applications and cloud services).

The conference drew attendees from around the world to talk about everything open-source, revolving around Linux at the core (of course) while continuing the discussion across topics such as embedded systems & IoT. As always, there was a unique blend of cutting-edge tech talk and practical problem-solving within every space that is what makes SCaLE special.

Herding Elephants: PostgreSQL@SCaLE22x

PostgreSQL@SCaLE22x ran as a dedicated two-day, two-track event on March 6-7, 2025, recognized under the PostgreSQL Global Development Group community event guidelines. The selection team included Gabrielle Roth, Joe Conway, and Mark Wong, ensuring the quality you’d expect from the PostgreSQL community.

The speaker lineup was impressive: Magnus Hagander, Christophe Pettus, Peter Farkas, Devrim Gündüz, Hamid Akhtar, Henrietta Dombrovskaya, Shaun Thomas, Gülçin Yıldırım Jelínek & Andrew Farries, Nick Meyer, and Jimmy Angelakos. One particularly memorable session was titled “Row-Level Security Sucks. Can We Make It Usable?”—a refreshingly honest take on PostgreSQL’s RLS feature that probably resonated with more than a few database administrators in the audience.

The community “Ask Me Anything” panel was hosted by Stacey Haysler and featured Christophe Pettus, Devrim Gündüz, Jimmy Angelakos, Magnus Hagander, and Mark Wong. These sessions are where the re

[...]

Postgres Extensions Day Montréal 2025
Posted by Andreas Scherbaum on 2025-06-01 at 22:00
On Monday before PGConf.dev, the Postgres Extensions Day 2025 took place. Same venue, same floor. A day fully packed with talks, and interesting discussions. This event shows once again that extensions in PostgreSQL are thriving. It also shows that there is a lot of work to do to make extensions more usable and discoverable for users. Just a few problems which where discussed: A database can only load one extension version.

Understanding Split-Brain Scenarios in Highly Available PostgreSQL Clusters
Posted by semab tariq in Stormatics on 2025-05-30 at 10:49

High Availability (HA) refers to a system design approach that ensures a service remains accessible even in the event of hardware or software failures. In PostgreSQL, HA is typically implemented through replication, failover mechanisms, and clustering solutions to minimize downtime and ensure data consistency. Hence, HA is very important for your mission-critical applications. 

In this blog post, we will try to explore a critical failure condition known as a split-brain scenario that can occur in PostgreSQL HA clusters. We will first see what split-brain means, and then how it can impact PostgreSQL clusters, and finally discuss how to prevent it through architectural choices and tools available in the PostgreSQL ecosystem

What is a Split-Brain Scenario?

A split-brain scenario occurs when two or more nodes in a cluster lose communication with each other but continue operating as if they are the primary (or leader) node.

We know that the standby node receives data from the primary in real time. However, if a network glitch occurs between the primary and the standby, and the standby is unable to receive data for a certain timeout period, it may assume that the primary has failed. 

As a result, the standby might promote itself to a new primary. But since the issue was only a temporary network problem, the original primary is still active. This leads to a situation where two primary nodes are accepting writes at the same time, which is a highly dangerous state for the cluster.

Split-brain is particularly dangerous because it breaks the fundamental guarantees of consistency in an HA cluster. Writes can happen independently on multiple primary nodes, and reconciling those changes later is often impossible without data loss or manual intervention.

Common Scenarios That Lead to Split-Brain

As we have seen earlier, the most common cause of split-brain is a network failure between the primary and standby nodes. However, there are several other situations where a cluster might also encounte

[...]

Bridged Indexes in OrioleDB: architecture, internals & everyday use?
Posted by Alexander Korotkov on 2025-05-30 at 00:00

Since version beta10 OrioleDB supports building indexes other than B-tree. Bridged indexes are meant to support these indexes on OrioleDB tables.

1. Why OrioleDB needs a “bridge”

OrioleDB stores its table rows inside a B-tree built on a table primary key and keeps MVCC information in an undo log, so it can’t simply plug PostgreSQL’s existing Index Access Methods (GiST, GIN, SP-GiST, BRIN, …) into that structure. While PostgreSQL's Index Access Methods:

  • reference a 6-byte ctid (block number and offset in the heap) -- not a logical key;
  • keep every live version of a row in the index, leaving visibility checks to the executor;
  • support inserts only in the index and rely on VACUUM for physical deletion.

OrioleDB indexes, in contrast, are MVCC-aware: they point to the rows via primary-key values and support logical updates/deletes directly in the index. To remain heap-free while still allowing users build the rich ecosystem of non-B-tree indexes, OrioleDB introduces a bridge index layer.

Illustration of a bridge index

2. How the bridge works under the hood

  1. Virtual iptr column -- an incrementally increasing "index pointer" automatically added to the table. The new value of iptr is assigned each time any column referenced by a bridged index is updated, ensuring the pointer remains stable for the indexed data.
  2. Bridge index -- a lightweight secondary index that maps iptr to primary-key value. It behaves like a normal OrioleDB secondary B-tree, except it doesn't use undo log for MVCC.
  3. PostgreSQL indexes (GIN/GiST/...) are built on the iptr values instead of ctids, so their structure stays compatible with the IndexAM API. During scans, the engine looks up iptr, translates it through the bridge index, and then fetches the row by primary key.
  4. The vacuum process collects stale iptr-s that are not visible to any snapshot, and asks the underlying IndexAM to clean up; then physically deletes the same pointers from the bridge index.

The result is a tri-level lookup path: Inde

[...]

Postgres Extensions: Use PG_MODULE_MAGIC_EXT
Posted by David Wheeler in Tembo on 2025-05-29 at 22:09

A quick note for PostgreSQL extension maintainers: PostgreSQL 18 introduces a new macro: PG_MODULE_MAGIC_EXT. Use it to name and version your modules. Where your module .c file likely has:

PG_MODULE_MAGIC;

Or:

#ifdef PG_MODULE_MAGIC
PG_MODULE_MAGIC;
#endif

Change it to something like:

#ifdef PG_MODULE_MAGIC_EXT
PG_MODULE_MAGIC_EXT(.name = "module_name", .version = "1.2.3");
#else
PG_MODULE_MAGIC;
#endif

Replace the name of your module and the version as appropriate. Note that PG_MODULE_MAGIC was added in Postgres 8.2; if for some reason your module still supports earlier versions, use a nested #ifdef to conditionally execute it:

#ifdef PG_MODULE_MAGIC_EXT
PG_MODULE_MAGIC_EXT(.name = "module_name", .version = "1.2.3");
#else
#ifdef PG_MODULE_MAGIC
PG_MODULE_MAGIC;
#endif
#endif

If you manage the module version in your Makefile, as the PGXN Howto suggests, consider renaming the .c file to .c.in and changing the Makefile like so:

  • Replace .version = "1.2.3" with .version = "__VERSION__"

  • Add src/$(EXTENSION).c to EXTRA_CLEAN

  • Add this make target:

    src/$(EXTENSION).c: src/$(EXTENSION).c.in
            sed -e 's,__VERSION__,$(EXTVERSION),g' $< > $@
    
  • If you use Git, add /src/*.c to .gitignore

For an example of this pattern, see semver@3526789.

That’s all!

Tip: Put your Rails app on a SQL Query diet
Posted by Andrew Atkinson on 2025-05-29 at 17:29

Introduction

Much of the time taken processing HTTP requests in web apps is processing SQL queries. To minimize that, we want to avoid unnecessary or duplicate queries, and generally perform as few queries as possible.

Think of the work that needs to happen for every query. The database engine parses it, creates a query execution plan, executes it, and then sends the response to the client.

When the response reaches the client, there’s even more work to do. The response is transformed into application objects in memory.

How do we see how many queries are being created for our app actions?

Count the queries

When doing backend work in a web app like Rails, monitor the number of queries being created directly, by the ORM, or by libraries. ORMs like Active Record can generate more than one query from a given line of code. Libraries can generate queries that are problematic and may be unnecessary.

Over time, developers may duplicate queries unknowingly. These are all real causes of unnecessary queries from my work experience.

Why are excessive queries a problem?

Why reduce the number of queries?

Besides parsing, planning, executing, and serializing the response, the client is subject to a hard upper limit on the number of TCP connections it can send to the database server.

In Postgres that’s configured as max_connections. The application will have a variable number of open connections based on use, and its configuration of processes, threads and its connection pool. Keeping the query count low helps avoid exceeding the upper limit.

What about memory use?

What about app server memory?

With Ruby on Rails, the cost of repeated queries is shifted because the SQL Cache is enabled by default, which stores and serves results for matching repeated queries, at the cost of some memory use.

As an side, from Rails 7.1 the SQL Cache uses a least recently used (LRU) algorithm. We can also configure the max number of queries to cache, 100 by default, to control how much

[...]

Don't mock the database: Data fixtures are parallel safe, and plenty fast
Posted by Brandur Leach in Crunchy Data on 2025-05-29 at 13:00

The API powering our Crunchy Bridge product is written in Go, a language that provides a good compromise between productivity and speed. We're able to keep good forward momentum on getting new features out the door, while maintaining an expected latency of low double digits of milliseconds for most API endpoints.

A common pitfall for new projects in fast languages like Go is that their creators, experiencing a temporary DX sugar high of faster compile and runtime speeds than they've previously encountered in their career, become myopically focused on performance above anything else, and start making performance optimizations with bad cost/benefit tradeoffs.

The textbook example of this is the database mock. Here's a rough articulation of the bull case for this idea: CPUs are fast. Memory is fast. Disks are slow. Why should tests have to store data to a full relational database with all its associated bookkeeping when that could be swapped out for an ultra-fast, in-memory key/value store? Think of all the time that could be saved by skipping that pesky fsync, not having to update that plethora of indexes, and foregoing all that expensive WAL accounting. Database operations measured in hundreds of microseconds or even *gasp*, milliseconds, could plausibly be knocked down to 10s of microseconds instead.

Mock everything, test nothing

Anyone who's substantially journeyed down the path of database mocks will generally tell you that it leads nowhere good. They are fast (although disk speed has improved by orders of magnitude over the last decade), but every other one of their aspects leaves something to be desired.

A fatal flaw is that an in-memory mock bears no resemblance to a real database and the exhaustive constraints that real databases put on input data. Consider for example, whether a mock would fail like a database in any of these scenarios:

  • A value is inserted for a column that doesn't exist.
  • A value of the wrong data type for a column is inserted.
  • Duplicate values are inse
[...]

Top posters

Number of posts in the past two months

Top teams

Number of posts in the past two months

Feeds

Planet

  • Policy for being listed on Planet PostgreSQL.
  • Add your blog to Planet PostgreSQL.
  • List of all subscribed blogs.
  • Manage your registration.

Contact

Get in touch with the Planet PostgreSQL administrators at planet at postgresql.org.