Latest Blog Posts

SCaLE 22x, live streams and Row-Level Security Sucks
Posted by Jimmy Angelakos on 2025-03-21 at 19:30

Obligatory hallway arch photo from SCaLE22x

The 22nd edition of the Southern California Linux Expo, or SCaLE 22x, took place from March 6-9, 2025, in Pasadena, California. It was a fantastic event packed with insightful talks, community spirit, and legendary figures from the computing world.

One of the highlights of SCaLE 22x was getting the chance to attend talks by two icons: Leslie Lamport and Jon "maddog" Hall. Leslie Lamport, whom many know for Paxos, TLA+, and generally being the "father of distributed computing", delivered the closing keynote titled "Coding isn't Programming". Meant to shake up developers, it was thought-provoking and engaging arguments. It was a rare privilege to hear directly from someone whose work has influenced so much of modern computing.

Earlier, Jon "maddog" Hall's talk "The JOY of Free Software: It should be fun to fish" was a great reminder of why we do what we do in the open-source world and how we should not lose track of how fun this journey of creation was. maddog's passion for free software and its ethical implications is always inspiring, and I've mentioned before on this blog how his example has guided my career choices.

PostgreSQL @ SCaLE 22x

Postgres again had a strong presence at SCaLE with a dedicated track of talks (thanks PgUS for the booth in the Expo hall), and I had the pleasure of being part of several related sessions:

  • On Thursday, March 6th, I participated in the PostgreSQL Ask Me Anything session alongside some incredible PostgreSQL experts. The discussion touched on internals, performance, best practices, and the future of Postgres. It's always great to interact with the community and share insights from our experiences.

  • On Friday, March 7th, I delivered my talk titled "Row-Level Security Sucks. Can We Make It Usable?". It was a deep dive into PostgreSQL's Row-Level Security (RLS), highlighting its strengths and shortcomings, and exploring ways to make it more accessible to empower developers in real-world applications.

[...]

Mentoring Applications and Hacking Workshop for April 2025
Posted by Robert Haas in EDB on 2025-03-19 at 15:04
Here are a few mentoring-related updates.Read more »

Postgres Troubleshooting: Fixing Duplicate Primary Key Rows
Posted by Greg Sabino Mullane in Crunchy Data on 2025-03-19 at 14:00

Someone recently asked on the Postgres mailing lists about how to remove unwanted duplicate rows from their table. They are “unwanted” in that sense that the same value appears more than once in a column designated as a primary key. We’ve seen an uptick in this problem since glibc was kind enough to change the way they sorted things. This can lead to invalid indexes when one upgrades their OS and modifies the underlying glibc library.

One of the main effects of a corrupted unique index is allowing rows to get added which should be caught by the primary key. In other words, for a table with a primary key on a column named “id”, you might observe things like this:

-- Can you spot the problem?
SELECT id FROM mytable ORDER BY id LIMIT 5;

 id
----
  1
  2
  3
  3
  4
(5 rows)

Without knowing anything else about the problem, what’s the first step to solving it? If you said take a backup, you are correct! Make a fresh backup anytime you think something is wrong with your database, or before any attempt to fix such a problem.

Aside: can we simply reindex? No - a unique index cannot be created (or recreated) as long as there are duplicate rows. Postgres will simply refuse to do so, as it violates the uniqueness we are trying to enforce. So we must delete the rows from the table.

Here is one careful recipe for fixing the problem of duplicated primary key entries in your Postgres table. Obviously you need to adjust this as needed for your situation, but just walk through it slowly and make sure you understand each step. Especially if this happens in production (spoiler: it almost always happens in production).

1. Debugging aids

The first thing we want to do may look a little strange:

-- Encourage not using indexes:
set enable_indexscan = 0;
set enable_bitmapscan = 0;
set enable_indexonlyscan = 0;

Because bad indexes are the primary way bad rows like this get into our database, we cannot trust them. These low-level debug aids are a way of telling the Postgres planner to prioritize other ways

[...]

PostgreSQL's COPY and \COPY
Posted by Dave Stokes on 2025-03-19 at 13:59
PostgreSQL is equivalent to a Swiss Army Knife in the database world. There are things in PostgreSQL that are very simple to use, while in another database, they take many more steps to accomplish. But sometimes, the knife has too many blades, which can cause confusion. This is one of those cases.

COPY and \COPY

I needed a quick dump of a table as part of an ETL (Extract, Transform, and Load - official definition, Exorcism, Trauma, and Lost-sleep - unofficially) to port some data. The COPY command is a quick way to output and write the data to a file.  The two copy commands below illustrate adding a delimiter and then writing the output to a table.

demo=# select * from z;
 a | b  |  c  
---+----+-----
 1 | 10 | 100
 2 | 20 | 200
 3 | 30 | 300
 4 | 40 | 400
(4 rows)
demo=# copy z to stdout (DELIMITER ',');
1,10,100
2,20,200
3,30,300
4,40,400
demo=# copy z to '/tmp/z' (DELIMITER ',');
COPY 4
demo=# 

The trick is that you must REMEMBER that the PostgreSQL server is writing the file and will need permission to write in the desired directory. If you try to write your current directory, you will probably see this:

demo=# copy z to 'z' (DELIMITER ',');
ERROR:  relative path not allowed for COPY to file
demo=# 

ETL

The table I needed to 'fix' as part of the ETL process was edited and was now ready to load. I created a new table.

demo=# create table zz (a1 int, b1 int, c1 int);
CREATE TABLE

The tricky part is that you can not use COPY to load the data. You need to use psql's \COPY! Consider \COPY a wrapper on the client side around the server's COPY command.

demo-# \copy zz from '/tmp/z/' (delimiter ',');
COPY 4
demo-# 

I find the COPY - /COPY pairing very handy.

More?


I highly recommend Wager Bianchi's COPY vs. \copy in PostgreSQL and their Security Implications for DBAs for those seeking more information on this subject and its healthy
[...]

How to Use regexp_matches and regexp_match in PostgreSQL
Posted by Colin Copeland on 2025-03-19 at 08:00

Introduction

regexp_matches() and regexp_match() are two similar string functions that support regular expression matching directly in the PostgreSQL database. regexp_matches() was added in PostgreSQL 8.3, and regexp_match() was added in PostgreSQL 10 (keep reading to see how ChatGPT struggled to answer this question).

As noted in the PostgreSQL 10 release notes, regexp_match():

"... only returns results from the first match so it does not need to return a set, making it easier to use for simple cases."

The differences between these two functions are nuanced, however, and they are easiest to understand with some examples.

Test data

You can create a short table with some test data in a local PostgreSQL database like so:

$ psql
psql (16.1)
Type "help" for help.
tobias=# CREATE TABLE patterns (value text);
CREATE TABLE
tobias=# INSERT INTO patterns VALUES ('foo'), ('bar'), ('foobar'), ('foo1barfoo2bar');
INSERT 0 4
tobias=# SELECT * FROM patterns;
    value
----------------
foo
bar
foobar
foo1barfoo2bar
(4 rows)

regexp_matches()

Since regexp_matches() was added first to PostgreSQL, let's see how it works and perhaps why its companion regexp_match() was added later on.

tobias=# SELECT
tobias-#     value
tobias-#     , regexp_matches(value, 'foo\d?', 'g')
tobias-#     , regexp_matches(value, 'foo\d?', 'g') is null AS is_null
tobias-# FROM patterns;
    value      | regexp_matches | is_null
----------------+----------------+---------
foo            | {foo}          | f
foobar         | {foo}          | f
foo1barfoo2bar | {foo1}         | f
foo1barfoo2bar | {foo2}         | f
(4 rows)

Observations:

  1. The example uses the g flag, since according to the documentation, "if you only want the first match, it's easier and more efficient to use regexp_match()."
  2. One row each is returned for the value
[...]

PGDay UK 2025 - Call for Papers and Call for Sponsors Open!
Posted by Chris Ellis on 2025-03-18 at 14:30

We're happy to announce that the Call for Papers and Call for Sponsors is open for PGDay UK 2025!

Call for Papers is open from today until May 12, 2025 at 23:59:59 in London, UK. Selected speakers will be notified before May 26, 2025. We accept proposals from both established and new speakers on a range of PostgreSQL related topics. You don't need to have the talk written beforehand, we just need an abstract of your talk, so don't hold back and tell your friends. Find out more: https://2025.pgday.uk/call-for-papers/ .

Call for Sponsors is open from today until 2025-09-02. Our Benefactor level is limited to 8 sponsors and will sell out, so best grab your company's spot sooner rather than later. Find out more here: https://2025.pgday.uk/become-sponsor/ .

Citus: The Misunderstood Postgres Extension
Posted by Craig Kerstiens in Crunchy Data on 2025-03-18 at 13:50

Citus is in a small class of the most advanced Postgres extensions that exist. While there are many Postgres extensions out there, few have as many hooks into Postgres or change the storage and query behavior in such a dramatic way. Most that come to Citus have very wrong assumptions. Citus turns Postgres into a sharded, distributed, horizontally scalable database (that's a mouthful), but it does so for very specific purposes.

Citus, in general, is fit for these type of applications and only these type:

  • Sharding a multitenant application: a SaaS/B2B style app, where data is never joined between customers
  • Low user facing, high data volume analytics: specifically where the dashboards are hand-curated with minimal levers-and-knobs for the user to change (i.e. customer cannot generate unknown queries)

Mistaken use cases for Citus that are not a great fit:

  • Lack of rigid control over queries sent to database
  • Geographic residency goals or requirements; Citus is distributed for scale, not distributed for edge.

Let's look closer at each of the two use cases that Citus is a good fit for.

Multitenant/SaaS applications

Multitenant or SaaS applications typically follow a pattern: 1) tenant data is siloed and does not intermingle with any other tenant's data, and 2) a "tenant" is a larger entity like a "team" or "organization".

An example of this could be Salesforce. Within Salesforce you have the notion of an organization, and the organization has accounts, customers, and opportunities within them. When you create a Salesforce account, all of your customers and opportunities are solely yours — data is not shared with other Salesforce organizations.

For these types of applications, Citus distributes the data for each tenant into a shard. Citus handles the splitting of data by creating placement groups that know they are grouped together, and placing the data within shards on specific nodes. A physical node may contain multiple shards. Let me restate that to under

[...]

Debugging PostgreSQL more easily
Posted by Hans-Juergen Schoenig in Cybertec on 2025-03-18 at 06:00

PostgreSQL is the foundation for thousands and thousands of applications. The system has long proven its worth and is operating reliably. However, the question people often ask is: What is the best way to actually debug a database application? 

For a couple of years now, I have been using a simple method that greatly speeds up the debugging process of typical applications.

Using inheritance to store data

One of the core features of PostgreSQL, which has been around for decades already, is the idea of "inheritance". So what does that mean, exactly? Simply speaking, tables can inherit columns from each other. A child table will simply have all the columns of the parent table, plus its own additional ones.

But what does this mean in real life, and what does it have to do with debugging? Let us take a closer look:

CREATE TABLE t_global 
(
        id              serial, 
        tstamp          timestamptz     DEFAULT now() 
);
CREATE TABLE t_product 
(
        name            text, 
        price           numeric
) INHERITS (t_global);
INSERT INTO t_product (name, price) 
        VALUES  ('Shoes', 113.98), 
                ('Sausage', 4.58);

The t_global table contains a sequence including a timestamp. Those two columns are passed on to t_product. What this means is that the sequence now also applies to the child table.

Here is a second table:

CREATE TABLE t_country 
(       
        country_name    text
) INHERITS (t_global);
INSERT INTO t_country (country_name) 
        VALUES  ('Austria'), 
                ('Germany'), 
                ('Japan');

We are now using the same parent table for the country table. What does it matter? What we have just produced is two things: First of all, there is ONE global sequence for all tables, which means that all IDs across the entirety of the systems are unique (we will need this later). Also: All tables have a timestamp that has an identical default value. 

Using the parent table for debugging applications

Let us now imagine for

[...]

Driving PostgreSQL forward: Fujitsu's key contributions
Posted by vignesh C in Fujitsu on 2025-03-17 at 23:52

Fujitsu has participated in the PostgreSQL community for over 20 years and has contributed to PostgreSQL development by developing new features, as well as creating and reviewing correction patches. And last year we took our team contributions to a whole new level.

Emma Saroyan
Posted by Andreas 'ads' Scherbaum on 2025-03-17 at 14:00
PostgreSQL Person of the Week Interview with Emma Saroyan: I’m Emma Saroyan, a Developer Advocate from Yerevan, Armenia. Since graduating from university, I’ve been working with tech startups and actively contributing to open-source communities by mentoring, giving talks, and organising meetups.

PgPedia Week, 2025-03-16
Posted by Ian Barwick on 2025-03-16 at 19:45

CommitFest 52 (2025-03), the final CommitFest in the PostgreSQL 18 development cycle, is past the halfway stage, and this week has seen a number of new features.

PostgreSQL 18 changes Improvements and new features EXPLAIN display of window functions improved index search count now reported log_lock_failure new GUC for logging lock acquisition failures effective_io_concurrency default increased to 16 log_connections now accepts a range of options to provide more granular connection logging lower() , upper() , casefold() performance improvements pg_stat_get_backend_wal() new function to retrieve per-backend WAL statistics reverse() support for bytea datatype added pg_dump , pg_dumpall , pg_restore : --no-policies option added PL/Python Python "Limited API" activated isn GUC isn.weak added, replacing the isn_weak() function Reversions support for temporal RESTRICT foreign keys constraints reverted this was added as part of commit 89f908a6 , which otherwise remains in place PostgreSQL 18 articles Waiting for PostgreSQL 18 – Allow json{b}_strip_nulls to remove null array elements (2025-03-13) - Hubert 'depesz' Lubaczewski Waiting for PostgreSQL 18 – Allow parallel CREATE INDEX for GIN indexes (2025-03-11) - Hubert 'depesz' Lubaczewski

more...

Selective asynchronous commits in PostgreSQL - balancing durability and performance
Posted by Shayon Mukherjee on 2025-03-16 at 15:24
I was recently looking into some workloads that generate a lot of I/O and CPU contention on some very high-write code paths and came across synchronous_commit (https://www.postgresql.org/docs/current/wal-async-commit.html). It can be very tempting to turn this off globally because the performance gains in terms of I/O, CPU, and TPS (transactions per second) are very hard to overlook. I noticed I/O completely gone, CPU down 20% (at peak), and a 30% increase in TPS.

Can Artificial Intelligence Created Better Tables Than You?
Posted by Dave Stokes on 2025-03-15 at 18:08

 Artificial Intelligence is one of those conundrums where the ability to have some tasks handled for you contrasts with a mix of ego and pride that it may be better than you at those tasks.  I recently wrote a blog using another database about an AI-generated SQL that was quite sophisticated. But what about asking an AI to create a table with a specification like a DBA/SRE/Analyst might receive?

I used Grok and entered the following prompt:

Write the SQL to create a table on a PostgreSQL version 17 server to record data on customers. The data will include a first name, a last name, an address, a birthdate, and a unique UUID primary key. Partition the data by the year of birth. And produce some test data of at least ten records.

I am sure many of us have started large projects given less instruction.  Notice: I did not denote the format for the address (US Post Office's format, or UK, or other). Nor did I set a name length. I wanted to see what assumptions were made by the AI.

The Answer

The response came very quickly, much more quickly that I could interactively input the CREATE TABLE statement.

-- First, create the parent table with partitioning

CREATE TABLE customers (
    customer_id UUID PRIMARY KEY,
    first_name VARCHAR(50) NOT NULL,
    last_name VARCHAR(50) NOT NULL,
    address TEXT NOT NULL,
    birthdate DATE NOT NULL
) PARTITION BY RANGE (EXTRACT(YEAR FROM birthdate));


-- Create partitions for specific year ranges
CREATE TABLE customers_y1960_1980 PARTITION OF customers
    FOR VALUES FROM (1960) TO (1980);
CREATE TABLE customers_y1980_2000 PARTITION OF customers
    FOR VALUES FROM (1980) TO (2000);
CREATE TABLE customers_y2000_2020 PARTITION OF customers
    FOR VALUES FROM (2000) TO (2020);
CREATE TABLE customers_y2020_2040 PARTITION OF customers
    FOR VALUES FROM (2020) TO (2040);
-- Insert test data (10 records)
INSERT INTO customers (customer_id, first_name, last_name, address, birthdate) VALUES
    -- 1960-1980 partiti
[...]

Contributions for the week of 2025-03-03 (Week 9)
Posted by Cornelia Biacsics in postgres-contrib.org on 2025-03-14 at 18:56

On March 6 and 7, PostgreSQL@SCaLE22x took place in Pasadena, California. The PostgreSQL team organized and volunteered at the event, with PGUS hosting the booth. The list of involved volunteers:

Speakers in the PostgreSQL track:

We also hosted a community panel, “Ask Me Anything”, with contributions from:

PG Day Chicago: The talks I am most excited about. Part II
Posted by Henrietta Dombrovskaya on 2025-03-14 at 14:26

Now – to the afternoon sessions (see Part I for morning sessions)

Paul Whalen. Who needs Kafka?.. Now, I am really excited about this talk for a number of reasons. First, Paul is an active Chicago PUG member, and he first presented this talk at the Chicago PUG meetup in January. Second, I am happy that I encouraged him to submit this talk :). And third and most important: this is a talk about Postgres from an application developer’s perspective, and I love this kind of talks!

Dave Pitts and Teresa Lopes: Fun with Postgres High Availability Poker. I saw this talk in Amsterdam in September, and notice that I said I saw the talk! That’s the talk you actually need to see! So much fun!

Next is my security talk 🙂 Am I excited? Yes! Although presenting at the conference which I am organizing is challenging at least:). But I had so few opportunities to present this talk that I am embracing this one! For those who listened to this talk before – it was significantly updated!

Jay Miller. Getting Advice from myself… One thing I want to mention here is that I am so upset that this talk is scheduled at the same time as mine! I would so much love to listen to it!!! As I already mentioned, I love the talks that present Postgred from the application developer’s perspective!

Gabriele Quaresima Clodnative 101 I have to admit< I know nothing about this topic! Seriously, I am entitled to have at least some topics I know nothing about! But it’s never too late to learn something new!

Jimmy Zelinskie. Implementing strict serializability with pg_xact. Excited about this talk because transactions are complicated, and the majority of application developers pay very little attention to how their applications behave from a transactional perspective, so it’s exciting to see that there are cases when it’s done right!

Melih Mutlu. The Journey of your query. A surprisingly high number of Postgres users have a very vague understanding of how queries are processed. We know what indexes do and how they help to improve

[...]

Pi Day PostGIS Circles
Posted by Paul Ramsey in Crunchy Data on 2025-03-14 at 14:00

What's your favourite infinite sequence of non-repeating digits? There are some people who make a case for e, but to my mind nothing beats the transcendental and curvy utility of π, the ratio of a circle's circumference to its diameter.

Drawing circles is a simple thing to do in PostGIS -- take a point, and buffer it. The result is circular, and we can calculate an estimate of pi just by measuring the perimeter of the unit circle.

SELECT ST_Buffer('POINT(0 0)', 1.0);

buffer default PostGIS

Except, look a little more closely -- this "circle" seems to be made up of short straight lines. What is the ratio of its circumference to its diameter?

SELECT ST_Perimeter(ST_Buffer('POINT(0 0)', 1.0)) / 2;
3.1365484905459406

That's close to pi, but it's not pi. Can we generate a better approximation? What if we make the edges even shorter? The third parameter to ST_Buffer() is the "quadsegs", the number of segments to build each quadrant of the circle.

SELECT ST_Perimeter(ST_Buffer('POINT(0 0)', 1.0, quadsegs => 128)) / 2;
3.1415729403671087

Much closer!

We can crank this process up a lot more, keep adding edges, but at some point the process becomes silly. We should just be able to say "this edge is a portion of a circle, not a straight line", and get an actual circular arc.

Good news, we can do exactly that! The CIRCULARSTRING is the curvy analogue to a LINESTRING wherein every connection is between three points that define a portion of a circle.

circular arc

The circular arc above is the arc that starts at A and ends at C, passing through B. Any three points define a unique circular arc. A CIRCULARSTRING is a connected sequence of these arcs, just as a LINESTRING is a connected sequence of linear edges.

How does this help us get to pi though? Well, PostGIS has a moderate amount of support for circular arc geometry, so if we construct a circle using "natively curved" objects, we should get an exact representation of a circle rather than an approximation.

circle

So, what is an arc that starts and ends at the

[...]

Testing loadBalancerSourceRanges with CloudNativePG on Azure Kubernetes
Posted by Jeremy Schneider on 2025-03-14 at 08:28

This option didn’t seem super widely documented from my initial searches online; it should be able to basically enforce layer 4 ingress/firewall rules at the individual service level. This is a quick test to check if it works.

Steps were generated with ChatGPT, and mostly worked. It missed Azure provider registration, but I figure that out easily from the Azure error message. GPT was creating the VMs after the CNPG cluster … I had to reverse that so I’d know the IP for loadBalancerSourceRanges. I had to switch the VMs to the “westus” region because of quota limits. There were a couple more tweaks but overall I got this done in an hour or two – couldn’t have done that with just google. I used an Azure free account; never had to set up a credit card with Azure.

Step 1: Create an AKS Cluster

Create resource group

az group create --name myResourceGroup --location eastus

{
  "id": "/subscriptions/7460cc93-ed07-42e7-a246-3b87e52a3ad7/resourceGroups/myResourceGroup",
  "location": "eastus",
  "managedBy": null,
  "name": "myResourceGroup",
  "properties": {
    "provisioningState": "Succeeded"
  },
  "tags": null,
  "type": "Microsoft.Resources/resourceGroups"
}

Register the provider

az provider register --namespace Microsoft.ContainerService

Registering is still on-going. You can monitor using 'az provider show -n Microsoft.ContainerService'

Create AKS cluster

az aks create \
  --resource-group myResourceGroup \
  --name myAKSCluster \
  --network-plugin kubenet \
  --network-policy calico \
  --load-balancer-sku standard \
  --node-count 2 \
  --enable-managed-identity \
  --generate-ssh-keys

{
  "aadProfile": null,
  "addonProfiles": null,
  "agentPoolProfiles": [
    {
      "availabilityZones": null,
      "capacityReservationGroupId": null,
      "count": 2,

...
  "upgradeSettings": null,
  "windowsProfile": null,
  "workloadAutoScalerProfile": {
    "keda": null,
    "verticalPodAutoscaler": null
  }
}

get cluster credentials

az aks get-credentials --resource-
[...]

PostgreSQL: 1 trillion rows in Citus
Posted by Hans-Juergen Schoenig in Cybertec on 2025-03-13 at 08:57

"PostgreSQL scales" - we have all heard this phrase over and over again. However, the question is: What does this actually mean? Does it mean 1 million rows? Or maybe even 1 billion rows? So, on a rainy weekend, I decided to do a little experiment to figure out if it is possible to squeeze 1 trillion rows (= 1000 billion rows) into my local personal computer.

For those of you who don't believe this to be possible - here is proof:

test=# SELECT count(*) FROM t_data ;
     count     
---------------
 1024000000000
(1 row)

Using Citus columnar storage, this translates to:

test=# \d t_data
               Table "public.t_data"
 Column |  Type   | Collation | Nullable | Default 
--------+---------+-----------+----------+---------
 key_id | integer |           |          | 
 data   | bigint  |           |          | 

test=# SELECT pg_size_pretty(citus_table_size('t_data'));
 pg_size_pretty 
----------------
 1563 GB
(1 row)

But, before looking at the final result, we need to understand how we got here in the first place.

Loading one trillion rows into PostgreSQL

Obviously, I don't have a real world data set on hand that contains this much data, so I decided to generate this data synthetically. However, before we take a look at how this can be done, we have to discuss the basic setup.

Loading this much data into row storage would lead to insane amounts of data. Here is some math:

test=# SELECT pg_size_pretty(1024000000000::int8 * (24 + 12));
 pg_size_pretty 
----------------
 34 TB
(1 row)

Assuming a tuple header (for each row) of about 24 bytes and 12 bytes data, we would end up with roughly 34 TB, which is larger than the maximum size of a table in PostgreSQL (assuming 8k blocks).

This basically leaves us with a couple of choices:

  • PostgreSQL table partitioning (row store)
  • Columnar store (single table)
  • Sharding using Citus

To make sure that I can query the data a bit faster, I decided on a sharded Citus columnar storage, consisting of a coordinator and

[...]

Waiting for PostgreSQL 18 – Allow json{b}_strip_nulls to remove null array elements
Posted by Hubert 'depesz' Lubaczewski on 2025-03-13 at 06:24
On 5th of March 2025, Andrew Dunstan committed patch: Allow json{b}_strip_nulls to remove null array elements   An additional paramater ("strip_in_arrays") is added to these functions. It defaults to false. If true, then null array elements are removed as well as null valued object fields. JSON that just consists of a single null is not … Continue reading "Waiting for PostgreSQL 18 – Allow json{b}_strip_nulls to remove null array elements"

Chicago PUG February Meetup recording
Posted by Henrietta Dombrovskaya on 2025-03-13 at 02:22

Hi Postgres community, apologies for the delay one more time – here is the recording of our February meetup! If you didn’t have a chance to attend in person or virtual, please take a moment to watch! You won’t be disappointed!

Also, one more reminder that our March meetup will take place later in the month than usual: on March 25. That will be the last meetup before PG Day Chicago, and I hope you will be able to attend!

Waiting for PostgreSQL 18 – Allow parallel CREATE INDEX for GIN indexes
Posted by Hubert 'depesz' Lubaczewski on 2025-03-11 at 13:06
On 3rd of March 2025, Tomas Vondra committed patch: Allow parallel CREATE INDEX for GIN indexes   Allow using parallel workers to build a GIN index, similarly to BTREE and BRIN. For large tables this may result in significant speedup when the build is CPU-bound.   The work is divided so that each worker builds … Continue reading "Waiting for PostgreSQL 18 – Allow parallel CREATE INDEX for GIN indexes"

stats_fetch_consistency: caching PostgreSQL monitoring statistics?
Posted by Laurenz Albe in Cybertec on 2025-03-11 at 07:52
stats_fetch_consistency in real life: an officer answers the phone and says that the official population count is still the same, because they don't hold a census whenever you call them
© Laurenz Albe 2025

I have worked with PostgreSQL since 2006. Still, I keep learning about new aspects of PostgreSQL regularly. Sometimes it feels like the more I know, the better I see that I have yet a lot to learn. Often, beginners' questions in a training session make me learn something new! Today, my co-worker Pavlo told be about the PostgreSQL parameter stats_fetch_consistency. Probably not a parameter that you want to tune every day, but you never know.

Surprise: stale statistics with the default stats_fetch_consistency

Let's imagine you are trying to investigate checkpoint statistics, which you can find in the view pg_stat_checkpointer since PostgreSQL v17:

-- start a transaction
BEGIN;

SELECT num_timed, num_requested FROM pg_stat_checkpointer;

 num_timed │ num_requested 
═══════════╪═══════════════
        52 │             4
(1 row)

-- explicitly trigger a checkpoint
CHECKPOINT;

SELECT num_timed, num_requested FROM pg_stat_checkpointer;

 num_timed │ num_requested 
═══════════╪═══════════════
        52 │             4
(1 row)

Huh? Why did num_requested not jump to 5? If we commit the transaction, it does:

COMMIT;

SELECT num_timed, num_requested FROM pg_stat_checkpointer;

 num_timed │ num_requested 
═══════════╪═══════════════
        52 │             5
(1 row)

It looks like the statistics only get updated in a new transaction. That is somewhat surprising, as I didn't use the REPEATABLE READ isolation level. Until today, I didn't know that I can change this behavior with stats_fetch_consistency!

Possible settings for stats_fetch_consistency

The parameter stats_fetch_consistency governs how PostgreSQL caches data for the following statistics views:

[...]

When HASH partitioning works better than RANGE
Posted by Umair Shahid in Stormatics on 2025-03-11 at 07:51

I have always been a fan of RANGE partitioning using a date/time value in PostgreSQL. This isn't always possible, however, and I recently came across a scenario where a table had grown large enough that it had to be partitioned, and the only reasonable key to use was a UUID styled identifier.

The goal of this post is to highlight when and why hashing your data across partitions in PostgreSQL might be a better approach.

Range vs. Hash Partitioning in PostgreSQL

Range Partitioning (A Quick Recap)

Range partitioning in PostgreSQL uses boundary values that define slices of the data, often by date or numeric ranges. If you have a transactions table, you might create monthly partitions based on a transaction_date column. This is intuitive for time-series data because each partition holds rows from a specific date range.

Advantages of Range Partitioning:

Easy pruning for date-based queries.
Straightforward approach to archiving old data: drop an entire partition for a past month, rather than issuing a massive DELETE.
Pairs nicely with time-based ingestion pipelines, where every day or month gets its own partition.
But as convenient as that is, there are cases where range partitioning runs into problems.

Why Range Partitioning Can Fall Short

Data Skew: If a huge portion of data lands in a single time interval—say, because of a traffic spike in the current month—that monthly partition might end up significantly larger than the others.
Complex Backfills: Not everyone ingests data in an orderly, daily manner. Sometimes you need to re-ingest or correct data that spans multiple periods. Merging or splitting range partitions can get cumbersome.
Non-Date Dimensions: Some tables aren’t naturally tied to a sequential numeric or date dimension. If your queries center on user IDs or device IDs, dividing by date might not solve your performance issues.

The post When HASH partitioning works better than RANGE appeared first on Stormatics.

Doug Ortiz
Posted by Andreas 'ads' Scherbaum on 2025-03-10 at 14:00
PostgreSQL Person of the Week Interview with Doug Ortiz: Consider myself a Technologist first and foremost that enjoys immersing into new and emerging technologies and trends. As to where I am from, I am originally a New Yorker that has moved around multiple continents and the USA.

Introducing pgNow: A free, point-in-time diagnostic tool for PostgreSQL
Posted by Ryan Booz on 2025-03-10 at 13:09
pgNow is a free, cross-platform desktop tool created by Redgate that helps you identify key performance metrics and configuration optimizations in your running Postgres instance. Available now as a public preview application, it’s designed to help when you’re in a pinch and don’t have the Postgres experience or monitoring solution already in place to help ... Read more

pgenv 1.4.0 is out!
Posted by Luca Ferrari on 2025-03-10 at 00:00

A new version with an interesting improvement in the configuration management.

pgenv 1.4.0 is out!

pgenv 1.4.0 is out with an interesting improvement regarding the configuration management.

When you install, and then use, a specific PostgreSQL version, pgenv loads the configuration to start the instance with from a configuration file that is named after the PostgreSQL specific version. For instance, if you are running version 17.1, then pgenv will load the configuration from a file named 17.1.conf. If the latter file does not exists, the pgenv script will try to load the default configuration file default.conf.

Now, thanks to the work done in the pgenv development, it is possible to allow for multiple configuration files with overrides. In particular, pgenv will load more than one configuration file with narrowing context related to the PostgreSQL version. Therefore, using a 17.1 PostgreSQL version will trigger the loading of the following files:

  • default.conf
  • 17.conf
  • 17.1.conf

Note the addition of the major version specific configuration file (in the above 17.conf).

This new configuration loading chain will make pgenv to load configuration from a default to a specific context, allowing also for a quicker sharing of configuration assuming you are interested only in the major version configuration.

OpenDay 2025 by PgTraining
Posted by Luca Ferrari on 2025-03-10 at 00:00

There are still seats available for this entire day dedicated to PostgreSQL!

OpenDay 2025 by PgTraining



PgTraining is organizing a free for all entire day dedicated to PostgreSQL, where people is going to meet by face.

The event, that will be held in the great NOI Techpark in Bolzano (Italy) will be organized in two parts:

  • a talk session in the morning
  • a laboratory sessione in the afternoon.

Please note that this is an italian only spoken language event.

The schedule of day is available, and there are still a few seats available (but you need to register in order to participate).

PgPedia Week, 2025-03-09
Posted by Ian Barwick on 2025-03-09 at 20:15

CommitFest 52 (2025-03), the final CommitFest in the PostgreSQL 18 development cycle, is underway.

PostgreSQL 18 changes ALTER TABLE ALTER CONSTRAINT ... SET [NO] INHERIT syntax added done CREATE INDEX parallel support for creating GIN indexes added done table statistics will not be updated if autovacuum is set to off done json_strip_nulls() and  jsonb_strip_nulls() can now remove null array elements done support for casting between bytea and integer types added done PL/pgSQL : => syntax for named cursor arguments supported postgres_fdw : column remote_backend_pid added to output of postgres_fdw_get_connections() done Trigger more frequent autovacuums with relallfrozen Add more monitoring data for WAL writes in the WAL receiver

more...

Automated Management of Extended Statistics in PostgreSQL
Posted by Andrei Lepikhov in Postgres Professional on 2025-03-09 at 15:34

Here, I am describing the results of a Postgres extension I developed out of curiosity. This extension focuses on the automatic management of extended statistics for table columns. The idea originated while I was finishing another "smart" query-driven project aimed at enhancing the quality of Postgres query planning. I realised that Postgres is not yet equipped enough for fully autonomous poor query plan detection and adjustment optimisations. Therefore, it might be beneficial to approach the problem from a different angle and create an autonomous, data-driven helper.

What is extended statistics?

The extended statistics tool allows you to tell Postgres that additional statistics should be collected for a particular set of table columns. Why is this necessary? - I will try to quickly explain using the example of an open power plant database. For example, the fuel type (primary_fuel) used by a power plant is implicitly associated with the country's name. Therefore, when executing a simple query:

SELECT count(*) FROM power_plants
WHERE country = '' AND primary_fuel = 'Solar';

we see that this number is zero for Norway and 243 for Spain. This is apparent to us since it is defined by latitude, but the DBMS does not know this, and at the query planning stage, it incorrectly estimates the sample (row number): 93 for Norway and 253 for Spain. If the query turns out to be a little more complex and the estimated data are the input for a JOIN operator, this can lead to unfortunate consequences. The extended statistic calculates the joint distribution of values ​​in columns and allows us to detect such dependencies.

In fact, there are worse situations in ORMs. In the power plant database example, this could be the joint use of conditions on the country and country_long fields. After reading their description, anyone understands that there is a direct correlation between these fields, and when the ORM groups by both of these fields, we get a significant error:

EXPLAIN (ANALYZE, COSTS ON, TIMING OFF, B
[...]

SCaLE 22x
Posted by Henrietta Dombrovskaya on 2025-03-08 at 16:29

I am behind the rest of the Postgres Community members attending SCaLE 22x, who have already posted a lot! In my defense, all the talks were so interesting that I could not skip any, even though I had to beat the Arctic cold of Ballroom G!

In addition to the regular program, Jimmy Angelakos had a live stream before the start of the talks both on Thursday and Friday and here are the links (I am in the first one).

As I said, I loved all the talks, although one or two ended up being not what I thought they will be, but interesting nevertheless).

I asked about a million questions during Magnus’ Postgres 18 talk, and then some more during “Ask me anything” session.

While we are on this topic, I wanted to point to the talks live streams:

Thursday:

Friday:

I am OK with how my talk went (for the record, I think that the same talk at PG Day Lowlands was better received:

I do not know how to resolve the problem that even when we present at a Linux conference, not a Postgres conference, most people who attend are people who are already using Postgres or complete strangers who have heard about Postgres for the first time and the target audience is still somewhere else.

Top posters

Number of posts in the past two months

Top teams

Number of posts in the past two months

Feeds

Planet

  • Policy for being listed on Planet PostgreSQL.
  • Add your blog to Planet PostgreSQL.
  • List of all subscribed blogs.
  • Manage your registration.

Contact

Get in touch with the Planet PostgreSQL administrators at planet at postgresql.org.