Latest Blog Posts

Is pg_dump a Backup Tool?
Posted by Robert Haas in EDB on 2024-10-15 at 20:03
Recently, I've been hearing a lot of experienced PostgreSQL users reiterate this line: "pg_dump is not a backup tool." In fact, the documentation has recently been updated to avoid saying that it is a backup tool, to widespread relief. Experienced PostgreSQL users and developers have been publicly called out for having the temerity to assert that pg_dump is, in fact, a backup tool. I find this narrative deeply frustrating, for two reasons.Read more »

Dealing with trigger recursion in PostgreSQL
Posted by Laurenz Albe in Cybertec on 2024-10-15 at 13:04

A father tells his son to hit back the next time, but the son is afraid of trigger recursion.
© Laurenz Albe 2024

Many a beginner falls into the trap of trigger recursion at some point. Usually, the solution is to avoid recursion at all. But for some use cases, you may have to handle trigger recursion. This article tells you what you need to know about the topic. If you were ever troubled by the error message “stack depth limit exceeded”, here is the solution.

The beginner's mistake leading to trigger recursion

Triggers are the only good way to change data automatically. Constraints are the “policemen” that make sure rules are not voilated, but triggers are the workers that make the data stay in line. A beginner who has understood that may (quite correctly) wish to use a trigger to set the updated_at column in the following table:

CREATE TABLE data (
   id bigint
      GENERATED ALWAYS AS IDENTITY
      PRIMARY KEY,
   value text NOT NULL,
   updated_at timestamp with time zone
      DEFAULT current_timestamp
      NOT NULL
);

The column default will set updated_at when the row is inserted, but won't change the value when you update the row. For that, our beginner writes a trigger:

CREATE FUNCTION set_updated_at() RETURNS trigger
   LANGUAGE plpgsql AS
$$BEGIN
   UPDATE data
   SET updated_at = current_timestamp
   WHERE data.id = NEW.id;

   RETURN NEW;
END;$$;

CREATE TRIGGER set_updated_at
   AFTER UPDATE ON data FOR EACH ROW
   EXECUTE FUNCTION set_updated_at();

But that won't work as intended:

INSERT INTO data (value) VALUES ('initial') RETURNING id;

 id 
════
  1
(1 row)

UPDATE data SET value = 'changed' WHERE id = 1;
ERROR:  stack depth limit exceeded
HINT:  Increase the configuration parameter "max_stack_depth" (currently 2048kB), after ensuring the platform's stack depth limit is adequate.
CONTEXT:  SQL statement "UPDATE data
   SET updated_at = current_timestamp
   WHERE data.id = NEW.id"
PL/pgSQL function set_updated_at() line 2 at SQL statement
SQL statement "UPDATE data
   SET updated_at = current_timestamp
   WHERE data.id = NEW.id"
PL/pgSQL function set_updated_
[...]

PostgreSQL Anonymizer 2.0 - Generating Fake Data
Posted by damien clochard in Dalibo on 2024-10-15 at 10:17

After several months of development, version 2.0 of PostgreSQL Anonymizer has entered the beta phase, and this is an opportunity for us to launch a series of articles to present its new capabilities in preview!

For this first technical overview, let’s see how to generate fake data (also known as “synthetic data”).

Photo Credit Markus Spiske

Logo PostgreSQL Anonymizer

Why is it important?

PostgreSQL Anonymizer 2.0 offers a wide range of functions to generate fake but realistic data. These functions are useful for writing masking rules and replacing sensitive data with data that “looks real.”

But that’s not the only benefit of these functions. In fact, they can also be used from the very first steps of a new project: when designing a data model, it is essential to “populate” the tables with data so as not to start development “empty-handed.”

Consider a new application that needs to rely on a classic customer table:

CREATE TABLE public.customer (
    id          INT PRIMARY KEY,
    firstname   TEXT,
    lastname    TEXT,
    email       TEXT
);

We want to insert 2000 people into this table, but how can we do it since the application doesn’t exist yet?

Why is it complicated?

Obviously, inserting John Doe 2000 times into the table is not really an option! The challenge is to produce realistic and context-appropriate data.

For example, if the application is intended for adult users, we want the birth dates to be between 1950 and 2006.

This is where PostgreSQL Anonymizer comes into play with its wide range of randomization functions and fake data generators.

INSERT INTO customer
SELECT
        i*100+pg_catalog.random(0,99), -- avoid collisions !
        anon.dummy_first_name(),
        anon.dummy_last_name(),
        anon.dummy_free_email()
FROM generate_series(1,2000) i;

Note: The random(x,y) function is one of the new features of PostgreSQL 17! For earlier versions, the anon.random_int_between(x,y) function is an equivalent alternative.

In total, PostgreSQL Anonymizer provides more than 70 fake data

[...]

PGDay Lowlands 2024 Reflections
Posted by Pavlo Golub in Cybertec on 2024-10-15 at 05:13

Introduction

This year, PGDay Lowlands 2024 was paired with PGDay UK 2024, which gave me the chance to try something new—taking the train directly from London to Amsterdam. Naturally, I was curious about passing through the famous tunnel under La Manche (some folks call it the English Channel, but I am still unsure why 😜). Spoiler alert: there's nothing cool to see, just darkness. 🙂

If you want to read about PGDay UK 2024, I recommend the blog post of my friend Stefan Fercot. Thank God he did a great job, so I can focus only on one event instead of two! 😅

A Special Connection with Amsterdam

PGDay Lowlands 2024 marks the first year of this event, but Amsterdam holds a special place in my professional journey. My first ever PostgreSQL community conference was in this beautiful city, way back in 2011, during pgconf.eu. So, Amsterdam isn't just another city for me; it's a key part of my community and professional life. I always love coming back here.

The conference schedule was pretty packed, so I only had a little time to visit my favorite places in Amsterdam this time. But the experience of the conference itself made up for that.

My Talk: "Customizing the Wordle Game Experience with PostgreSQL"

My presentation was titled "Customizing the Wordle Game Experience with PostgreSQL." Despite the fun name, the talk covered some serious and interesting topics, like full-text search dictionaries and customizing psql. The best part? We ended the talk by playing Wordle with the audience, where they had to guess PostgreSQL 5-letter keywords. I'm happy to report that Gianni Ciolli is officially the best keyword professional among us! 😄

A Second Chance

I initially presented this talk once at PgConf in New York, but unfortunately, my slot was in parallel with Bruce Momjian's session. Naturally, many attendees chose to go to his talk (honestly, I would have done the same!). That left me with little feedback and some doubts about the talk's appeal. So, I'm incredibly

[...]

Tuning the glibc memory allocator (for Postgres)
Posted by Tomas Vondra on 2024-10-14 at 10:00

If you’ve done any Postgres development in C, you’re probably aware of the concept of memory contexts. The primary purpose of memory contexts is to absolve the developers of having to track every single piece of memory they allocated. But it’s about performance too, because memory contexts cache the memory to save on malloc/free calls. But malloc gets the memory from another allocator in libc, and each libc has its own thing. The glibc allocator has some concurrency bottlenecks (which I learned the hard way), but it’s possible to tune that.

pgsql_tweaks 0.10.7 Released
Posted by Stefanie Janine on 2024-10-13 at 22:00

pgsql_tweaks is a bundle of functions and views for PostgreSQL

The soucre code is available on GitLab, a mirror is hosted on GitHub.
One could install the whole package, or just copy what is needed from the source code.

The extension is also available on PGXN.

General changes

It is only a minor release, no function or view has been changed.

What has been changed is the creation of the release files. Especially the test retult output. The file test/sql/out/pgsql_tweaks_test--0.10.7.out is now created with catptions and messages set to en_EN with the parameter LC_MESSAGES.
This does make the test results comparable.
In the README it is also explained, how to compare local test results with the released version test result.

In addtion the description on how to create the extension from source has been extended.

Contributions of w/c 2024-10-07 (week 41)
Posted by Jimmy Angelakos in postgres-contrib.org on 2024-10-12 at 19:45

SQL/JSON is here! (kinda “Waiting for Pg 17”)
Posted by Hubert 'depesz' Lubaczewski on 2024-10-11 at 21:37
Amazing. Awesome. Well, but what is it? We could store json data in Pg since PostgreSQL 9.2 – so it's been there for over 12 years now. How is the new shiny thing different? What does it allow you to do? Let's see if I can shed some light on it… For starters: SQL/JSON is … Continue reading "SQL/JSON is here! (kinda “Waiting for Pg 17”)"

CNPG Recipe 14 - Useful Command-Line Tools
Posted by Gabriele Bartolini in EDB on 2024-10-11 at 16:53

In this CNPG recipe, we explore three essential command-line tools that simplify working with CloudNativePG in Kubernetes: view-secret, view-cert, and stern. These tools enhance tasks such as inspecting secrets, verifying certificates, and tailing logs across multiple pods, streamlining your PostgreSQL management experience in a cloud-native environment. Whether you’re troubleshooting or optimising workflows, these utilities will help you boost productivity and gain better control over your Kubernetes-based PostgreSQL deployments.

Convert JSON into Columns and Rows with JSON_TABLE
Posted by Paul Ramsey in Crunchy Data on 2024-10-11 at 14:30

JSON_TABLE, new in Postgres 17

If you missed some of the headlines and release notes, Postgres 17 added another huge JSON feature to its growing repository of strong JSON support with the JSON_TABLE feature. JSON_TABLE lets you query JSON and display and query data like it is native relational SQL. So you can easily take JSON data feeds and work with it like you would any other Postgres data in your database.

Shaking the Earth with JSON_TABLE

A few days ago, I was awakened in the middle of the night when my house started to shake. Living in the Cascadia subduction zone, when things start to shake I wake up really fast, because you never know if this one is going to be the Big One.

Fortunately this one was only a little one, 4.0 magnitude quake several miles to the north of the city, captured and memorialized in its own USGS earthquake page, almost as soon as it had finished shaking.

The USGS keeps a near-real-time collection of information about the latest quakes online, served up in a variety of formats including GeoJSON.

alt

The weekly feed of magnitude 4.5 quakes has a nice amount of data in it.

If we could import this feed into the database, we could use it for other queries, like finding potential customers to sell tents and emergency supplies to! (When the big one hits me, sell me some tents and emergency supplies.)

This readily available GeoJSON earthquake file seems like the perfect chance to try out the new JSON_TABLE. And maybe give me something to do in the middle of night.

Retrieving a JSON file with HTTP

The first step is to retrieve the feed. The simplest way is to use the http extension, which provides a simple functional API to making HTTP requests.

CREATE EXTENSION http;

The http_get(url) function returns an http_response, with a status code, content_type, headers and content. We could write a wrapper to check the status code, but for this example we will just assume the feed works and look at the content.

SELECT jsonb_pretty(content::jsonb)
  FRO
[...]

PostgreSQL Berlin October 2024 Meetup
Posted by Andreas Scherbaum on 2024-10-11 at 02:36

On August 9th, 2024, we had the PostgreSQL October Meetup in Berlin. Amazon hosted it, and sponsored the Meetup in their office in the Brüderstraße, in Berlin Mitte.

More Release Note Details
Posted by Bruce Momjian in EDB on 2024-10-10 at 20:30

Over the years, we occasionally get requests for more detail on release note items, particularly for the major release notes. Historically, our only response was to tell people to view the SGML/XML file used to generate the web and PDF versions of the release notes. In the file, above each release note item, is a comment block which contains the commit headers responsible for the item, e.g.:

<!--
Author: David Rowley <drowley(at)postgresql(dot)org>
2024-01-23 [b262ad440] Add better handling of redundant IS [NOT] NULL quals
Author: David Rowley <drowley(at)postgresql(dot)org>
2024-04-12 [3af704098] Fix IS [NOT] NULL qual optimization for inheritance tabl
-->

Continue Reading »

Transform PostgreSQL into a Columnar Database Using Citus
Posted by semab tariq in Stormatics on 2024-10-10 at 09:23

Columnar databases are transforming the way we handle large datasets by storing data in columns rather than rows. This approach enhances performance, especially for analytical queries, by allowing faster data retrieval and efficient storage. As businesses generate more data than ever, understanding the benefits of columnar databases becomes crucial. In this blog, we'll explore how these databases work, their advantages over traditional row-based systems, and why they are becoming a popular choice for data-driven organizations.

The post Transform PostgreSQL into a Columnar Database Using Citus appeared first on Stormatics.

Comparing Columnar to Heap Performance in Postgres with pg_timeseries
Posted by Shaun Thomas in Tembo on 2024-10-10 at 09:00
The pg_timeseries extension uses Hydra Columnar storage for historical data. Check this out if you want to know how this kind of storage compares to standard Postgres Heap storage during database writes.

PostgreSQL 17 - A Major Step Forward in Performance, Logical Replication and More
Posted by Ahsan Hadi in pgEdge on 2024-10-10 at 07:01

After a successful 3rd beta in August 2024, the PostgreSQL development group released the GA version on September 26th. Recently, I blogged about some of the key logical replication features that you'll see in PostgreSQL 17 https://www.pgedge.com/blog/logical-replication-features-in-Postgres 17.  In this blog I'll describe a couple of new performance features that you'll find in Postgres 17 as well as another important logical replication feature that I didn't cover in my earlier blog of this series.PostgreSQL has grown remarkably over the years, and with each major release has become a more robust, reliable, and responsive database for both mission critical and non-mission critical enterprise applications. The global and vibrant PostgreSQL community is contributing to PostgreSQL success, diligently ensuring that all changes are carefully scrutinized and reviewed before they are added to the project source code. It is also very encouraging to see big technology names like Microsoft, Google, Apple, and others investing in Postgres by developing in-house expertise and giving back to the open source community.Improvements to logical replication are paving the way for adding distributed PostgreSQL support to the core functionality. Distributed PostgreSQL refers to the implementation of PostgreSQL in a distributed architecture, allowing for enhanced scalability, fault tolerance, and improved performance across multiple nodes. A pgEdge fully distributed PostgreSQL cluster already provides essential enterprise features like improved performance with low latency, ultra-high availability, data residency, and fault tolerance. Now without further adieu let's discuss some PostgreSQL 17 performance features:     

Improved Query Performance with Materialized CTEs

Common Table Expressions (CTEs) in PostgreSQL are temporary result sets that can be referenced within a , , , or  statement. They enhance the readability and organization of complex queries and can be recursive, making them particularly useful for hierarchical [...]

sparql_fdw Foreign Data Wrapper Tested Against PostgreSQL 17
Posted by Stefanie Janine on 2024-10-09 at 22:00

sparql_fdw Forreign Data Wrapper

The sparql_fdw is a foreign data wrapper to connect to query web databases with the SPARQL protocol from inside PostgreSQL written in Python.

Test Against PostgreSQL 17

As multicorn2 is already working with PostgreSQL 17, I tested the sparql_fdw, too.

It worked like a charm and I have added PostgreSQL 17 to the supported versions in the README file, the merge request is pending.

Enhanced Postgres Release Notes
Posted by Greg Sabino Mullane in Crunchy Data on 2024-10-09 at 21:30

There is something new you may not have seen in the release notes for Postgres 17. No, not a new feature - I mean inside the actual release notes themselves! The Postgres project uses the git program to track commits to the project, and now each item in the release notes has a link to the actual commit (or multiple commits) that enabled it.

You may have missed it if you were scanning the release notes, but after the end of each specific item in the release note is a small “section” symbol which looks like this: §. Each of these symbols is a link leading to the relevant commit for that item. Here’s what it looks like on the Postgres 17 release notes page:

pg_commit_plain.png

Clicking the section symbol will send you to the GIT link for each individual patch, for example, this one:

hackers email messsage

Note that there’s a “Discussion” link in each commit linking back to the full thread on the hackers channel.

Writing the release notes is hard work, and involves a good bit of debate in the community. We have to make sure we list all the changes, in a concise yet comprehensible manner, and decide what level of detail to include. Oftentimes, this level is not sufficient for people interested in learning about this feature. That’s where these new commit links are invaluable. They link to the git commit, which not only lets you see the exact code changes that were made, but show the actual commit message, which has more detail than what can be provided in the release notes.

Postgres Notes in Lots of Places

Postgres release notes appear in lots of different places so this addition will surely make its way into other downstream projects. This new link also now appears on “postgres all versions” - a project that I maintain that collates the information from all of the release notes for every version of Postgres (over 500 now!) into a single page. To make the link more visible and easier to use, I converted it to a larger Unicode scroll symbol, then added some tooltip tech to make it show information about the link like so:

[...]

PGXN Certifications RFC
Posted by David Wheeler in Tembo on 2024-10-09 at 16:26

A couple weeks ago, I drafted PGXN RFC–5 — Release Certification, which proposes to replace the simple inclusion of a SHA-1 hash digests in PGXN release META.json files with a JWS-signed release payload. From the introduction:

This RFC therefore proposes to extend v2 distribution metadata with a single additional property, certs, that contains one or more certifications that attest to the authenticity or other characteristics of a release on PGXN.

The certs value is an object that contains at least one property, pgxn, which itself contains a PGXN-generated RFC 7515 JSON Web Signature in the JWS JSON Serialization format. The pgxn property will allow clients not only to assemble the release URL and verify the downloaded file against checksums, but also validate it against a public key provided by PGXN.

The design allows multiple signatures, certifications, or other attestations, which in the future MAY allow authors or other entities to sign releases with their own keys. The new format appends a structure such as this to the distribution META.json file:

{
  "certs": {
    "pgxn": {
      "payload": "eyJ1c2VyIjoidGhlb3J5IiwiZGF0ZSI6IjIwMjQtMDktMTNUMTc6MzI6NTVaIiwidXJpIjoiZGlzdC9wYWlyLzAuMS43L3BhaXItMC4xLjcuemlwIiwiZGlnZXN0cyI6eyJzaGE1MTIiOiJiMzUzYjVhODJiM2I1NGU5NWY0YTI4NTllN2EyYmQwNjQ4YWJjYjM1YTdjMzYxMmIxMjZjMmM3NTQzOGZjMmY4ZThlZTFmMTllNjFmMzBmYTU0ZDdiYjY0YmNmMjE3ZWQxMjY0NzIyYjQ5N2JjYjYxM2Y4MmQ3ODc1MTUxNWI2NyJ9fQ",
      "signature": "cC4hiUPoj9Eetdgtv3hF80EGrhuB__dzERat0XF9g2VtQgr9PJbu3XOiZj5RZmh7AAuHIm4Bh-rLIARNPvkSjtQBMHlb1L07Qe7K0GarZRmB_eSN9383LcOLn6_dO--xi12jzDwusC-eOkHWEsqtFZESc6BfI7noOPqvhJ1phCnvWh6IeYI2w9QOYEUipUTI8np6LbgGY9Fs98rqVt5AXLIhWkWywlVmtVrBp0igcN_IoypGlUPQGe77Rw"
    }
  }
}

Review and feedback would be very much appreciated, especially on the list of unresolved questions toward the end.

Thanks to David Christensen and Steven Miller for the early review

[...]

⛰️ Postgres Ecosystem Summit EU
Posted by David Wheeler in Tembo on 2024-10-09 at 15:27

Given the success of the Extension Ecosystem Summit at PGConf.dev back in May, my colleague Floor Drees has organized a sequel, the Extension Ecosystem Summit EU on Tuesday, October 22, at the Divani Caravel Hotel in Athens. That’s “Day 0” at the same hotel as PGConf.eu. Tembo, Percona, Xata, and Timescale co-sponsor.

While the May event took the form of an open-space technology (OST)-style unconference aimed at extension developers, the EU event aims to inform an audience of Postgres users about the history and some exemplary use cases for extensions. From the invite:

Join us for a gathering to explore the current state and future of Postgres extension development, packaging, and distribution. Bring your skills and your devices and start contributing to tooling underpinning many large Postgres installations.

  • Jimmy Angelakos - pg_statviz: pg_statviz is a minimalist extension and utility pair for time series analysis and visualization of PostgreSQL internal statistics.
  • Adam Hendel (Tembo) - pgmq: pgmq is a lightweight message queue. Like AWS SQS and RSMQ but on Postgres. Adam is pgmq’s maintainer since 2023, and will present a journey from pure Rust → pgrx → pl/pgsql.
  • Alastair Turner (Percona) - pg_tde: pg_tde offers transparent encryption of table contents at rest, through a Table Access Method extension. Percona has developed pg_tde to deliver the benefits of encryption at rest without requiring intrusive changes to the Postgres core.
  • Gülçin Yıldırım Jelínek (Xata) - pgzx: pgzx is a library for developing PostgreSQL extensions written in Zig.
  • Mats Kindahl (Timescale) - TimescaleDB (C), [pgvectorscale] (Rust) and pgai (Python): maintaining extensions written in different languages.

I will also deliver the opening remarks, including a brief history of Postgres extensibility. Please join us if you’re in the area or planning to attend PGConf

[...]

RDS Connectivity
Posted by Grant Fritchey on 2024-10-07 at 14:04

I travel quite a lot for work. Most of it is in the US and Europe, but I get around to other places as well. Most of the time, connectivity, while not the greatest, isn’t that bad, or hard. I run a VPN on my phone and laptop. I can set up my RDS connectivity […]

The post RDS Connectivity appeared first on Grant Fritchey.

Yurii Rashkovskii
Posted by Andreas 'ads' Scherbaum on 2024-10-07 at 14:00
PostgreSQL Person of the Week Interview with Yurii Rashkovskii: I was born and raised in Ukraine, and moved to Canada in 2008 without knowing much about it. I wanted to go as far west as I could to explore. That move changed my life a lot, and I now live on beautiful Vancouver Island in British Columbia.

Contributions of w/c 2024-09-30 (week 40)
Posted by Floor Drees in postgres-contrib.org on 2024-10-07 at 08:15

Contributions of w/c 2024-09-23 (week 39)
Posted by Jimmy Angelakos in postgres-contrib.org on 2024-10-06 at 00:54

Automating Highly Available PostgreSQL Clusters
Posted by Douglas Hunley on 2024-10-04 at 14:07
In March of ‘23, I took over as the Lead Architect of my employer’s Ansible-based automation for creating highly-available PostgreSQL clusters.

Transitioning from Oracle to PostgreSQL: Tablespaces
Posted by Umair Shahid in Stormatics on 2024-10-04 at 08:07

Tablespaces play an important role in database management systems, as they determine where and how database objects like tables and indexes are stored. Both Oracle and PostgreSQL have the concept of tablespaces, but they implement them differently based on the overall architecture of each database.

The post Transitioning from Oracle to PostgreSQL: Tablespaces appeared first on Stormatics.

Failover Slots in PostgreSQL-17: Ensuring High Availability with Logical Replication
Posted by Amit Kapila in Fujitsu on 2024-10-04 at 05:51

With the introduction of failover slots in PostgreSQL-17, logical replication can now be effectively utilized in high-availability environments. This feature ensures that logical replication continues seamlessly even if the publisher node fails and its physical standby is promoted as the new publisher.

To maintain continuous replication after a failover, you need to configure the following parameters and options for both logical and physical replication setups:

1. Enable Failover Property on Slots:

•  Set the failover property for the slots corresponding to the subscriptions that need to continue receiving changes after a failover. This can be done using the failover option in CREATE SUBSCRIPTION or the failover parameter in the pg_create_logical_replication_slot() SQL API. These slots will be referred to as failover slots.

2. Synchronize Replication Slots:

•  Set the sync_replication_slots parameter to on on the standby. This ensures that the failover slots are periodically synchronized by the slotsync worker.

3. Configure Physical Replication Slot:

•  Ensure a physical replication slot exists between the primary and the standby. Set primary_slot_name on the standby and ensure the slot with that name exists on the primary. Also, enable hot_standby_feedback on the standby to prevent row removal needed by the standby.

4. Prevent Subscriber Overrun:

•  On the primary, set the synchronized_standby_slots parameter to include the physical replication slot. This prevents the subscriber from consuming changes faster than the hot standby can handle.

5. Update Connection Info:

•  Modify primary_conninfo on the standby to include dbname=postgres. This is necessary for the slot sync worker to connect to the primary.

Example Setup

Here's an example demonstrating logical replication continuity after failover using three nodes: Node-A (primary server/publisher), Node-B (physical standby for Node-A), and Node-C (subscriber for Node-A).

Node-A (Publisher)
Initialize th
[...]

Recap of OSS Vienna: What is Happening in the Open Source World
Posted by Gülçin Yıldırım Jelínek in Xata on 2024-10-04 at 00:00
My notes from the Open Source Summit Europe in Vienna, featuring fresh updates from open source communities, insights from the Xata booth, and highlights from our Postgres Happy Hour

PostgreSQL 'VALUES -> ANY' transformation
Posted by Andrei Lepikhov in Postgres Professional on 2024-10-03 at 23:58

Introduction

As usual, this project was prompted by multiple user reports with typical complaints, like 'SQL server executes the query times faster' or 'Postgres doesn't pick up my index'. The underlying issue that united these reports was frequently used VALUES sequences, typically transformed in the query tree into an SEMI JOIN.

I also want to argue one general question: Should an open-source DBMS correct user errors? I mean optimising a query even before the search for an optimal plan begins, eliminating self-joins, subqueries, and simplifying expressions - everything that can be achieved by proper query tuning. The question is not that simple since DBAs point out that the cost of query planning in Oracle overgrows with the complexity of the query text, which is most likely caused, among other things, by the extensive range of optimisation rules.

Now, let's turn our attention to the VALUES construct. Interestingly, it's not just used with the INSERT command but also frequently appears in SELECT queries in the form of a test of inclusion in a set:

SELECT * FROM something WHERE x IN (VALUES (1), (2), ...);

and in the query plan, this syntactical construct is transformed into SEMI JOIN. To demonstrate the essence of the problem, let's generate a test table with an uneven distribution of data in one of the columns:

CREATE EXTENSION tablefunc;
CREATE TABLE norm_test AS
  SELECT abs(r::integer) AS x, 'abc'||r AS payload
  FROM normal_rand(1000, 1., 10.) AS r;
CREATE INDEX ON norm_test (x);
ANALYZE norm_test;

here, the value x of the norm_test table has a normal distribution with a mean of 1 and a standard deviation 10 [1]. There are not too many distinct values​, which will all be included in the MCV statistics. As a result, it will be possible to calculate the number of duplicates accurately for each value despite the uneven distribution. Also, we naturally introduced an index on this column, easing the table’s scanning. Now, let's execute the query:

EXPLAIN ANALYZE
SELECT * FROM norm_test W
[...]

AI-assisted Postgres experiment: number of partitions vs. planning time
Posted by Nikolay Samokhvalov in Postgres.ai on 2024-10-03 at 21:45

In one of the recent PostgresFM episodes, Michael Christofides and Nikolay discussed planning time and what can affect it. One of the obvious negative factors we've discussed is the large number of partitions a partition table might have.

In this post, we're using our Postgres.AI assistant to see how planning time depends on the number of partitions.

This is the very first blog post of its kind: it has an integrated AI that you can use to explore the topic discussed here further, to repeat the experiment, alter it in any direction, and study the Postgres behavior more effectively.

If you, like some of us, prefer reading summaries, please jump straight to the AI chat section and ask for the "TL;DR" version. We don't mind.

Why

It's quite a well-known fact that with the growth in the number of partitions, planning time also grows. The official documentation says:

It is important to consider the overhead of partitioning during query planning and execution. The query planner is generally able to handle partition hierarchies with up to a few thousand partitions fairly well, provided that typical queries allow the query planner to prune all but a small number of partitions. Planning times become longer and memory consumption becomes higher when more partitions remain after the planner performs partition pruning.

There is a long thread in the pgsql-hackers mailing list where this problem is discussed, and where there are patches proposed to improve this overhead:

This work is still in progress (currently, patches need a next round of reviews). Meanwhile, fast-growing OLTP projects need to find a way to deal with multi-terabate tables, and partitioning is widely used. We thought it would be a good idea to conduct a simple database experiment to show how exactly the planning time depends on the

[...]

Waiting for PostgreSQL 18 – Add temporal FOREIGN KEY contraints
Posted by Hubert 'depesz' Lubaczewski on 2024-10-03 at 18:19
On 17th of September 2024, Peter Eisentraut committed patch: Add temporal FOREIGN KEY contraints   Add PERIOD clause to foreign key constraint definitions. This is supported for range and multirange types. Temporal foreign keys check for range containment instead of equality.   This feature matches the behavior of the SQL standard temporal foreign keys, but … Continue reading "Waiting for PostgreSQL 18 – Add temporal FOREIGN KEY contraints"

Top posters

Number of posts in the past two months

Top teams

Number of posts in the past two months

Feeds

Planet

  • Policy for being listed on Planet PostgreSQL.
  • Add your blog to Planet PostgreSQL.
  • List of all subscribed blogs.
  • Manage your registration.

Contact

Get in touch with the Planet PostgreSQL administrators at planet at postgresql.org.