Latest Blog Posts

To Preload, or Not to Preload
Posted by David Wheeler in Tembo on 2024-07-24 at 14:00

To preload, or not to preload, that is the question:
Whether ’tis nobler in the ram to suffer
The slings and arrows of pointer functions,
Or to take arms against a sea of backends,
And by alter role limit them: to session, to user

— William Shakespeare, DBA (probably)

Recently I’ve been trying to figure out when a Postgres extension shared libraries should be preloaded. By “shared libraries” I mean libraries provided or used by Postgres extensions, whether LOADable libraries or CREATE EXTENSION libraries written in C or pgrx. By “preloaded” I mean under what conditions should they be added to one of the Shared Library Preloading variables, especially shared_preload_libraries.

The answer, it turns out, comes very much down to the extension type. Read on for details.

Normal Extensions

If your extension includes and requires no shared libraries, congratulations! You don’t have to worry about this question at all.

If your extension’s shared library provides functionality only via functions called from SQL, you also don’t need to worry about preloading. Custom types, operators, and functions generally follow this pattern. The DDL that creates objects, such as CREATE FUNCTION, uses the AS 'obj_file', 'link_symbol' syntax to tell PostgreSQL what library to load when SQL commands need them.

For certain extensions used by nearly every connection, there are may be performance benefits to preloading them in shared_preload_libraries, but it’s not required. See below for details.

Initializing Extensions

If your shared library needs to perform tasks before PostgreSQL would load it --- or if it would never be loaded implicitly by SQL statements --- then it must be explicitly loaded before it’s used. This is typically the case for libraries that modify the server’s behavior through hooks rather than providing a set of functions.

To accommodate these requirements, PostgreSQL provides three preloading levels that correspond to the configuration variables for

[...]

The fastest way to copy data between Postgres tables
Posted by Anthony Sotolongo León in OnGres on 2024-07-24 at 11:00

Introduction

Data cloning from one table to another in relational databases is a commonly used process to copy data from an existing table to a new or pre-existing table definition within the same database. This process can be performed for various reasons, such as data backup, information replication, and testing, among other purposes. Postgres and other third-party tools offer several techniques to clone data from one table to another.

The purpose of this blog is to try to find the fastest way to clone data in Postgres to be prepared for this type of request in the future. To do this, several examples and the results will be shown.

For the sake of the example, the following table definitions will be used, To simplify the testing activity, the tables will not have indexes or triggers:

postgres=# CREATE TABLE origin_table (i bigint, d date, j int, t character varying (10));
CREATE TABLE
postgres=# INSERT INTO origin_table SELECT i,'2024-02-15', i/10, md5(i::text)::character varying(10) FROM generate_series(1,20000000) as i; 
INSERT 0 20000000
postgres=# ANALYZE origin_table ;
ANALYZE
postgres=# CREATE TABLE target_table (i bigint, d date, j int, t character varying (10));
CREATE TABLE

Tests

INSERT INTO SELECT (IIS)

One of the most common ways to clone data using almost standard SQL. An IIS statement is used to select data from the original table and then insert it into the target table.

postgres=# INSERT INTO target_table SELECT * FROM origin_table ;
INSERT 0 20000000
Duración: 12795,623 ms (00:12,796)
--query plan
postgres=# EXPLAIN (ANALYZE,VERBOSE) INSERT INTO target_table SELECT * FROM origin_table ;
                                                                QUERY PLAN                                                           
------------------------------------------------------------------------------------
 Insert on public.target_table  (cost=0.00..347059.24 rows=0 width=0) (actual time=12757.949..12767.355 rows=0 loops=1)
   ->  Seq Scan on public.origin_table  
[...]

Speeding up index creation in PostgreSQL
Posted by Hans-Juergen Schoenig in Cybertec on 2024-07-23 at 08:00

Indexes are, by far, the most feature related to performance provided by every general purpose relational database. Without indexes, there are no such things as efficient search operations, no unique constraints, and no primary keys. Achieving good performance on reasonable amounts of data without indexes is therefore totally impossible.

Generating sample data

The main question now is: what happens during index creation, and how can we speed up the process? Creating indexes on billions of rows is quite costly because data has to be sorted and turned into an efficient search structure. To demonstrate how we can make indexes quickly, I have created a simple set of demo data:

blog=# CREATE TABLE t_data AS 
SELECT  id::int4, 
(random() * 1000000000)::int4 AS i, 
(random() * 1000000000) AS r, 
(random() * 1000000000)::numeric AS n 
FROM    generate_series(1, 1000000000) AS id;
Time: 1569002.499 ms (26:09.002)

This sample data has a couple of interesting properties: the “id” column is an ascending number. During index creation, this makes a huge difference. The second column contains a random value multiplied by the number of rows as integer value. The third column contains a “double precision” number, and finally, at the end, we are storing similar data as “numeric”, which is a floating point number that does not use the FPU (floating point unit of the CPU) internally. 

Once this billion of rows has been created, we can easily check the size of the table by using the following command: 

blog=# SELECT pg_size_pretty(pg_relation_size('t_data'));
 pg_size_pretty
----------------
 56 GB
(1 row)

Once the test data has been created, it makes sense to set a thing in PostgreSQL called “hint bits” which will allow us to make a fair comparison between various runs. If you are interested in hint bits and their relation to VACUUM, consider checking out our blog post about this topic. 

blog=# VACUUM ANALYZE;
VACUUM
Time: 91293.971 ms (01:31.294)

While VACUUM is running, we can check the progr

[...]

Wait a minute! — PostgreSQL extension pg_wait_sampling
Posted by Andrew Atkinson on 2024-07-23 at 00:00

PostgreSQL uses a complex system of locks to balance concurrent operations and data consistency, across many transactions. Those intricacies are beyond the scope of this post. Here we want to specifically look at queries that are waiting, whether on locks or for other resources, and learn how to get more insights about why.

Balancing concurrency with consistency is an inherent part of the MVCC system that PostgreSQL uses. One of the operational problems that can occur with this system, is that queries get blocked waiting to acquire a lock, and that wait time can be excessive, causing errors.

In order to understand what’s happening with near real-time visibility, PostgreSQL provides system views like pg_locks and pg_stat_activity that can be queried to see what is currently executing. Is that level of visibility enough? If not, what other opportunities are there?

Knowledge and Observability

When a query is blocked and waiting to acquire a lock, we usually want to get more information when debugging.

The query holding the lock is the “blocking” query. A waiting query and a blocking query don’t always form a one-to-one relationship though. There may be multiple levels of blocking and waiting.

Real-time observability

In Postgres, we have “real-time” visibility using pg_stat_activity.

We can find queries in a “waiting” state:

SELECT
    pid,
    wait_event_type,
    wait_event,
    LEFT (query,
        60) AS query,
    backend_start,
    query_start,
    (CURRENT_TIMESTAMP - query_start) AS ago
FROM
    pg_stat_activity
WHERE
    datname = 'rideshare_development';

We can combine that information with lock information from the pg_locks catalog.

Combining lock information from pg_locks and active query information from pg_stat_activity becomes powerful. The query below joins these sources together.

https://github.com/andyatkinson/pg_scripts/blob/main/lock_blocking_waiting_pg_locks.sql

The result row fields include:

  • blocked_pid
  • blocked_user
[...]

A Follow up on Key PostgreSQL Configuration Parameters for Enhanced Performance – Part 2
Posted by semab tariq in Stormatics on 2024-07-22 at 14:44

In a previous blog post, we configured an EC2 instance and installed PostgreSQL on it. After the initial setup, we ran pgbench to measure the initial TPS (transactions per second). Then, we tuned PostgreSQL and reran pgbench to see how much we could improve the TPS. To tune PostgreSQL, we adjusted various parameters based on the system's available resources. In this blog, we will identify which of those parameters contributed the most to the performance improvements.

We will use the same instance size on AWS EC2 as before, which is t2.large. Here is the list of parameters that we initially tuned...

The post A Follow up on Key PostgreSQL Configuration Parameters for Enhanced Performance – Part 2 appeared first on Stormatics.

pg_statviz 0.7 released with new features, PG17 support
Posted by Jimmy Angelakos on 2024-07-22 at 12:37

pg_statviz logo

I'm pleased to announce release 0.7 of pg_statviz, the minimalist extension and utility pair for time series analysis and visualization of PostgreSQL internal statistics.

This release comes with a huge new feature: the implementation of resampling with pandas to permit effectively unlimited data analysis. Now you can take thousands, potentially millions of full statistics snapshots and squash all those data points to just 100 plot points (by default, using downsampling with mean values) to enable you to get a long-term view of your system's performance and statistics.

Over 7 months of wait event statistics squashed down to 100 plot points The chart above shows what 7 months of wait event statistics (over 300,000 data point snapshots) look like, downsampled to just 100 plot points.

Other new features:

  • The addition of server I/O stats from the view pg_stat_io, which was added in PostgreSQL 16
  • Update for upcoming PostgreSQL 17 release (pg_stat_bgwriter is broken up!)
  • The JSONB in the conf table, which stores Postgres configuration snapshots, has been changed from "setting": "work_mem", "value": "4MB" to "work_mem": "4MB" to save approximately half the storage space on disk.

pg_statviz takes the view that everything should be light and minimal. Unlike commercial monitoring platforms, it doesn't require invasive agents or open connections to the database — it all lives inside your database. The extension is plain SQL and pl/pgSQL and doesn't require modules to be loaded, the visualization utility is separate and can be run from anywhere, and your data is free and easy to export.

  • You can download and install pg_statviz from the PostgreSQL repositories or PGXN.
  • The utility can also be installed from PyPi.
  • Manual installation is also possible.

Which cloud providers support auto_explain?
Posted by Michael Christofides on 2024-07-22 at 09:34

Photo credit: Kenrick Mills

From time to time I need to check whether a managed Postgres provider supports auto_explain, and if so, which auto_explain parameters they support. Since it can be time-consuming to set up and test each of these, I wanted to create a reference for myself, and thought it might also be of interest to others.

I did my testing over the past week, but plan to keep this up to date. If you notice a mistake, or that a provider changes their support, please do let me know.

What is auto_explain?

Firstly, in case you’re not familiar, auto_explain is an extension that ships with PostgreSQL as a contrib module, allowing you to log the execution plans for queries that ran on your database. This is particularly useful for queries that are intermittently slow. We have a guide for configuring auto_explain, in case that’s of interest, and also recorded a podcast episode about it, if you would like a more casual introduction.

Overview

Here is a roundup of the providers I’ve looked into so far, with a top level summary of whether they support auto_explain, and if so how many of its parameters.

Provider auto_explain Parameters
Amazon RDS Yes 9/13
Google Cloud SQL Yes 12/13
Azure Database Yes 12/13
Crunchy Bridge Yes 9/13
Timescale Cloud Yes 13/13
Aiven No
Digital Ocean No
EDB Big Animal Yes 13/13
Heroku
[...]

Looking for hidden hurdles when Postgres face partitions
Posted by Andrei Lepikhov in Postgres Professional on 2024-07-22 at 01:31

Preface

This post was initially intended to introduce my ‘one more Postgres thing' - a built-in fully transparent re-optimisation feature, which I'm really proud of. However, during benchmarking, I discovered that partitioning the table causes performance issues that are hard to tackle. So, let's see the origins of these issues and how PostgreSQL struggles with them.

Here, I do quite a simple thing: having the non-trivial benchmark, I just run it over a database with plain tables, do precisely the same thing over the database where all these tables are partitioned by HASH, and watch how it ends up.

When I finished writing the post, I found out that benchmarking data looked a bit boring. So, don’t hesitate to skip the main text and go to the conclusion.

Preliminary runs

The Join-Order Benchmark has some specifics that make it challenging to stabilise execution time through repeating executions. Processing many tables and most of their data makes query execution time intricately dependent on the shared buffer size and its filling. The frequent involvement of parallel workers further complicates the process, with the potential for one worker to start after a long delay, leading to performance slumps.

What’s more, it turned out that the ANALYZE command is not entirely stable on this benchmark's data, and I constantly observe that rebuilding a table statistic causes significant changes in estimations, followed by different query plans.

To manage these complexities, I used pg_prewarm for over 4GB shared buffers (all data size is about 8GB). We still can't pass statistics through the pg_upgrade process, although I feel it will be possible soon. So, I just analysed the tables one time after filling them with data. I passed all 113 benchmark queries ten times, wasting all the first run because of instability1. The scripts to create schemas, benchmarking scripts, and data can be found in the repository on GitHub.

To gauge the stability of the execution time, I conducted the benchmark over

[...]

You make a good point! — PostgreSQL Savepoints
Posted by Andrew Atkinson on 2024-07-22 at 00:00

This post will look at the basics of PostgreSQL Savepoints within a Transaction.

A transaction is used to form a non-separable unit of work to commit or not, as a unit. Transactions are opened using the BEGIN keyword, then either committed or may be rolled back. Use ROLLBACK without any arguments to do that.

Dividing Up a Transaction

Within the concept of a transaction, there is a smaller unit that allows for incremental persistence, scoped to the transaction, called “savepoints.” Savepoints create sub-transactions with some similar properties to a transaction.

Savepoints

Savepoints mark a particular state of the transaction as a recoverable position. In a similar way to how ROLLBACK rolls back an entire transaction, ROLLBACK TO captures a position within the transaction that the state of the data can be restored to.

After restoring to a savepoint, querying the data will show its state at the time the savepoint was created.

Commands

Savepoints have verbs to know about:

  • “Savepoint” may be used as a noun or verb depending on the context. Running the command SAVEPOINT a where a is the name of the savepoint, uses “savepoint” as a command verb that creates savepoint “a”. The savepoint “a” (a noun) was created.
  • The savepoint name can be reused, creating a new savepoint with the same name, reflecting a new state of the data.
  • Savepoints can be rolled back to, using the ROLLBACK TO command, specifying a named savepoint https://www.postgresql.org/docs/current/sql-rollback-to.html
  • Savepoints can be “released” by using the RELEASE command. Releasing a savepoint does not change the state of the data though, which is what ROLLBACK TO may do. Releasing a savepoint frees up the savepoint name and releases the resources used to create the samepoint. Read more: https://www.postgresql.org/docs/current/sql-release-savepoint.html

Let’s look at SQL commands for creating and rolling back to a savepoint:

BEGIN;

INSERT INTO vehicles (name)
[...]

sparql_fdw tested against PostgreSQL 15 und 16
Posted by Stefanie Janine on 2024-07-21 at 22:00

sparql_fdw

sparql_fdw is a foreign data wrapper to access data available over the internet in the SPARQL format from within PostgreSQL based on Multicorn2.

The soucre code and documentation is available on GitHub.
A merge request to the original repository has been opened.

General changes

I have tested the extension after recent changes in Multicorn2 against PostgreSQL 15 and 16, which have not been supported by now.

The installation examples had to be changed due to changes in Python 3.11, see PEP 668.
The problem here is, that Pythons pip3 needs a parameter, –break-system-packages, to install packages globally instead inside a virtual environment and a virtual environment is the last thing you’d like to have while accessing Python scripts from within a database server.

In addition the support of unsupported PostgreSQL versions have been changed. Currently PostgreSQL 12, 13, 14, 15, 16 are suppored.

There have been no changes on source code of sparql_fdw.

What tables were touched within given range of wal LSN?
Posted by Hubert 'depesz' Lubaczewski on 2024-07-19 at 15:00
We had a case recently where one of our DBs failed over to a new primary. To this server (old primary, and then new primary) we had connection from some kind of CDC tool (Debezium, I think). The thing is that while there was failover, this tool read (using logical decoding) changes on old primary … Continue reading "What tables were touched within given range of wal LSN?"

Magic Tricks for Postgres psql: Settings, Presets, Echo, and Saved Queries
Posted by Elizabeth Garrett Christensen in Crunchy Data on 2024-07-19 at 12:00

As I’ve been working with Postgres psql cli, I’ve picked up a few good habits from my Crunchy Data co-workers that make my terminal database environment easier to work with. I wanted to share a couple of my favorite things I’ve found that make getting around Postgres better. If you’re just getting started with psql, or haven’t ventured too far out of the defaults, this is the post for you. I’ll walk you through some of the friendliest psql settings and how to create your own preset settings file.

Some of the most helpful psql commands

Formatting psql output

Postgres has an expanded display mode, which will read your query results as batches of column and data, instead of a huge wide list of columns that expand the display to the right.

A sample expanded display looks like this:

-[ RECORD 1 ]------------------------------
id         | 1
name       | Alice Johnson
position   | Manager
department | Sales
salary     | 75000.00
-[ RECORD 2 ]------------------------------
id         | 2
name       | Bob Smith
position   | Developer
department | Engineering
salary     | 65000.00
--Automatically format expanded display for wide columns
\x auto

I have a tutorial up about using basic psql if you’re just getting started and want to try these commands out.

Table column borders in psql output

If you’re not using the extended display, you can have psql do some fancy column outlines with the \pset linestyle.

--Outline table borders and separators using Unicode characters
\pset linestyle unicode

That will get you query output that looks like this:

┌────┬───────┬─────┐
│ id │ name  │ age │
├────┼───────┼─────┤
│  1 │ Alice │  30 │
│  2 │ Bob   │  25 │
└────┴───────┴─────┘

Show query run times in psql

This will give you a result in milliseconds for the time the query took to run at the bottom:

-- Always show query time
\timing

Create a preset for your null values in psql

This will work with emojis or really anything utf-8 compatible:

-- Set Null char output to differentiate 
[...]

Row pattern recognition feature for PostgreSQL
Posted by Tatsuo Ishii in SRA OSS LLC on 2024-07-19 at 06:14

What is row pattern recognition feature?

Row pattern recognition (RPR) is a feature defined in the SQL standard. It allows to search for a sequence of rows by pattern.  Since I am working on this feature for PostgreSQL, I would like to give a brief introduction to RPR.
 
Consider a table holding date and daily stock price of a company. 
 company  |   tdate    | price
----------+------------+-------
 company1 | 2024-07-01 |   100
 company1 | 2024-07-02 |   200
 company1 | 2024-07-03 |   150
 company1 | 2024-07-04 |   140
 company1 | 2024-07-05 |   150
 company1 | 2024-07-06 |    90
 company1 | 2024-07-07 |   110
 company1 | 2024-07-08 |   130
 company1 | 2024-07-09 |   120
 company1 | 2024-07-10 |   130
(10 rows)
Suppose you want to find a sequence of rows in which the stock price rises once or more then falls once or more. For example, the stock price on July 1 is 100, then rises to 200 on July 2 and then falls to 150 and 140 on July 2 to July 3. RPR allows users to write this kind of queries in intuitive way.
 

RPR syntax

 To express the query in RPR, you first define row pattern  variables.
 
DEFINE
START AS TRUE,
UP AS price > PREV(price),
DOWN AS price < PREV(price)

Here DEFINE is a keyword to start the definition. START, UP and DOWN are row pattern variable names. The right hand side of AS is logical expression which the pattern variable needs to satisfy. The logical expression "TRUE" means always true. PREV() is a special function that only used in RPR (cannot use elsewhere, for example SELECT target or WHERE clause).  PREV takes a column name as an argument, and returns the previous row's column value. So if today's price is higher than yesterday's price, "price > PREV(price)" returns TRUE. DOWN has opposite meaning.

Once the row pattern variables are defined, you can define patterns to search for by using the row pattern variables.

PATTERN (START, UP+, DOWN+)

Here PATTERN is a k

[...]

pgroll 0.6.0 update
Posted by Andrew Farries on 2024-07-19 at 00:00
Learn about the latest changes in pgroll in the 0.6.0 release as we continue to build and turn it into a first-class open-source schema migration tool for Postgres.

PostgreSQL Performance Farm 2024 Progress Update
Posted by Mark Wong on 2024-07-18 at 00:26

It feels like it was time to revisit the PostgreSQL Performance Farm that Tomas Vondra envisioned back in 2010.  Between that and several Google Summer of Code iterations starting in 2018, the project just didn't seem to gain enough traction.  See Ilaria Battiston's presentation in 2022 for a demo.

I spent a few days proofing whether something like BuildBot might work on a whim.  It was to see if I could get something basic working in order to get a feel for whether leveraging an established continuous integration framework might be worth the trade offs from building our own system.  The result was a system running 4 TPC-derived workloads that are simply running the tests quickly, not necessarily interestingly, just to produce data.

I added some additional desired functionality after a few more days:

  • Trigger only when there are code changes.
  • Define Buildbot worker specific test and PostgreSQL parameters.
  • Overrides to the Buildbot worker defined test and PostgreSQL parameters.
  • Submit patches against PostgreSQL and the test kits.
How I implemented the added functionality probably needs some review, but I liked how things were looking enough to spend a little more time on it.

Next I wanted to see how much effort is needed to mine the data, especially since the results of the tests are not captured in a structured way.   I'm pretty sure it would take me much longer than a few days to write a new BuildBot view, so I instead opted to try scripting something to munge data and plot metrics.  I declared success by being able to quickly do so.  Here's an example of results from one system vanillaleaf, a Buildbot worker, showing the DBT-2 test metric per commit per branch:


The PostgreSQL community has access to a couple of systems for processor and memory intensive workloads, thanks to Equinix Metal, OSUOSL and IBM.  Both systems are continuing to run small tests so if building an interface to peruse results is successful, then I'll have more to show in the near future.

Postgres major version upgrades with minimal downtime
Posted by Carlos Pérez-Aradros Herce on 2024-07-18 at 00:00
With the beta release of dedicated clusters, we've added the ability to move branches between clusters. Xata customers can now perform Postgres major version upgrades with minimal downtime.

PostgreSQL Berlin July 2024 Meetup
Posted by Andreas Scherbaum on 2024-07-17 at 15:09

On July 16th, 2024, we had the PostgreSQL July Meetup in Berlin. Adjust hosted and Neon sponsored the Meetup in their Berlin Headquarter at Prenzlauer Berg, near the TV Tower.

Exploring PostgreSQL 17: A Developer’s Guide to New Features – Part 3: The COPY Command Gets More User-Friendly
Posted by Deepak Mahto on 2024-07-17 at 14:12

PostgreSQL 17 Beta was released on May 23, 2024, introducing a host of exciting new features anticipated to be part of the official PostgreSQL 17 release. In this blog series, we’ll delve into these features and explore how they can benefit database developers and migration engineers transitioning to the latest PostgreSQL version.

First part of the blog was on newer features with PL\pgSQL – Procedural language in PostgreSQL 17.

Second part of the blog was on newer features with Nulls Constraint and Performance – Procedural language in PostgreSQL 17.

PostgreSQL 17 – Enhancements with the COPY Command

The COPY command is the default native option for loading flat files, primarily in text or CSV format, into PostgreSQL. If you are a data engineer or database developer looking for ways to load flat files into PostgreSQL, you must take care of quality checks and encapsulate a mechanism to report or discard failed records.

Until PostgreSQL 16, if quality checks were planned to be done on the database side, one option was to preload flat files into staging tables with all columns as text or a generic data type and later move legitimate and discarded data to the concerned tables.

With PostgreSQL 17, the COPY command is more user-friendly and provides options for handling data type incompatibilities and logging failed rows or records. Based on your use case, you now have the feasibility to avoid staging tables and can load directly into your preferred target tables with all quality checks on data types.

PostgreSQL 17 – ON_ERROR/LOG_VERBOSITY

Let’s start by creating sample target tables and a CSV file with intentional data type issues. The sample table also has check and NOT NULL constraints defined on it.

CREATE TABLE copy17 (
    col1 integer,
    col2 text DEFAULT 'x'::text,
    col3 text NOT NULL,
    col4 text,
    col5 boolean,
    col6 date,
    col7 timestamp without time zone
    CONSTRAINT copy17_col4_check CHECK (length(col4) > 2)
);

The CSV file contain

[...]

PGDay UK 2024 - Schedule published
Posted by Dave Page in PGDay UK on 2024-07-16 at 10:35

Join us on 11th September 2024 in London, for a day of talks on the World's Most Advanced Open Source Database coupled with the usual valuable hallway track. This event is aimed at all users and developers of PostgreSQL and is your chance to meet and exchange ideas and knowledge with like-minded database fanatics in London.

Schedule

We are pleased to announce that the schedule for PGDay UK 2024 has now been published. You can see the fantastic selection of talks we have planned at:

https://2024.pgday.uk/schedule/

The team would like to thank all those who submitted talks, as well as the program committee who had a long and difficult job selecting the talks!

Registration

Registration for attendees is now open. For more information and to secure your seat, please visit:

https://2024.pgday.uk/registration/

Sponsors

Sponsor the event and take your chance to present your services or products to the PostgreSQL community - or see it as a give-back opportunity. The benefactor sponsorship level also includes a free entrance ticket. Please head to:

https://2024.pgday.uk/become-sponsor/

for more details.

See you there!

As usual, if you have any questions, don't hesitate to contact us at contact@pgday.uk.

We look forward to seeing you in London in September!

PGDay UK 2024 is a PostgreSQL Europe event run according to the PostgreSQL Community Conference Recognition programme. Local logistics and other services are provided by Slonik Enterprises Ltd. on a not-for-profit basis.

We look forward to seeing you in London in September!

Keyset pagination with descending order
Posted by Laurenz Albe in Cybertec on 2024-07-16 at 05:49
no-offset-banner
© Markus Winand 2014

Keyset pagination is the most performant way to retrieve a large result set page by page. However, the neat trick with composite type comparison doesn't always work. This article explains why and how you can work around that shortcoming.

An example table for paginated queries

We create a table with a million rows:

CREATE TABLE sortme (
   id bigint PRIMARY KEY,
   val1 integer NOT NULL,
   val2 timestamp with time zone NOT NULL,
   val3 text NOT NULL
);

-- deterministic, but sort of random values
INSERT INTO sortme
SELECT i,
       abs(hashint8(i)) % 200 + 1,
       TIMESTAMPTZ '2024-01-01 00:00:00' +
          (abs(hashint8(-i)) % 200) * INTERVAL '1 hour',
       substr(md5(i::text), 1, 2)
FROM generate_series(1, 1000000) AS i;

-- get statistics and set hint bits
VACUUM (ANALYZE) sortme;

Normal keyset pagination

If we want to ORDER BY val1, val2, the first query would look like

SELECT val1, val2, val3, id
FROM sortme
ORDER BY val1, val2, id
LIMIT 50;

The primary key id was added to the result list and the ORDER BY clause to provide uniqueness and thereby guarantee a stable order. Assuming that the last row returned from that query was (1, '2024-01-01 01:00:00+01', 'e3', 920198), the query for the next page would be

SELECT val1, val2, val3, id
FROM sortme
WHERE (val1, val2, id) > (1, '2024-01-01 01:00:00+01', 920198)
ORDER BY val1, val2, id
LIMIT 50;

With a multi-column index on (val1, val2, id), both queries would be lightning fast, as would be the queries for all following pages.

The problem with keyset pagination in descending and mixed order

Matters become more complicated as soon as we need descending sort order. A fairly simple case is if we need to sort the result by a single colum (for example, val3). Again, we need the unique column id as a tie-breaker. The solution is to use descending sort order for both columns:

-- first page
SELECT val1, val2, val3, id
FROM sortme
ORDER BY val3 DESC, id DESC
LIMIT 50;

-- next pa
[...]

Introducing multi-version schema migrations
Posted by Andrew Farries on 2024-07-16 at 00:00
Today's release of multi-version schema migrations addresses one of the most common pain points of application deployment - keeping your application code and database schema in sync. You can now present two versions of your schema, both old and new, to client applications.

RFC: PGXN Meta Spec v2
Posted by David Wheeler in Tembo on 2024-07-15 at 19:15

Two bits of news on the “PGXN v2” project.

PGXN RFCs: The Book

First, I’ve moved the RFC process (again, sorry) from PGXN Discussions, which were a bit fussy about Markdown formatting and don’t support inline comments, to the PGXN RFCs project, where use of pull requests on CommonMark Markdown documents address these issues. This process borrows heavily from the Rust RFCs project, right down to publishing accepted RFCs as a “book” site.

So I’d also like to introduce rfcs.pgxn.org, a.k.a., the PGXN RFCs Book.

It currently houses only one RFC: Meta Spec v1, dating from 2010. This document defines the structure of the META.json file required in archives published on PGXN.

But I expect many more RFCs to be drafted in the coming years, starting with draft RFC–2, the binary distribution RFC I POCed a few weeks ago. There has already been some great feedback in that pull request, in addition to the previous discussion. More eyes will make it even better.

PGXN Meta Spec v2 RFC

Last week I also iterated on the PGXN Metadata Sketch several times to produce draft RFC–3: Meta Spec v2. This represents a major reworking of the original spec in an attempt to meet the following goals:

  • Allow more comprehensive dependency specification, to enable packagers to identify and install system dependencies and dependencies from other packaging systems, like PyPI and CPAN
  • Adopt more industry-standard formats like SPDX License Expressions and purls.
  • Improve support multiple types of Postgres extensions, including apps, LOADable modules, background workers, and TLEs.
  • Improve curation and evaluation via categories, badging, and additional download links.

There’s a lot here, but hope the result can better serve the community for the next decade, and enable lots of new services and features.

The proof will be in the applicatio

[...]

Implementing UUIDs v7 in pure SQL
Posted by Daniel Vérité on 2024-07-15 at 18:14
In May 2024, the IETF standard on UUIDs (Universally Unique IDentifiers) has been updated with RFC 9562, finally officializing the UUID Version 7. This version is known to be a much better choice for database indexes than previous ones, since it has values generated consecutively already sorted. PostgreSQL does not yet have a built-in function to generate UUIDs v7, but of course several extensions do exist. The ones I found tend to require a compilation step and superuser privileges to install, as they’re written in “untrusted languages” like C or Rust. However, UUID-v7 functionalities can be implemented in pure SQL so they can be installed easily everywhere. In this post, let’s see how to do that.

Autovacuum Tuning Basics
Posted by Tomas Vondra in EDB on 2024-07-15 at 17:16

A few weeks ago I covered the basics of tuning checkpoints, and in that post I also mentioned autovacuum as the second common source of performance issues (based on what we see on the mailing list and at our customers). Let me follow-up on that with this post about how to tune autovacuum, to minimize the risk of performance issues. In this post I'll briefly explain why we even need autovacuum (dead rows, bloat and how autovacuum deals with it), and then move to the main focus of this blog post - tuning. I’ll go over all the relevant configuration options, and some basic rules for tuning them.

Ozgun Erdogan
Posted by Andreas 'ads' Scherbaum on 2024-07-15 at 13:15
PostgreSQL Person of the Week Interview with Ozgun Erdogan: I’m originally from Istanbul. After college, I moved to the Bay Area for grad school and then up to Seattle for my first programming gig at Amazon. After about four years, I wanted to learn more and co-founded Citus Data with two of the smartest guys I know.

How expensive is it to maintain extended statistics?
Posted by Andrei Lepikhov in Postgres Professional on 2024-07-14 at 23:24

In the previous post, I passionately advocated for integrating extended statistics and, moreover, creating them automatically. But what if it is too computationally demanding to keep statistics fresh?

This time, I will roll up my sleeves, get into the nitty-gritty and shed light on the burden extended statistics put on the digital shoulders of the database instance. Let's set aside the cost of using this type of statistics during planning and focus on one aspect - how much time we will spend in an ANALYZE command execution.

I understand how boring numbers look sometimes, as well as benchmarks. However, my vast experience in computational physics and analysing long listings full of numbers shows that it can be a fantastic source of inspiration.

So, let's start and create a test table:

DROP TABLE IF EXISTS bench;
CREATE TABLE bench (
  x1 integer, x2 integer, x3 integer, x4 integer,
  x5 integer, x6 integer, x7 integer, x8 integer
) WITH (autovacuum_enabled = false);
INSERT INTO bench (x1,x2,x3,x4,x5,x6,x7,x8) (
  SELECT x%11,x%13,x%17,x%23,x%29,x%31,x%37,x%41
  FROM generate_series(1,1E6) AS x
);

Why eight columns, you might wonder? - This deliberate choice is due to the hard limit of extended statistics - STATS_MAX_DIMENSIONS, which allows only eight columns or expressions in its definition clause.

Let me compare the performance of plain statistics with extended ones. I'll consider different variations of 'ndistinct', 'MCV' and 'dependencies' types. Additionally, I'll include a comparison with a statistic type called 'Joinsel', which builds and uses histograms, MCV, and distinct statistics over a predefined set of columns, treating them as a single value of a composite type. It can be found in standard and enterprise variants of the private Postgres Professional fork.

To measure execution time, use "\timing on". I'm going to observe how much time it takes if we build statistics over two, four and eight columns. I will take the surrogate test for plain statistics by creating the 'bench'

[...]

Use pg_easy_replicate for setting up Logical Replication and Switchover in PostgreSQL
Posted by Shayon Mukherjee on 2024-07-13 at 13:11
Logical replication is a powerful feature in PostgreSQL that allows for real-time data replication between databases. It can be used for performing major version upgrades using a blue/green setup where you have two databases, allowing you to test and switch over to a new version with minimal downtime. Logical replication can also be use to facilitate database migrations between different environments, using the same technique and tooling. In this post, I will describe the process of setting up simple replication and switchover between two databases using pg_easy_replicate.

SaaS on Rails on PostgreSQL — POSETTE 2024
Posted by Andrew Atkinson on 2024-07-13 at 00:00

In this talk attendees will learn how Ruby on Rails and PostgreSQL can be used to create scalable SaaS applications, focusing on schema and query design, and leveraging database capabilities.

We’ll define SaaS concepts, B2B, B2C, and multi-tenancy. Although Rails doesn’t natively support SaaS or multi-tenancy, solutions like Bullet Train and Jumpstart Rails can be used for common SaaS needs.

Next we’ll cover database designs from the Apartment and acts_as_tenant gems which support multi-tenancy concepts, then connect their design concepts to Citus’s row and schema sharding capabilities from version 12.0.

We’ll also cover PostgreSQL’s LIST partitioning and how to use it for efficient detachment of unneeded customer data.

We’ll cover the basics of leveraging Rails 6.1’s Horizontal Sharding for database-per-tenant designs.

Besides the benefits for each tool, limitations will be described so that attendees can make informed choices.

Attendees will leave with a broad survey of building multi-tenant SaaS applications, having reviewed application level designs and database designs, to help them put these into action in their own applications.

💻 Slide Deck

🎥 YouTube Recording

Checking Your Privileges, 2
Posted by Christophe Pettus in pgExperts on 2024-07-10 at 17:48

I turned the last blog post into a talk; you can get the slides here.

Guide to Auditing and Monitoring Access in PostgreSQL
Posted by Umair Shahid in Stormatics on 2024-07-10 at 08:52

In the data-driven world of today, maintaining the security and integrity of your database is paramount. Auditing and monitoring access to your database are critical components of an effective security strategy. These processes help ensure that only authorized users are accessing sensitive information and that any unauthorized access attempts are detected and addressed promptly.

PostgreSQL is renowned for its robustness, extensibility, and adherence to standards. It offers a rich set of features for auditing and monitoring, which can be leveraged to enhance security and compliance.

Purpose and Scope of the Blog:

This blog aims to provide an in-depth guide on auditing and monitoring access in PostgreSQL. We will explore various tools, techniques, and best practices to help you implement effective auditing and monitoring strategies in your PostgreSQL environment.

The post Guide to Auditing and Monitoring Access in PostgreSQL appeared first on Stormatics.

Top posters

Number of posts in the past two months

Top teams

Number of posts in the past two months

Feeds

Planet

  • Policy for being listed on Planet PostgreSQL.
  • Add your blog to Planet PostgreSQL.
  • List of all subscribed blogs.
  • Manage your registration.

Contact

Get in touch with the Planet PostgreSQL administrators at planet at postgresql.org.