The world's most advanced open source database
Top posters
Number of posts in the past two months
Top teams
Number of posts in the past two months
  • Policy for being listed on Planet PostgreSQL.
  • Add your blog to Planet PostgreSQL.
  • List of all subscribed blogs.
  • Manage your registration.
  • Get in touch with the Planet PostgreSQL administrators at planet at

 Query cache in Pgpool-II

Pgpool-II has built-in in memory query result cache which may dramatically enhance performance in certain use cases. If your workload is mostly read only SELECT and certain query is frequently used, then you might be able to enjoy the feature. When a SELECT comes to Pgpool-II, it saves the result of the query in shared memory or memcached (you can configure which one to be used). If same query, meaning the query string is  absolutely identical to  the previous query,  then Pgpool-II does not send the query to PostgreSQL. Instead, it fetches the query result from the shared memory (or memcached) and returns to client. Note that in order to this, the user and database must be also identical. In the process following tasks to generate query result are eliminated thus takes less time comparing with standard query processing:

  • Parsing query
  • Generating query plan
  • Executing the query plan
  • Network round-trip between Pgpool-II and PostgreSQL

Note , however, when a table is modified or dropped, the query cache corresponding to the table is removed. If this frequently happens, query cache may be slower than ordinal query processing because of its overhead. You can monitor the query cache hit ratio by using "SHOW pool_cache" command to make a decision  whether you should use the query cache feature or not.

Bottle neck in query cache

The query cache has one weakness until Pgpool-II 4.3. As you can easily guess, the shared memory is heavily accessed by concurrent users. In order to avoid the confusion, Pgpool-II uses strong lock which is called "exclusive" lock. Once a Pgpool-II process acquires the lock, other process have to wait until the process release the lock. As a result when there are many concurrent users, the query cache performance rapidly drops.

Pgpool-II 4.4 will mitigate the problem

Upcoming Pgpool-II 4.4, which is expected to be released by the end of this year, will use "shared lock" in certain cases.

There are two types of query cac

Posted by Lætitia AVROT in EDB on 2022-09-24 at 00:00
I was browsing the Postgres slack channel when I ran into this intriguing question from a member: How come my query SELECT TO_CHAR(POW(2,54), ‘9999999999999999999’) AS V0, TO_CHAR(POW(2,54)-1, ‘9999999999999999999’) AS V1; gives that result? v0 | v1 ------------------------------+---------------------- 18014398509481984 | 18014398509481984 Of course, I did some IT support, so my first reaction was to test by myself: laetitia=# select laetitia-# power(2,54) as v1, laetitia-# power(2,54)-1 as v2; v1 | v2 ------------------------+------------------------ 1.

Our waitlist has now been activated, as the conference is close to selling out.

We have seats left, and may get a couple of extra ones if outstanding invoices are not paid on time. However, we have no ability to further extend the total number of seats (as they are limited by the venue). So if you want a chance to get one of these seats, make sure you sign up as soon as possible! The waitlist ensures fair allocation of the remaining seats.

A PostGIS cluster is essentially a PostgreSQL cluster with a primary and an arbitrary number of standby instances where the running operand image contains all the required PostGIS related extensions. [Continue reading...]

Compared to PostgreSQL alone, TimescaleDB can dramatically improve query performance by 1000x or more, reduce storage utilization by 90%, and provide features essential for time-series and analytical applications. Some of these features even benefit non-time-series data–increasing query performance just by loading the extension.

PostgreSQL is today’s most advanced and most popular open-source relational database. We believe this as much today as we did 5 years ago, when we chose PostgreSQL as the foundation of TimescaleDB because of its longevity, extensibility, and rock-solid architecture.

By loading the TimescaleDB extension into a PostgreSQL database, you can effectively “supercharge” PostgreSQL, empowering it to excel for both time-series workloads and classic transactional ones.

This article highlights how TimescaleDB improves PostgreSQL query performance at scale, increases storage efficiency (thus lowering costs), and provides developers with the tools necessary for building modern, innovative, and cost-effective time-series applications – all while retaining access to the full Postgres feature-set and ecosystem.

(To show our work this article also presents the benchmarks that compare query performance and data ingestion for 1 billion rows of time-series data between PostgreSQL 14.4 and TimescaleDB 2.7.2.  For PostgreSQL, we benchmarked both using a single-table and declarative partitioning)

Better Performance at Scale

With orders of magnitude better performance at scale, TimescaleDB enables developers to build on top of PostgreSQL and “future-proof” their applications.

1000x Faster Performance for Time-series Queries

The core concept in TimescaleDB is the notion of the “hypertable”: seamless partitioning of data, while presenting the abstraction of a single, virtual table across all your data.

This partitioning enables faster queries by quickly excluding irrelevant data, as well as enabling enhancements to the query planner and execution process. In this way


PostgreSQL 15 introduces a new feature that allows specifying column lists in publications, to restrict the amount of data replicated. Let's see what are the advantages of this feature and how to use it.

I don’t know if you’re aware that PostgreSQL Conference Europe (in Berlin this year) will have a daycare option for children. It’s been published here. I’m really happy with this and I hope this will help more parents to attend the event. Please follow the instructions if you need to add this option to your registration for the event. What will children do? The professionals who will take care of the children suggested the following activities:
Posted by Craig Kerstiens in Crunchy Data on 2022-09-21 at 15:00

Postgres is an awesome database for a lot of reasons, put simply it's a really feature rich, powerful, and reliable database. One of those rich areas is the many built-in functions in Postgres.

Let's take a look at a few of my favorites.

Date functions

First, let's start with one of the most simple built-in ones but when doing things with time is quite handy.

Want to know what time it is according to your database?

SELECT now();

Want to know the biggest date in your data?

select max(day)
from births;

Now let's look at one that is a little bit more interesting. If you want to do a rollup by month, by week, by day, really any time interval you could do comparison of greater than or less than various months, or you could truncate the date. A simple example of this could be calculating your signups per week, where day is the data you have per row.

SELECT date_trunc('week', day),
FROM users

Shifting a little bit, let’s take a look at something even more complicated. Earlier this week I wanted to calculate what percentage of time remained in the month. For example if it was the very start of the day on the 15th of the month, in February we'd be 50% of the way through the month, but in January we'd only be 45% of the way through the month.

My table events has a column period which is a range type (a single datatype that has a from and a to with two timestamp values). On the query below we do a few things:

  1. We calculate two different intervals, to dissect the first:
  2. We get the start of the month lower(period) and increase it by 1 month so it's the exact end of the month
  3. Then we get the timestamp of how much time has elapsed into the month
  4. Then we subtract the two, leaving us with something similar to: 14 days 3 hrs 5 minutes
  5. The second interval we get as the full length of the month, so February 28 days or January 31 days
  6. We extract the epoch of the time interval EXTRACT ( epoch FROM intervalhere )
  7. And final
Around a month ago I wrote about new pretty-printer for SQL queries that I created. Today, figured I'll add command line tool for putting queries through it, to make my life a bit easier. Tool is named pg_prettify, and you can find in in my shell_utils repo. Example usage: =$ cat bad.sql SELECT n.nspname as … Continue reading "Prettify SQL queries from command line"
Posted by Stefan Fercot in EDB on 2022-09-20 at 16:00

With pgBackRest 2.41 just released, a new feature called backup annotations is now available. Let’s see in this blog post what this is about.

Backup annotations

This new feature adds an --annotation option to the backup command. We can now attach informative key/value pairs to the backup and the option may be used multiple times to attach multiple annotations.


$ pgbackrest backup --stanza=X --type=full \
  --annotation=comment="this is our initial backup" \
  --annotation=some-other-key="any text you'd like"

Internally, the initial annotation is stored in the shared backup information:

$ pgbackrest repo-get backup/X/ |grep annotation
..."backup-annotation":{"comment":"this is our initial backup","some-other-key":"any text you'd like"}...

That allows to show those annotations in the info command.

First, in the text output, for a specific backup --set:

$ pgbackrest info --stanza=X --set=20220920-140720F
full backup: 20220920-140720F
    timestamp start/stop: 2022-09-20 14:07:20 / 2022-09-20 14:07:24
    wal start/stop: 00000002000000000000002F / 00000002000000000000002F
    lsn start/stop: 0/2F000028 / 0/2F000100
    database size: 185.3MB, database backup size: 185.3MB
    repo1: backup set size: 13.7MB, backup size: 13.7MB
    database list: bench (16394), postgres (13631)
        comment: this is our initial backup
        some-other-key: any text you'd like

$ pgbackrest info --stanza=X --set=20220920-140740F
full backup: 20220920-140740F
    timestamp start/stop: 2022-09-20 14:07:40 / 2022-09-20 14:07:46
    wal start/stop: 000000020000000000000031 / 000000020000000000000031
    lsn start/stop: 0/31000028 / 0/31000138
    database size: 185.3MB, database backup size: 185.3MB
    repo1: backup set size: 13.7MB, backup size: 13.7MB
    database list: bench (16394), postgres (13631)
        comment: just another backup

Then also in the global json output:

$ pgbackrest info --stanza=X --output=json
Posted by Emil Shkolnik in Awide Labs on 2022-09-20 at 15:18

This topic describes how you can detect and solve PostgreSQL checkpoints ratio issues

What happened?

To detect checkpoints ratio issues, and have an ability to look in some historical metrics of this – you must have some kind of monitoring solution. Today, there is a rich selection of monitoring solutions – you can use any you already have, or install another one. Here we will discuss samples based on Awide management and monitoring solution.

This alert description includes the following terms: WAL and Checkpoint. You can click on links to see and familiarize with those terms.

Checkpoint operation forces a transaction log checkpoint, flushing all dirty data pages to disk and writing a special checkpoint record to the WAL file. It’s done to minimize the amount of WAL REDO in the process of crash recovery.

There is two checkpoints types:

  • Automatic (or scheduled) – It’s a desirable event type
  • Required – It’s a problematic event type

Often a checkpoint of required type can cause a significant I/O load, so the system measures the ratio of checkpoints.

If 50% of checkpoints in the measured period were required type – system will raise warning alert. If this value exceeds 75% the system will raise a problem alert.

Why did it happen?

Checkpoint tile located at right side of instance overview.

PostgreSQL - Checkpoints Ratio

PostgreSQL – Checkpoints Ratio Dashboard

Open the pop out window (click on three dots) and check the chart for the last 3 or 8 hours. Your system raised required checkpoints instead of automatic. So you will see most red line graph, like this:

PostgreSQL Checkpoints Ratio

PostgreSQL Checkpoints Ratio – Required Checkpoints

Correct checkpoint work should consist of automatic checkpoints representing with blue graph:

PostgreSQL Checkpoints Ratio

PostgreSQL Checkpoints Ratio – Scheduled Checkpoints

The alert reason is that the checkpoint chart should be like a blue, instead of the red above.

How to solve it?

Two followi

Posted by Akshay Joshi in EDB on 2022-09-20 at 09:33
In this blog we look at how server definitions can be imported and exported from pgAdmin using the command line or the graphical user interface. This allows server definitions to be migrated between pgAdmin installations and shared with colleagues.

Sequences are a core feature of SQL. However, some users are tempted to implement sequences to generate invoices. That’s dangerous and should be avoided. The core question is: Why? What’s the problem with using database-side sequences to put unique invoice numbers to send to clients? Let’s dive in and find out.

Getting started with CREATE SEQUENCE

Before we try to figure out the best way to handle invoice numbers it makes sense to take a look at the inner workings for CREATE SEQUENCE. The syntax specification is quite straightforward and allows us to achieve many different things:

Description: define a new sequence generator
[ IF NOT EXISTS ] name
[ AS data_type ]
[ INCREMENT [ BY ] increment ]
[ START [ WITH ] start ] [ CACHE cache ] [ [ NO ] CYCLE ]
[ OWNED BY { table_name.column_name | NONE } ]


For the sake of simplicity, I have created a basic sequence without any fancy parameters. Once this is done you can call the nextval function to increment the sequence and return the value:

test=# CREATE SEQUENCE seq_a;
test=# SELECT nextval('seq_a');
(1 row)

test=# SELECT nextval('seq_a');
(1 row)

What’s important to mention here is that the sequence generates an ever increasing number. If you want to figure out which value has already been used, you can use the currval function:

test=# SELECT currval('seq_a');
(1 row)

currval will ensure that the latest value produced by YOUR session is returned. It is NOT the last value issued by the sequence in your database – PostgreSQL guarantees that it is indeed the last value consumed by your session. This is important because we can rely on the fact that we will never get somebody else’s value.

The behavior just described bec

Posted by Egor Rogov in Postgres Professional on 2022-09-20 at 00:00

I’m excited to announce that the translation of Part III of the “PostgreSQL 14 Internals” book is finished. This part is about a diverse world of locks, which includes a variety of heavyweight locks used for all kinds of purposes, several types of locks on memory structures, row locks which are not exactly locks, and even predicate locks which are not locks at all.

Please download the book freely in PDF. We have two more parts to come, so stay tuned!

Your comments are much appreciated. Contact us at

Posted by Luca Ferrari on 2022-09-20 at 00:00

A new release of the PostgreSQL virtual environment manager.

pgenv 1.3.2 is out!

Today we released version 1.3.2 of pgenv, the binary manager for PostgreSQL.
This release fixes a quite subtle bug in the handling of the configuration that prevented custom settings to be correctly loaded back into the running system. Users are encouraged to upgrade as soon as possible.

A description of the problem

17bob17 noticed the issue: when you edited your configuration file, either the default or a per-version one, and changed settings in a (Bash) array, the configuration was not correctly loaded.
It took a lot of time to figure out that the problem was not directly in the way the configuration was loaded, rather in the way the configuration was stored.
When pgenv acquired the configuration settings as arrays, it started using declare -p as a way to print out a Bash compatible representation of the array, and such representation was stored in the configuration file. The problem was that declare -p assumes you want to use declare back when you re-evaluate the variable (array), and so placed a declare -a as the output.
The configuration is then loaded within the pgenv_configuration_load function, and declare run into a function has the same effect as local, that is it lexically scope the variables. Therefore, as soon as pgenv_configuration_load ends its job, the lexically scoped variables are gone and the old (previous) one are kept with their default value. It is a boring masquerading problem due to inner contexts.
One possible solution could have been to use -g as a flag to declare, so to force the variable to be global and therefore not lexically scoped, but such flag is not everywhere in different Bash versions and implementation.
The -x flag to declare, to export the variable, did not have any effect too.

Therefore, the current release removes the use of declare at all when the configuration is sourced back (loaded).

Citus is a distributed database that is built entirely as an open source PostgreSQL extension. In fact, you can install it in your PostgreSQL server without changing any PostgreSQL functionality. Citus simply gives PostgreSQL additional superpowers.

Being an extension also means we can keep adding new Postgres superpowers at a high pace. In the last release (11.0), we focused on giving you the ability to query from any node, opening up Citus for many new use cases, and we also made Citus fully open source. That means you can see everything we do on the Citus GitHub page (and star the repo if you’re a fan 😊). It also means that everyone can take advantage of shard rebalancing without write-downtime.

In the latest release (11.1), our Citus database team at Microsoft improved the application’s experience and avoided blocking writes during important operations like distributing tables and tenant isolation. These new capabilities built on the experience gained from developing the shard rebalancer, which uses logical replication to avoid blocking writes. In addition, we made the shard rebalancer faster and more user-friendly; also, we prepared for the upcoming PostgreSQL 15 release. This post gives you a quick tour of the major changes in Citus 11.1, including:

If you want to know the full details of what changes in Citus 11.1, check out our Updates page. Also check out the end of this blog post for details on the upcoming livestream of the Citus 11.1 release party which will include 3 live demos.

Distribute tables without interruption using create_distributed_table_concurrently

If there’s one thing you need to know when you want to start to scale out Postgres using Citus,

Posted by David Christensen in Crunchy Data on 2022-09-19 at 15:00

At Crunchy we talk a lot about memory, shared buffers, and cache hit ratios. Even our new playground tutorials can help users learn about memory usage. The gist of many of those conversations is that you want to have most of your frequently accessed data in the memory pool closest to the database, the shared buffer cache.

There's a lot more to the data flow of an application using Postgres than that. There could be application-level poolers and Redis caches in front of the database. Even on the database server, data exists at multiple layers, including the kernel and various on-disk caches. So for those of you that like to know the whole story, this post pulls together the full data flow for Postgres reads and writes, stem-to-stern.

Application Server

The application server sends queries to the individual PostgreSQL backend and gets the result set back. However there may in fact be multiple data layers at play here.

Application caching

Application caching can have many layers/places:

  • Browser-level caching: a previously-requested resource can be re-used by the client without needing to request a new copy from the application server.
  • Reverse proxy caches, i.e. Cloudflare, Nginx: a resource is requested by a user, but does not even need to hit an application server to return the result.
  • Individual per-worker process caches: Within specific application code, each backend could store some state to reduce querying against the database.
  • Framework-specific results or fragment caching: Whole parts of resources could be stored and returned piecemeal, or entire database result sets could be stored locally or in a shared resource outside of the database itself to obviate the need for accessing the database at all. This could be something like Redis or memcached to name a few examples.

Application connection pooler

When the application requests data that is not cached with one of the above methods, the application initiates an upstream connection to the database. Rather tha

Posted by Andreas 'ads' Scherbaum on 2022-09-19 at 14:00
PostgreSQL Person of the Week Interview with Peter Smith: My name is Peter Smith. Originally from New Zealand, I now live and work from home on the NSW Central Coast, just north of Sydney Australia. My employer is Fujitsu Australia Software Technology (FAST), and for the last couple of years I have been a member of Fujitsu’s PostgreSQL open-source team.

In my previous post I wrote about  how to configure and manage virtual IP for Pgpool-II on AWS. 

As I mentioned in my previous post, there are four methods to assign the virtual IP on AWS:

  • Assigning Elastic IP
  • Assigning secondary private IP
  • Updating route table
  • Updating Route53 record

This post describes how to use "updating route table" method to configure and manage virtual IP on AWS.


In this post, we create three Pgpool-II instances across multiple Availability Zones within one region.

We assume that application is running in the same VPC. The figure below shows the application architecture, with the application instance in a public subnet and the Pgpool-II instances in private subnets.



Before you start, ensure that:

  • You have created a PVC with one public subnet and three private subnets and each private subnet is in a different Availability Zone.
  • You have created four EC2 instances.
  • You have installed Pgpool-II in each Pgpool-II instance.
  • You have created a Unix user postgres in each Pgpool-II instance. In this tutorial we run Ppgool-II using Unix user postgres.
  • You have created a PostgreSQL cluster.
  • Determine an IPv4 address used for virtual IP which is used by the application to connect to Pgpool-II. This IP address should not belong to the IP addresses range for VPC.

In this post, we use the following configurations:

Application Instance Pgpool-II #1 Instance Pgpool-II #2 Instance Pgpool-II #3 Instance
Virtual IP
Posted by Denis Laxalde in Dalibo on 2022-09-19 at 05:00

Toulouse, 19 September 2022

Psycopg, the PostgreSQL database adapter for Python, recently added support for libpq pipeline mode thus bringing significant performance boost, especially when network latency is important. In this article, we’ll briefly describe how it works from users’ perspective and under the hood while also providing a few implementation details.

Logo de psycopg

Psycopg 3.1, released in late August, is the second major release of the Psycopg 3 project, which is a complete rewrite of the venerable psycopg2. Supporting libpq pipeline mode involved significant changes to the query processing logic in the driver. Yet, the challenge was to make it compatible with the “normal” query mode in order to keep the API almost unchanged and thus bring performance benefits to users without exposing the complexity of the batch query mode.

For the impatient, head out to the pipeline mode documentation of Psycopg: it’s self-consistent, explains nicely the details for client/server communication, as well as how things work from the user’s perspective.

Using the pipeline mode in Psycopg

Connection objects gained a pipeline() method to enable the pipeline mode through a context manager (with statement); so using it is as simple as:

conn = psycopg.connect()
with conn.pipeline():
   # do work

What is the pipeline mode for?

Postgres documentation contains advices on when the pipeline mode is useful. One particular case is when the application is doing many write operations (INSERT, UPDATE, DELETE).

For instance, let’s consider the following schema:

CREATE TABLE t (x numeric, d timestamp, p boolean)

and assume an application does a lot of queries like:

INSERT INTO t (x, d, p) VALUES ($1, $2, $3)

with distinct values, and with x possibly being a large integer (n!, n<1000). Maybe the application could make use of batch inserts such as executemany(), maybe not (e.g. because it needs to do some other operations between inserts, like querying another resource): this does not matt

Lately in couple of places I recommended people that they can solve their problem with queries using LATERAL. In some cases recipient of such suggestion indicated that they had no idea what LATERAL is. Which made me think that it might be good idea to write more about them (lateral queries)… Also – I know … Continue reading "What is LATERAL, what is it for, and how can one use it?"
Posted by Ryan Booz on 2022-09-15 at 13:30

Not quite a year ago, I had the opportunity to give a keynote address at PGConf NYC 2021 as part of Timescale’s sponsorship. Thankfully the leadership at Timescale supported me addressing the topic of community and how, particularly in light of the upward trend of PostgreSQL adoption, investing in the growing community can help everyone involved.

In that talk, I drew on a few of my experiences as a SQL Server developer, specifically within the PASS and #sqlfamily community. Although I understand that each community has its ethos, I wondered aloud if some of the initiatives that had drawn me further into the SQL Server community could provide ideas on ways to engage the growing PostgreSQL user base.

One of those initiatives was focused on the monthly T-SQL Tuesday blogging event started by community members more than a decade ago. The long-running event is simple.

Each month a community member volunteers to host the event. They select a topic and “invite” anyone to participate through a short blog post addressing the topic. Then, on the first Tuesday of each month, anyone in the community can post their own blog about the topic, linking back to the original invitation blog. The main purpose of the monthly event is to provide a safe, open opportunity for anyone to blog with a group of community members. For many people, the biggest hurdle to writing down and sharing their experience with a specific technology is selecting topics… and believing that anyone cares about what they have to say. T-SQL Tuesday has started many successful blogs over the years because someone had a good prompt and others received encouraging feedback about what they shared. 

As more people got involved, the community grew, spreading the knowledge and support of the community across a larger, more close-knit group.

I don’t know about you, but that sounds 🔥!

What does the PostgreSQL community think?

In the 2022 State of PostgreSQL survey, short and long-form blog posts were among the top three ways users prefer t

Posted by Luca Ferrari on 2022-09-15 at 00:00

A new release of the pgagroal connection pooler.

pgagroal 1.5.0 released!

[pgagroal]({:target=_blank} is a fast connection pooler for PostgreSQL, written in the C language.
A couple of weeks ago, a new release, the 1.5.0 was released. I’m writing about this just now because I was on holidays!
The new release brings a new set of features, in particular a lot of small checks within the configuration file setup (e.g., avoiding duplicated servers or wrong parameters) and a lot of new loggin capabilities, including log rotation and log line prefix.
Other areas of improvements include code clean-up, shell completion for command line tools, and portability towards FreeBSD and OpenBSD systems.
Last but not least, a new set of tutorials will help the newcomers to correctly start using pgagroal!

Posted by Emil Shkolnik in Awide Labs on 2022-09-13 at 14:45

This topic describes how you can detect and solve PostgreSQL memory availability issues.


What happened?

To detect memory availability issue on time, and have an ability to look in some historical metrics of this – you must have some kind of monitoring solution. Today, there is a rich selection of monitoring solutions – you can use any you already have, or install another one. Here we will discuss samples based on Awide management and monitoring solution.

The alert is based on the collected “Available memory” metric value and the specific thresholds this value can reach. Available memory is the amount of memory that is available for allocation to new or existing processes. This is the memory that can be given instantly to processes without the system going into swap.

This is calculated by summing different memory values depending on the platform and it is supposed to be used to monitor actual memory usage in a cross platform fashion. The metric is collected in bytes.

There are two alert’s levels:

  • Warning – on reaching by default 30% of the total device memory
  • Problem – on reaching by default 10% of the total device memory

The percentage is calculated by comparing the Available and Total memory metrics values. The platform will create a new alert only if the average value  reached remains in the defined threshold more than 300 seconds. The platform will automatically change the alert level, e.g. from Warning to Problem, in case the metric value reaches the next threshold value.

The platform will automatically close the alert in case the value reaches by default 40% of the total device memory and remains in the defined threshold for more than 300 seconds.

Why did it happen?

Memory can be used by processes requiring a lot of memory allocation.

Most popular reason for memory shortage is slow queries.

Another reason is an increase in the number of parallel server processes can lead to memory shortages.

How to solve it?

Two cases for solv

Posted by Robert Bernier in Percona on 2022-09-13 at 11:49
Working With Snapshots in PostgreSQL

One of the reasons I’ve never strayed far from PostgreSQL is that I am always discovering features that overcome all sorts of interesting problems. Although, to be honest, from time to time I’ve had the feeling that sometimes some features are a solution in search of a problem. Take, for example, exporting transaction snapshots

Originally implemented in version 9.2, snapshot exports allow multiple transactions to share identical views of the same database state. Back then I wasn’t certain how I could use it until pg version 9.3 came out presenting a true use-case when multi-threaded downloads were introduced to pg_dump.

Here’s a simple example; let’s start by creating two tables with lots of data:

    create table t1(
        c1 serial primary key,
        c2 text default lpad('',500,md5(random()::text)),
        c3 timestamp default clock_timestamp()

    create table t2(like t1 including all);

    insert into t1 select generate_series(1,5e6);
    insert into t2 select generate_series(1,5e6);

List of relations
 Schema | Name | Type  |  Owner   |  Size   | Description 
 public | t1   | table | postgres | 2791 MB | 
 public | t2   | table | postgres | 2791 MB |

TIP: Changing the arguments in the generate_series function will populate the tables that best fits your hardware’s capabilities.

The traditional invocation of a logical dump uses a single CPU/thread archiving the tables one at a time. The CLI time generates timing statistics returning the time it takes to perform the dump using the “compressed” format:

# execution time: 60 seconds
time pg_dump -Fc db01 > db01.db

real    1m0.322s
user    0m50.203s
sys     0m1.309s

An alternate invocation generating the dump uses the “directory” format:

# execution time: 52 seconds
time pg_dump -Fd db01 -f db01_dirA

real    0m52.304s
user    0m50.908s
sys     0m1.152s

Thus far the execution utilizes a single CPU. Now look at the execution time when


A join is a concept in IT which is widely used and often referred to but rarely really understood. What are the differences between inner joins, outer joins, semi joins and so on? Let’s shed some light on them and see how inner and outer joins really work.

Producing sample data

Before we can get started we need to create some sample data:

test=# CREATE TABLE a (aid int);
test=# CREATE TABLE b (bid int);
test=# INSERT INTO a
VALUES (1), (2), (3), (4);
test=# INSERT INTO b
VALUES (2), (3), (4), (4), (5);

The structure is actually quite simple. It consists of two tables containing a handful of rows which will be used for our little demo.

Explicit vs implicit joins

The question people often ask is: What is the difference between an implicit and an explicit join? The answer is simply syntactic sugar. Here is an implicit join:

test=# SELECT * FROM a, b WHERE aid = bid;
aid | bid
2 | 2
3 | 3
4 | 4
4 | 4
(4 rows)

In this the join actually happens in the WHERE-clause. In the FROM clause we only list the tables we want to join. The alternative is to use an explicit join:

test=# SELECT * FROM a JOIN b ON (aid = bid);
aid | bid
2 | 2
3 | 3
4 | 4
4 | 4
(4 rows)

Both syntax variations are identical. It is merely a question of taste and style that is relevant here. If you look at the execution plan you will see that both plans are absolutely identical.

test=# explain SELECT * FROM a, b WHERE aid = bid;
Merge Join (cost=393.42..893.85 rows=32512 width=8)
Merge Cond: (a.aid =
-> Sort (cost=179.78..186.16 rows=2550 width=4)
Sort Key: a.aid
-> Seq Scan on a (cost=0.00..35.50 rows=2550 width=4)
-> Sort (cost=179.78..186.16 rows=2550 width=4)
Sort Key:
-> Seq Scan on b (cost=0.00..35.50 rows=2550 width=4)
(8 rows)

Here is the second plan …

test=# explain SELECT * FROM a JOIN b ON (aid = bid);
Posted by Peter Eisentraut in EDB on 2022-09-13 at 04:00

A little while ago, we had a few PostgreSQL hackers in a room and someone oversaw me typing something like

git diff REL_14_STABLE...REL_15_STABLE

and they wondered, “oh, I didn’t know about three dots”. My flippant explanation was, “you use three dots when two dots don’t give you the right answer”.

But let’s unpack this.


git diff REL_14_STABLE REL_15_STABLE

gives you the complete difference between (the tip of) PostgreSQL 14 and PostgreSQL 15. This will be a huge diff.

This diff does not include changes that were made on both REL_14_STABLE and REL_15_STABLE. For example, b9b21acc766db54d8c337d508d0fe2f5bf2daab0 was backpatched to both branches, so it doesn’t show up in the above diff at all.

This is almost never what you want when looking at diverging stable branches like this. It might make sense to look at the cross-version diff of a particular file, perhaps, like this:

git diff REL_14_STABLE REL_15_STABLE -- src/test/regress/parallel_schedule

But the full diff is usually not useful, at least for manual inspection.

Then there is

git diff REL_14_STABLE..REL_15_STABLE

This is exactly the same as the above command without the dots. The two forms are equivalent.

So now let’s look at the three dots:

git diff REL_14_STABLE...REL_15_STABLE

The git-diff documentation says that this is equivalent to git diff $(git merge-base REL_14_STABLE REL_15_STABLE) REL_15_STABLE. The result of that git merge-base is:

$ git merge-base REL_14_STABLE REL_15_STABLE

So the original command is equivalent to

git diff e1c1c30f635390b6a3ae4993e8cac213a33e6e3f REL_15_STABLE

So what is the point of that? Note that the commit immediately after e1c1c30f635390b6a3ae4993e8cac213a33e6e3f on branch REL_15_STABLE is

commit 596b5af1d3675b58d4018acd64217e2f627da3e4
Author: Andrew Dunstan 
Date:   Mon Jun 28 17:31:16 2021

Stamp HEAD as 15devel.

Let the hacking begin ...

So e1c1c30f635390

Posted by Frits Hoogland in Yugabyte on 2022-09-12 at 19:15

Understanding postgres performance fundamentally relies on the communication between the client and the database side process called 'backend' in postgresql. I created a small utility to profile the postgres database connection network communication: connection-profiler

The first question to answer is why create a connection-profiler tool while there is pgbench. The answer to that is that pgbench is for running benchmarks, which means run SQL against postgres, and not so much for examining individual requests. So it serves a different function than the connection-profiler. It is important to use the right tool for the job!

The connection-profiler takes a connection URL, such as host= port=5433 sslmode=disable user=yugabyte password=yugabyte, and connects to the database using it, and optionally can execute a SQL statement.

It allows to specify the connection protocol (simple, extended or prepared). The simple setting uses the postgres simple query protocol, the extended setting uses the postgres extended protocol, and the prepared setting uses the postgres extended protocol and the usage of a prepared statement.

For a single execution of SQL, which connection-profiler by default does, using a prepared statement does not make sense: the purpose of a prepared statement is to have a statement in parsed state so that the parse step (which includes the syntactic and semantic parsing and rewriting) can be skipped after the prepared statement is created (and thus parsed). It additionally enables the postgres planner to decide to use a generic plan (which requires at least 5 executions of the default "custom plan").

So why provide this option? This is because the purpose of the tool is to measure what is happening inside the connection, and using a prepared statement is a distinct pattern.

Optionally, connection-profiler can be configured to execute the specified SQL multiple times inside the same connection, or terminate the connection and connect again to execute the specified SQ

Posted by Jelte Fennema in CitusData on 2022-09-12 at 15:41

A few months ago we made Citus fully open source. This was a very exciting milestone for all of us on the Citus database engine team. Contrary to folks who say that Postgres is a monolith that can’t scale—Postgres in fact has a fully open source solution for distributed scale, one that’s also native to Postgres. It’s called Citus! This post will go into more detail on why we open sourced our few remaining enterprise features in Citus 11, what exactly we open sourced, and finally what it took to actually open source our code. If you’re more interested in the code instead, you can find it in our GitHub repo (feel free to give the Citus project a star.)

Why make the final pieces of Citus open source now?

One of the reasons we open sourced the last few enterprise features in Citus 11 is that our business model has changed. When Citus was first started back in ~2011, the business model consisted of selling enterprise licenses and support contracts. In 2016 we un-forked Citus from Postgres and open sourced the bulk of Citus. After doing that, we differentiated our enterprise-licensed software from the open source version by including a few extra closed source enterprise features. But over time our business model has moved away from selling enterprise licenses.

Currently our business model revolves around our managed service: Azure Database for PostgreSQL - Hyperscale (Citus). As you might imagine, our managed service builds on top of Citus open source by adding in the “managed” features, aka the features that save you time and make it so you no longer have to worry about your database.

With the Azure service, you can create and scale a Postgres cluster with the click of a button; your Postgres settings are already tuned to get extra performance out of the hardware; you get automatic backups, from which you can restore with ease; and if a node in your cluster crashes, you automatically failover to another one assuming you enabled High Availability (HA). You’ll also get easy integrations with other Azu


Ha ha, made you look!

This post is not a crazy scam (you be the judge) but just a practical description of using cryptographical algorithms to encrypt and decrypt data inside PostgreSQL.

Encryption in Crunchy Bridge

There's already a lot of encryption in Crunchy Bridge!

First, your data are "encrypted at rest". That means that the "volumes" (what in an earlier era would be called the disk drives) your data is saved to are encrypted. Also all the backup files generated by your server are encrypted. In practice, this means that it is not possible for someone to backdoor access to your data by mounting the volumes or reading the backups.

Second, your connections to the database are encrypted. Transport Layer Security ("TLS", what used to be called "SSL") ensures that all the traffic between your database server and you client software is also encrypted.

What does all this churning of bytes achieve? If we didn't do it, it would mean we'd have to trust that nobody could ever get access to the data volumes or backup files. Those items are all protected by the cloud provider login rules, so it would not be easy for someone to do, but it's not impossible, and there's one entity that can always get access to the volumes and the backups: the cloud provider itself.

Encrypting your data at rest is a way of ensuring that your system administrator (or someone who has hacked your system administrator) cannot directly read your raw data.


We aren't going to be talking about volume encryption or connection encryption, but encryption of the data stored inside the database tables, with pgcrypto.

Why use pgcrypto?

Maybe don't! It is just a lot simpler to not encrypt things. It adds complexity and moving parts. Don't do it unless you need it.

Why might you need it?

Just as volume encryption and network encryption are useful if you want your data to remain private even if someone obtains access to the underlying storage, encryption of data in tables is useful if you w