The world's most advanced open source database
Top posters
Number of posts in the past two months
Top teams
Number of posts in the past two months
  • Policy for being listed on Planet PostgreSQL.
  • Add your blog to Planet PostgreSQL.
  • List of all subscribed blogs.
  • Manage your registration.
  • Get in touch with the Planet PostgreSQL administrators at planet at

In PostgreSQL and many other relational databases, constraints are an integral part of the feature set. Many people are aware of primary keys, foreign keys, CHECK-constraints, table constraints and so on. However, from time to time, the situation is way more complicated. That’s when some more advanced techniques are needed to enforce integrity in the way it is desired by end users.
In this post, we will elaborate on restrictions spanning more than one row. What does that mean? Let us envision a simple scenario: We want to store information about airplane ownership. In aviation, it happens more often than not that a single plane has more than one owner. In this example, we want to make sure that the ownership parts actually add up to 100% of the whole. Similar challenges can be found over and over again in real business applications.

Creating sample data

Let’s first create some sample data, and see how we can actually solve the problem of ensuring that the ownership always has to end up at 100% for a specific airplane:

    id     int    UNIQUE,
    call_sign    text    NOT NULL UNIQUE

    plane_id    int    REFERENCES    t_plane (id)
    owner     text,
    fraction    numeric

INSERT INTO t_plane (id, call_sign)
VALUES     (1, 'D-EHWI'),
           (2, 'ES-TEEM'),
           (3, 'D-ONUT');

In this case, we’ve got two tables: The t_plane table contains a unique ID and the call sign of the plane. The call sign is similar to a license plate – it identifies the aircraft. “D” indicates Germany, OE means Austria and “N” would indicate that we are dealing with a US-registered aircraft.

As you can see, we have used a 1 : N relationship here. One plane can have many owners. The trouble is, if we add up the owners’ percentages of a plane, we always have to end up with 100%. The nasty part is concurrency. What if many people make changes at the same time? Let’s take a look at the basic problem:


At a company we have literally thousands of Pg servers. The layout is also kinda non-obvious. Each database is named the same, but contains different data. And in front of it all, we have pgbouncers. After some talk, it was suggested that perhaps we could make psql prompt show which database it is connected to. … Continue reading "A tale of making company-wide standard psqlrc"
Posted by Kat Batuigas in Crunchy Data on 2021-06-22 at 19:39
In the last several months, we've featured simple yet powerful tools for optimizing PostgreSQL queries . We've walked through how the pg_stat_statements extension can show which queries are taking up the most time to run system-wide . We've also looked at how to use the EXPLAIN command to uncover query plans for individual queries.
You can get a lot out of those two, but you may have also wondered, "What about logs? Surely I can use Postgres' logs to help me find and track slow queries too?" Today we're going to take a look at a useful setting for your Postgres logs to help identify performance issues. We'll take a walk through integrating a third-party logging service such as LogDNA with Crunchy Bridge PostgreSQL and setting up logging so you're ready to start monitoring and watching for performance issues .
Posted by Laurenz Albe in Cybertec on 2021-06-22 at 08:00

scram-sha-256 rejects a client connection
© Laurenz Albe 2021

Since v10, PostgreSQL has provided support for scram-sha-256 for password hashing and authentication. This article describes how you can adapt your application safely.

Why do we need scram-sha-256?

PostgreSQL uses cryptographic hashing for two purposes:

  • The actual database password is a hash of the clear text password entered by the user. This prevents a thief from using a stolen password on other systems.
  • During password authentication, the client has to hash the (hashed) password with a random salt provided by the server. The password check is successful if the server receives the correct hashed response from the client.

Now, the MD5 hashing method has weaknesses that make it unsuitable for cryptography. In particular, it is too easy to construct a string with a given MD5 hash. These shortcomings do not apply to the way PostgreSQL uses MD5, but it still makes sense to use a better hashing algorithm:

  • an expensive hash function makes brute force password attacks more difficult
  • during a security audit, it looks better if PostgreSQL doesn’t use a hash function with weaknesses

Hence the introduction of scram-sha-256 support in v10. If you can, start using the new hashing method. The increased difficulty of brute force password attacks makes it worth the effort.

Problems with switching over to scram-sha-256

There are two problems that make it hard to switch over from MD5 to scram-sha-256:

  • Since PostgreSQL does not know the original clear text password, the user has to set the password again, after you change the password encryption method to scram-sha-256.
  • The PostgreSQL client has to support scram-sha-256 authentication, so authentication with older client software will fail.

The error message you get with an old version of libpq when you attempt to connect to a server that requires scram-sha-256 authentication is:

authentication method 10 not supported

An old JDBC driver will tell you:

The authentication type 10 is
Posted by Bruce Momjian in EDB on 2021-06-21 at 17:15

Vacuum is a routine database maintenance tasks that is handled manually or by autovacuum. Over the years, people have wondered if there is a way to eliminate the vacuum requirement.

First, since autovacuum is enabled by default, most sites don't even need to know that vacuum is running, and that is a key feature — vacuum tasks always run in the background, not as part of foreground queries like INSERT or UPDATE. (There are a few cleanup operations that are so inexpensive that they are even performed by SELECT queries.) By having vacuum tasks run in the background, foreground queries are almost never affected by vacuum, except perhaps by vacuum I/O requirements. Second, vacuum improves code simplicity by centralizing vacuum tasks in a single set of processes. Third, since Postgres does not have an undo segment, there is no centralized place where changes must be recorded, improving database concurrency.

So, we keep improving vacuum, its performance, and its minimal interference with foreground queries, but it hard to see how it can be fundamentally improved without adding many negatives.

Rocky Linux 8.4 is released today. I already added support and wrote down instructions about how to install PostgreSQL on Rocky Linux 8, and now it is time for a short blog post about migrating from CentOS 8 to Rocky Linux 8. Please note that if you are using CentOS 8 Stream, this blog post is not suitable for you.

Please note that these steps are experimental for now -- use these instructions at your own risk. It worked for me, though.

Continue reading "How to migrate from CentOS 8 to Rocky Linux 8 (experimental!)"
Posted by Andreas 'ads' Scherbaum on 2021-06-21 at 14:00
PostgreSQL Person of the Week Interview with Federico Campoli: Despite I’m looking younger, I’ve been around this planet for almost 49 years. My hometown is Napoli, Italy, I moved to Tuscany to start my first job as ASP developer on MS-SQL Server in 1999.
Posted by Robert Haas in EDB on 2021-06-21 at 13:00

Professor Andy Pavlo, at CMU, seems to be a regular organizer of technical talks about databases; this year, he organized the vaccination database tech talks, and invited me to give one about the PostgreSQL query optimizer. So I did. It was great. There were a few PostgreSQL community members present, but more importantly, a bunch of smart people who know a lot about other database systems showed up to the talk, including Andy Pavlo himself, and I got some feedback on where PostgreSQL could perhaps be improved.  Here are the highlights, with links to the relevant portion of the YouTube video.

Read more »
Back in 2013 I wrote a series of 5 posts about how to read explain analyze output. Figured that there is one big part missing – buffers info. You don't see this part in normal explain analyze, you have to specifically enable it: explain (analyze on, buffers on). Well, technically you don't need analyze part, … Continue reading "Explaining the unexplainable – part 6: buffers"
Posted by Andrew Dunstan in EDB on 2021-06-19 at 20:53

Recent discussion set me thinking about what would be involved in setting up standalone TAP tests for Postgres, to do, say, more code coverage than we get from the current core tests. So I started experimenting. And with a tiny bit of Makefile magic it turns out to be absurdly easy.

You need just two things: an entirely boilerplate Makefile and a tiny bit of glue at the top of your TAP test files.

First let's look at the Makefile. Here it is in its entirety:


PG_CONFIG = pg_config
PGXS := $(shell $(PG_CONFIG) --pgxs)
include $(PGXS)

# install_path for PostgreSQL binaries etc
PG_INSTALL_ROOT := $(dir $(shell $(PG_CONFIG) --bindir))

# where to find PostgresNode and friends
PG_NODE_LOC = $(top_srcdir)/src/test/perl
export PG_NODE_LOC

Then in your TAP test perl file(s) (which should reside in a subdirectory called "t") you put this:

use lib "$ENV{PG_NODE_LOC}";

use PostgresNode;
use Test::More;

local $ENV{PG_REGRESS} = "/bin/true";

my $node_inst = $ENV{PG_INSTALL_ROOT};

# for pre--release-14 releases you would possibly set LD_LIBRARY_PATH
# based on this. For release 14 and up PostgresNode does that for you
# via the install_path parameter.

my $node = PostgresNode->get_new_node('dummy', install_path => $node_inst);


That's all you need. Given that you can run your TAP tests with just a Postgres installation, as in this example:

andrew@emma:tests $ make PG_CONFIG=../inst.head.5701/bin/pg_config installcheck
rm -rf '/home/andrew/pgl/tests'/tmp_check
/usr/bin/mkdir -p '/home/andrew/pgl/tests'/tmp_check
cd ./ && TESTDIR='/home/andrew/pgl/tests' PATH="/home/andrew/pgl/inst.head.5701/bin:$PATH" PGPORT='65701' \
top_builddir='/home/andrew/pgl/tests//home/andrew/pgl/inst.head.5701/lib/postgresql/pgxs/src/makefiles/../..' \
PG_REGRESS='/home/andrew/pgl/tests//home/andrew/pgl/inst.head.5701/lib/postgresql/pgxs/src/makefiles/../../src/test/regress/pg_regress' \
REGRESS_SHLIB='/src/test/regress/' \
/usr/bin/prove -I /home/andrew/pgl/inst.head.5701/lib/postgresql/pgxs

When it comes to monitoring in PostgreSQL, progress reports, able to give the state of an operation at a given point in time, exist since 9.6 and pg_stat_process_vacuum for VACUUM. PostgreSQL 14 is adding a new feature in this area with progress reporting for COPY, as of this commit:

commit: 8a4f618e7ae3cb11b0b37d0f06f05c8ff905833f
author: Tomas Vondra 
date: Wed, 6 Jan 2021 21:46:26 +0100
Report progress of COPY commands

This commit introduces a view pg_stat_progress_copy, reporting progress
of COPY commands.  This allows rough estimates how far a running COPY
progressed, with the caveat that the total number of bytes may not be
available in some cases (e.g. when the input comes from the client).

Author: Josef Šimánek
Reviewed-by: Fujii Masao, Bharath Rupireddy, Vignesh C, Matthias van de Meent

This was the initial commit of the feature, and it got improved in a second commit to have more information.

COPY can be long, very long depending on the amount of data to load with users having no idea how the operation is going to last. So more monitoring capabilities in this area is welcome. The state of operations can be tracked in a new system view called pg_stat_progress_copy that returns one row per backend running a COPY. Several fields are tracked in that:

  • The PID of the backend running the operation.
  • The type of operation: COPY FROM, TO
  • The relation operated on, or just 0 if using a SELECT with COPY FROM.
  • The amount of data processed, thanks to the size of the rows aggregated each time a tuple is processed.
  • The number of tuples processed, or even skipped as an effect of a WHERE clause specified in COPY FROM.
  • The total amount of data from the original file of a COPY FROM. This is not known if the data is provided through a pipe like stdin or ps
Posted by Tatsuo Ishii in SRA OSS, Inc. on 2021-06-18 at 11:43


image by Gerd Altmann from Pixabay

Promoting a standby node

 Pgpool-II manages a streaming replication primary node and multiple standby nodes. Suppose we shutdown the primary node. Pgpool-II detects the event and starts a command called "failover command". The failover command is a user supplied script (usually written in shell or other scripting language), and it choose one of standby nodes to be promoted. Typical script chooses the "next main node", which is the next live node to the former primary node: e.g. if the former primary node is 0, and there are node 1 and 2 standby nodes, node 1 will chosen.

Another way to promote a standby node is to use "pcp_detach_node". The command is one of the control commands for Pgpool-II. pcp_detach_node can "detach" a node so that Pgpool-II  changes the internal status of the specified node to be down (but the node is actually up and running). In this case the failover command is also called and the next node - in the example above node 1 - will be chosen.

Promoting specified node

What if we want to promote node 2, instead of node 1? Actually there had been no way to achieve this until today. The next version of Pgpool-II 4.3, supposed to be released in this winter will allow you do it.

Existing pcp_promote_node command now has extra argument: --switchover or -s.

pcp_promote node --switchover 2

will do followings:

  1.  detach the current primary node. This makes the current primary node to be in "down" status (but the node is actually up and running).
  2.  failover command is triggered. Node 2 is passed to the failover command as the "new main node".
  3.  failover command promotes node 2.
  4.  "follow primary command" runs against node 0 and node 1. Follow primary command is a user supplied script to make a node to be standby node, "following" current primary.
  5. As a result, node 0, 1 are standby following new primary node 2.


Pgpool-II 4.3 will have one of the long awaited features: promoting

Posted by Greg Sabino Mullane in Crunchy Data on 2021-06-17 at 19:36

Data checksums are a great feature in PostgreSQL. They are used to detect any corruption of the data that Postgres stores on disk. Every system we develop at Crunchy Data has this feature enabled by default. It's not only Postgres itself that can make use of these checksums. Some third party tools such as the awesome pgBackRest program can also use them for enhanced data integrity.

Sadly, enabling data checksums is still not the default behavior when creating a new Postgres cluster. When you invoke the initdb program, add the ‑‑data‑checksums flag (or ‑k if you prefer obscure arguments), and your shiny new Postgres cluster will have data checksums enabled.

Contributing to open-source projects can be intimidating – and PostgreSQL is no exception. As a Postgres contributor, I share my hard-earned tips to help you make your first contribution (or contribute more).

The article is published in Timescale blog:

How (and why) to become a PostgreSQL contributor

Posted by Alexey Lesovsky in Data Egret on 2021-06-17 at 10:45

Since the last pgCenter release I’ve been working on some new improvements and now it’s time to introduce you to pgCenter 0.9.0.

Here are a few key features and fixes introduced in this release:

  • session statistics from pg_stat_database (Postgres 14);
  • WAL usage statistics from pg_stat_wal (Postgres 14);
  • progress statistics about running COPY commands from pg_stat_progress_copy (Postgres 14);
  • filesystem stats for top utility;
  • extended statistics about tables sizes;
  • support for millisecond resolution for record and report utilities;
  • naming convention for columns names.

The above items and other changes are explained in more detail in the release notes.

As always, if you have any issues or suggestions write in comments below or open discussions.

The post Upgrade: pgCenter 0.9.0 is out! appeared first on Data Egret.

A “materialized view” is a database object which stores the result of a precalculated database query and makes it easy to refresh this result as needed. Materialized views are an integral feature of pretty much all advanced database systems. Naturally, PostgreSQL also provides support for materialized views, and offers the end-user a powerful tool to handle more time-consuming requests.

The main questions are now: What are the pitfalls, and how can you make use of materialized views in the first place? Let’s dive in and find out.

Creating a materialized view

Before we can actually take a look at materialized views and figure out how they work, we have to import some sample data which we can use as the basis for our calculations:

demo=# CREATE TABLE t_demo (grp int, data numeric);
demo=# INSERT INTO t_demo SELECT 1, random()
FROM generate_series(1, 5000000);
INSERT 0 5000000
demo=# INSERT INTO t_demo SELECT 2, random()
FROM generate_series(1, 5000000);
INSERT 0 5000000

We have created 10 million rows organized in 2 groups. To create a materialized view in PostgreSQL, we can make use of the following syntax specification:

Description: define a new materialized view
     [ (column_name [, ...] ) ]
     [ USING method ]
     [ WITH ( storage_parameter [= value] [, ... ] ) ]
     [ TABLESPACE tablespace_name ]
     AS query
     [ WITH [ NO ] DATA ]

Basically, a materialized view has a name, some parameters, and is based on a query. Here is an example:

           SELECT   grp, avg(data), count(*)
           FROM     t_demo
           GROUP BY 1;

What is really important to note here is the size of the materialized view compared to the underlying table:

demo=# \d+
List of relations
Schema | Name     | Type              | Owner | Persistence | Size   | Description
Posted by Egor Rogov in Postgres Professional on 2021-06-17 at 00:00

The previous two series of articles covered isolation and multiversion concurrency control and logging.

In this series, we will discuss locks.

This series will consist of four articles:

  1. Relation-level locks (this article).
  2. Row-level locks.
  3. Locks on other objects and predicate locks.
  4. Locks in RAM.

The material of all the articles is based on training courses on administration that Pavel Luzanov and I are creating (mostly in Russian, although one course is available in English), but does not repeat them verbatim and is intended for careful reading and self-experimenting.

Many thanks to Elena Indrupskaya for the translation of these articles into English.

General information on locks

PostgreSQL has a wide variety of techniques that serve to lock something (or are at least called so). Therefore, I will first explain in the most general terms why locks are needed at all, what kinds of them are available and how they differ from one another. Then we will figure out what of this variety is used in PostgreSQL and only after that we will start discussing different kinds of locks in detail.


Posted by Bruce Momjian in EDB on 2021-06-16 at 14:45

Having worked with databases for 32 years, I have always lived in a world of locks, sharing, and complexity, but sometimes, I look out the window and wonder, "Why does everything have to be so hard?" After a while, I just give up figuring it out and get back work.

However, a few months ago I watched a video that gave me a new perspective on that question. The video was "The Forgotten Art of Structured Programming" by Kevlin Henney, and was given on a C++-themed cruise. (Postgres needs one of those someday. ) Anyway, the interesting part starts at 1:22:00, and this slide holds the key to the complexity I could not previously express.

Kevlin explains that database programming is hard because shared, mutable state is hard. If data is not shared, or not immutable, things are much easier. This means that no matter what new magical tool or language is created, database programming will continue be hard, and I just have to accept that. One comforting discovery is that someone else had the same reaction to that slide, which makes me feel better.

Posted by Gilles Darold in MigOps on 2021-06-16 at 01:43

One of the common problems while migrating from Oracle to PostgreSQL is the need of Global Temporary Tables in PostgreSQL (GTT). There is no PostgreSQL equivalent for global temporary tables. Due to this reason, I have originally started the developed of an extension called : PGTT, to reduce the overall efforts involved while migrating from Oracle to PostgreSQL. In this article, I am going to discuss about the concept of Global temporary tables in PostgreSQL and how this extension : pgtt works followed by its limitations.

Concept of a Global Temporary Table (GTT)

The rows registered into a global temporary table are private to the session that inserts them, which means that they are only visible to that session but Postgres Global Temporary Tablesnot any other sessions. These rows can be preserved for the session time or only for the current transaction. This is defined at temporary table creation time using the clauses – ON COMMIT PRESERVE ROWS or ON COMMIT DELETE ROWS. If rows are removed at commit or session close time, the Global Temporary table persists among the sessions or DBMS shutdown. In this way, it is like any other standard table, just that it resides in the temporary tablespace and can be accessed by all sessions individually. You can see a short description of the Oracle implementation here

PostgreSQL has the concept of Local Temporary Table but not a Global temporary table. The temporary table must be created by each session that uses it and the lifetime of the rows are either commit or session based exactly like it is with the GTT. PostgreSQL has also an additional clause ON COMMIT DROP that automatically drops the temporary table at the end of a transaction. The temporary table is always dropped when the session ends. Oracle, since 18c, has something quite similar which is called Private Temporary Tables, see here for a short explanation. 

There is some work in progress to implement the Global Temporary Table in PostgreSQL core, see [1], [2] and [3].

How to deal with GTT in a migration ?

While wait

Posted by Denish Patel on 2021-06-15 at 15:14

On a particular project, Materialized Views (MVs) were being used quite extensively in the Apps that were being migrated from Oracle to PostgreSQL.  One thing I noticed was missing was VACUUM ANLYZE on these MVs after being created or refreshed.  It is extremely important to integrate VACUUM ANALYZE commands into this process.  Here is the reasoning behind it.

1. ANALYZE – updates the statistics for the MV so that the PG Optimizer can make the best decision possible for access path determination for SQL statements running against these MVs.

2. VACUUM – creates/updates the PG Visibility Map (VM) so that index-only scans can be used.  Furthermore, if MV refreshes are done using the CONCURRENTLY option, which avoids locking the table in exclusive mode, then PG uses DML statements to update the original MV, thus causing bloat.  Bloat can only be alleviated by a normal vacuum operation which makes dead rows visible again for reuse.

Michael Vitale, PG DBA

It's interesting to know how big is your jsonb and it's not trivial as it looks, since jsonb is a binary format and also may be TOASTed (compressed).

Assume, that jb - is an attribute of type jsonb, than

raw_size = pg_column_size(jb::text::jsonb) -- the size of jsonb in memory
compressed_size = pg_column_size(jb) -- stored size of jsonb (raw_size if not TOAST-ed and non-compressed)
Posted by Laurenz Albe in Cybertec on 2021-06-15 at 09:00

CROSS JOINS - promiscuity extreme!
© Laurenz Albe 2021

For many people, “cross join” is something to be afraid of. They remember the time when they forgot the join condition and the DBA was angry, because the query hogged the CPU and filled the disk. However, there are valid use cases for cross joins which I want to explore in this article.

What is a cross join?

The term comes from relational algebra, which also calls the Cartesian product between two relations cross product and denotes it by A × B. This is the most basic kind of join: it combines every row of one table with every row of another table.

A simple example:

Table A Table B
name birthday street city
John 1989-08-24 Roman Road Kexborough
Paul 2001-10-03 Earl Street Fitzwilliam
Claude 1996-04-18

The cross product of the tables would be:

A × B
name birthday street city
John 1989-08-24 Roman Road Kexborough
Paul 2001-10-03 Roman Road Kexborough
Claude 1996-04-18 Roman Road Kexborough

Part of the release engineering process at Swarm64 is to run performance benchmarks that assure us new versions of Swarm64 DA and PG Nitrous cause PostgreSQL performance to improve and not regress.

We run TPC-H and TPC-DS benchmarks to measure query-intensive analytic (OLAP) performance. For mixed workloads such as hybrid analytic transaction processing (HTAP) or transaction-enhanced analytics (TEA), we’ve developed a new open source benchmark that I’ll describe below.

All of our benchmarks are open source and available in the Swarm64 DA benchmark toolkit on Github.

What is HTAP and why does it matter to PostgreSQL users?

An HTAP database system is able to support transaction processing (OLTP) while simultaneously answering complex queries of the data. HTAP is easier and less costly than running two copies of the database (e.g., a system of record vs. a system of insight), to support OLTP vs. OLAP workloads, respectively. HTAP can also enable queries to return more real-time answers by eliminating the time it takes to replicate or ETL data from one database to another.

The need for benchmarks specific to mixed workloads arises as OLTP and OLAP pose different demands on database architectures, and the combination of both is particularly challenging. This is because a mixed workload is either primarily OLTP (but also running analytical queries), or it’s primarily OLAP (but also executing OLTP updates to the data and maintaining ACID properties). These are called HTAP and TEA, respectively. Since Swarm64 DA users run both HTAP or TEA, we need a way to benchmark Swarm64 DA-accelerated PostgreSQL for both of these scenarios.

Existing HTAP benchmarks

Two HTAP benchmark implementations stand out: CH-benCHmark and HTAPBench. Both, as well as the Swarm64 HTAP benchmark, have a transactional workload based on the TPC-C benchmark, and an analytical workload based on the TPC-H benchmark.

These HTAP benchmark implementations use the same hybrid schema, containing all TPC-C relations along wi

Posted by Andreas 'ads' Scherbaum on 2021-06-14 at 14:00
PostgreSQL Person of the Week Interview with Josh Berkus: I’m from America. I mean, seriously, I grew up in DC, Ohio, Florida, Texas, and California, and now I live in Oregon. So really all over the USA.

I mentioned the problem of safety systems causing safety failures last year. There is a corollary to that that I want to mention related to planned vs unplanned database downtime.

Databases are critical parts of most information technology infrastructure, so downtime is a big concern. There are a few cases where you know your Postgres database will need to be down, if only briefly:

  • Operating system upgrades
  • Minor database upgrades
  • Major database upgrades
  • Application upgrades

You can reduce the downtime duration of these operations by using replica servers, but there is still the complexity of session migration during switchover. Multi-master replication can simplify application switchover, but it can also have a higher probability of unplanned downtime if its complex systems fail.

Continue Reading »

Relational databases started as simple data containers with a relational structure. Over the decades, SQL matured into a language that allows complex processing inside relational databases. The typical life-cycle is that once a client-side feature become well-understood and established, it often moves into a relational database for efficiency and consistency among applications. This has happened for data warehouse workloads (slide 30), full text search, geographic information systems (GIS), non-relational workloads, and JSON. Odds are this migration will continue. Artificial intelligence might be the next area of integration.

Posted by Michael Aboagye on 2021-06-11 at 17:01

Learn how to grant or revoke rows privileges in PostgreSQL. Continue reading Row-level Security Policy in PostgreSQL

The post Row-level Security Policy in PostgreSQL appeared first on Vettabase.

Posted by Greg Smith in Crunchy Data on 2021-06-11 at 17:00

By default Linux uses a controversial (for databases) memory extension feature calledovercommit. How that interacts with PostgreSQL is covered in the Managing Kernel Resources section of the PG manual.

PostgreSQL HA with Patroni

A couple of weeks ago, Jobin and I did a short presentation during Percona Live Online bearing a similar title as the one for this post: “PostgreSQL HA With Patroni: Looking at Failure Scenarios and How the Cluster Recovers From Them”. We deployed a 3-node PostgreSQL environment with some recycled hardware we had lying around and set ourselves at “breaking” it in different ways: by unplugging network and power cables, killing main processes, attempting to saturate processors. All of this while continuously writing and reading data from PostgreSQL. The idea was to see how Patroni would handle the failures and manage the cluster to continue delivering service. It was a fun demo!

We promised a follow-up post explaining how we set up the environment, so you could give it a try yourselves, and this is it. We hope you also have fun attempting to reproduce our small experiment, but mostly that you use it as an opportunity to learn how a PostgreSQL HA environment managed by Patroni works in practice: there is nothing like a hands-on lab for this!

Initial Setup

We recycled three 10-year old Intel Atom mini-computers for our experiment but you could use some virtual machines instead: even though you will miss the excitement of unplugging real cables, this can still be simulated with a VM. We installed the server version of Ubuntu 20.04 and configured them to know “each other” by hostname; here’s how the hosts file of the first node looked like:

$ cat /etc/hosts localhost node1 node1 node2 node3


Patroni supports a myriad of systems for Distribution Configuration Store but etcd remains a popular choice. We installed the version available from the Ubuntu repository on all three nodes:

sudo apt-get install etcd

It is necessary to initialize the etcd cluster from one of the nodes and we did that from node1 using the following configuration file:

$ cat /etc/default/etcd
Posted by Weaponry Weaponry on 2021-06-11 at 04:34

New pgSCV 0.6.0 has been released, with two new features and with minor fixes and improvements.

pgSCV is a Prometheus exporter and monitoring agent for PostgreSQL environment. Project’s goal is to provide a single tool (exporter) for collecting metrics about PostgreSQL and related services.

Global filters. Sometimes you may want to filter some of exposed metrics. For example, metrics about particular users, databases, tables or whatever else. Using filters it is possible to define ‘include’ or ‘exclude’ filtering rules for metric collectors. A tiny example:

exclude: "^.*_(test|staging)$"

In this example, all metrics collected by “postgres/statements” collector with databases which match with specified regular expression, will not be exposed.

Environment variables. The usual way to configure pgSCV is a YAML configuration which is stored in file. Such approach might tricky when using pgSCV with containers — in such case, volumes are required to use. Environment variables allow to configure pgSCV and avoid YAML configs and volumes.

DATABASE_DSN="postgresql://postgres@db/postgres" pgscv

In this example, Postgres connection settings are passed with DATABASE_DSN variable. It is also possible to specify more than one DSN and collect metrics from many Postgres instances. The most of configuration settings could be defined with environment variables.

For more information and examples checkout settings reference and docker tutorial.

This release make pgSCV is more configurable and container-friendly, I hope you will find it useful.