PostgreSQL
The world's most advanced open source database
Top posters
Number of posts in the past two months
Top teams
Number of posts in the past two months
Feeds
Planet
  • Policy for being listed on Planet PostgreSQL.
  • Add your blog to Planet PostgreSQL.
  • List of all subscribed blogs.
  • Manage your registration.
Contact
  • Get in touch with the Planet PostgreSQL administrators at planet at postgresql.org.

Relational databases started as simple data containers with a relational structure. Over the decades, SQL matured into a language that allows complex processing inside relational databases. The typical life-cycle is that once a client-side feature become well-understood and established, it often moves into a relational database for efficiency and consistency among applications. This has happened for data warehouse workloads (slide 30), full text search, geographic information systems (GIS), non-relational workloads, and JSON. Odds are this migration will continue. Artificial intelligence might be the next area of integration.

Posted by Greg Smith in Crunchy Data on 2021-06-11 at 17:00

By default Linux uses a controversial (for databases) memory extension feature calledovercommit. How that interacts with PostgreSQL is covered in the Managing Kernel Resources section of the PG manual.

PostgreSQL HA with Patroni

A couple of weeks ago, Jobin and I did a short presentation during Percona Live Online bearing a similar title as the one for this post: “PostgreSQL HA With Patroni: Looking at Failure Scenarios and How the Cluster Recovers From Them”. We deployed a 3-node PostgreSQL environment with some recycled hardware we had lying around and set ourselves at “breaking” it in different ways: by unplugging network and power cables, killing main processes, attempting to saturate processors. All of this while continuously writing and reading data from PostgreSQL. The idea was to see how Patroni would handle the failures and manage the cluster to continue delivering service. It was a fun demo!

We promised a follow-up post explaining how we set up the environment, so you could give it a try yourselves, and this is it. We hope you also have fun attempting to reproduce our small experiment, but mostly that you use it as an opportunity to learn how a PostgreSQL HA environment managed by Patroni works in practice: there is nothing like a hands-on lab for this!

Initial Setup

We recycled three 10-year old Intel Atom mini-computers for our experiment but you could use some virtual machines instead: even though you will miss the excitement of unplugging real cables, this can still be simulated with a VM. We installed the server version of Ubuntu 20.04 and configured them to know “each other” by hostname; here’s how the hosts file of the first node looked like:

$ cat /etc/hosts
127.0.0.1 localhost node1
192.168.1.11 node1
192.168.1.12 node2
192.168.1.13 node3

etcd

Patroni supports a myriad of systems for Distribution Configuration Store but etcd remains a popular choice. We installed the version available from the Ubuntu repository on all three nodes:

sudo apt-get install etcd

It is necessary to initialize the etcd cluster from one of the nodes and we did that from node1 using the following configuration file:

$ cat /etc/default/etcd
ETCD_NAME=node1
ETCD_INITIAL_CLUSTER="node1=http://192.168.1.11:2380"
ETCD_INI
[...]
Posted by Weaponry Weaponry on 2021-06-11 at 04:34

New pgSCV 0.6.0 has been released, with two new features and with minor fixes and improvements.

pgSCV is a Prometheus exporter and monitoring agent for PostgreSQL environment. Project’s goal is to provide a single tool (exporter) for collecting metrics about PostgreSQL and related services.

Global filters. Sometimes you may want to filter some of exposed metrics. For example, metrics about particular users, databases, tables or whatever else. Using filters it is possible to define ‘include’ or ‘exclude’ filtering rules for metric collectors. A tiny example:

collectors:
postgres/statements:
filters:
database:
exclude: "^.*_(test|staging)$"

In this example, all metrics collected by “postgres/statements” collector with databases which match with specified regular expression, will not be exposed.

Environment variables. The usual way to configure pgSCV is a YAML configuration which is stored in file. Such approach might tricky when using pgSCV with containers — in such case, volumes are required to use. Environment variables allow to configure pgSCV and avoid YAML configs and volumes.

DATABASE_DSN="postgresql://postgres@db/postgres" pgscv

In this example, Postgres connection settings are passed with DATABASE_DSN variable. It is also possible to specify more than one DSN and collect metrics from many Postgres instances. The most of configuration settings could be defined with environment variables.

For more information and examples checkout settings reference and docker tutorial.

This release make pgSCV is more configurable and container-friendly, I hope you will find it useful.

Posted by Jędrzej Biedrzycki on 2021-06-11 at 03:55

0. Problem statement

One of the most missing features in Postgres in my opinion was always a possibility to ignore nulls in analytic functions. Other databases support ignore nulls clause which makes the task much easier. Let’s start by creating a simple, two-column table and populating it with data:

create table tbl (
    id bigint primary key,
    a int
);
insert into tbl (id, a)
select s, case when random() < .5 then s end
from generate_series(0, 100) s;

The problem is to generate a column (lnn – last not null) with the last non-null value ordered by id:

id a lln
0
1
2 2 2
3 2
4 2
5 2
6 6 6
7 7 7
8 7
9 9 9
Table with nullable column and a column with gaps filled with last not null value

1. How we would do it in Oracle

As mentioned, in Oracle the solution is straightforward:

select id, a, last_value(a) ignore nulls over (order by id) lnn
from
[...]
Posted by Luca Ferrari on 2021-06-11 at 00:00

How I implemented a kind of Oracle-to-PostgreSQL backup.

Using ora2pg to do a kind of backup

Disclaimer: ora2pg is an amazing tool, but is not supposed to be used as a backup tool!
In this article I’m going to show you how I decided to implement a kind of Oracle-to-PostgreSQL backup by means of ora2pg.

It all started as a simple need: migrate an Oracle database to PostgreSQL to do some experiments.
Therefore I fired up an ora2pg project, and started from there in order to do the migration.
End of the story.
But then I was asked to migrate again the same database, because in the meantime something changed.
And then again, and again.
I’m not saying I was asked to keep the database synchronized, but to sometime load an updated amount of data (and structures) from Oracle to PostgreSQL.
As lazy as I am, after a couple of request I was producing a simple shell script to automate the job, at least about running ora2pg. Yes, this could be less trivial than you think, since ora2pg relies on the Oracle instaclient to be installed (with all the environment set), and Perl to be ready with all the DBD::Oracle, DBI and other stuff in the right place. And this is a little complicated on my machines because I tend to experiment, and so I have a lot of different stuff installed, so I have to fire up the right Perl, with the right modules, and the right environment (I do use perlbrew, in the case you are wondering). In other words, there was some setup work necessary before I could run ora2pg, and that was a perfect candidate for a real shell script.
Then, the number of the databases to do this work on became two, and this was a call for a parametric script…you get the point!
Last but not least, I was not sure about when the migration would happen and when I was asked to load a new bunch of stuff into PostgreSQL, and since my memory is lazier than me, I not always do remember all the required steps to load the extracted part of ora2pg into our beloved database.
And therefore I decided to write a

[...]

PostgreSQL contains some hidden gems which have been around for many years and help to silently speed up your queries. They optimize your SQL statements in a clever and totally transparent way. One of those hidden gems is the ability to synchronize sequential scans. Actually, this feature has been around for 15+ years, but has gone mostly unnoticed by many end-users. However, if you are running data warehouses and analytical workloads, you might have already used synchronized seq scans without actually knowing it.

Reducing I/O for large SQL queries

Before we draw any conclusions, it is necessary to understand the problem we’re trying to solve in the first place. Consider the following scenario: 10 users are concurrently running analytics on a large table. For the sake of simplicity, we assume that the size of the table is 1 TB and we can read 1 GB / second. In case there is just 1 user, we can get read data in around 17 minutes (assuming there is zero caching). But what if people are reading the table concurrently and if there is still zero caching, because those  reads are spread all over the PostgreSQL table? In that case, we face a problem:

Number of users MB / second Time
1 1000 16.6 minutes
2 500 33.2 minutes
4 250 67 minutes
10 100 166 minutes

 

Assuming that the overall throughput is a constant (which of course it is not – especially not on spinning disks) we really have a big problem.

Queries take longer the more users are added to the warehouse!

What is important to understand is that queries take longer and longer as more users are added to your warehouse.

Synchronized sequential scans in PostgreSQL

Synchronized sequential scans have been around for

[...]
Posted by Joshua Drake in CommandPrompt on 2021-06-09 at 19:39

In many Open Source communities it is difficult to consider who a contributor is. Some projects take an exclusive view, requiring a direct contribution to be made to be considered a contributor. Looking at this holistically, we find that the success of a project is found only when there is a mutual connection between the hands-on team and those who support it.Without that connection, PostgreSQL would just be a fever dream of academic pursuit.

Where would Postgres be without the tireless contributions made by volunteers, meetup organizers, the diligent chat and email forum support, the attendance taker at a conference, or free technical blog writers? Every one of these people are important and deserve to be acknowledged. While they may not be on the front lines so to speak, they are building the community in the most critical way: by using and being excited about Postgres. With this in mind I am recognizing the following individuals as Postgres Contributors:

Grant Zhou

A Director of the Chinese PostgreSQL Association

Lloyd Albin

One of the kindest men I have ever met, he organizes Seattle Postgres and has for over a decade. He has also assisted in bug reporting to Postgresql.org, assists in organizing PostgresConf.org, and has been a great resource for users.

Taha Kadado

The founder and organizer of Dallas - Fort Worth Postgres.

Viral Shah

An Organizer of PostgresConf.org and consummate professional behind the scenes.

Debra Cerda

Goat wrangler and Austin Postgres organizer. Debra has been a behind the scenes Postgres advocate and resource for the user community for years.

Amanda M. Nystrom

A PostgresWarrior that only other conference organizers will understand. She is a co-chair of PostgresConference and manages the relationships with PostgresConf.org sponsors.

Mason Sharp

A long time community member and founder of NYC Postgres.

Ryan Lambert

One of the most energetic members of the community I have met and an expert with PostGIS. He has provid

[...]
Posted by Bruce Momjian in EDB on 2021-06-09 at 18:30

Postgres history goes back 35 years, and it is always interesting for me to hear details about the early years and the project decisions that were made. Fortunately, Joseph M. Hellerstein, the author of GiST, has written and released a public copy of his recollections, which are part of a larger book about Michael Stonebraker. This quote summarizes the paper:

The signature theme of Postgres was the introduction of what he eventually called Object-Relational database features: support for object-oriented programming ideas within the data model and declarative query language of a database system. But Stonebraker also decided to pursue a number of other technical challenges in Postgres that were independent of object-oriented support, including active database rules, versioned data, tertiary storage, and parallelism.

Hellerstein explains how each of these features developed during the Berkeley years, and often adds how they were later enhanced by the PostgreSQL project. He closes by analyzing the reasons for the success of the PostgreSQL project and its impact on the database industry.

I can talk about the benefits of PostgreSQL for application development and operations all day. But there two enduring topics that are close to my heart: SCRAM (you need to update your passwords to use SCRAM) and range types.

I've been stoked about range types since they were released in PostgreSQL 9.2. Before I joined Crunchy Data, I deployed them heavily in production to help manage a mature scheduling and space booking application. Performance wise, range types provided a fast way to perform sophisticated lookups over temporal data sets!

With all the power and flexibility of range types, there was an unfortunate limitation: working with non overlapping (or noncontiguous) ranges. Now don't get me wrong, it afforded me the opportunity to write some very advanced recursive SQL and build some cool real-time data management systems. However, this was at the cost of convenience and maintainability.

Almost a decade after range types were first introduced, PostgreSQL 14 now adds the ability to write some "boring SQL" to further unlock the power of working with range data. Meet the "multi range" data type.

PostgreSQL 14: Déjà Vu All Over Again

Posted by Vik Fearing in EDB on 2021-06-09 at 17:45
Continuing my monthly PostgreSQL benchmarks series, these latest findings are aimed at helping developers improve PostgreSQL performance stability. I discovered the performance stability issue when I had to reset the AWS instance I used for the Daily 500 benchmarks and lost 20% performance. The only explanation I have is that the reset instance physically moved in the data center. From an ongoing benchmarking perspective, this is unacceptable. [Continue reading...]
Posted by Robert Treat in credativ on 2021-06-09 at 13:06

In the recent Postgres Community Survey by Timescale, nearly 50% of users reported that they started using Postgres within the last 5 years, including almost 20% within just the last 2 years. With the pandemic and subsequent lock-downs wiping out so many local user groups and in-person conferences, being able to get help and network with other Postgres users online has never been more important.

Of course the Postgres website lists several community resources which I would encourage you to check out, but given the recent kerfuffle with the freenode irc community I thought it might be good to highlight some additional options. Luckily there are a bunch of them out there, and they are all free to join. The following list is not exhaustive by any means, but these are the regular postgres gatherings that I visit at least occasionally and I think you’ll be able to get something out of as well.

PostgresTeam.Slack.com - Ok, this one is listed on the community page, but since I use slack regularly for work, the Postgres slack team has become my daily driver. The #general channel serves as the primary spot for general Q&A, but there are also topic specific channels like #pgbackrest and #patroni, not to mention general information channels like #postgres-job-offers and #pgsql-commits. So far we’ve had over 10,000 people sign-up for the Postgres slack, and the community continues to grow at a steady pace. If you’ve not used slack before, one of the nice things about it is it has excellent clients for the desktop, through your favorite web browser, and even on mobile; just in case you need to get your postgres fix on the go. (What? I can’t be the only one!)

Stackoverflow - Well, technically https://dba.stackexchange.com, but in any case, your favorite technical question / answer site has a site dedicated to databases. While I have a lot of concerns about the recent purchase, I don't see anything that comes close to an alternative given what it offers. One of the nice things about stackexchange is that it works

[...]

Real life is running in autocommit mode
© Laurenz Albe 2021

When analyzing customer’s problems, I have seen the mess you can get into if you disable autocommit in your interactive client, so I’d like to bring this topic to a wider audience.

What is autocommit?

In PostgreSQL, like in any other ACID-complicant database, each statement runs in a transaction: if the statement causes an error, PostgreSQL undoes all its effects.

You can start transactions explicitly using the START TRANSACTION statement (or BEGIN in PostgreSQL). However, if you don’t do that, different RDBMS behave differently:

  • some, like Oracle, implicitly start a transaction when you issue the first statement, and you have to run a COMMIT statement to end the transaction
  • others, like PostgreSQL, run each statement in its own transaction until you explicitly start a transaction

The latter mode is called autocommit mode.

PostgreSQL and autocommit

PostgreSQL operates in autocommit mode, and there is no way to change that behavior on the server side.

Back in 2002, release 7.3 introduced a database parameter, autocommit, that allowed users to turn autocommit mode off. This proved to be a bad mistake: now the effects of SQL statements varied depending on that database setting, which led to all kinds of confusion and problems. Consequently, release 7.4 in 2003 removed that feature again.

Instead, PostgreSQL now implements autocommit mode on the client side. Almost all database clients and APIs offer a way to turn autocommit off. A few examples:

  • in psql, you can run “\set AUTOCOMMIT off
  • with JDBC, you can call java.sql.Connection.setAutoCommit(boolean)
  • in psycopg2, you can call connection.set_session(autocommit=True)
  • with pgAdmin 4, you can click the “down” arrow next to the execute icon of pgAdmin icon in the query tool to turn off autocommit
  • with DBeaver, you can click the DBeaver autocommit icon icon in the SQL editor to disable autocommit

Note that in the above list, all clients except for psycopg2 run in autocommit mode by default.

The way that cli

[...]
Some time ago Eugen Konkov mailed me that he'd like to have some changes on explain.depesz.com. One of the changes was actual bug, but the rest were improvements to functionality. I kinda didn't want to do it, but when I looked closer it appeared to me that there are some subtle bugs, and when I'll … Continue reading "Many changes on explain.depesz.com"
Posted by Dimitri Fontaine in CitusData on 2021-06-08 at 11:40
The MACI French podcast honoured me with an invitation to a guest appearance on their weekly schedule. As you can imagine, we talked about many things related to PostgreSQL… and also reacted to some newsworthy articles carefully curated by the MACI team. One of the topics we discussed in the podcast started with looking at PostgreSQL through the angle of it being one of the microservices that your application would be composed of.
Posted by Luca Ferrari on 2021-06-08 at 00:00

PostgreSQL relies on the concept of template databases to create a new one.

Template Databases

PostgreSQL relies on the concept of template as a way to create a new database. The idea is similar to the one of the /etc/skel for Unix operating systems: whenever you create a new user, its own home directory is cloned from the /etc/skel. In PostgreSQL the idea is similar: whenever you create a new database, that is cloned from a template one.

PostgreSQL ships with two template database, namely template1 and template0.
template1 is the default database, meaning that when you execute a CREATE DATABASE the system will clone such database as the new one. In other words:



CREATE DATABASE foo;



is the same as

CREATE DATABASE foo WITH TEMPLATE template1;



One advantage of this technique is that whatever object you put into the tempalte1, you will find into the new database(s). This could be handy when having to face multiple database with similar or identical objects, but can be a nightmare if you screw up your template database.
Then there is template0, that is used as a backup for template1 (in the case you screw up) or as a special templating database for handling particular situations like different encoding.

Working with different templates

You can create your own template database, that you can then use as a base to create other database:



emplate1=# CREATE DATABASE my_template WITH
           IS_TEMPLATE = true;
CREATE DATABASE

template1=# CREATE DATABASE a_new_database
            WITH TEMPLATE my_template;
CREATE DATABASE



Having templates is handy, however is not mandatory to exploit a template to build a new database. Change the previous template so that it is no more a template database and then build another database:



template1=# ALTER DATABASE my_template
            WITH IS_TEMPLATE = false;
ALTER DATABASE

template1=# CREATE DATABASE a_new_database_from_no_template
            WITH TEMPLATE my_template;
CR
[...]
Posted by Bruce Momjian in EDB on 2021-06-07 at 18:30

Postgres is a convenient platform to manage and manipulate date and timestamp information, and included in that is the ability to manage time zones. Postgres can be compiled to use pre-installed time zone definitions, or --with-system-tzdata can be used to specify an external time zone definition directory.

Once installed, the system view pg_timezone_names can report the supported time zones:

SELECT * FROM pg_timezone_names ORDER BY name LIMIT 5;
        name        | abbrev | utc_offset | is_dst
--------------------+--------+------------+--------
 Africa/Abidjan     | GMT    | 00:00:00   | f
 Africa/Accra       | GMT    | 00:00:00   | f
 Africa/Addis_Ababa | EAT    | 03:00:00   | f
 Africa/Algiers     | CET    | 01:00:00   | f
 Africa/Asmara      | EAT    | 03:00:00   | f

Continue Reading »

Posted by David Christensen in Crunchy Data on 2021-06-07 at 17:12

We spend time day in, day out, answering the questions that matter and coming up with solutions that make the most sense. However, sometimes a question comes up that is just so darn…interesting that even if there are sensible solutions or workarounds, it still seems like a challenge just to take the request literally. Thus was born this blog series, Devious SQL.

Posted by Andreas 'ads' Scherbaum on 2021-06-07 at 14:00
PostgreSQL Person of the Week Interview with Tom Kincaid: I live outside Boston MA. I was born here and spent the majority of my life around here.
Posted by Weaponry Weaponry on 2021-06-07 at 09:12

A week ago I announced pgSCV — a new metrics exporter for PostgreSQL. After that, some people asked me about dashboards — it will be nice to provide dashboards for pgSCV. I had a task to make dashboards in my todo list, and the request for dashboards didn’t surprise me. This is obvious, pgCSV provides a lot of metrics, and even seasoned DBAs need a lot of time to understand all of that and then make a dashboard.

I’ve made and published dashboards which cover the most metrics from pgSCV, but not the all. In this post I am going to make a quick review and tell what is inside the dashboards.

I would like to emphasize, these are initial versions of dashboards, they will be extended or changed in the future. Now there are three dashboards, because pgSCV can expose metrics about PostgreSQL, Pgbouncer and operating system. In next releases of pgSCV it is planned to collect metrics about other Postgres-related tools, like pgBackrest, Patroni and similar, hence the list of dashboards will also be extended.

I would like to emphasize, these are initial versions of dashboards, they will be extended or changed in the future. Now there are three dashboards, because pgSCV can expose metrics about PostgreSQL, Pgbouncer and operating system. In next releases of pgSCV it is planned to collect metrics about other Postgres-related tools, like pgBackrest, Patroni and similar, hence the list of dashboards will also be extended.

As it follows from title, all dashboards are made for Grafana. I am quite familiar with Grafana, and IMHO this is a well known and widely used tool for visualizing metrics. All created dashboards use Prometheus data source. But, to be honest, I made them using the Victoriametrics storage, but all queries are written using PromQL dialect. Perhaps, later I will make extra dashboards with MetricsQL.

Currently, three dashboards are available:
- PostgreSQL
- Pgbouncer
System

What is in PostgreSQL dashboard?

Overview — this is a panel with RED metrics — requests, errors, duratio

[...]
Posted by Michael Paquier on 2021-06-06 at 06:42

PostgreSQL has gained two features that help in getting information about the memory usage of sessions. First, as of this commit, there is one new system view that reports the memory usage for a session:

commit: 3e98c0bafb28de87ae095b341687dc082371af54
author: Fujii Masao 
date: Wed, 19 Aug 2020 15:34:43 +0900
Add pg_backend_memory_contexts system view.

This view displays the usages of all the memory contexts of the server
process attached to the current session. This information is useful to
investigate the cause of backend-local memory bloat.

[...]

Author: Atsushi Torikoshi, Fujii Masao
Reviewed-by: Tatsuhito Kasahara, Andres Freund, Daniel Gustafsson, Robert Haas, Michael Paquier
Discussion: https://postgr.es/m/72a656e0f71d0860161e0b3f67e4d771@oss.nttdata.com

This view, called pg_backend_memory_contexts, will display the current memory usage of the session attempting to access to this view. The implementation of such a feature is possible out-of-core, as one has access to the low-level APIs and structures for memory structure, mainly via src/include/nodes/memnodes.h, but it is nice to get access to this facility without a dependency to an external module.

The logic starts from the TopMemoryContext and cascades down to each child memory context, calling on the way a set of callbacks to grab all the statistics for each memory context to print one tuple per memory context. There are two extra things to track down the pyramidal structure of the contexts: a depth level of each child and the name of the parent context. Getting the memory statistics was also possible with a debugger after logging into the host but that can prove to be annoying in our cloud-ish days where logging into the host is not possible. Aggregating this data is an nice bonus as well here.

Note that by default access to this view is restricted to superusers, but it can be granted to other roles. One area where this is useful is the possibility to track the amount of cache used by the session currently

[...]
Posted by Andrew Dunstan in EDB on 2021-06-05 at 16:01

In keeping with modern git practice, I have renamed the default branch on both the client-code and server-code repositories to main. If you have a clone of either of these repositories, you can adapt to the change by doing the following commands in your local repo:

git checkout master
git branch -m main
git fetch
git branch --unset-upstream
git branch -u origin/main
git symbolic-ref refs/remotes/origin/HEAD refs/remotes/origin/main
git fetch -p

There is still a bunch of work to do to ensure that things run smoothly when source repositories such as The PostgreSQL Git repository make this change, as will surely happen at some stage. I will be testing this on a small source repository I control - the plan is that when it happens it will be transparent to buildfarm animals.

Posted by Ryan Lambert on 2021-06-05 at 05:01

This post examines at a common database design decision involving the choice of using BIGINT versus INT data types. You may already know that the BIGINT data type uses twice the storage on disk (8 bytes per value) compared to the INT data type (4 bytes per value). Knowing this, a common decision is to use INT wherever possible, only resorting to using BIGINT when it was obvious* that the column will be storing values greater than 2.147 Billion (the max of INT).

That's what I did too, until 2-3 years ago! I started changing my default mindset to using BIGINT over INT, reversing my long-held habit. This post explains why I default to using BIGINT and examine the performance impacts of the decision.

TLDR;

As I conclude at the end:

The tests I ran here show that a production-scale database with properly sized hardware can handle that slight overhead with no problem.

Why default to BIGINT?

The main reason to default to BIGINT is to avoid INT to BIGINT migrations. The need to do an INT to BIGINT migration comes up at the least opportune time and the task is time consuming. This type of migration typically involves at least one column used as a PRIMARY KEY and that is often used elsewhere as a FOREIGN KEY on other table(s) that must also be migrated.

In the spirit of defensive database design, BIGINT is the safest choice. Remember the *obvious part mentioned above? Planning and estimating is a difficult topic and people (myself included) get it wrong all the time! Yes, there is overhead for using BIGINT, but I believe the overhead associated with the extra 4 bytes is trivial for the majority of production databases.

Posted by Bruce Momjian in EDB on 2021-06-04 at 16:15

I wrote a blog entry in 2018 about how to sign data rows so their authenticity can be verified by anyone using a public key. In talking to Chapman Flack, he mentioned that another option is to store data signatures in the database even if the data is stored outside the database. This allows the database to serve as a central location to verify the authenticity of externally-stored data. This is particularly useful if you trust your database storage but not your external storage.

Posted by Stefan Fercot in EDB on 2021-06-03 at 00:00

I got the question the other day from a friend: “does pgBackRest work on SUSE?”. Having to admit I never really used SUSE, and not knowing what to answer, I decided to give it a try. Let’s see in this short post how far we can go.


Vagrant box

The openSUSE project provides several vagrant boxes including Leap 15.2:

$ vagrant init opensuse/Leap-15.2.x86_64
$ vagrant up
$ vagrant ssh

vagrant@localhost:~> cat /etc/os-release 
NAME="openSUSE Leap"
VERSION="15.2"

PostgreSQL

Thanks to the packaging team, we can use the zypper package manager to install PostgreSQL and/or other components.

In case you’d use SLES (aka. SUSE Linux Enterprise Server), some package dependencies requiring access to specific repositories not hosted at postgresql.org might be needed.

Let’s now install PostgreSQL 13 on the vagrant box:

root# zypper addrepo https://download.postgresql.org/pub/repos/zypp/repo/pgdg-sles-15-pg13.repo
root# zypper refresh
root# zypper install postgresql13-server
root# export PGSETUP_INITDB_OPTIONS="--data-checksums"
root# /usr/pgsql-13/bin/postgresql-13-setup initdb
root# systemctl enable postgresql-13
root# systemctl start postgresql-13

Check if the cluster is running:

postgres$ ps -o pid,cmd fx
  PID CMD
 5359 ps -o pid,cmd fx
 5194 /usr/pgsql-13/bin/postmaster -D /var/lib/pgsql/13/data/
 5195  \_ postgres: logger 
 5197  \_ postgres: checkpointer 
 5198  \_ postgres: background writer 
 5199  \_ postgres: walwriter 
 5200  \_ postgres: autovacuum launcher 
 5201  \_ postgres: stats collector 
 5202  \_ postgres: logical replication launcher

pgBackRest packages

The currently latest pgBackRest release is available in the PostgreSQL community repository:

root# zypper install pgbackrest
postgres$ pgbackrest version
pgBackRest 2.33

Configuration

We’ll now prepare the configuration for our demo stanza:

# /etc/pgbackrest.conf
[global]
repo1-path=/var/lib/pgbackrest
repo1-retention-full=1
process-max=2
log-level-console=info
log-level-file=debug
sta
[...]
Posted by Luca Ferrari on 2021-06-03 at 00:00

pgbackrest has been inserted into the FreeBSD ports!

pgbackrest lands on FreeBSD!

At last it happened: pgbackrest, my favourite backup solution fo PostgreSQL is now available in the FreeBSD ports tree, my favourite operating system!

Thanks to the efforts of people involved in this issue it is now possible to get pgbackrest installed easily (or in a simpler way) on FreeBSD!

Did you know PostgreSQL ships with a pre-built trigger function that can speed up UPDATES?

PostgreSQL Builtin Trigger Function to Speed Up Updates

PostgreSQL ships with an internal trigger function, named suppress_redundant_updates_trigger that can be used to avoid idempotent updates on a table.
The online documentation explains very well how to use it, including the fact that the trigger should be fire as last in a trigger chain, and so the trigger name should be alphabetically the last one in natural sorting.
But is it worth using such function?
Let’s find out wth a very trivial example on well known pgbench database. First of all, let’s consider the initial setup:



pgbench=> SELECT count(*), 
          pg_size_pretty( pg_relation_size( 'pgbench_accounts' ) ) 
          FROM pgbench_accounts;
  count   | pg_size_pretty 
----------|----------------
 10000000 | 1281 MB
(1 row)



Now, let’s execute an idempotet UPDATE, that is something that does not change anything, and monitor the timing:



pgbench=> \timing
Timing is on.
pgbench=> UPDATE pgbench_accounts SET filler = filler;
UPDATE 10000000
Time: 307939,763 ms (05:07,940)

pgbench=> SELECT pg_size_pretty( pg_relation_size( 'pgbench_accounts' ) );
 pg_size_pretty 
----------------
 2561 MB
(1 row)

Time: 180,732 ms




Note how the table has doubled its size: this is because of bloating caused by every row being substituted by an exact copy of it.
Now, let’s create the trigger using the suppress_redundant_updates_trigger function, and let’s run the same update again, but after a server restart to clean up also the memory.



pgbench=> CREATE TRIGGER tr_avoid_idempotent_updates
BEFORE UPDATE ON pgbench_accounts
FOR EACH ROW
EXECUTE FUNCTION suppress_redundant_updates_trigger();

-- restart the server

pgbench=> \timing
Timing is on.
pgbench=> UPDATE pgbench_accounts SET filler = filler;
UPDATE 0
Time: 287588,607 ms (04:47,589)

pgbench=> SELECT pg_size_pretty( pg_relation_size( 'pgbench_accou
[...]
Posted by Ibrar Ahmed in Percona on 2021-06-02 at 19:10
postgres_fdw Postgresql 14

It’s exciting times in the PostgreSQL world with the version 14 beta released a few days ago. It’s now time to look under the hood and find out what has changed from version 13, see what has improved, and what behaviors are changed that we should be aware of. Putting it all in a single blog would take weeks to write and days to read, so this one will focus solely on the changes expected in the Foreign Data Wrapper in the GA release of PostgreSQL version 14.

Foreign Data Wrappers (FDWs) provide a mechanism by which regular SQL queries can be used to access data that resides outside PostgreSQL. There are many different FDWs available, however, and PostgreSQL comes with a “File FDW” and a “PostgreSQL FDW”. PostgreSQL FDW may seem counterintuitive, but, it is an extremely useful feature. And there have been some very useful updates to this FDW.

So, let’s start understanding what has changed.

Performance Feature

If you are already using PostgreSQL FDW for any use case, take note of the performance improvements.

1 – Parallel /Async Foreign Scans

(Allow a query referencing multiple foreign tables to perform foreign table scans in parallel)

Remote aggregations and remote joins might have been a performance nightmare when performed across multiple servers. The performance benefit comes from the parallelization of ForeignScan which can now be executed in parallel asynchronously. The sequential execution previously was very slow, and in some cases, too slow. For this, a new server option is added “async_capable” which allows for parallel planning and execution of ForeignScan.

Create Servers and User-Mappings

-- Create foreign server 1.
CREATE SERVER postgres_svr1
FOREIGN DATA WRAPPER postgres_fdw
OPTIONS (host '127.0.0.1', async_capable "true");

-- Create foreign server 2.
CREATE SERVER postgres_svr2
FOREIGN DATA WRAPPER postgres_fdw
OPTIONS (host '127.0.0.1', async_capable "true");

CREATE USER MAPPING FOR vagrant
SERVER postgres_svr1
OPTIONS (user 'postgres', password 'pass');

CREATE
[...]
Posted by Bruce Momjian in EDB on 2021-06-02 at 14:45

I created pg_cryptokey in 2019, which allows encryption key management at the SQL-level. A more sophisticated project is pgsodium, which uses libsodium for encryption and decryption. It is similar to pgcrypto, but has key management like pg_cryptokey. If you have ever wanted pgcrypto with integrated key management, pgsodium is definitely worth considering.

Autovacuum has been part of PostgreSQL for a long time. But how does it really work? Can you simply turn it on and off? People keep asking us these questions about enabling and disabling autovacuum a lot. PostgreSQL relies on MVCC to handle concurrency in a multiuser environment. The problem which arises when handling concurrent transactions is that dead tuples must be cleaned up. In PostgreSQL this is handled by the VACUUM command, which we already have covered in some other posts. However, running VACUUM manually is a thing of the past. Most people rely on the PostgreSQL autovacuum daemon to handle cleanup.

How autovacuum works

The first thing to understand is that autovacuum really does what it says: basically, it is automation around manual VACUUM. All it does is to sleep for a while and check periodically if a table requires processing. There are three things autovacuum takes care of:

  • Creating statistics for the PostgreSQL optimizer (ANALYZE)
  • Cleaning out dead rows
  • Handling wraparound protection

In PostgreSQL, autovacuum is a server-side daemon which is always there. Yes, that’s right: ALWAYS. Even if you turn autovacuum off in postgresql.conf (or by using ALTER SYSTEM to adjust postgresql.auto.conf), the daemon will still be around – by design – to help with wraparound protection.

The way autovacuum works is: it periodically checks if work has to be done, and notifies the postmaster in case new workers have to be launched to take care of a table. Autovacuum does not launch a worker directly, but works indirectly through the postmaster to make sure that all processes are on one level. The fact that it works through the postmaster clearly helps to make the system more reliable.

Let’s take a closer look at what autovacuum does.

Autovacuum creates optimizer statistics

The PostgreSQL optimizer relies heavily on statistics. It estimates the number of rows returned by various operations and tries to guess the best way to optimize a query.
The statistical distributi

[...]