The world's most advanced open source database
Top posters
Number of posts in the past month
Top teams
Number of posts in the past month
  • Policy for being listed on Planet PostgreSQL.
  • Add your blog to Planet PostgreSQL.
  • List of all subscribed blogs.
  • Manage your registration.
  • Get in touch with the Planet PostgreSQL administrators at planet at

Read our interview with Fabrice Vaillant who will talk about going From a Legacy Relational DB to an Event Queue.

This article gives a step by step guide to utilizing Machine Learning capabilities with 2UDA. In this article, we’ll use example of Animals to predict whether they are Mammals, Birds, Fish or Insects. Software versions We’re going to use 2UDA version 11.6-1 to implement the Machine Learning model. This version 11.6-1 combines: PostgreSQL 11.6 Orange […]

Read our interview with Daniel Vérité who will talk about Staying Safe from ACID Rains.

Posted by Florian Nadler in Cybertec on 2020-02-25 at 10:30

Last time we imported OpenStreetMap datasets from Iceland to PostGIS.
To quickly visualize our results, we will now use QGIS to render our datasets and generate some nice maps on the client.

Let’s start with our prerequisites:

  1. A PostGIS enabled PostgreSQL database preloaded with OpenStreetMap datasets as described in blogpost
  2. An up and running QGIS Client, see for further instructions
  3. A Git-Client to checkout some predefined styles for QGIS

Import datasets into QGIS

QGIS supports PostgreSQL/PostGIS out of the box.
We can import our datasets by defining our database connection (1), showing up the db catalog (2) and adding datasets of interest (3) to our workspace.

Step 1 Step 2 Step 3
Visualizing OSM data in QGIS Visualizing OSM data in QGIS Visualizing OSM data in QGIS


QGIS will automatically apply basic styles for the given geometry types and will start loading features of the current viewport and extent. Herby the tool will use information from PostGIS’s geometry_column view to extract geometry column, geometry type, spatial reference identifier and number of dimensions. To check what PostGIS is offering to QGIS, let’s query this view:

select f_table_name as tblname, f_geometry_column as geocol, coord_dimension as geodim, srid, type
from geometry_columns
where f_table_schema = 'osmtile_iceland';


This brings up the following results:

tblname geocol geodim srid type
planet_osm_point way 2 3857 POINT
planet_osm_line way 2 3857 L

Read our interview with Christophe Pettus who will talk about Database Disasters and How to Find Them.

A PostgreSQL procedural language handler needs to look up the body of the user defined function or stored procedure for the code to execute.  Remember that the C function executed to handle PL/Julia user defined functions and stored procedures is defined in the SQL file. Thus the pljulia_call_handler() function needs to be updated to retrieve […]

A very non-scientific comparison about the two database engines.

Usage of disk space in Oracle and PostgreSQL

A few days ago I built a table in Oracle (11, if that matters) to store a few hundred megabytes of data. But I don’t feel at home using Oracle, so I decided to export the data and import it back in PostgreSQL 12.
Surprisingly, PostgreSQL requires more data space to store the same amount of data.

I’m not saying anything about who is the best, and I don’t know the exact reasons why this happens, however this is just what I’ve observed hpoing this can be useful to someone else!
So please don’t flame!

Table structure

The table is really simple, and holds data about files on a disk. It does not have even a key, since it is just data I must mangle and then throw away.

testdb=> \d my_schema.my_files Table "my_schema.my_files" Column | Type | Collation | Nullable | Default -----------|-------------------------|-----------|----------|--------- filename | character varying(200) | | | directory | character varying(2048) | | | md5sum | character varying(128) | | | bytes | bigint | | | 

I’ve seen no changes in using text against a varchar, I used the latter just to be as similar as possible in the definition with Oracle.
The table is populated with 1872529...

Posted by movead li in Highgo Software on 2020-02-21 at 01:32

The lock is an essential part of a database system. In PostgreSQL, there are various locks, such as table lock, row lock, page lock, transaction lock, advisory lock, etc. Some of these locks are automatically added to complete the database functions during the operation of the database system, and some are manually added for the elements in PostgreSQL through some SQL commands. This blog explores the locks in PostgreSQL.

1. A Little Bit About pg_locks

It is a locks view in PostgreSQL. Except for row locks added by the SELECT … FOR command, we can observe all other locks existing in the database in this lock view. There is a granted attribute in the lock view, if this attribute is true, it means that the process has acquired the lock. Otherwise, it means that the process is waiting for the lock.

2. Table-level Lock

ACCESS SHARE, ROW SHARE, ROW EXCLUSIVE, SHARE UPDATE EXCLUSIVE, SHARE,SHARE ROW EXCLUSIVE, EXCLUSIVE, ACCESS EXCLUSIVE is all the table-level locks, and below is the block relationship among them.

Posted by Andreas 'ads' Scherbaum on 2020-02-20 at 17:54
This website will publish interviews with people who make the PostgreSQL Project what it is today.

PostgreSQL Conference Germany 2020 in Stuttgart, Germany, on May 15th is now open for registrations.

The Call for Papers is already closed, and we are working with the last speakers to confirm their talks, and we will have a full schedule published soon.

There are still a few "EARLYBIRD" tickets available, until end of February.

See you in Stuttgart!

If you are using Azure PostgreSQL and have upgraded your client side libpq to version 12 (which can happen automatically for example if you use the PostgreSQL apt repositories), you may see connection attempts fail with symptoms like:

$ psql -dpostgres -UXXXXX_dba@ZZZ-db01
psql: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.

With no log information whatsoever available. This can happen if your client is in a Kerberos environment and has valid Kerberos credentials (which can be verified with the klist command). In this case, PostgreSQL 12 will attempt to negotiate GSSAPI encryption with the server, and it appears the connection handler in Azure PostgreSQL is unable to handle this and just kills the connection.

When running the same thing against a local PostgreSQL server prior to version 12, a message like the following will show up in the log:

2020-02-20 10:48:08 CET [35666]: [2-1] client= FATAL:  unsupported frontend protocol 1234.5680: server supports 2.0 to 3.0

This is a clear indicator of what's going on, but unfortunately the information isn't always available when connecting to a managed cloud service, such as Azure PostgreSQL. The hard error from Azure also prevents libpq from retrying without GSSAPI encryption, which is what would happen when connecting to a regular PostgreSQL backend or for example through pgbouncer.

The fix/workaround? Disable GSSAPI encryption in the client:

$ export PGGSSENCMODE=disable
$ psql -dpostgres -UXXXXX_dba@ZZZ-db01
Password for user XXXXX_dba@ZZZ-db01:
psql (11.6 (Ubuntu 11.6-1.pgdg16.04+1), server 9.5.20)
SSL connection (protocol: TLSv1.2, cipher: ECDHE-RSA-AES256-GCM-SHA384, bits: 256, compression: off)
Type "help" for help.


If you have this type of issue, it's probably worth putting this environment variable in your startup scripts. It can also be set using the gssencm

Posted by Darafei Praliaskouski in PostGIS on 2020-02-20 at 00:00

The PostGIS Team is pleased to release PostGIS 3.0.1.

Best served with PostgreSQL 12.2, GEOS 3.8.0, SFCGAL 1.3.7, GDAL 3.0.4, PROJ 6.3.1, protobuf-c 1.3.3, json-c 0.13.1.

Continue Reading by clicking title hyperlink ..
Posted by Amit Kapila in EnterpriseDB on 2020-02-19 at 11:07
This blog post is about the journey of parallelism in PostgreSQL till now and what is in store for the future.  Since PostgreSQL 9.6 where the first feature of parallel query has arrived, each release improves it.  Below is a brief overview of the parallel query features added in each release.

PG9.6 has added Parallel execution of sequential scans, joins, and aggregates.

PG10 has added (a) Support parallel B-tree index scans, (b) Support parallel bitmap heap scans, (c) Allow merge joins to be performed in parallel, (d) Allow non-correlated subqueries to be run in parallel, (e) Improve ability of parallel workers to return pre-sorted data and (f) Increase parallel query usage in procedural language functions.

PG11 has added (a) Allow parallel building of a btree index, (b) Allow hash joins to be performed in parallel using a shared hash table, (c) Allow parallelization of commands CREATE TABLE ... AS, SELECT INTO, and CREATE MATERIALIZED VIEW, (d) Allow UNION to run each SELECT in parallel if the individual SELECTs cannot be parallelized, (e) Allow partition scans to more efficiently use parallel workers, (f) Allow LIMIT to be passed to parallel workers, this allows workers to reduce returned results and use targeted index scans, (g) Allow single-evaluation queries, e.g. WHERE clause aggregate queries, and functions in the target list to be parallelized.

PG12 has added Allow parallelized queries when in SERIALIZABLE isolation mode.

The progress for PG13 with respect to parallelism.  Some of the important advancements are:
(a) Parallel vacuum - This feature allows the vacuum to leverage multiple CPUs in order to process indexes.  This enables us to perform index vacuuming and index cleanup with background workers.  This adds a PARALLEL option to VACUUM command where the user can specify the number of workers that can be used to perform the command which is limited by the number of indexes on a table.  Specifying zero as a number of workers will disable parallelism.  For more informatio

The PostgreSQL caching system has always been a bit of a miracle to many people and many have asked me during consulting or training sessions: How can I figure out what the PostgreSQL I/O cache really contains? What is in shared buffers and how can one figure out? This post will answer this kind of question and we will dive into the PostgreSQL cache.


Creating a simple sample database

Before we can inspect shared buffers we have to create a little database. Without data the stuff we are going to do is not too useful:

 hs@hansmacbook ~ % createdb test 

To keep it simple I have created a standard pgbench database containing 1 million rows as follows:

 hs@hansmacbook ~ % pgbench -i -s 10 test
dropping old tables...
NOTICE: table "pgbench_accounts" does not exist, skipping
NOTICE: table "pgbench_branches" does not exist, skipping
NOTICE: table "pgbench_history" does not exist, skipping
NOTICE: table "pgbench_tellers" does not exist, skipping
creating tables...
generating data...
100000 of 1000000 tuples (10%) done (elapsed 0.14 s, remaining 1.25 s)
200000 of 1000000 tuples (20%) done (elapsed 0.27 s, remaining 1.10 s)
300000 of 1000000 tuples (30%) done (elapsed 0.41 s, remaining 0.95 s)
400000 of 1000000 tuples (40%) done (elapsed 0.61 s, remaining 0.91 s)
500000 of 1000000 tuples (50%) done (elapsed 0.79 s, remaining 0.79 s)
600000 of 1000000 tuples (60%) done (elapsed 0.92 s, remaining 0.62 s)
700000 of 1000000 tuples (70%) done (elapsed 1.09 s, remaining 0.47 s)
800000 of 1000000 tuples (80%) done (elapsed 1.23 s, remaining 0.31 s)
900000 of 1000000 tuples (90%) done (elapsed 1.37 s, remaining 0.15 s)
1000000 of 1000000 tuples (100%) done (elapsed 1.49 s, remaining 0.00 s)
creating primary keys...

Deploying 1 million rows is pretty fast. In my case it took around 1.5 seconds (on my laptop),

Deploying pg_buffercache

Now that we have some data we can install the pg_buffercache extension which is ideal if you want to inspect the content of the PostgreSQL I/O cache:

Recently, on irc, there were couple of cases where someone wanted to use uuid as datatype for their primary key. I opposed, and tried to explain, but IRC doesn't really allow for longer texts, so figured I'll write a blogpost. First problem – UUID values are completely opaque. That means – uuids generated for table … Continue reading "Why I’m not fan of uuid datatype"
Couple of days ago RhodiumToad reported, on irc, a bug in Specifically – if explain was done using JSON/XML/YAML formats, and node type was Aggregate, the site didn't extract full info. In text explains the node type is one of: Aggregate HashAggregate GroupAggregate But in non-text formats, type of Aggregate was ignored. As of … Continue reading "Fix for displaying aggregates on"
Some time ago I wrote blogpost which showed how to list tables that should be autovacuumed or autoanalyzed. Query in there had one important problem – it didn't take into account per-table settings. Specifically – it only used system-wide values for: autovacuum_analyze_scale_factor autovacuum_analyze_threshold autovacuum_vacuum_scale_factor autovacuum_vacuum_threshold but these can be also set per table using syntax … Continue reading "Which tables should be auto vacuumed or auto analyzed – UPDATE"
On 12nd of February 2020, Michael Paquier committed patch: Add %x to default PROMPT1 and PROMPT2 in psql   %d can be used to track if the current connection is in a transaction block or not, and adding it by default to the prompt has the advantage to not need a modification of .psqlrc, something … Continue reading "Waiting for PostgreSQL 13 – Add %x to default PROMPT1 and PROMPT2 in psql"
On 6th of February 2020, Michael Paquier committed patch: Add leader_pid to pg_stat_activity   This new field tracks the PID of the group leader used with parallel query. For parallel workers and the leader, the value is set to the PID of the group leader. So, for the group leader, the value is the same … Continue reading "Waiting for PostgreSQL 13 – Add leader_pid to pg_stat_activity"

To avoid having concurrent transactions interfere with each other, SQL engines implement isolation as a feature. This property corresponds to the I letter in the well known ACID acronym, the other properties being Atomicity, Consistency and Durability.

Isolation happens to be configurable, with different levels that correspond to different behaviors when executing concurrent transactions.

The SQL-1992 standard defines four level of isolation, from the weakest to the strongest:

  • Read Uncommitted: a transaction sees the changes of other transactions before they are committed. PostgreSQL does not implement this mode.

  • Read Committed: a transaction sees changes from others as soon as they have committed.

  • Repeatable Read: when a transaction reads back a row that has been already read by a previous query, it must read the same values, even if the row has been changed by another transaction that has committed in the meantime.

  • Serializable: a transaction cannot see or produce results that could not have occurred if other transactions were not concurrently changing the data. PostgreSQL implements true serializability since version 9.1

If you’re porting code from MySQL or writing code targeting both PostgreSQL and MySQL/MariaDB, you might want to care about the default level being Read Committed in PostgreSQL and Repeatable Read in MySQL or MariaDB (with the InnoDB engine). By the way, the SQL standard says that Serializable should be used by default, so both engines don’t follow this recommendation (just like Oracle, MS SQL Server, DB2,Sybase ASE… which ignore it as well).

Anyway, let’s see first a very simple example to illustrate the difference between MySQL and PostgreSQL in their default isolation levels:

Example 1

Say we have a single-column table with 4 values:

CREATE TABLE a(x int);
INSERT INTO a VALUES (1),(2),(3),(4);

A transaction Tx1 computes the sum and average of the values in two distin

Small note - I finished support of statement and branch coverage metrics calculations for plpgsql_check
cloudfs_fdw now supports .xls (Excel 97-2003), .xlsx, and .ods (Open Document Format) Spreadsheets via pandas, xlrd, and odfpy. It requires pandas >= 1.0.1, so Multicorn must be compiled against Python 3.

Since pandas provides sorting and filtering capabilities, cloudfs_fdw tries to push down SQL qualifiers and sort keys when they can be translated into pandas notation.

Take a look and have fun.
Posted by Robert Haas in EnterpriseDB on 2020-02-13 at 19:13
In previous blog posts that I've written about VACUUM, and I seem to be accumulating an uncomfortable number of those, I've talked about various things that can go wrong with vacuum, but one that I haven't really covered is when autovacuum seems to be running totally normally but you still have a VACUUM problem. In this blog post, I'd like to talk about how to recognize that situation, how to figure out what has caused it, how to avoid it via good monitoring, and how to recover if it happens.
Read more »
Compression of PostgreSQL WAL Archives

compression of postgresql wal archivesAs hardware and software evolve, the bottlenecks in a database system also shift. Many old problems might disappear and new types of problems pop-up.

Old Limitations

There were days when CPU and Memory was a limitation. More than a decade back, servers with 4 cores were “High End” and as a DBA, my biggest worry was managing the available resources. And for an old DBA like me, Oracle’s attempt to pool CPU and Memory from multiple host machines for a single database using RAC architecture was a great attempt to solve it.

Then came the days of storage speed limitations. It was triggered by the emergence of multi-core with multi-thread processors becoming common, as well as memory size and bus speed increasing. Enterprises tried to solve it with sophisticated SAN drives, Specialized Storages with cache, etc. But it has remained for many years, even now as enterprises started increasingly shifting to NVMe drives.

Recently we started observing a new bottleneck which is becoming a pain point for many database users. As the capability of the single-host server increased, it started processing a huge number of transactions. There are systems that produce thousands of WAL files in a couple of minutes, and there were a few cases reported where WAL archiving to a cheaper, slower disk system was not able to catch up with WAL generation. To add more complexity, many organizations prefer to store WAL archives over a low bandwidth network. (There is an inherent problem in Postgres Archiving that if it lags behind, it tends to lag more because the archive process needs to search among .ready files. which won’t be discussed here.)

In this blog post, I would like to bring to your attention the fact that compressing WALs can be easily achieved if you are not already doing it, as well as a query to monitor the archiving gap.

Compressing PostgreSQL WALs

The demands and requirements for compressing WALs before archiving are increasing day by day. Luckily, most of the PostgreSQL backup tools like pgbackre


The right way to get the current PostgreSQL configuration is by means of pg_settings.

Take advantage of pg_settings when dealing with your configuration

I often see messages on PostgreSQL related mailing list where the configuration is assumed by a Unix-style approach. For example, imagine you have been asked to provide your autovacuum configuration in order to see if there’s something wrong with it; one approach I often is the copy and paste of the following:

% sudo -u postgres grep autovacuum /postgres/12/postgresql.conf #autovacuum_work_mem = -1 # min 1MB, or -1 to use maintenance_work_mem #autovacuum = on # Enable autovacuum subprocess? 'on' #log_autovacuum_min_duration = -1 # -1 disables, 0 logs all actions and autovacuum_max_workers = 7 # max number of autovacuum subprocesses autovacuum_naptime = 2min # time between autovacuum runs autovacuum_vacuum_threshold = 500 # min number of row updates before autovacuum_analyze_threshold = 700 # min number of row updates before #autovacuum_vacuum_scale_factor = 0.2 # fraction of table size before vacuum #autovacuum_analyze_scale_factor = 0.1 # fraction of table size before analyze #autovacuum_freeze_max_age = 200000000 # maximum XID age before forced vacuum #autovacuum_multixact_freeze_max_age = 400000000 # maximum multixact age #autovacuum_vacuum_cost_delay = 2ms # default vacuum cost delay for # autovacuum, in milliseconds; #autovacuum_vacuum_cost_limit = -1 # default vacuum cost limit for # autovacuum, -1 means use 

While this could be a correct approach and makes it simply to provide a full set of configuration values, it has few drawbacks:

  • it produces verbose output (e.g., there are comments on the right of each line);
  • it could not be the whole story about the configuration, for example because something is in;
  • it does include commented out lines;
  • it could be not the configuration your cluster is running on.

Let’s examine all the drawbacks, one...

Posted by Dave Conlin on 2020-02-12 at 14:38

One of the worst performance hits a Postgres query can take is having to perform a sort or hash operation on disk. When these space-intensive operations require more memory than is available, Postgres uses disk space instead. Disk space is much slower to read and write than RAM, so this generally takes significantly longer.

The best solution to this problem is to avoid having to perform the operation entirely, for example by adding a judicious index.

The second best solution is to reduce the amount of space required for the operation, by working on fewer rows or less data per row. If you can reduce the amount of memory needed, the operation can take place in RAM rather than requiring slow disk access.

However, these options are not always available. Assuming that the server has enough memory, it often makes sense to allow postgres to use more RAM for these operations before choosing to use disk space. This is done by adjusting the work_mem system config parameter.

The default value for work_mem is 4MB. This is generally acknowledged to be too small for most modern systems. For example, Christophe Pettus suggests that 16MB is a good starting point for most people. So it’s pretty normal to at least consider increasing it. You can check what the current value is with the query:

SHOW work_mem;

There are dangers in changing the work_mem value. It refers to the memory available to a single operation – one individual hash, bitmap scan or sort – and so even a single query can use several times the defined value. When you consider that servers are often serving many queries simultaneously, you’ll see why setting the value too high can lead to the server running out of memory. This is, to put it mildly, something you probably want to avoid.

A value too small, on the other hand, will result in too many operations taking place on disk, which is in turn much less efficient than using RAM.

Therefore it’s usually a good idea to experiment with values before changing them for the whole server. You


When migrating from MS SQL to PostgreSQL, one of the first things people notice is that in MS SQL, object names such as tables and columns all appear in uppercase. While that is possible on the PostgreSQL side as well it is not really that common. The question therefore is: How can we rename all those things to lowercase – easily and fast?


MS SQL PostgreSQL Migration


Finding tables to rename in PostgreSQL

The first question is: How can you find the tables which have to be renamed? In PostgreSQL, you can make use of a system view (pg_tables) which has exactly the information you need:

SELECT  'ALTER TABLE public."' || tablename || '" RENAME TO ' || lower(tablename) 
FROM    pg_tables 
WHERE   schemaname = 'public'
        AND tablename <> lower(tablename);

This query does not only return a list of tables which have to be renamed. It also creates a list of SQL commands.

If you happen to use psql directly it is possible to call …


… directly after running the SQL above. \gexec will take the result of the previous statement and consider it to be SQL which has to be executed. In short: PostgreSQL will already run the ALTER TABLE statements for you.

The commands created by the statement will display a list of instructions to rename tables:

(1 row)

Avoid SQL injection at all cost

However, the query I have just shown has a problem: It does not protect us against SQL injection. Consider the following table:

test=# CREATE TABLE "A B C" ("D E" int);

In this case the name of the table contains blanks. However, it could also contain more evil characters, causing security issues. Therefore it makes sense to adapt the query a bit:

test=# SELECT 'ALTER TABLE public.' || quote_ident(tablename) || ' RENAME TO ' || lower(quote_ident(tablename))
       FROM    pg_tables
       WHERE   schemaname = 'public'
               AND   tablename <> lower(table
Julia provides an API so that Julia functions can be called from C.  PL/Julia will use this C API to execute Julia code from its user defined functions and stored procedures. Julia’s documentation provides an example C program that starts up the Julia environment, evaluates the expression sqrt(2.0), displays the resulting value to the standard […]
This is the second installment of a two-part series on 2ndQuadrant’s repmgr, an open-source high-availability tool for PostgreSQL. In the first part, we set up a three-node PostgreSQL 12 cluster along with a “witness” node. The cluster consisted of a primary node and two standby nodes. The cluster and the witness node were hosted in […]

Parcel sorting Photo by @kelvyn on Unsplash

The End Point development team has completed a major application migration from one stack to another. Many years ago, the vendor maintaining the old stack abandoned support and development. This led to a stack evolution riddled with independent custom changes and new features in the following years.

The new application was developed by a consortium that created migration scripts to transfer data to a fresh stack resulting in a completely restructured database schema. While we could not directly use those consortium migration scripts to our client application, attempting to create migration scripts from scratch would be tedious due to the many labor-intensive and time-consuming tasks. We looked to reuse and customize the scripts in order to ensure an exact match of the customized changes to the client’s application.

Liquibase: A Schema Comparison Tool

After an arduous hunt for a suitable solution, we came across Liquibase, an open-source schema comparison tool that utilizes the diff command to assess missing, changed, and unexpected objects.

Installation and Usage

Let’s see how to use Liquibase and review the insights and results offered by the diff command.

Before beginning, download the latest version of Liquibase. As the default package doesn’t have its own driver, it would be wise to add the PostgreSQL driver to the Liquibase lib folder. (You’ll need to do this with any other database types and their necessary libraries and drivers.)

$ wget
$ tar xvzf liquibase-3.8.5.tar.gz
$ wget -P lib/

$ ./liquibase \
--classpath="/path/to/home/apps/liquidiff/lib" \
--outputFile=liquibase_output.txt \
--driver=org.postgresql.Driver \
--url=jdbc:postgresql://localhost:5432/schema_two \
--username=postgres \
--password=CHANGEME \
--defaultSchemaName=public \
Diff \