The world's most advanced open source database
Top posters
Number of posts in the past month
Top teams
Number of posts in the past month
  • Policy for being listed on Planet PostgreSQL.
  • Add your blog to Planet PostgreSQL.
  • List of all subscribed blogs.
  • Manage your registration.
  • Get in touch with the Planet PostgreSQL administrators at planet at
Posted by Shaun M. Thomas on 2016-09-23 at 19:56

When it comes to putting Postgres through its paces, we often turn to benchmarks to absolutely bury it under a torrent of oppressive activity. It’s a great way to obtain maximum performance metrics and also observe how Postgres reacts and breaks down under such pressure. But these kinds of tests aren’t really practical, are they? After all, many such simulated workloads are nothing but bragging rights measured against previous Postgres releases, or for hardware comparisons. But while functionality beyond defaults is often overlooked, tools like pgbench are actually critical to the development process.

I’ve used pgbench frequently in the past for examples in these articles. It’s much like Apache’s own ab for bombarding a server with various workloads, except pgbench must bootstrap by creating a set of test tables. Or does it? What if we have our own existing tables from a project that we want to measure, either for a hardware migration, or for scalability concerns?

Let’s see how we can utilize pgbench with our trusty sensor_log table. It’s not really designed to experience any updates, so we need something else as well. Let’s say the sensor log is an endpoint capture for a very busy sensor table that is only updated when sensors detect a change in the environment. That should provide us with plenty of opportunities for contention!

  location       VARCHAR NOT NULL,
  reading        BIGINT NOT NULL,
  modified_date  TIMESTAMP NOT NULL
INSERT INTO sensor (location, reading, modified_date)
SELECT, % 100,
                    - ( % 60 || 's')::INTERVAL
  FROM generate_series(1, 5000000) s(id);
CREATE INDEX idx_sensor_location
    ON sensor (location);
ANALYZE sensor;
CREATE TABLE sensor_log (
  sensor_id     BIGINT NOT NULL,
  location      VARCHAR NOT NULL,
  reading       BIGINT NOT NULL,
  reading_date  TIMESTAMP NOT NULL
CREATE UNIQUE INDEX udx_sensor_log_sensor_id_reading_date

In the previous post, we provided a business and architectural
background for the Postgres FDWs that we are developing for Spark,
Hadoop and Cassandra. In particular, we highlighted the key benefits of
bringing Cassandra and PostgreSQL together.

With this post, we will start taking a more technical look at the
Cassandra FDW.

The C* FDW speaks natively with Cassandra on two levels; it:

  • uses the binary CQL protocol instead of the legacy Thrift protocol.
  • directly relies on the DataStax Native C++ driver for Cassandra.

The DataStax C++ driver is performant and feature-rich; various load
balancing and routing options are available and configurable. We are
already making use of some of these features and plan to provide more of
these to our users.

While exploring Cassandra as a Postgres user, the defaults such as
automatic inclusion of the ALLOW FILTERING clause are useful as they
allow gradual familiarity; especially useful in small development
environments. Our intent is to support tuning for large environments
but to default to a configuration geared toward existing PostgreSQL

At this point, let us consider whether we are introducing a new SPOF by
using PostgreSQL with a Cassandra system. We believe not; a PostgreSQL
node at the edge of a Cassandra cluster – as a transactional or open-SQL
end point – is not at all the same as a central master node critical to
the operation of an entire cluster. We see some trade-offs but mostly
we see benefits of bringing PostgreSQL to Cassandra in this way as we
intend to elucidate through this series.

In the next post, we will show you how to get started with the Cassandra

Posted by Ernst-Georg Schmid on 2016-09-23 at 05:59
OpenBabel 2.4.0 is released, the build process worked flawlessly, 100% tests passed.

Now I have to see if the Tigress still likes it...

Well, pgchem::tigress builds against OpenBabel 2.4.0 without errors, but will it work?

Yes, pgchem::tigress works with OpenBabel 2.4.0 without any notable alterations, except changing:


in the Makefile to:


As always with .0 releases, subtle issues might be lurking below the surface, though.
Posted by Bruce Momjian in EnterpriseDB on 2016-09-21 at 13:45

I have written a presentation covering the important features in Postgres 9.6 and some of the features we hope for in Postgres 10.

After about 100 hours of packaging and testing work, pgadmin4 RPMs hit PostgreSQL 9.4, 9.5 and 9.6 repos, for RHEL/CentOS 7 and Fedora 23, 24!

First of all, I'd like to write down the list of packages that entered git repo for pgadmin4 dependency: Continue reading "Installing pgadmin4 to Red Hat, CentOS, and Fedora"
Posted by Craig Ringer in 2ndQuadrant on 2016-09-21 at 00:52

I’m pleased to say that Postgres-BDR is on its way to PostgreSQL 9.6, and even better, it works without a patched PostgreSQL.

BDR has always been an extension, but on 9.4 it required a heavily patched PostgreSQL, one that isn’t fully on-disk-format compatible with stock community PostgreSQL 9.4. The goal all along has been to allow it to run as an extension on an unmodified PostgreSQL … and now we’re there.

The years of effort we at 2ndQuadrant have put into getting the series of patches from BDR into PostgreSQL core have paid off. As of PostgreSQL 9.6, the only major patch that Postgres-BDR on 9.4 has that PostgreSQL core doesn’t, is the sequence access method patch that powers global sequences.

This means that Postgres-BDR on 9.6 will not support global sequences, at least not the same way they exist in 9.4. The 9.6 version will incorporate a different approach to handling sequences on distributed systems, and in the process address some issues that arose when using global sequences in production.

Since Postgres-BDR on 9.6 is now passing regression tests, I expect to have an alpha release out not long after the release of PostgreSQL 9.6 itself. Keep an eye out!

There’s also a pglogical 1.2.0 update coming to coincide with the release of PostgreSQL 9.6.

At the same time, Petr has submitted another revision of logical replication for core PostgreSQL to the 10.0 development series.

Along with updating Postgres-BDR in concurrence with 9.6, there is work in progress to enhance Postgres-BDR’s HA capabilities. I’ll have more to say on that in an upcoming post.

We’ve been busy here at 2ndQuadrant.

Posted by Martín Marqués in 2ndQuadrant on 2016-09-20 at 14:11


PostgreSQL gives developers the chance of choosing between two possible storage facilities for large binary data: Bytea and LargeObjects.

Large Objects have been around for a long time, and PostgreSQL has a smart way of storing large binary data. It does so by splitting it into chunks of LOBLKSIZE (a forth of BLCKSZ). That way the tuples from pg_largeobject don’t spill on the toast table.

On the other hand bytea stores the binary data directly in the tuple, which may lead to poor performance depending on how your schema looks.

This sounds great if you have an intelligent interface for dealing with the manipulation of these binary files, specially if update modify just a small portion of the whole binary file.

But normally we don’t bother writing code that takes advantage of this, and instead we write again of the whole binary data.

One of the things that I believe make people adopt large objects are the functions available for importing and exporting files directly from the database server to it’s filesystem. There’s a con to this: if the application is on a different server, you’ll need more code to move the file to the location where it’s needed.

A problem you might face

The past days I had to examine a database used to store information of user sessions from a Java CAS system. I found there were almost 100 million large objects in the database, not very big ones.

I went over the user tables checking the fields that had an oid field, and then I cross-referencing the values in those fields with the pg_largeobject_metadata table. I found that 96% of those large objects where orphan ones. Those are large objects which weren’t referenced by any tuple from the user tables.

Further investigation concluded that Hibernate didn’t take care of purging the largeobjects it created when deleting or updating tuples with oid fields. So it was generating a great amount of bloat which could not be clean up by vacuuming, but had to be purged from pg_largeobjects table manually.

In the particular case of the CAS database, t


I was in Jakarta a couple of weeks ago and there happened to be a meetup of the Indonesia PUG in Bandung while I was there. Because it is just a 2 hour, rather picturesque drive, from Jakarta, I thought it was too good of an opportunity to miss. So I showed up.



The meetup was hosted by Julyanto Sutandang of Equnix Business Solutions and the conversation was mostly centered around convincing the local technology industry about PostgreSQL in comparison to Oracle. We got into fairly detailed discussions on the typical challenges of moving an Oracle production database to PostgreSQL. I especially love talking about hierarchical queries – Oracle’s CONNECT BY PRIOR and PostgreSQL’s WITH RECURSIVE.

It was very interesting to find out how popular BDR – the Multi Master Replication solution from 2ndQuadrant – was in Indonesia. I got a lot of questions about the technology (which, admittedly, I am not an expert at) and its roadmap (which I was able to answer more confidently). Not only is BDR already being used in production at multiple locations but many corporates, including large banks, are actively doing POCs using the technology.

We also got talking about ACID compliance, and how that figures into the whole NoSQL equation. We talked about the major NoSQL-like features of PostgreSQL: JSON/JSONB, HSTORE, XML, etc. and the fact that they are all ACID compliant, unlike exclusively NoSQL data stores. The main takeaway was that NoSQL Databases and RDBMS complement each other, they can’t replace each other. NoSQL is good for POCs and massive data writes at high speeds, but if you hold your data valuable and want to derive analytical value from it, you have to eventually move it to an RDBMS. Of course, our database of choice is PostgreSQL :-)



Oh, and did I mention that Julyanto insisted on stuffing me full of traditional Indonesian food? ;-)


It sounds like click-bait, or one of those late night TV promotions – “Improve your database performance by 100% – by changing just this one setting!” But in this case, it’s true – you can drastically improve PostgreSQL on Windows performance by changing one configuration setting – and we made this the default in our Postgres by BigSQL distribution for 9.2 thru 9.6.

tl;dr – if you have high query load, change “update_process_title” to ‘off’ on Windows, and get 100% more throughput.

Improve postgresql performance by turning off update_process_title

Performance Improvement by turning off update_process_title

Most Postgres DBA’s already know that they need to tune settings for shared buffers, WAL segments, checkpoints, etc, to get the best performance from their database. If you are running PostgreSQL on Windows, there’s another setting that you need to look at, specifically “update_process_title”. Changing this setting from “on” to “off” can improve throughput on a heavy query load by close to 100%

We ran a series of benchmark tests in our performance lab and you can see the dramatic improvement in the graphs displayed. We tested PostgreSQL 9.5 on a 16-core Windows server with fast SSD drives using a standard pgbench run in both read-only and read-write modes. Scaling from 4 to 40 clients shows a plateau in throughput (measured by TPS) after 8 clients when the setting is set to “on”. Changing the update_process_title setting to “off” allows PostgreSQL to continue to scale throughput, showing increasing TPS up to 40 clients. The throughput at 32 read-only clients increases from 20K TPS to 58K TPS (180% higher) and at 40 clients continues to climb to 76K TPS (270% higher).

Improvement in read-write transactions turning off update_process_title

Improvement in read-write transactions turning off update_process_title

This performance gain is seen for both read-only and read-write workloads. With 32 clients, the write throughput increases from 2,700 TPS to 7,700 TPS (180% higher) and at 40 clients continues to climb to 8,200 (200% higher).

The update_process_title setting controls whether or not Postgres will update the process description that you

Posted by Federico Campoli on 2016-09-20 at 05:38
Back in the 2013 I started playing with sqlalchemy to create a simple extractor from heterogeneous systems to be pushed in postgresql.
I decided to give the project a name which recalled the transformation and I called pg_chameleon.

To be honest I didn't like sqlalchemy.  Like any other ORM adds an interface to the data layer with a mental approach to the data itself. I lost the interest to developing a migrator very soon, and after all there are thousands of similar tools thousands of times better than mine (e.g. the awesome pgloader)

However recently I revamped the project after discovering a python library capable to read the mysql replication protocol. In few weeks I cleaned all the sqlalchemy stuff, rebuilt the metadata extraction using the information_schema and finally I had an usable tool to replicate the data across the two systems.

I've also changed the license from GPL to the 2 clause BSD.

The tool requires testing. I'm absolutely sure is full of bugs and issues, but it seems to work quite nice.

Some key aspects:

  • Is developed in python 2.7. Bear with me, I'll build a port to python 3.x when/if the project will get to an initial  release.
  • I use tabs (4 space tabs). Bear with me again. I tried to use spaces and I almost thrown my laptop out of the window
  • is not working. I'll fix this as soon as I'll do a release.
  • Yes, the sql part use the "hungarian notation" and the keywords are uppercase with strange indentation on the statements .  
  • The DDL are not yet replicated. I'm thinking to a clever approach to the problem.

That's it. If you want to test it please do and try to break the tool :)

The tool is on github here:

Sometimes, the elephant gets hurt - inducing database errors! Data corruption is a fact of life in working with computers, and Postgres is not immune. With the addition of the "data checksums" feature, detecting such corruption is now much easier. But detection is not enough - what happens after the corruption is detected? What if Postgres could fix the problem all by itself - what if we could give the elephant a mutant healing power?!?

Now we can. I wrote an extension named pg_healer that does just that - detects corruption issues, and automatically repairs them. Let's see how it works with a demonstration. For this, we will be purposefully corrupting the "pgbench_branches" table, part of the venerable pgbench utility.

For the initial setup, we will create a new Postgres cluster and install the pgbench schema. The all-important checksum feature needs to be enabled when we initdb, and we will use a non-standard port for testing:

$ initdb --data-checksums dojo
The files belonging to this database system will be owned by user "greg".
Data page checksums are enabled.

creating directory dojo ... ok
creating subdirectories ... ok
$ echo port=9999 >> dojo/postgresql.conf
$ pg_ctl start -D dojo -l log.dojo.txt
server starting
$ createdb -p 9999 $USER
$ pgbench -p 9999 -i
NOTICE:  table "pgbench_history" does not exist, skipping
NOTICE:  table "pgbench_tellers" does not exist, skipping
NOTICE:  table "pgbench_accounts" does not exist, skipping
NOTICE:  table "pgbench_branches" does not exist, skipping
creating tables...
100000 of 100000 tuples (100%) done (elapsed 0.35 s, remaining 0.00 s)
set primary keys...

Next, we install the pg_healer extension. As it needs to access some low-level hooks, we need to load it on startup, by adding a line to the postgresql.conf file:

$ git clone git://
Cloning into 'pg_healer'...
$ cd pg_healer
$ make install
gcc -Wall -Wmissing-prototypes ... -c -o pg_healer.o pg_healer.c
gcc -Wall -Wmissing-
Posted by Regina Obe in PostGIS on 2016-09-19 at 00:00

PostGIS 2.3.0rc1 is feature complete, so we’re looking for testing and feedback! Best served with PostgreSQL 9.6rc1 and pgRouting 2.3.0-rc1

Please give this release candidate a try and report back any issues you encounter. New things since 2.3.0beta1 release

Please report bugs that you find in this release.

Important / Breaking Changes

  • 3466, Casting from box3d to geometry now returns a 3D geometry (Julien Rouhaud of Dalibo)

  • 3604, pgcommon/ orders CFLAGS incorrectly leading to wrong liblwgeom.h (Greg Troxel)

  • 3396, ST_EstimatedExtent, now throws WARNING instead of ERROR (Regina Obe)

    New Features and Performance Enhancements

  • Add support for custom TOC in (Christoph Moench-Tegeder)

  • Add support for negative indexing in STPointN and STSetPoint (Rémi Cura)
  • Numerous new function additions and enhancements: New Functions and Enhancements

  • 3549, Support PgSQL 9.6 parallel query mode, as far as possible (Paul Ramsey, Regina Obe)

  • 3557, Geometry function costs based on query stats (Paul Norman)
  • 3591, Add support for BRIN indexes (Giuseppe Broccolo of 2nd Quadrant, Julien Rouhaud and Ronan Dunklau of Dalibo)
  • 3496, Make postgis non-relocateable (for extension install), schema qualify calls in functions (Regina Obe) Should resolve once and for all for extensions #3494, #3486, #3076

  • 3547, Update tiger geocoder to support TIGER 2016 and use http or ftp (Regina Obe)

See the full list of changes in the news file and please report bugs that you find in the release. Binary packages will appear in repositories over the coming weeks as packagers roll out builds.

View all closed tickets for 2.3.0.

It’s been a long time since my last post. It’s time to write something useful :)

When people start working with PostgreSQL they sometimes make mistakes which are really difficult to fix later. For example during initdb of your first DB you don’t really understand whether you need checksums for data or not. Especially that by default they are turned off and documentation says that they “may incur a noticeable performance penalty”.

And when you already have several hundred databases with a few hundred terabytes of data on different hardware or (even worse) in different virtualization systems, you do understand that you are ready to pay some performance for identification of silent data corruption. But the problem is that you can’t easily turn checksums on. It is one of the things that is adjusted only once while invoking initdb command. In the bright future we hope for logical replication but until that moment the only way is pg_dump, initdb, pg_restore that is with downtime.

And if checksums may be not useful for you (e.g. you have perfect hardware and OS without bugs), lc_collate is important for everyone. And now I will prove it.

Sort order

Suppose you have installed PostgreSQL from packages or built it from sources and initialized DB by yourself. Most probably, in the modern world of victorious UTF-8 you would see something like that:

d0uble ~ $ psql -l
                               List of databases
   Name    | Owner  | Encoding |   Collate   |    Ctype    | Access privileges
 postgres  | d0uble | UTF8     | en_US.UTF-8 | en_US.UTF-8 |
 template0 | d0uble | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/d0uble        +
           |        |          |             |             | d0uble=CTc/d0uble
 template1 | d0uble | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/d0uble        +
           |        |          |             |             | d0uble=CTc/d0uble
(3 rows)

d0uble ~ $

If you don’t specify explicitly, initdb will take settings

Posted by Luca Ferrari in ITPUG on 2016-09-18 at 16:42
As you probably already know the Call For Papers for the PGDay.IT 2016 is now open. Please see the details here and send your contribution following the instructions. The organizing committee will review each proposal in order to deliver a great program for the tenth edition of the italian PostgreSQL based conference.

Since version 1.7, Django has natively supported database migrations similar to Rails migrations. The biggest difference fundamentally between the two is the way the migrations are created: Rails migrations are written by hand, specifying changes you want made to the database, while Django migrations are usually automatically generated to mirror the database schema in its current state.

Usually, Django’s automatic schema detection works quite nicely, but occasionally you will have to write some custom migration that Django can’t properly generate, such as a functional index in PostgreSQL.

Creating an empty migration

To create a custom migration, it’s easiest to start by generating an empty migration. In this example, it’ll be for an application called blog:

$ ./ makemigrations blog --empty -n create_custom_index
Migrations for 'blog':

This generates a file at blog/migrations/ that will look something like this:

# -*- coding: utf-8 -*-                                                                                                                                                                                             
# Generated by Django 1.9.4 on 2016-09-17 17:35                                                                                                                                                                     
from __future__ import unicode_literals                                                                                                                                                                             
from django.db import migrations                                                                                                                                                                         
Posted by Shaun M. Thomas on 2016-09-16 at 21:00

There seem to be quite a few popular Postgres conferences peppering the globe these days. This year, Simon Riggs of 2ndQuadrant gave the sponsored keynote at Postgres Open. I’m not entirely sure it was intentional since it wasn’t the title of his presentation, but he uttered the words “working together to make Postgres better for everyone” at one point. The phrase “Working Together” really stood out, because that’s a significant part of what makes Postgres so great. It resonated acutely with the impetus behind the Unconference track that remains a regular fixture at PGCon.

Then Simon dropped another bomb that shocked everyone in attendance. Everywhere I looked were visions of absolute disbelief and awe. He suggested that somewhere in the twisting catacombs of 2ndQuadrant was a beta version of WordPress running on Postgres instead of MySQL. Considering its roots in PHP before there were readily available database abstraction layers, and the fact many extensions call the MySQL functions outright, this is clearly an impossibility. This is clearly a cruel hoax, and I demand proof that this particular unicorn exists for reasons completely unrelated to exorcising MySQL from my own website.

Perhaps primarily accidental, but I seem to be a regular fixture at Postgres Open since it began in 2011. Most of this is because it started in Chicago, which made it extremely convenient to attend for the first four years of its existence. Now I’m just operating on pure momentum. So what did I learn this time around? I’ve been doing this too long to learn much regarding Postgres, and short of diving into writing extensions in C, that isn’t going to change. But maybe I can offer a useful perspective on the conference itself and the talks I visited.

PGLogical – the logical replication for PostgreSQL

First on the agenda was a talk on PGLogical by Petr Jelinek, yet another 2ndQuadrant contribution. It works by interpreting the Postgres WAL stream and decoding activity into generic SQL statements that can be replayed on a remote databas

Posted by Julien Rouhaud in Dalibo on 2016-09-16 at 12:03
VCF, the Variant Call Format, is a clever idea. Instead of storing all genetic information of a sampled genome, it only stores the delta against some reference genome. This squeezes a lot of redundancy out of the data and thus occupies a lot less storage space.

Unfortunately, VCF is also a unwieldy format. Only a part is fixed, with metadata in the header describing fields in the body which in turn describe the actual data format of the samples.

This makes VCF especially hard to grasp for systems like Hadoop that work on chunked files spread over many compute nodes. With VCF, every chunk has to carry a copy of the header to make sense of the data it carries.

Formats like ADAM are under development that tackle this (any many more) problems, but when I was given the task to make VCF files accessible to a database last year, I took the direct route:

Since there have been already quite powerful tools and libraries to work with VCF files around for some time, I used Multicorn and PyVCF and wrote a foreign data wrapper for PostgreSQL that understands the VCF format.

After it was finished, I realized that it had been done before...

However, my implementation is different in some parts. Most notably it abstracts more from the actual storage of the VCF files and it works with vanilla PyVCF and does not need any special modifications.

A few days ago, I was granted permission to release the code into the wild by the people who paid for it in the first place, so now you have the freedom of choice. ;-)

Ladies and Gentlemen, please take a look at just another multicorn based foreign data wrapper for VCF files for PostgreSQL, the one and only truly integrarelational DBMS.
Posted by Denish Patel on 2016-09-15 at 15:36

Today, I presented at PgOpen 2016 (Postgres conference) in Dallas, TX.

Here is the slide deck :

Posted by US PostgreSQL Association on 2016-09-14 at 20:13

The CFP for PGConf US 2017 is now open. You can read all about it and submit here. This coming conference is set to be the largest we have had. As we continue to grow we continue to add options for our attendees including a Job Fair and an additional day.

Today many organizations struggle to keep up with their database
requirements, for example, to:

  • store and analyze high-velocity and rapidly-growing data such as logs,
    package tracking events, sensor readings and machine-generated
  • ensure 24/7 availability of customer-facing websites, services and
    apps even when a subset of their data centers, servers or data are
  • support fast-growing internet-scale businesses by adding relatively
    inexpensive data servers rather than requiring million-dollar
    investments in high-end servers and storage.

Our industry is increasingly producing and exploring various Open Source
systems to provide solutions for requirements like these. However, many
such systems intending to offer degrees of Scalability and
Availability choose architectures that impose inherent limitations.

Many of these architectures have a node or a collection of nodes that
are treated as special. Think Master-Slave, NameNode-DataNode and so
forth. While each of these models serves a different set of use cases,
a common attribute across them is that they have a SPOF (Single Point
of Failure). Even when they offer some level of multiplicity to deal
with the SPOF issue, the problems continue: these special nodes can
become bottlenecks for the operations that only they are allowed to
carry out. Capacity Planning, Backup and Recovery, Fault
, Disaster Recovery and similar areas of operation all
become more complex. Moreover, the non-special nodes are typically
underutilized or entirely passive. Many of these architectures make it
virtually impossible to achieve peta-scale, multi-thousand-node clusters
with linear growth and failure tolerance atop today’s
dynamically-orchestrated infrastructure.

Enter Cassandra – A peer-to-peer, multi-datacenter active-active,
peta-scale, fault-tolerant distributed database system. Nowadays, it is
hard not to have heard of this excellent system as its user-base
continues to grow. The key to realize is that its peer-to-peer
architecture is the basis for its SPOF


When performing point-in-time recovery, Postgres offers a variety of ways to stop recovery, or WAL replay at a given point using different ways of estimating the stop point:

  • Timestamp, with recovery_target_time.
  • Name, with recovery_target_name, which is a recovery target defined by a user with pg_create_restore_point().
  • XID, with recovery_target_xid, a transaction ID that will make recovery go up to the point where the transaction where this ID was assigned has been committed.
  • ‘immediate’, which is a special case using recovery_target = ‘immediate’. Using that the recovery will stop when a consistent state has been reached by the server.

The replay position can as well be influenced by recovery_target_inclusive, which is true by default (list of recovery parameters is here).

Today’s post is about a new recovery target type, that has been added in Postgres 10 by this commit:

commit: 35250b6ad7a8ece5cfe54c0316c180df19f36c13
author: Simon Riggs <>
date: Sat, 3 Sep 2016 17:48:01 +0100
New recovery target recovery_target_lsn

Michael Paquier

An LSN (logical sequence number) is a position in a WAL stream, in short a set of locations to know where a record is inserted, like ‘0/7000290’. So with this new parameter what one is able to do is to set at a record-level up to where recovery has to run. This is really helpful in many cases, but the most common one is where for example WAL has been corrupted up to a given record and a user would like to replay data as much as possible. With this parameter there is no need to do a deep analysis of the WAL segments to look at which transaction ID or time the target needs to be set: just setting it to a record is fine. And one can even look at such a LSN position via the SQL interface with for example pg_current_xlog_location() that would give the current LSN position that a server is using.

Let’s take a small example with this cluster from which a base backup has already been taken (important to be able to replay forward):

=# CREATE TABLE data_to_reco


Since PostgreSQL 9.5, pg_rewind has been able to make a former master follow up a promoted standby although, in the meantime, it proceeded with its own timeline. Consider, for instance, the case of a switchover that didn’t work properly.

Have you ever experienced a “split brain” during a switchover operation? You know, when the goal is to switch the roles of the master and the standby, but instead you end up with two independent masters – each one with its own timeline? For PostgreSQL DBAs in HA contexts, this where pg_rewind comes in handy!

Until PostgreSQL 9.5, there was only one solution to this problem: re-synchronise the PGDATA of the downgraded master with a new base backup and add it to the HA cluster as a new standby node. Generally, this is not a problem, unless your database is relatively big. For instance, if there are hundreds of GBs, it is not easy to do this type of operation while trying to keep downtime as low as possible.

Restoring a database to a previous point in time can create some complexities that can be handled in different ways. For an in depth explanation of the evolution and components of fault tolerance in PostgreSQL I suggest you check out Gulcin’s series Evolution of Fault Tolerance in PostgreSQL, including a Time Travel feature that mentions the use of pg_rewind.

how pg_rewind works

pg_rewind scans the “old” master’s PGDATA folder, identifies the data blocks changed during the switch to the new timeline, then copies only these blocks from the promoted slave. This is then used to replace the changes. As a “collateral” effect, the configuration files are also copied from the promoted standby (so the DBA has to be careful to adapt them to the new role of the node in the HA cluster). However, this allows the prevention of re-syncing PGDATA completely.

To do this, it is necessary to have all the WALs produced in the final moments before the switchover from the old master. Changes are identified by comparing the status of the data blocks present in the PGDATA with the changes logged i


In any as-a-service business, which bills monthly, a key metric you track is MRR or monthly recurring revenue. It’s good practice to have this on a dashboard and check it on a monthly, weekly, or even daily basis. If you have a simple pricing model that has set monthly plans, say like Netflix this is pretty easy to calculate:

SELECT sum(user_subscriptions.price)
FROM user_subscriptions
WHERE users_subscriptions.ended_at IS null;

The above will give you the run rate as of this exact moment in time. It gets a little more complicated to do this in a single query that gives it to you over time. First, you’ll want to check the run rate for each day, then you’ll want to check what plan they were on at that time. Whether you’re Netflix or a smaller site users will upgrade/downgrade–especially in the case of a usage based service such as Citus Cloud.

We track run rate for ourselves as well as month over month growth all in the same query. While we’ve written more complex SQL before we thought this might be an interesting example of a number of things you can do with SQL and helpful to publicly share for others to benefit from.

First the schema. We have a users and orgs table which overall are pretty un-interesting. The reason they’re in the query is we want to filter out internal usage, as we heavily use Citus Cloud to run Citus Cloud and don’t want that skewing numbers. The interesting table is billings:

# \d billings
                      Table "public.billings"
     Column      |   Type    |              Modifiers
 id              | uuid      | not null default uuid_generate_v4()
 formation_id    | uuid      | not null
 org_id          | uuid      | not null
 validity_period | tstzrange | not null
 price_per_month | integer   | not null

You’ll notice the validity_period has a tstzrange type. This is a range type which allows you to have a from and a to range value of timestamps directly in a single column.

So on to the query. First I’m going t


SQL can be a powerful language for reporting. Whether you’re just exploring some data, or generating reports that show month over month revenue growth it’s the lingua franca for data analysis. But, your data isn’t always in a SQL database, even then if you’re using Postgres you can still likely use SQL to analyze, query, even joing with that data. Foreign data wrappers have been around for years in Postgres, but are continuing to mature and be a great option for joining disparate systems.

Overview of foreign data wrappers

If you’re unfamiliar, foreign data wrappers, or FDW, allow you to connect from within Postgres to a remote system. Then you can query them from directly within Postgres. While there is an official Postgres FDW that ships with Postgres itself, that allows you to connect from one Postgres DB to another, there’s also a broad community of others.

At the core of it Postgres provides certain APIs under the covers which each FDW extension can implement. This can include the ability to map SQL to whatever makes sense for a given system, push down various operators like where clauses, and as of Postgres 9.3 can even write data.

To setup a FDW you first would install the extension, then provide the connection to the remote system, setup your schema/tables, and then you’re off to the races–or well ready to query. If you’ve got more than 2-3 databases or systems in your infrastructure, you’ll often benefit from FDWs as opposed to introducing a heavyweight ETL pipeline. Don’t mistake FDWs as the most performant method for joining data, but they are often the developer time efficient means of joining these data sets.

Let’s look at just a few of the more popular and interesting ones.

Postgres FDW

The Postgres one is the easiest to get started with. First you’ll just enable it with CREATE EXTENSION, then you’ll setup your remote server:

CREATE EXTENSION postgres_fdw;

 OPTIONS (host 'foo', dbname 'core_db', port '5432');

Then you’ll create the use

Posted by Denish Patel on 2016-09-09 at 22:50

A couple of weeks ago, I had to deal with corrupted Postgres database cluster. At the end, we couldn’t able to recover some of the data but managed to recover most part of it. Having experience working with dozens of database systems, I’m pleasantly surprised to experience resiliency of Postgres database.

Kudos to Postgres Development team for building the most resilience database in the world :)

Here is my Postgres database recovery story :)


I'm posting the steps carried out during the recovery process for information purpose only. This post doesn't provide any guarantee that it will work for your use-case and/or environment.

Note: The actual database name has been replaced with “dbname”  and actual table names with “tablename”.

One of the Postgres DB cluster database experienced disk level corruption thus we were hitting this error:

postgres=# \c dbname
FATAL: could not read block 0 in file "base/16389/11930": Input/output error

Uh oh?? Really bad, isn’t it? Fortunately, it wasn’t mission critical system so we managed to take extended  outage and work on partial recovery process because we didn’t want to loose all the data!!

Once we received the complaint, we immediately backed up corrupted  database and created recovery cluster to bring up on different  server so we can go through recovery efforts!!

Trial 1:

As many of you know, the first option is to bring up recovery database cluster with  zero_damaged_pages=on   . You can set the value in Postgres config file and try to  reindex system catalog:

reindexdb -p 5433 --system dbname
reindexdb: could not connect to database dbname: FATAL: index "pg_index_indexrelid_index" contains unexpected zero page at block 0 
HINT:  Please REINDEX it.

Doh! Still, we could still not be able to connect to database !!

Trial 2:

If you aren’t aware, you should note down that there is a way to ignore indexes at system level. We started up recovery cluster with ignore_system_indexes=true setting:

pg_ctl -D /data -o '-c ignore_system_indexes=true'

Sometimes in your PostgreSQL code you want to read an environment variable – whether to get the input file location for a data load, or check that the postgres user has the right path set, or verify that the TDSDUMP environment variable is set when configuring the tds_fdw to connect to SQL Server. Here’s a plpython based function that will do just that (or the gist):

create extension plpythonu;
create type py_environ_type as (name text, value text);

create or replace function py_environ(name varchar DEFAULT NULL)
  returns setof py_environ_type
as $$
  import os
  aev = []
  if name is None:
    for k, v in os.environ.items():
      aev.append((k, v))
    v = os.getenv(name)
    if v is not None:
  return aev;
$$ language plpythonu;

You can use it to get a list of all environment variables:

select * from py_environ();

Or to get just a single variable:

select value from py_environ('PATH');

If you want other filtering, just add a where clause:

select name, value from py_environ() where name like 'TDS%';

Inspired by an idea from Scott Mead and the pal_environ() function, tested to work on Linux, OS/X, and Windows.

ITPUG is proud to announce the tenth edition of the italian PostgreSQL conference, namely PGDay.IT 2016, that will take place in Prato, Tuscany, on December 13th.
The organizing committee is working to provide another great edition of the famous italian day dedicated to PostgreSQL.
Very soon the Call For Papers will start (see for more details).
In the meantime...stay tuned!

Posted by Shaun M. Thomas on 2016-09-09 at 16:32

Say hi to Princess Kittybutt. She’ll be our mascot (and subject) for today. We’ll get to her in a minute.

Princess Kittybutt

When the only tool you have is a hammer, everything becomes a nail, right? With Postgres becoming more of an environment than simply a database engine, this colloquialism is starting to resemble reality. Of course, that’s not necessarily a bad thing! As Postgres accumulates copious and varied extensions, its role as an adaptive middleware solidifies. When Postgres can do something for itself, the need for sprawling harnesses of support scripts steadily decreases.

Yet there’s also the accusation that’s often levied at Emacs regarding its role as a UNIX kitchen sink. Extensions aid in preventing Postgres from descending into bloated madness. So for now at least, we’re safe to contemplate the tools available before they gain sentience and destroy humanity. For instance, can we build a game using only Postgres?

Not some contrived simulation where every interactive function calls a maintenance routine to advance the game world and respond to input. Can Postgres be the game engine, with an independently operating universe that churns along regardless of a player’s presence? Can we do that without a mature LISTEN / NOTIFY external resource that does all of the heavy lifting in some other language? Do we have all of the parts?

A new extension named pg_cron recently hit the web which makes this much easier. Postgres has had job scheduling for a while thanks to pgAgent, but it always ran as an external daemon which just happened to use Postgres tables to store schedules. Now we have something a bit more native that leverages the Postgres 9.3+ background worker functionality to integrate scheduling more directly. True to its name, Postgres now has a cron service.

Does it work?

CREATE TABLE cron_test (message TEXT);
SELECT cron.schedule(
  '* * * * *',
  $$INSERT INTO cron_test VALUES ('Hello World')$$
SELECT * FROM cron.job;
-[ RECORD 1 ]--------------------------------------
Posted by gabrielle roth on 2016-09-09 at 01:00

When: 6-8pm Thursday September 15, 2016
Where: iovation
Who: Toby Fee
What: Connecting Your Pg DB to Node Apps

Toby will tell us about his experiences connecting Postgres DBs to Node apps using the Node libraries Bookshelf and Knex. He ran into some of limitations on the node side, and wrote some methods for allowing full query flexibility while still remaining injection-secure.

Toby is currently a developer at NWEA.

If you have a job posting or event you would like me to announce at the meeting, please send it along. The deadline for inclusion is 5pm the day before the meeting.

Our meeting will be held at iovation, on the 32nd floor of the US Bancorp Tower at 111 SW 5th (5th & Oak). It’s right on the Green & Yellow Max lines. Underground bike parking is available in the parking garage; outdoors all around the block in the usual spots. No bikes in the office, sorry!

iovation provides us a light dinner (usually sandwiches or pizza).

Elevators open at 5:45 and building security closes access to the floor at 6:30.

See you there!