The world's most advanced open source database
Top posters
Number of posts in the past month
Top teams
Number of posts in the past month
  • Policy for being listed on Planet PostgreSQL.
  • Add your blog to Planet PostgreSQL.
  • List of all subscribed blogs.
  • Manage your registration.
  • Get in touch with the Planet PostgreSQL administrators at planet at

In my last blog, we looked at the benchmark results from bulk load test for a Postgres-XL database cluster. Using a 16-datanode, 2-coordinator cluster, running on EC2 instances, we could easily clock 9M rows/sec or 3TB/hr of ingestion rate. That’s a significant number in itself. In this blog, we’ll see if the ingestion rate is scalable in Postgres-XL. In particular, we’ll try to answer if adding more nodes to the cluster can result in a linear increase in performance.

Let’s use the same line item table from the TPC-H benchmark for these tests. We’ll increase the cluster size from 16 datanodes to 20 datanodes and then further to 24 datanodes. We’ll also repeat the tests with 1, 2 and 3 coordinators respectively. For all these tests, we are using i2.xlarge EC2 instance for a datanode and c3.8xlarge EC2 instance for a coordinator. Also, we are keeping the number of concurrent COPY processes to 24 for all these tests.

Sustained Bulk-load Rate

To measure if the bulk-load rate can be sustained over longer period of time and when the data significantly overflows the combined RAM on all machines, let’s measure time to load 3 billion, 6 billion and 9 billion rows on a 16-datanode cluster. The following chart shows the data load time and a steady ingestion rate even for a very large table.

screen-shot-2016-10-19-at-3-10-26-pmAt the end of loading 9 billion rows, the table is close to 1TB in size, but that does not seem to have any negative impact on the load rate.

Scaling with Coordinators

Let’s first see if adding more coordinators to the cluster has any positive impact on the load time. To measure the impact of additional coordinators, we keep the number of datanodes constant at 16 and vary coordinators from 1 to 3.

So there is a significant improvement in the ingestion rate when 1 more coordinator is added to the cluster. The rate remains almost the same for 3 coordinators, suggesting that the bottleneck probably shifts to some other place and hence adding the third coordinator does not help much.

Scaling with Datanodes

This is the most interesting bit


Postgres 9.6 has added a really cool infrastructure called wait events. This allows a developer to define in the code being run a wait point that gets reported to the system statistics via it PGPROC entry. In short, a custom wait point in the code gets reported and is then visible in system catalogs. In this case this is pg_stat_activity via the columns wait_event_type and wait_event. 9.6 infrastructure shows up information for backend processes holding lightweight locks, heavyweight locks, and buffer pins. While getting a look at this infrastructure, I got surprised by the fact that nothing was reported for latches, more or less code paths calling WaitLatch to wait for a timeout, a postmaster death, a socket event or just for the latch to be set. As 9.6 was close to shipping when I bumped into the limitation, nothing much could be done for it. So this got delayed to Postgres 10, and has been committed recently with the following change:

commit: 6f3bd98ebfc008cbd676da777bb0b2376c4c4bfa
author: Robert Haas <>
date: Tue, 4 Oct 2016 11:01:42 -0400
Extend framework from commit 53be0b1ad to report latch waits.

WaitLatch, WaitLatchOrSocket, and WaitEventSetWait now taken an
additional wait_event_info parameter; legal values are defined in
pgstat.h.  This makes it possible to uniquely identify every point in
the core code where we are waiting for a latch; extensions can pass

Because latches were the major wait primitive not previously covered
by this patch, it is now possible to see information in
pg_stat_activity on a large number of important wait events not
previously addressed, such as ClientRead, ClientWrite, and SyncRep.

Unfortunately, many of the wait events added by this patch will fail
to appear in pg_stat_activity because they're only used in background
processes which don't currently appear in pg_stat_activity.  We should
fix this either by creating a separate view for such information, or
else by deciding to include them in pg_stat_activity after all.

Michael Paquier
It's almost time for the pgconf 2016. This year the conference is in Tallinn the capital of Estonia. The conference is packed with super interesting talks which I really look forward to listen.

I'll also present the talk life on a rollercoaster which tells the story of the last 4 years  of a DBA(just guess who's he :P)  dealing with large large PostgreSQL installations.

I decided to make the talk with a narrative, and hopefully, entertaining form, in order to avoid to bore the audience to death.

I will also submit a lightning talk for the pg_chameleon tool I'm developing.

I'm pretty excited to meet old friends from the PostgreSQL community.

Well, that's all for now.

See you in Tallinn very soon :)
Posted by Shaun M. Thomas on 2016-10-21 at 16:48

Partitioning tables in Postgres can be an extremely risky endeavor. Unfortunately on many larger systems, it’s also essentially a requirement; the maximum size of a Postgres table is 32TB. This isn’t just because converting an existing table to a series of partitions is expensive or time consuming. We must consider how the query planner will react to the partitioned version of a table. There’s also the very real risk we will (or already have) implement flaws in the trigger or constraint logic.

Tools like pg_partman or pg_pathman are great for new systems, but existing deployments require retrofitting. So what happens if we have a large warehouse that’s already using a slightly flawed partition engine? And what other concerns remain even after achieving a pristine setup?

Let’s start with a relatively large sample table with three partitions. Each partition will consist of roughly 26M rows representing sensor readings every 100ms from 1000 sensors.

CREATE TABLE sensor_log (
  sensor_id     INT NOT NULL,
  location      VARCHAR NOT NULL,
  reading       INT NOT NULL,
  reading_date  TIMESTAMPTZ NOT NULL DEFAULT now()
CREATE TABLE sensor_log_201608 () INHERITS (sensor_log);
CREATE TABLE sensor_log_201609 () INHERITS (sensor_log);
CREATE TABLE sensor_log_201610 () INHERITS (sensor_log);
INSERT INTO sensor_log_201608 (sensor_id, location, reading, reading_date)
SELECT % 1000, (random() * 100)::INT, % 100,
       '2016-09-01'::TIMESTAMPTZ - ( * '100ms'::INTERVAL)
  FROM generate_series(1, 36000*24*31) s(id);
INSERT INTO sensor_log_201609 (sensor_id, location, reading, reading_date)
SELECT % 1000, (random() * 100)::INT, % 100,
       '2016-10-01'::TIMESTAMPTZ - ( * '100ms'::INTERVAL)
  FROM generate_series(1, 36000*24*30) s(id);
INSERT INTO sensor_log_201610 (sensor_id, location, reading, reading_date)
SELECT % 1000, (random() * 100)::INT, % 100,
       '2016-11-01'::TIMESTAMPTZ - ( * '100ms'::INTERVAL)
  FROM generate_series(1, 36000*24*31) s(id);

PostgreSQL 9.6 is now out and so is an updated version of pglogical that works with it.

For quick guide on how to upgrade the database with pglogical you can check my post which announced 9.6beta support.

The main change besides the support for 9.6.x release of PostgreSQL is in the way we handle the output plugin and apply plugin. They have now been merged into single code base and single package so that there is no need to track the pglogical_output separately for the users and developers alike.

We fixed several bugs this time and also made upgrades from 9.4 much easier.

Here is a more detailed list of changes:

  • keepalive is tuned to much smaller values by default so that pglogical will notice network issues earlier
  • better compatibility when upgrading from PostgreSQL 9.4 – it’s now safe to use the internal schema synchronization for that
  • improved conflict handling
  • improved support for replica triggers
  • several issues with initial synchronizations were fixed
  • it’s easier to monitor the initial synchronization in pg_stat_replication
  • as usual, there are several documentation improvements
Posted by David Rader in OpenSCG on 2016-10-19 at 11:16

Relocatable PostgreSQL RPMs – You asked and we delivered!

In our years of dealing with PostgreSQL users, one of the main gripes they have with PostgreSQL is not a missing feature or capability but rather it is the fact that the provided rpms are non relocatable rpms. The reasons for wanting relocatable rpms are many including the company they work for has specific standards or they are installing in a lab with other installations and they want to keep their install separate or they just simple want to place things in a more logical/accessible place.

Before now, the only ways available to users to do this were to use the graphical installer or build Postgres from source.  If you are setting up a Devops environment on Linux or the cloud, using a graphical installer doesn’t work well and, let’s be honest, who wants to be pestered  with advertising while doing an installation? Building Postgres from source is definitely another option but most DBAs don’t have the time to deal with working through that process and handling any errors that might occur.  They could hand off the task to their dev or sys admin team but then they will need to wait for that to happen and/or lose control of the way it is built. And besides, you want a consistent binary build on each server in your organization, not a compiled from source version that uses different versions of libraries or config settings on different servers over time.

So, in our Postgres by BigSQL distribution, we decided to make it easy for users to install Postgres wherever they want simply by using the 100% open source rpm that we build and make available to everyone. Our rpm does not modify system libraries and by default installs to a user friendly location:


But you have total control - you simply use the —prefix <path> option to tell the package manager to install the software where you want.  For example:
# rpm -ivh --prefix /home/postgres/PG96 postgresql-9.6.0-1-x64-bigsql.rpm
warning: postgresql-9.6.0-1-x64-bigsql.rpm: Header V3 DSA/SHA1 Signa

Posted by US PostgreSQL Association on 2016-10-19 at 03:45

Hello, everyone! I would like to announce the opening of nominations and the voting procedure for the 2016 PgUS board election.

Our nomination period is open until October 25th; candidates may be nominated or self nominate by emailing the corporation secretary (me) at:
I will be posting platforms/bios on the PgUS website by October 26. PgUS members may vote via email to:

read more

PGConf Silicon Valley, November 14-16, is barely under a month away and it’s shaping up to be another great conference this year. Last year we had over 300 attendees to make it one of the larger Postgres conferences around, and this year we’re expecting even more. While we’ve had the schedule online for a few months now, we thought it might be interesting to take a step back and see if any strong themes emerged. Perhaps unsurprisingly, there were a few strong themes that start to tell a bit of a story of some of Postgres recent momentum as well as future direction. Here’s a look at a few of the strong themes that are highlighted at the conference:

NoSQL and Postgres

Postgres has had some form or another of NoSQL support stretching back for years to hstore or even the XML datatype. Then came JSONB, which is extremely flexible and can also be performant. Given all the excitement around it, it’s no suprise there’s a lot of great talks, but all are not only touting the benefits of JSONB. Some talks will take a hard look at trade-offs as well as benefits of JSONB vs. other systems for similar problems. Just a few that will touch on Postgres and when it makes sense for NoSQL:


And while Postgres continues with each major release to make a number of improvements, there are a lot of advancements in the form of extensions. Postgres extension APIs make it possible to add lots of rich functionality to Postgres, without having to have code committed to core. This year we have a strong line-up of talks that surface some of the power of these extensions, which should only continue to grow as an ecosystem over time:

Postgres internals

Of course no Postgres conference would be complete without a number of committers talking about all

Posted by Craig Ringer in 2ndQuadrant on 2016-10-18 at 03:08

BDR is both a patch to PostgreSQL core and an extension on top of PostgreSQL core. How did that come about, and what’s it’s future?

Development of BDR was initiated around the time PostgreSQL 9.2 was in development. Arguably earlier if you count things like the extension mechanism. The goal of BDR is, and has always been, to add necessary features to core PostgreSQL to perform asynchronous loosely-coupled multi-master logical replication.

BDR improvements to core PostgreSQL

Since it’s such a large set of changes it was necessary to structure development as a series of discrete features. A natural dividing line was “things that require changes to the core PostgreSQL code” vs “things that can be done in an extension”. So the code was structured accordingly, making BDR a set of patches to core plus an extension that uses the added features to implement a multimaster node. The required features were developed in a branch of PostgreSQL and then extracted and submitted one by one to PostgreSQL core. BDR is, and has always been, much too big to simply commit to PostgreSQL in one go.


Getting the BDR extension running on unmodified PostgreSQL

BDR 1.0 is still in two parts – a set of patches to add necessary features to core PostgreSQL and an extension that uses the features to implement multimaster. The original goal was to have BDR running on stock 9.4 without patches, but it just took too long to get all the required patches through the community process and into core. This isn’t entirely a bad thing, since it’s led to the eventual features being of higher quality when they were committed to 9.4, 9.5 and 9.6.

Now, as of PostgreSQL 9.6, all of the patches required to make it possible to implement BDR-style replication have been included in PostgreSQL core. As mentioned in an earlier post, BDR on 9.6 should run as an extension on an unmodified PostgreSQL. The implementation of BDR itself is still an outside extension, so this doesn’t mean “BDR is in 9.6″, but it’s big progress.

Other enhancements related to BDR




This is the third and last part of blog articles dedicated to pg_rewind. In the two previous articles we have seen how pg_rewind is useful to fix split-brain events due to mistakes in the switchover procedures, avoiding the need of new base backups. We have also seen that this is true for simple replication clusters, where more standby nodes are involved. In this case, just two nodes can be fixed, and the other ones need a new base backup to be re-synchronised. pg_rewind for PostgreSQL 9.6 is now able to work with complex replication clusters.

Indeed, pg_rewind has been extended so it can view the timeline history graph of an entire HA cluster, like the one mentioned in my previous blog article. It is able to find out the most recent, shared point in the timeline history between a target and a source node, and then it performs a rewind operation from this point – not simply from the last checkpoint of the master before the standby promotion, as in PostgreSQL 9.5.

So, let’s consider again the same cluster, but now based on PostgreSQL 9.6:

~$ # Set PATH variable
~$ export PATH=/usr/pgsql-9.6/bin:${PATH}
~$ # This is the directory where we will be working on
~$ # Feel free to change it and the rest of the script
~$ # will adapt itself
~$ WORKDIR=/var/lib/pgsql/9.6
~$ # Environment variables for PGDATA and archive directories
~$ # Create the archive directory
~$ mkdir -p ${ARCHIVE_DIR}
~$ # Create the HA cluster
~$ initdb --data-checksums -D ${WORKDIR}/master
~$ cat >> ${MASTER_PGDATA}/postgresql.conf <<EOF
~$ archive_command = 'cp %p ${ARCHIVE_DIR}/%f'
~$ archive_mode = on
~$ wal_level = hot_standby
~$ max_wal_senders = 10
~$ min_wal_size = '32MB'
~$ max_wal_size = '32MB'
~$ hot_standby = on
~$ wal_log_hints = on
~$ EOF
~$ cat >> ${MASTER_PGDATA}/pg_hba.conf <<EOF
~$ # Trust local access for replication


In the last blog post, pgAudit was configured to audit entire classes of statements (session auditing). Session auditing works great, but it can generate a lot of logs and not every administrator needs all that information. In this blog post pgAudit will be configured to use an auditing role to watch only specific objects.

Ehi, the CFP for the PGDay.IT 2016 (Italian tenth edition of the PGDay) has been extended until next Saturday 22 October at 23:59 (Rome).
Don't miss the opportunity to be a speaker at one of the most well known PGDay!
Posted by Shaun M. Thomas on 2016-10-14 at 18:29

One of the things Postgres has been “missing” for a while is logical replication based on activity replay. Until fairly recently, in order to replicate single tables from one database to another, we had to encumber the table with performance-robbing triggers coupled to a third party daemon to manage transport. Those days might finally be behind us thanks to pglogical.

But is it easy to use? Let’s give it a try on our trusty sensor_log table. First, we’ll need to create everything on the donor node:

CREATE TABLE sensor_log (
  location      VARCHAR NOT NULL,
  reading       BIGINT NOT NULL,
  reading_date  TIMESTAMP NOT NULL
INSERT INTO sensor_log (location, reading, reading_date)
SELECT % 1000, % 100,
       CURRENT_DATE - ( || 's')::INTERVAL
  FROM generate_series(1, 1000000) s(id);
SELECT pglogical.create_node(
    node_name := 'prod_sensors',
    dsn := 'host=localhost port=5432 dbname=postgres'
SELECT pglogical.create_replication_set(
    set_name := 'logging',
    replicate_insert := TRUE, replicate_update := FALSE,
    replicate_delete := FALSE, replicate_truncate := FALSE
SELECT pglogical.replication_set_add_table(
    set_name := 'logging', relation := 'sensor_log', 
    synchronize_data := TRUE

After all of that, we have a replication set containing a single table representing one million rows of sensor data. At this point, any number of subscribers could connect to the replication set and request its contents. After that, all inserts would also be replayed on the subscriber once they are detected in the Postgres transaction log.

In a second Postgres instance, we would do this to “consume” the table:

CREATE TABLE sensor_log (
  id            INT PRIMARY KEY NOT NULL,
  location      VARCHAR NOT NULL,
  reading       BIGINT NOT NULL,
  reading_date  TIMESTAMP NOT NULL
SELECT pglogical.create_node(
    node_name := 'sensor_warehouse',
    dsn := 'host

Comparing the schemas of two or more different Postgres databases is a common task, but can be tricky when those databases are running different versions of Postgres. The quick and canonical way to compare schemas is by using the exact same pg_dump program to query each database via the --schema-only option. This works great, but there are some gotchas, especially when dumping database views.


First some background as to how this issue was discovered. We have a client that is in the process of upgrading from Postgres 9.2 to the Postgres 9.6 (the latest version as of this writing). Using the pg_upgrade program was not an option, because not only are data checksums going to be enabled, but the encoding is being moved to UTF-8. A number of factors, especially the UTF-8 change, meant that the typical upgrade process of pg_dump old_database | psql new_database was not possible. Thus, we have a very custom program that carefully migrates pieces over, performing some transformations along the way.


As a final sanity check, we wanted to make sure the final schema for the upgraded 9.6 database was as identical as possible to the current production 9.2 database schema. When comparing the pg_dump outputs, we quickly encountered a problem with the way that views were represented. Version 9.2 uses a very bare-bones, single-line output, while 9.6 uses a multi-line pretty printed version. Needless to say, this meant that none of the views matched when trying to diff the pg_dump outputs.

The problem stems from the system function pg_get_viewdef(), which is used by pg_dump to give a human-readable and Postgres-parseable version of the view. To demonstrate the problem and the solution, let's create a simple view on a 9.2 and a 9.6 database, then compare the differences via pg_dump:

$ psql -p 5920 vtest -c \
'create view gregtest as select count(*) from pg_class where reltuples = 0'
$ psql -p 5960 vtest -c \
'create view gregtest as select count(*) from pg_class whe
Posted by David Rader in OpenSCG on 2016-10-14 at 14:48

Holy Easy PostgreSQL deployment!

In case you missed it, the BigSQL team released an awesome package manager for installing and configuring PostgreSQL and many related, useful components. The package manager can be found here:

Playfully named pgc, for ‘pretty good command line’, pgc is a utility similar to yum or apt-get that allows you to install, configure, update and manage Postgres related components including foreign data wrappers, stored procedure languages, connectors, devops tools, HA tools and monitoring tools. Common uses:

  • Provision Postgres (9.2 through 9.6, including multiple versions on same server)
  • Installing pgBouncer, Backrest, and other community projects
  • Scripting configurations in chef or other devops tools

PGC runs on Linux, Windows and OS X and supports the same exact cli so it is an ideal provisioning/management tool for multi OS environments.

PGC not only allows you to get and install these components but you can use pgc to update each component as new updates become available.

PGC’s syntax is very intuitive and easy to use:

Use the help command to see the possible commands:

$ ./pgc help
Usage: pgc command [component1 component2 ...]
Common commands:
help - Show this help file
info - Display OS or component information
list - Display installed & available components
status - Display status of installed server components
start - Start server components
stop - Stop server components
enable - Enable a server component
disable - Disable a server component from starting automatically
update - Retrieve new list of available components
upgrade - Upgrade installed components to newer (compatible) versions
install - Install a component
remove - Remove a component
Advanced commands:
init - Initialize a component
download - Download a component
config - Configure a component
clean - Remove component files from download cache
enable - Enable a server component
disable - Disable a server component from starting automatically

Example workflow:

1) Checking to


We are faced with this question: “What’s the ingestion rate of Postgres-XL?”, and I realised I don’t have a very good answer to that. Since recently we made some good improvements in this area, I was curious to know too. Well, I decided to benchmark.

Hardware and Software

For the tests, I used a Postgres-XL cluster running on EC2 instances. Since COPY has to go through the coordinator, it seemed reasonable to use a compute-optimised c3.8xlarge instance for running coordinator. Similarly, for datanodes,  storage-optimised i2.xlarge instances are more appropriate. Both these instances have attached SSD disks, though i2.xlarge instance has more storage than the c3.8xlarge instance. 

So the next question was how to generate data for the test? We’d used TPC-H benchmark for previous performance tests and it had everything that I needed for this simple, focused test. I just picked up one table from the TPC-H benchmark, the lineitem table. It’s a very wide table, with many columns and a wide VARCHAR column. The ingestion benchmarks I’d seen earlier used a very narrow table, so for fairness I also stripped away a few columns. The resulting table was still much wider, each row being about 90 bytes, including tuple headers.

CREATE TABLE lineitem (
   l_orderkey BIGINT NOT NULL,
   l_partkey BIGINT NOT NULL,
   l_suppkey BIGINT NOT NULL,
   l_linenumber BIGINT NOT NULL,
   l_shipdate DATE NOT NULL,
   l_receiptdate DATE NOT NULL,
   l_shipinstruct CHAR(25) NOT NULL
) DISTRIBUTE BY HASH (l_orderkey);

TPC-H benchmark comes with a data generation tool called dbgen. You can generate data only for a specific table, of required size, and split it into multiple chunks if needed. So I created sample data of 3 billion rows, split into 24 chunks. Since the c3.8xlarge instance has two SSDs attached to it, the data files were split onto the disks to utilise full IO bandwidth at the coordinator side.

for i in `seq 1 12`; do
  dbgen -C 24 -S $i -T L -s 500 -o | cut -d '|' -f 1,2,3,4,11,13,14 > /data1/pgxl/lineitem_part_$i.csv&


Last week a new PostgreSQL major version with the number 9.6 was released and I already covered my favorite parts from the official release notes from DBA’s point of view in a blogpost here. Now I would like to look at the same list of changes from a different angle, transforming in my mind into […]

The post Best of PostgreSQL 9.6 for the Developer appeared first on Cybertec - The PostgreSQL Database Company.

Posted by Joe Nelson in CitusData on 2016-10-12 at 00:00

Everybody counts, but not always quickly. This article is a close look into how PostgreSQL optimizes counting. If you know the tricks there are ways to count rows orders of magnitude faster than you do already.

The problem is actually underdescribed – there are several variations of counting, each with its own methods. First think whether you need an exact count or whether an estimate suffices. Next, are you counting duplicates or just distinct values? Finally do you want a lump count of an entire table or will you want to count only those rows matching extra criteria?

We’ll analyze the techniques available for each situation and compare their speed and resource consumption. After learning about techniques for a single database we’ll use Citus to demonstrate how to parallelize counts in a distributed database.

Table of Contents

Preparing the DB for tests

The sections below use the following table for benchmarks.

-- create a million random numbers and strings
    (random()*1000000)::integer AS n,
    md5(random()::text) AS s

-- inform planner of big table size change

Counts With Duplicates

Exact Counts

Let’s begin at the beginning, exact counts allowing duplication over some or all of a table, good old count(*). Measuring the time to run this command provides a basis for evaluating the speed of other types of counting.

Pgbench provides a convenient way to run a query repeatedly and collect statistics about performance.

# Tests in this article were run against PostgreSQL 9.5.4

echo "SELECT coun
According to OWASP, SQL injection is the #1 attack vector against web applications. Well at least it was in 2013, but a 2016 survey is under way. But since it also was #1 in 2010, it might make the triple.

But even if injection does not work, an amazing number of database servers can be discovered by Shodan - and it's not only MongoDB...

Quite a few years ago, I read about the idea to detect leaking data by using sentinel values. The concept is, to put values into the data that are never queried and thus never emitted if a certain application uses the data as it should.
But if an attacker bypasses security and slurps all of the data, e.g. by SELECT * FROM sometable; the sentinels are emitted, detected, and countermeasures can be triggered.

As far as I can remember, the original idea was to put sensors on the network to detect the sentinels, but since then I always wondered if it could be applied to the PostgreSQL server itself.

Where to put the sensor then?

The application is certainly the wrong place, since the attacker might bypass it altogether. The driver isn't an option also, because the attacker might bring his own driver. Even a middleman like pgbouncer or pgpool-II could be bypassed, so the only way is to sink the detector as deep as possible into the heart of the server.

To cut a long story short, as proof-of-concept I chose to write a PostgreSQL module that hooks into ExecutorRun(): pg_sentinel.

Documentation comes with the module.

If you find things to improve, feel free to do so. After all, except for a minuscule contribution to pg_plan_filter, I had no previous experience programming modules and hooks and started eight hours ago from zero.
Posted by gabrielle roth on 2016-10-11 at 02:06
It’s been a few weeks now since we wrapped up the most recent PostgresOpen in Dallas. Marianne Bellotti’s keynote was a fantastic and very interesting look into the current state of government IT.  I hope we get to see her at more conferences! Other quick highlights: – 2ndQ has a running beta of WordPress on […]

For those folks on windows who want to do http gets and posts directly from you PostgreSQL server, we've made binaries for the http extension v1.1.2 release for PostgreSQL 9.6, 9.5, and 9.4 Windows 64-bit.

We haven't gotten around to building the 32-bit variants but plan to and will add them here once done.

These should work fine on both PostgreSQL EDB windows distributions as well as the BigSQL windows 64 distributions. Instructions in the ReadMe.txt for where to copy the files to for each distribution. Caution should be taken especially with BigSQL to not overwrite existing BigSQL files. Many of the files in this already exist in the BigSQL distribution since we both compile using the Mingw64 chain.

If you have PostGIS already installed, many of these files you will also already have since things like the libcurl and posix are also packaged with PostGIS.

Continue reading "http extension for windows 64-bit"


In a previous post, we've seen how PostGIS and Openstreetmap can be used to leverage geographical data.

A common task in GIS is land subdivision. This involves taking different shapes, polygons usually, and cutting them into smaller polygons. This is often used in urban planning.

This post will focus on describing an algorithm for partitioning a land polygon into parcels of a given area.

Algorithm description

We have a polygon P and the nearest road R. We get the bounding box B for P and all our searches for cutting points/lines will be confined to B. We compute the extreme points on the bounding box and we label them with the cardinal directions they represent.

Our goal is to cut a corner C (also called subdivision) from P such that it contains the nearest boundary point to a road. The cut will be done using two lines, one horizontal and one vertical. We want C to be of a given area A.

Now get the closest two extreme points on P to R, and depending on them, we decide which corner to cut.

We'll use sweeping-lines for this. Any sweeping-lines mentioned will be moving away from the corner (in other words, away from the closest road point).

In what follows, we assume the north-west corner needs to be cut.

We place an inset (a horizontal line) that will be located sqrt(A) to the south (relative to the north edge). The inset is positioned there because we anticipate the target area to have a rectangular shape.

If the area above the inset (the one we aim for) is larger than our target, we split the polygon, take the upper half and use another sweeping line that goes from west to east, to find another cutting line that allows us to get to target area.

If the area above the inset is insufficient (below the target area), we search for a better position for it, using binary search, along the north-south direction.

Additional details: The way the cut search works, using the inset, is such that we avoid getting thi


repmgr 3.2 has recently been released with a number of enhancements, particularly support for 2ndQuadrant’s Barman archive management server, additional cluster monitoring functionality and improvements to the standby cloning process.repmgr-multi-v2-illustration-var-3-2-12-by-shahzaman-19-july-2016_display

One aim of this release is to remove the requirement to set up passwordless SSH between servers, which means when using repmgr’s standard functionality to clone a standby, this is no longer a prerequisite. However, some advanced operations do require SSH access to be enabled.

Barman support

repmgr 3.2 can now clone a standby directly from the Barman backup and recovery manager. In particular it is now possible to clone a standby from a Barman archive, rather than directly from a running database server. This means the server is not subjected to the I/O load caused by a backup, and there’s no need to manage WAL retention on the database server other than to ensure WALs are archived to Barman. The standby can also use the Barman archive as a fallback source of WAL files in case streaming replication is interrupted.

To clone from barman, the following prequisites must be met:

  • the Barman server’s host name must be set in repmgr.conf with the parameter barman_server
  • the Barman server must be accessible by SSH
  • the name of the server configured in Barman is equal to the cluster_name setting in repmgr.conf
  • Barman must have at least one valid backup for this server

With barman_server set, repmgr will automatically attempt to clone from Barman. --without-barman overrides this behaviour and allows repmgr to clone directly from another PostgreSQL server. See the README for further details. This feature was implemented by my colleague Gianni Colli.

Cluster status monitoring

Previous repmgr versions provided the command repmgr cluster show, which displays a simple overview of the cluster status as seen from the node it was executed on. However, this only provides a partial picture of the cluster’s state – in particular it could indicate that a node appears to be down, even though the node is ru

Posted by Kaarel Moppel in Cybertec on 2016-10-10 at 07:35

Last week a new PostgreSQL major version with the number 9.6 was released! Announcement, release notes and the official “What’s new” overview can be found here, here and here – it’s highly recommended reading, so check them out. But as always, also a slew of blog-posts from exited members of the global Postgres community follows […]

The post Best of PostgreSQL 9.6 for the DBA appeared first on Cybertec - The PostgreSQL Database Company.

pg_ctl is a control utility on top of the postgres binary allowing to do many operations on a server. A couple of its subcommands offer a way to wait for the operation to finish before exiting pg_ctl, which is useful to ensure the state of the server before moving on with other things in the upper application layer. Sub-commands start, restart and register (Windows-only to register Postgres as a service in the in-core SCM), are not waited to finish by default, but this can be done by using the -w switch. Reversely, the sub-command stop implies to wait, and this can be disabled by using the switch -W. In Postgres 10, the sub-command promote has gained this option, per the following commit:

commit: e7010ce4794a4c12a6a8bfb0ca1de49b61046847
author: Peter Eisentraut <>
date: Wed, 21 Sep 2016 12:00:00 -0400
pg_ctl: Add wait option to promote action

When waiting is selected for the promote action, look into pg_control
until the state changes, then use the PQping-based waiting until the
server is reachable.

Before this commit, what pg_ctl did was to just write in PGDATA a file called “promote” to let the startup process know that it needs to exit recovery, to take a couple of end-of-recovery action, and to jump to a new timeline before switching the server to read-write mode. Once pg_ctl was done, it was necessary to have some additional logic for example querying pg_is_in_recovery() on the newly-promoted server to see if the server was ready for read-write queries or not. While not complicated, that is always an additional task to do for the server maintainer when doing a failover.

With pg_ctl promote -w, such additional logic becomes unnecessary and the whole process is actually more responsive. pg_ctl checks periodically for the control file of the server and see if it has been switched to an in-production state, then it considers that the promotion is completed, a check happening every second until a timeout ends (can be defined by the user). However, the control file is updated closer to the poin

Posted by Shaun M. Thomas on 2016-10-07 at 19:05

For all of those warehouse queries that never seem to complete before the heat death of the universe, there’s often a faster version. Sometimes this is due to a fundamental misunderstanding of how queries work, or how Postgres specifically functions. The trick is knowing when to back away slowly from an ugly but efficient query, and when to inject a flurry of predicates to fully illustrate the original intent of the query so the planner makes better decisions. When partitions are involved, this can be quite an adventure.

Sometimes optimizing partitioned plans is an art. Other times, things are a little easier to isolate. Imagine we have a sensor log partitioned by month, and our app is fairly new, so there are only three months available. Anything illustrated in this case would only be much worse as time progresses.

Just to make things simple, there will be no trigger logic or anything to actually move rows around. We’re just looking at query performance, not the feasibility of the structure itself.

CREATE TABLE sensor_log (
  sensor_id     INT NOT NULL,
  location      VARCHAR NOT NULL,
  reading       INT NOT NULL,
  reading_date  TIMESTAMPTZ NOT NULL DEFAULT now()
CREATE TABLE sensor_log_201607 () INHERITS (sensor_log);
CREATE TABLE sensor_log_201608 () INHERITS (sensor_log);
CREATE TABLE sensor_log_201609 () INHERITS (sensor_log);
INSERT INTO sensor_log_201607 (sensor_id, location, reading, reading_date)
SELECT % 1000, (random() * 100)::INT, % 100,
       '2016-08-01'::TIMESTAMPTZ - ( || 's')::INTERVAL
  FROM generate_series(1, 3600*24*30) s(id);
INSERT INTO sensor_log_201608 (sensor_id, location, reading, reading_date)
SELECT % 1000, (random() * 100)::INT, % 100,
       '2016-09-01'::TIMESTAMPTZ - ( || 's')::INTERVAL
  FROM generate_series(1, 3600*24*30) s(id);
INSERT INTO sensor_log_201609 (sensor_id, location, reading, reading_date)
SELECT % 1000, (random() * 100)::INT, % 100,
       '2016-10-01'::TIMESTAMPTZ - ( || 's')::INTERVAL
  FROM generate_se
Posted by gabrielle roth on 2016-10-07 at 02:14

When: 6-8pm Thursday October 20, 2016
Where: iovation
Who: Kristin Tufte
What: Urban Analytics: Connecting People to Data

Data from connected and automated vehicles, from connected intersections, positional or probe data from cell phones, crowd-sourced data, fixed-sensor vehicle speed and volume data, air quality data, and more… The list of data sources available to Smart Cities for use in Urban Analytics is almost endless. But data by itself is just data; what’s powerful is using data to solve people’s problems. This premise was central to the City of Portland’s – Ubiquitous Mobility for Portland proposal – one of seven finalists in the U.S. Department of Transportation’s $40 million Smart City Challenge.

In this talk, Dr. Tufte will discuss possibilities for using data and analytics for solving people’s problems including technical, institutional and social barriers to such work. She will also talk about the Smart City Challenge itself – she was a core member of the team and will share insight into (really stories about) the process, the team, the work and the presentation in DC. In the end, Columbus, OH won the challenge, but Portland put together a great proposal and a great team and she’ll also talk about how she expects Portland to continue to move Smart Cities forward.

Dr. Kristin Tufte is a transportation data management expert with expertise in the application areas of transportation and smart cities. Dr. Tufte has collaborated closely with transportation agencies in the Portland-Vancouver area for the past ten years and manages PORTAL — the longest-running public transportation data archive in the nation. Dr. Tufte is a key member of the highly-regarded city-university partnership between Portland State University and City of Portland and was the data lead on the City of Portland’s Smart Cities proposal, one of seven finalists in the US Department of Transportation’s Smart City Challenge. Dr. Tufte’s passion is using data and technology to change people’s lives.

If you have a job posting or event you

Posted by Bruce Momjian in EnterpriseDB on 2016-10-06 at 19:45

Having spoken at many Postgres events over the years, I have accumulated 25+ talks, whose slides are all on my website under a Creative Commons license. Fortunately, many of these conferences have recorded my presentations. In the past, I have added video links next to each event as I found them. Unfortunately, unless website visitors chose to view event locations using the link at the top of the page, they wouldn't see the video links. Using some JavaScript trickery, all video links now appear by default, and also next to locations if you choose to view them. This should make it easier for website visitors to find my presentation videos. (If you are aware of any missing videos, please let me know.)

The community has done a great job of creating wiki pages for most of the Postgres conferences with links to the slides. Unfortunately there are no links to videos matching the slide decks. It would be great if someone would add such links. For example, the PGConf US's Youtube page has 89 videos from 2015 and 2016 conferences. It would be great if those could be linked to from the Postgres wiki. The same could be done for other conferences who publish videos.

In addition, it would be nice if presentation and video URLs were loaded into a database (maybe Postgres ) so they can be categorized and searched, e.g. it would be great to see all the performance or security slides and videos listed together on one web page. In summary, I feel we are not fully utilizing the work that has been done by conferences to disseminate the proceedings of their conferences. These high-quality proceedings are valuable resources for our growing user-base.

Continue Reading »

With PostgreSQL 9.6 now out, many of our users and customers are asking us about the plans for Postgres-XL 9.6. The new release of PostgreSQL implements some new features, especially around parallel query processing and many are interested to know when those capabilities will be available in Postgres-XL.

Postgres-XL itself is a massively parallel processing system. It has an efficient distributed planner and executor which can distribute work to tens of datanodes, process the result and send it back to the client. It allows the user to either shard or replicate a table for efficient query execution. On top of that, Postgres-XL also provides global consistency of the data, all the time, so that users don’t need to worry about consistency while designing their applications.

One major difference between PostgreSQL 9.6’s parallel query capability and Postgres-XL’s parallel query capability is that PostgreSQL’s parallel engine will try to utilise multiple cores or CPUs within a single physical or virtual machine. While that does not much help the workloads where a single, large server itself is just not enough or where the resources on a single server are fully utilised; the new parallel query capabilities will still help certain workloads, especially workloads that can be served by a single large server. Before, these workloads could not utilise the full power of a large server, but with PostgreSQL 9.6, they will be able to do so.


So why would parallel query capabilities of PostgreSQL help Postgres-XL at all? A short and simple answer to the question is that if these capabilities exist, then each Postgres-XL datanode can parallelise part of the workload assigned to it. That can significantly improve query execution performance for workloads where datanode resources are not fully utilised. It will also allow users to provision larger servers for datanodes and get the full benefit of two levels of parallelism. So if you’re computing an aggregate for a multi-terabyte table, sharded on 64 nodes, Postgres-XL will first