The world's most advanced open source database
Top posters
Number of posts in the past month
Top teams
Number of posts in the past month
  • Policy for being listed on Planet PostgreSQL.
  • Add your blog to Planet PostgreSQL.
  • List of all subscribed blogs.
  • Manage your registration.
  • Get in touch with the Planet PostgreSQL administrators at planet at

My slides from today's presentation at #PGConfUS 2017 in Jersey City.

My slides from my PGConf US 2017 tutorial, PostgreSQL When It’s Not Your Job, are available now.

Currently I'm doing a short series of videos about data analysis PostgreSQL. Since this is for people who usually do not directly deal with databases every day, it starts very basic with the installation on Linux and Windows.

What do you think, is it worth the effort?
Posted by Jeff McCormick in Crunchy Data on 2017-03-28 at 17:48


Last November the team at CoreOS introduced the concept of an “application-specific controller” for Kubernetes called software Operators.  In their announcement, CoreOS suggested Operators as a means to more efficiently manage database infrastructure.

Crunchy Data has previously released a suite of containers for deploying, administering and monitoring PostgreSQL and leveraging the Operator concept to further advance the deployment and management of PostgreSQL functions within Kubernetes was a natural extension of our work to date.

To that end, Crunchy Data is pleased to announce an iniital implemention of a PostgreSQL Operator.

Turtle turtle by WO1 Larry Olson from US Army

The PostgreSQL database system uses the write-ahead logging method to ensure that a log of changes is saved before being applied to the actual data. The log files that are created are known as WAL (Write Ahead Log) files, and by default are 16 MB in size each. Although this is a small size, a busy system can generate hundreds or thousands of these files per hour, at which point disk space becomes an issue. Luckily, WAL files are extremely compressible. I examined different programs to find one that offered the best compression (as indicated by a smaller size) at the smallest cost (as indicated by wall clock time). All of the methods tested worked better than the venerable gzip program, which is suggested in the Postgres documentation for the archive_command option. The best overall solution was using the pxz program inside the archive_command setting, followed closely by use of the 7za program. Use of the built-in wal_compression option was an excellent solution as well, although not as space-saving as using external programs via archive_command.

A database system is a complex beast, involving many trade-offs. An important issue is speed: waiting for changes to get written to disk before letting the client proceed can be a very expensive solution. Postgres gets around this with the use of the Write Ahead Log, which generates WAL files indicating what changes were made. Creating these files is much faster than performing the actual updates on the underlying files. Thus, Postgres is able to tell the client that the work is "done" when the WAL file has been generated. Should the system crash before the actual changes are made, the WAL files are used to replay the changes. As these WAL files represent a continuous unbroken chain of all changes to the database, they can also be used for Point in Time Recovery - in other words, the WAL files can be used to rewind the database to any single point in time, capturing the state of the database at a specific moment.


Posted by Jobin Augustine in OpenSCG on 2017-03-28 at 11:32

The C++ 14 specification came with huge improvements: Auto type deduction, lambdas, movable RValue expressions etc. — the list is huge. Its a dream come true for lazy programmers like me : write less code & create fewer bugs 😉 And get huge performance gains for free.

OK this time I thought I’d try some functions in C++ rather than PL/SQL. It could be easier for development. Ohh, well…But Postgresql is a world of C code. Other than the list of restrictions, I couldn’t find anything in the documentation. No working examples.

Looks like there is a lot of confusion around. I see many queries on different forums that remain unanswered. I thought I’d give it a try.

After the first couple of attempts, i figured out that wrapping my code in an extern “C” could do the magic.

Here is a simple sample function, where i am trying to use type deduction, STL and a loop iterator. My file name is extension.cpp and it contains following lines

extern "C" {
#include <postgres.h>    //all C headers and macros go inside extern "C"
#include <utils/rel.h>

#include<vector>           //C++ headers go outside

extern "C" {
int sumofall(){        //wrap the C++ function inside the extern "C" block
auto sum_of_all = 0;
std::vector<int> arr {1,2,3,4,5,6,7};
for (auto& i : arr )
     sum_of_all += i;
return sum_of_all;

I know that this could be construed as dumb code. A C++ coder may say that this function must be a C++ “constexpr” (Constant expression) so that the calculation is done at compile time and there is zero overhead at runtime. But here my intention is to show that we won’t have any restriction in using C++ features.

I used the Postgres sandbox created using BigSQL Packagemanger for this demonstration as it comes with all the header files and libraries. At the same time it is portable (relocatable) also.

This can be compiled with:

g++ -c -fPIC -Wall -Werror -g3 -O0 -I/home/vagrant/bigsql/pg96/include/postgresql/server extension.cpp 

Make it a shared object

g++ -shared -o extension.o


Posted by Bruce Momjian in EnterpriseDB on 2017-03-27 at 14:15

Many database administrators use databases as simple data stores. However, relational systems can do much more, with advanced querying, analysis, and transaction control capabilities. Another area that is often overlooked is constraints. Constraints allow new and updated data to be checked against defined constraints and prevent changes if the constraints would be violated.

Constraints are odd in that they don't do anything if the data is consistent — it is more like an insurance policy against invalid data being entered into the database. If constraints are missing, there often are no initial problems, but over time erroneous or unexpected data gets in, causing problems with applications and reporting.

Do yourself a favor the next time you create a table — take the insurance and create useful CHECK, NOT NULL, DEFAULT, UNIQUE, PRIMARY KEY, and FOREIGN KEY constraints. If your tables are already created, you can use ALTER TABLE to add constraints to existing tables.

Continue Reading »

Posted by Joe Conway in Crunchy Data on 2017-03-27 at 13:18

Crunchy Data recently announced the publication of the PostgreSQL Security Technical Implementation Guide (STIG) by the United States Defense Information Systems Agency (DISA), making PostgreSQL the first open source database to provide a published STIG.

Everything you always wanted to know about Postgres stats

Today, I would like to make a little detour from the main series and will dive into pg_stat_bgwriter. If you’ve been following my previous posts, you will remember that pg_stat_bgwriter view has summary statistics about bgwriter and checkpointer. Here I would like to show an interesting report query which is based on pg_stat_bgwriter. Sources of the query were found in postgres mailing lists and shared by my colleague Viсtor Yegorov and slightly modified by me. This report provides comprehensive information about bgwriter and checkpointer activity and helps to better configure them.
A tiny recommendation to run this query with expanded output in psql. Report produces only one row and looks like this :

-[ RECORD 1 ]--------------------------+---------------------------------------
Uptime                                 | 112 days 12:58:21.394156
Since stats reset                      | 132 days 14:37:34.149519
Forced checkpoint ratio (%)            | 7.6
Minutes between checkpoints            | 56.61
Average write time per checkpoint (s)  | 775.36
Average sync time per checkpoint (s)   | 0.21
Total MB written                       | 4194915.7
MB per checkpoint                      | 105.26
Checkpoint MBps                        | 0.03
Bgwriter MBps                          | 0.24
Backend MBps                           | 0.10
Total MBps                             | 0.37
New buffer allocation ratio            | 34.925
Clean by checkpoints (%)               | 8.5
Clean by bgwriter (%)                  | 64.3
Clean by backends (%)                  | 27.2
Bgwriter halt-only length (buffers)    | 0.00
Bgwriter halt ratio (%)                | 0.25
-------------------------------------- | --------------------------------------
checkpoints_timed                      | 3117
checkpoints_req                        | 256
checkpoint_write_time                  | 2615284047
checkpoint_sync_time                   | 716686
buffers_checkpoint                     | 4544

As part of PostgreSQL documentation translation into Hebrew project I’m proud to announce about PostgreSQL 9.6.2 Release Notes translation process completion.

The Hebrew Notes are available on link.

Comments are welcome.

I will continue posting the translation project progress.




Posted by REGINA OBE in PostGIS on 2017-03-26 at 04:42

A reminder, PGConfUS 2017 conference is just days away, and we'll be giving a training March 28th 2017, Jersey City, NJ at 1 PM. If you are coming, keep an eye on this page PGConf 2017 US Getting Stuff done with PostGIS materials.

If you haven't signed up already, there are still spots, make sure to buy your tickets at

Continue reading "PGConfUS 2017 Getting Stuff done in PostGIS"
Posted by Shaun M. Thomas on 2017-03-24 at 17:20

Ever wonder why using SELECT column aliases in WHERE clauses fails?

SELECT random() AS confusion
FROM generate_series(1,10)
WHERE confusion > 0.5;
ERROR:  column "confusion" does not exist
LINE 3: WHERE confusion > 0.5;

Continue Reading »

Yet another edition of PGConf India came to conclusion in early March. You may have noticed the change from PGDay to PGConf, which signals a much larger gathering of PostgreSQL enthusiasts, now and in future. What started as a small meet-up of like minded people 4 years back, has now grown into a 2-day conference with a dedicated training day and a multi-track event at the main conference.

The keynote this year was delivered by Simon Riggs, a major developer and committer at PostgreSQL. He presented his thoughts on why Persistence is key to PostgreSQL’s success. Persistence is why users trust PostgreSQL to manage their critical data and persistence is why PostgreSQL community is able to deliver a solid product, release after release.

This year’s conference was attended by more than 250 delegates, coming from 20 different cities in the world and presenting over 80 different companies. This is at least 40% growth over the last year’s conference, and shows growing popularity of the conference but more so of PostgreSQL as a preferred choice of database.









The other key achievement this year was the success of a dedicated training day. The program saw much higher interest than what we anticipated. Over 75 participants from 30 different companies attended this day long session and got a chance to learn from some of the best trainers in the world. The training topics covered things such as performance tuning, high availability, physical/logical replication and backup and recovery.

This is the first time we conducted 3 parallel tracks at the main conference. Until 2 years back we used to struggle to get even a handful topics submitted for the talks. This year we had to reject 2 submissions for every accepted talk, even after doing a 3-track event.

We hope this journey continues its positive growth and we see more participation from our community. If you’ve any comments, please write to the organisers at

Leaving you with a photo moment.


I spent the better part of the morning figuring out why a colleague could not import a PostgreSQL dump in plain format made on Linux on his Windows machine.

According to documentation, this works like so (OS agnostic):

psql dbname < infile

However, this  gave the following error:

ERROR:  missing data for ...

However, the documentation for psql gives an alternative way to read commands from a file:

The -f switch.

Read commands from the file filename, rather than standard input. This option can be repeated and combined in any order with the -c option. When either -c or -f is specified, psql does not read commands from standard input; instead it terminates after processing all the -c and -f options in sequence. Except for that, this option is largely equivalent to the meta-command \i.
If filename is - (hyphen), then standard input is read until an EOF indication or \q meta-command. This can be used to intersperse interactive input with input from files. Note however that Readline is not used in this case (much as if -n had been specified).
Using this option is subtly different from writing psql < filename. In general, both will do what you expect, but using -f enables some nice features such as error messages with line numbers. There is also a slight chance that using this option will reduce the start-up overhead. On the other hand, the variant using the shell's input redirection is (in theory) guaranteed to yield exactly the same output you would have received had you entered everything by hand.

What this doesn't tell you, is that on Windows, CMD.exe apparently somehow tries to interpret the file it reads. And by doing so, it destroyed data in the dump so that COPY was unable to understand it anymore. So the last sentence of the statement above is just theory on Windows.

Long story short, with psql -f all went fine - and don't use I/O redirection with psql on Windows!

The BDR and pglogical apt repository GnuPG signing keys have been renewed.

Users should re-import the existing keys. You can verify that it’s still the same key as referenced in the documentation, just with a later expiry date.

Simply run:

wget --quiet -O - | sudo apt-key add -
sudo apt-get update

If you get an error like:

GPG error: jessie-2ndquadrant
InRelease: The following signatures were invalid: KEYEXPIRED 1490229886 KEYEXPIRED 1490229886 KEYEXPIRED 1490229886
WARNING: The following packages cannot be authenticated!
postgresql-bdr-client-9.4 postgresql-bdr-9.4 postgresql-bdr-contrib-9.4 postgresql-bdr-9.4-bdr-plugin

… then re-import your keys per the above instructions.

Understanding how characters sets, encodings, and collations work together can be confusing. I would like to explain them in this blog post:

Characters Sets

Postgres databases can be initialized to support a variety of character sets. A character set is a full set of the characters or glyphs that can be represented. Popular characters sets are ASCII (127 characters), Latin1 (ISO8859-1, 255 characters), and Unicode (128k+ characters).

Continue Reading »


Row Level Security is one of the lesser known great addition to PostgreSQL 9.5. The documentation about it is of PostgreSQL documentation quality of course, but there is a lack of online examples and usages without relying on distinct PostgreSQL users for multi-tenant websites. I recently built a proof of concept using RLS to secure access to a specific table in a multi-site Django application, and I will explain here how to do it. It is a very simple trick, but it can improve your security a lot, especially in bigger applications when auditing the whole source code can become tedious.

What is Row Level Security ?

Row Level Security, aka RLS, is a great PostgreSQL feature that allows you, for each (user, table, action) combination, to specify additional rules that restrict access to the rows. The PostgreSQL documentation, and most online documentations, show that feature with SQL users. A table with an “owner” column would be automatically filtered on the criteria owner = current_user(). It’s very efficient (the criteria is not applied after the query but pushed in the query and thus can use more indexes if needed), and for a very low maintenance cost you can have security pushed down to the lowest level of your stack.

With RLS, suddendly customer A can no longer steal from customer B, even if there is a SQL injection in your server. It would also require a security issue or misconfiguration on PostgreSQL side. Of course, you should keep checks in your code, better safe than sorry, but it’s the extra security belt that could save you from the next “data leaks” headlines 🙂

Limitation of the common RLS demonstrations and workaround

Usually, most RLS demo will apply restrictions per SQL user. It’s, by far, the easiest and safest way to do it, because it will be very hard to alter the data used to apply the rules. But it’s not going to work very well for most, if not all, web applications. Web applications seldomly use different SQL users per authenticated web user. Sometimes one SQL user is used per “site” (we


AWS RDS has a pretty good web UI in their console. It’s even got a log file watcher built in. But sometimes you just want to download your PostgreSQL log files from your command line, maybe because you’re looking for slow queries and want to run pgBadger. Or you want to script grabbing the last log file after you run a batch data load. Or you just like the command line better.

You should already know that AWS had a command line that uses the same APIs their web console is built with and has extensive documentation. Make sure you have that installed and configured to access your AWS account. Since most of the CLI output is in JSON, we’re going to us the JQ tool to pick apart the responses and get the bits we want.

Step 1: Find your instance identifier

Hopefully you gave your RDS instance a nice easy to remember name when you created it. To find out, list all of your instances, filter for postgres engines (in case you also have some “other” databases for some reason) and show it’s status. You use the describe-db-instances API for this:

aws rds describe-db-instances 

And this will give you a big blob of JSON that makes your eyes bleed trying to find the information you want. Like this: AWS JSON

To make this usable, fire up JQ, filter down to postgres engines (let’s pretend we don’t need MySQL for WordPress or anything), and check the availability status:

aws rds describe-db-instances | jq ' .DBInstances[] | select( .Engine | contains("postgres")) | .DBInstanceIdentifier + ": " + .DBInstanceStatus' 
"bigsql-on-rds: available" 
"openwatch: available"

Much better!

Step 2: Find the most recent log file

Now, you want to find the most recent log file for your instance. Use the describe-db-log-files command and the instance identifier for your db, find the one that was most recently written:

aws rds describe-db-log-files --db-instance-identifier bigsql-on-rds | jq ' .DescribeDBLogFiles | max_by( .LastWritten ) | .LogFileName'

Step 3: Retrieve the log file

Download it! Use the text out


The following feature has landed in Postgres 10 to help system administrators:

commit: 19dc233c32f2900e57b8da4f41c0f662ab42e080
author: Robert Haas <>
date: Fri, 3 Mar 2017 11:43:11 +0530
Add pg_current_logfile() function.

The syslogger will write out the current stderr and csvlog names, if
it's running and there are any, to a new file in the data directory
called "current_logfiles".  We take care to remove this file when it
might no longer be valid (but not at shutdown).  The function
pg_current_logfile() can be used to read the entries in the file.

Gilles Darold, reviewed and modified by Karl O.  Pinc, Michael
Paquier, and me.  Further review by Álvaro Herrera and Christoph Berg.

When “stderr” or “csvlog” is defined as log_destination, there is no real way to know to which PostgreSQL backends are writing to for most users. There are configurations where this can be guessed automatically, for example by tweaking log_filename to use only a day or a month number, and then have some client application layer guess what is currently the file being written to based on the current data, but this adds an extra complexity by having a dependency between an upper application layer and a setting value in PostgreSQL.

The above patch, as mentioned in the commit message, shows up what are the current files where logs are being written depending on the log destination defined. Once run, it shows the file currently in use:

=# SELECT pg_current_logfile();
(1 row)

This function actually parses a file in $PGDATA/current_logfiles that gets updated each time a log file is rotated, or when parameters are reloaded and that there is a modification of the log destinations, the first entry showing up if no argument is given. Note as well that the entry for “stderr” is generated first, and then goes the one of “csvlog”. So the order of things writtent in current_logfiles is pre-defined and does not


I have just committed a series of patches that provide support for enum types in the btree_gin and btree_gist standard extensions. This is something I first started work on about a year ago. It turned out to be more involved that I had first thought it would be, as it requires some additional core code due to the way that enum comparisons work, which is a bit more complex than for most data types, and involves use of PostgresSQL’s internal caching mechanism.

The practical upshot of this, however, is that starting with PostgreSQL 10 you will be able to use enum columns in exclusion constraints. That’s something that could be very useful – I started this work when I found, somewhat to my surprise, that it wasn’t possible.



Posted by Bruce Momjian in EnterpriseDB on 2017-03-20 at 17:00

Postgres supports both traditional join syntax, which uses the WHERE clause to specify joined columns, and ANSI join syntax, that uses the word JOIN in the FROM clause. While both syntaxes can be used for inner joins, only the ANSI join syntax supports outer joins in Postgres.

Because column restrictions like col = 4 are not related to joins, you would think that restrictions have the same effect whether they appear in the WHERE clause or as part of a join clause, e.g. a OUTER JOIN b ON a.x = b.x AND col = 4. However, this is not always true. Restrictions in the JOIN clause are processed during joins, while WHERE clause restrictions are processed after joins.

This is only significant in outer joins (and CROSS joins) because columns from unjoined rows are manufactured by outer joins. Here is an example:

Continue Reading »

Posted by Jim Mlodgenski in OpenSCG on 2017-03-20 at 15:30

More and more I’m seeing people use people use a NUMERIC for their primary keys. This is a direct result of people leveraging automated tools for their Oracle to PostgreSQL migration. Oracle’s NUMBER data type is frequently used as the primary key for a table so the tools just map to a PostgreSQL NUMERIC and calls it a day. The PostgreSQL NUMERIC is a purpose built data type for arbitrary precision numbers where exactness is required, think monetary values. It can hold up to 131072 digits before the decimal point and up to 16383 after the decimal point. To me, that does not sound like the right choice for the surrogate key for your country code table. It really is more like using a cement truck for your daily commute. It gets the job done, but it is not even close to the most efficient way to do it.

Let’s take a look.

First, let’s create 2 tables with the only difference being the data type of the primary key.

test=# \d bar_int
                   Table "public.bar_int"
 Column |       Type        | Collation | Nullable | Default 
 a      | integer           |           | not null | 
 b      | character varying |           |          | 
    "bar_int_pkey" PRIMARY KEY, btree (a)

test=# \d bar_numeric
                 Table "public.bar_numeric"
 Column |       Type        | Collation | Nullable | Default 
 a      | numeric           |           | not null | 
 b      | character varying |           |          | 
    "bar_numeric_pkey" PRIMARY KEY, btree (a)

Then, let’s fill those tables with 10 million rows of data.

test=# INSERT INTO bar_int 
test-# SELECT a, repeat('x', 100) 
test-# FROM generate_series(1, 100000000) a;

Looking at the sizes on disk of those 2 tables, they both take up exactly the same amount of space. When people are trying to decide if a NUMERIC is safe choice for their primary keys, many times this is where they stop the analysis. The storage differe

Posted by Federico Campoli on 2017-03-20 at 05:40
After fixing few bugs I decided to release the first pg_chameleon beta.

The package details with the changelog are available here

These are the notable changes.

  • Switch to psycopg2 2.7.x - Thanks to Daniele's work now the requirement installs without need for python or postgresql headers, making the install in virtualenv much simpler
  • Install system wide or in virtualenv follow the same rules. The configuration files and the package files are installed in the python's site-packages/pg_chameleon. 
  • Configuration directory created at first run. If not present the $HOME/.pg_chameleon/ directory is created when is executed the first time. The script creates  the directory and the required config dir with the config-example.yaml file. 
  • The write_batch function is now using the copy_expert in order to speedup the batch load. The fallback to inserts is still present.
 The release is already available on pypi

The documentation is available here

Please report any issue on the project's github page

However if you like to get in touch you can ping me on twitter @4thdoctor_scarf or if you prefer to chat, on irc there is a dedicated channel #pgchameleon (you can find me logged in at night).

Posted by Michael Goldberg in Postgres Miktzoanim on 2017-03-19 at 08:14

pgDay Israel 2017, the first event of its kind in Israel fully dedicated to PostgreSQL, was held at Google Campus Tel Aviv on March 2. The event was run by PostgreSQL Israel Community .

The event raised great interest among high-tech and security industries, and attended by more than 70 participants from a variety of domains, companies and ages.

Speakers from Israeli companies and from the global PostgreSQL Community, Oleg Bartunov and Ivan Panchenko, made the event atmosphere interesting and valuable for all the participants.

As open source gains popularity in the enterprise, there is increased study of open source communities and how they function. Those studying such things often ask about Postgres because of its unusually healthy community and recent successes.

While I was in India in February, I was interviewed by an open source magazine; an excerpt from that interview is now online. It covers open source leadership, encouraging new developers, and healthy software ecosystems.

Posted by Shaun M. Thomas on 2017-03-17 at 16:33

PostgreSQL supports Hash Index from a long time, but they are not much used in production mainly because they are not durable.  Now, with the next version of PostgreSQL, they will be durable.  The immediate question is how do they perform as compared to Btree indexes. There is a lot of work done in the coming version to make them faster. There are multiple ways in which we can compare the performance of Hash and Btree indexes, like the time taken for creation of the index, search or insertion in the index.  This blog will mainly focus on the search operation. By definition, hash indexes are O(1) and Btree indexes are O(log n), however with duplicates that is not exactly true.

To start with let us see the impact of work done to improve the performance of hash indexes. Below is the performance data of the pgbench read-only workload to compare the performance difference of Hash indexes between 9.6 and HEAD on IBM POWER-8 having 24 cores, 192 hardware threads, 492GB RAM.

The workload is such that all the data fits in shared buffers (scale factor is 300 (~4.5GB) and shared_buffers is 8GB).  As we can see from the above graph, that the performance has increased at all client counts in the range of 7% to 81% and the impact is more pronounced at higher client counts. The main work which has led to this improvement is 6d46f478 (Improve hash index bucket split behavior.) and 293e24e5 (Cache hash index's metapage in rel->rd_amcache.).

The first commit 6d46f478 has changed the heavyweight locks (locks that are used for logical database objects to ensure the database ACID properties) to lightweight locks (locks to protect shared data structures) for scanning the bucket pages.  In general, acquiring the heavyweight lock is costlier as compare to lightweight locks.  In addition to reducing the locking cost, this also avoids locking out scans and inserts for the lifetime of the split.

The second commit 293e24e5 avoids a significant amount of contention for accessing metapage. Each search operation needs to access metap
Posted by Dan Robinson on 2017-03-16 at 21:02

Amazon Redshift is a data warehouse that’s orders of magnitudes cheaper than traditional alternatives. Many companies use it, because it’s made data warehousing viable for smaller companies with a limited budget. Since so many Heap customers use Redshift, we built Heap SQL to allow them to sync their Heap datasets to their own Redshift clusters. […]

The post Redshift Pitfalls And How To Avoid Them appeared first on Heap Blog.

On operational issues side, one thing that quite commonly floats atop when dealing with customers using Postgres, especially with smaller setups, is Streaming Replication and it’s failures. Failure here not as a bug or design failure, but more as a misunderstood “feature”, as encountered problems are mostly actually things that work as intended and with […]

The post Number one thing to watch out for when doing Postgres Streaming Replication appeared first on Cybertec - The PostgreSQL Database Company.