The world's most advanced open source database
Top posters
Number of posts in the past month
Top teams
Number of posts in the past month
  • Policy for being listed on Planet PostgreSQL.
  • Add your blog to Planet PostgreSQL.
  • List of all subscribed blogs.
  • Manage your registration.
  • Get in touch with the Planet PostgreSQL administrators at planet at

Please join me on September 9th at 10 am Pacific Time for my webinar entitled "Columnar Store for PostgreSQL using cstore_fdw". This webinar will give you a comprehensive overview of cstore_fdw, an open source PostgreSQL extension from Citus Data that lets you store tables in columnar form. During this live webinar, we will discuss how columnar storage differs from row-based storage. We will demonstrate the difference between columnar and row-based storage by storing data in compressed form in a column in contiguous blocks on disk, as opposed to storing each row separately.

You will learn which use-cases are the best fit for cstore_fdw along with performance tuning best practices. We will take a deeper dive into the Optimized Row Columnar (ORC) storage format that cstore_fdw uses and discuss key features such as skip-list indexes. We will also look at practical examples of how to make effective use of cstore_fdw, such as using it to reduce the storage size of tables with JSONB columns over 6x. Finally, I will give a live demo of how to set up cstore_fdw tables yourself and demonstrate some of the latest and upcoming features. PostgreSQL 9.5 adds foreign table inheritance, which lets you create tables that are part row-based and part columnar and move data between the different partitions, giving you the best of both worlds.

You can learn more by visiting the Citus Data Webinars page or use the button below to register now.

Webinar on Columnar Store for PostgreSQL using cstore_fdw on September 9th

Learn how columnar storage can significantly reduce storage size, memory footprint, and I/O utilization for online analytical processing (OLAP) workloads

Posted by Hans-Juergen Schoenig in Cybertec on 2015-09-01 at 09:34:33
NVIDIA’s CUDA is definitely I great thing and I got to admit that I got excited years ago when I first learned about it. For many operations a nicely optimized GPU implementation definitely seems the way to go. GPUs are traditionally used for scientific operations and massively parallel tasks. However, some important work is also […]
Posted by Josh Berkus in pgExperts on 2015-09-01 at 05:07:00
One of PostgreSQL's advantages over many other SQL databases is letting users modify database objects with a minimum of locking and fuss, and do it in a transaction so it can be rolled back.  However, a minimum of locking isn't no locking, so one still has to plan for getting a lock to push through the database migration.

For example, say I wanted to add two new columns to a table.  I already know that by making the column nullable, and not setting a default, I can add them with a metadata-only change which requires only an update to the system catalogs.  However, this table change does require an ACCESS EXCLUSIVE lock for a few milleseconds to go through, which can be tricky.  See, an access exclusive lock blocks everything, including reads and autovacuum, which can be a bit messy if your timing is bad.  You can end up waiting for that lock behind a long-running query, and then all the other database traffic can pile up behind you.

Now, if we had ALTER TABLE CONCURRENTLY it would take care of this for you.  But we don't (yet), so you need to fake it with a DO script.  Here's an example DO script which polls for a lock every 2 seconds until it can get one, then pushes through the table change:

    DO $f$
    DECLARE ntries INT := 10;
        sleepytime INT := 2;

    FOR get_lock IN 1 .. ntries LOOP
                    LOCK TABLE mytable
                    ALTER TABLE mytable ADD COLUMN new_col1 INT,
                            ADD COLUMN new_col2 VARCHAR;
                    RAISE INFO 'table updated';
                    WHEN lock_not_available THEN
                            PERFORM pg_sleep(sleepytime);

    RAISE INFO 'unable to obtain lock after % tries', ntries;


The idea here is that you keep trying to LOCK ... NOWAIT, which will throw and error if it can't get the lock immediately.  Then it sleeps and tries again 2 seconds lat

[continue reading]

Posted by David E. Wheeler on 2015-08-31 at 21:48:54

I took a little time this summer to finally address some nagging issues in PGXN Manager, the site to which extensions are uploaded and added to the master repository. Most of the changes were internal, improving the interface through which I sometimes have to re-index a release. There are also a couple of minor changes to the sample Makefile in the How To. But the most important change for new releases going forward is that version ordering is now enforced. That means two things:

  • Distribution versions must be greater than the version of the previous release. You can no longer release v1.2.0 today and v1.1.0 tomorrow.
  • Extension versions must be greater than or equal to versions in previous releases. It’s pretty common to have a new release version but have embedded extensions be the same version as before. But they can’t be any less than before.

The changes have been applied to the reindexing code, as well, which also prevents versions from being greater than in subsequent releases. That won’t come up very often—most of the time, I end up reindexing something that has just been released. But in the future I expect to add an interface for release managers to reindex their own distributions, so it may be that someone reindexes a release that’s a few versions old. Or not. But just in case, it’ll be handled.

So what are you going to release on PGXN today? Get to it!.

Helsinki PostgreSQL User Group will convene tomorrow, on Tuesday September 1st 2015 at 15:00.

I will be babbling about the various index types in PostgreSQL. GIN, GiST, and the humble B-tree. What does the structure of each one look like? What are the strengths and weaknesses of each? And of course, a lot of free chatter on whatever other topics you want to bring up.

We have also created a Meetup group. Registration at Meetup is recommended to ensure you get a seat:

This is the fourth meeting of the Helsinki PostgreSQL User Group. Come and meet fellow PostgreSQL users and developers. Welcome!

September 1st 2015

Richardinkatu Library
Rikhardinkatu 3

Posted by Oleg Bartunov in Postgres Professional on 2015-08-30 at 20:14:37
Our company Postgres Professional has conducted a performance testing on big P8 server, provided by IBM. One excited thing is that we were able to identify scalability issue and made a patch (kludge for now), which gave us performance of postgres close to million tps. We'll discuss the approach used in patch in -hackers.


More details are in Alexander's Korotkov blog post.

Next post we'll compare performance of postgres and MariaDB :)
Posted by gabrielle roth on 2015-08-30 at 19:00:00
PostgresOpen is coming up in just a few (Disclosure: I am on the conference committee.) We are still working out the room assignments; your conference badge/printed program will have the right info. The committee is very excited to have snagged Lacey Williams Henschel to give our keynote, on Open Source & Higher Ed. I’m looking […]
A few days ago, I have released a brand-new PostgreSQL extension, called "sql_firewall". sql_firewall is intended to protect PostgreSQL database from SQL injection attacks by limiting SQL queries to be executed on the database. In this entry, I would like to introduce how it works and how to use it. How
Aquameta Layer 0: meta - Writable System Catalog for PostgreSQL

In the introduction, we talked about Aquameta's first principle, datafication. We weild the broadsword of datafication as we charge the many-headed hydra that is unnecessary programming complexity. In this writeup, we describe our first, and in some ways most-challenging foe, datafication of the database itself.

Layer zero of Aquameta is called meta, a writable system catalog for PostgreSQL. It exposes PostgreSQL administration commands through data manipulation commands, enabling schema creation, table creation, column renaming, role dropping, and much more via INSERTs UPDATEs and DELETEs. In other words, it makes the DDL accessible via the DML.

Meta unifies "normal" data with schema, database configuration, everything that is PostgreSQL, into a coherent and synthesized information model. Everything is accessible as data. This adds the needed paths to make PostgreSQL truly homoiconic, which breaks down the wall between schema and data, and opens the doors for all manner of meta-programming.

PostgreSQL already has two system catalogs, INFORMATION_SCHEMA and pg_catalog. Meta is different because:

  • It's writable, and changes to the data take effect in the database
  • It has friendly names for relations and columns
  • It's normalized, the views are sensibly laid out in parallel with PostgreSQL's architecture
  • It has a type system of meta-identifiers for primary keys

Here's a simple example of how to use it. Instead of doing:

aquameta=# create schema test_schema;  

You can do:

aquameta=# insert into meta.schema (name) values ('test_schema');  
INSERT 0 1  

These two commands perform identical operations under the hood; meta just makes them accessibe through a different interface.

Here is an ER diagram of meta's schema:

Aquameta Layer 0: meta - Writable System Catalog for PostgreSQL

aquameta=# set search_path=meta  
aquameta=# \d  
 Schema |         Name         | Type | Owner
 meta   | cast                 | view | eric
 meta   | column               | view | eric
 meta   | connection           | view | eric

[continue reading]

Posted by Eric Hanson on 2015-08-28 at 23:12:00

I'm happy to announce Aquameta, a web application development platform built entirely in PostgreSQL.

Why We Must Rethink Our Stack

1. Software Development Complexity Bottlenecks Human Progress

The complexity and inaccessibility of software development is inhibiting businesses, education and social progress.

In businesses, especially large ones, software has become the digital nervous system that allows them to operate at scale. But because only programmers can change the software, experimentation and evolution of business practices has actually become slower and more expensive. Most employees are powerless to make even simple software changes, and yet developers often don't have the domain expertise required to write good software for someone else's job.

Education is feeling it too. The demand for programmers is quickly outpacing supply, and for students, learning computer science is becoming more and more of an essential skill. Educators are well aware of this problem, but the complexity of programming makes it very challenging to teach.

Then there is social progress generally. End users have so many great ideas for software systems that could help us all collaborate with each other in deeper, richer ways. But to create them they either need the money or the skill to bring it into reality. Software complexity is a huge bottleneck for social innovation. Not every idea has a business model behind it, and there must be so many good ideas that could exist if programming was eaiser.

We believe that it is a social imperative, arguably one of the greatest demands of our time, to make software development more accessible to everyone.

2. Today's Web Is Better At Making Money Than Fun

Two-decades of profit-driven evolution have lead to the web we have today. But this outcome wasn't inevitable, and we can still change it. We have learned a lot since the inception of the web, and when we look at it knowing what we know now, we see some ways things could be a lot cooler. Here are a few.

Conflation of Data and Presentation


[continue reading]

Posted by Josh Berkus in pgExperts on 2015-08-28 at 22:16:00
I write a lot more PL/pgSQL than I'd like to.  Not that I don't like SQL, but as a language PL/pgSQL really shows its thrown-together origin; it's like 80's primitive.  One thing that PL/pgSQL lacks is good string manipulation tools, which is particularly tragic given that the #1 thing to do in PL/pgSQL is to generate queries from parameters and run EXECUTE.

Postgres has two built-in ways to do string substitution: concatenation and format().  Both have drawbacks.  Let me give you an example:

EXECUTE 'SELECT ' || col1 || ', ' || col2 || ' FROM ' || userschema ||
  '.accounts WHERE ' || filterclause || ' ORDER BY ' || col1 || ',' || col2;

EXECUTE format('SELECT %s, %s FROM %s.accounts WHERE %s ORDER BY %s, %s', col1, col2, userschema, filterclause, col1, col2);

You can see the problem here.  Both formats are hard to read and hard to maintain.  Python and Perl have a good fix for this: dictionary/hash-based string substitution, where you can swap in the dictionary keys for the values.  So I wrote up a quick hack to do this in PL/pgSQL.

Here's the dict-replace function.

Using it, you'd replace the above with:

EXECUTE replace_vars('SELECT ${col1}, ${col2} FROM ${userschema}.accounts
  WHERE ${filterclause} ORDER BY ${col1}, ${col2}', vardict);

Of course, you need to first set up the vardict as a JSON value, in the form:

vardict := '{ "col1" : "username", 'col2' : "branchname", ...}'

Still, much more readable, eh?  No refcounting, no repeating variables, no string breaks.  I used Bash's variable substitution syntax of ${var} because it seemed like the thing least likely to conflict with user text, unlike anything involving %.

There's some caveats, though: it's not performant, and you could probably find strings which will break it, particularly if you're swapping in JSON values.  You still have to do your own quoting of strings.  And if you have access to PL/Perl or PL/Python you don't need this nonsense.

But for the minimal case, it should help.
Posted by Shaun M. Thomas on 2015-08-28 at 17:00:23

This week we’ll be covering another method of Postgres partitioning. This is a technique I personally prefer and try to use and advocate at every opportunity. It’s designed to straddle the line between traditional partitioning and standard monolithic table structure by using table inheritance as a convenience factor. The assumption here is that end-user applications either:

  1. Know that partitioning is in use.
  2. Only load “current” data and don’t care about partitions.

These two things may seem mutually exclusive, but they can actually be complementary. There are also notable performance benefits to this approach that we’ll explore later.

The SQL for this structure is similar to what we used last week, but there are a couple of notable omissions.

TRUNCATE TABLE sensor_log;
CREATE TABLE sensor_log_part_2012 (
    CHECK (reading_date >= '2012-01-01' AND
           reading_date < '2013-01-01')    
) INHERITS (sensor_log);
CREATE TABLE sensor_log_part_2013 (
    CHECK (reading_date >= '2013-01-01' AND
           reading_date < '2014-01-01')    
) INHERITS (sensor_log);
CREATE TABLE sensor_log_part_2014 (
    CHECK (reading_date >= '2014-01-01' AND
           reading_date < '2015-01-01')    
) INHERITS (sensor_log);

This time, we don’t need the function that distributes the rows to the correct partition, or the associated trigger. Further, note that the 2015 partition is missing. This is possible because we’ve made two assumptions about how partitions will be used:

  1. Current data will be stored in the base table.
  2. Old or “archived” data will be moved into partitions.

To fulfill the first requirement, we made a slight change to the python loader script introduced at the beginning of this series. We could have done this with the partitioning introduced last week as well, but while that is a performance tweak for that format, this method depends on it.

# Set these to modify how large the C

[continue reading]

PostgreSQL JDBC Driver Version 9_4_1202 released

Lots of bug fixes and some awesome performance enhancements, including statement caching

Version 9.4-1201 (2015-02-25)

  • ResultSet positioning methods in some particular cases PR #296 (282536b)
Author: Craig Ringer
  • Disable binary xfer on batches returning generated keys PR #273 (763ae84)
  • Add a new test case demonstrating a bug in returning support PR #273 (4d2b046)
  • Always Describe a query in a batch that returns generated keys PR #273 (a6bd36f)
Author: Dave Cramer
  • chore: fix build.xml to allow releasing to maven PR #262 (34f9361)
  • fix: BlobInputStream ignores constructor parameters #263 PR #273 (c1c6edc)
  • don't reset forceBinary transfer if we setPreparedThreshold (937a11c)
  • Revert "perf: Remove expensive finalize methods from Statement and Connection" PR #293 (a0d3997)
  • updated copyright PR #312 (263375c)
  • Revert "Issue 250 -- Adding setURL/getURL to" PR #312 (a1ac380)
  • fixed mailing list href PR #326 (c3e86a6)
  • increment driver version PR #346 (b8ee75d)
Author: David R. Bild:
  • feat: add equality support to PSQLState PR #277 (7698cd9)
  • Improve version checking PR #355 (f7a84db)
Author: Eugene Koontz
  • Add support within "private Object buildArray (PgArrayList input, int index, int count)" for array elements whose type is jsonb PR #349 (d313138)
  • Added setter method for logging level. The method exactly matches property name in documentation. PR #282 (d9595d1)
  • Added getter method. PR #282 (65759f0)
  • Adding XML catalog to help unit tests not remote entity resolution. PR #284 (cb87067)
  • Added support to locally resolve dtd or entity files. PR #284 (017970d)
  • Disable verbose logging of the catalog resolver. PR #284 (fcc34f5)
Author: Kris Jurka
  • Improve error message for failure to update multicolumn primary key RSs. PR #284 (05ff811)
  • Remove all JDBC3 code as JDK 1.4/1.5 are no longer supported. PR #284 (f9a956b)
  • Add preliminary support for JDBC4.2. PR #284 (bd05fd2)
Author: Lo

[continue reading]

I've been writing a little provisioning script for a vagrant Centos 7 machine that will be a buildfarm client.

Stripped of the vagrant stuff and some stuff that's special for the machine, here is what I did to get it running:

sudo yum install -y wget gcc make flex bison ccache git \
perl-libwww-perl perl-Digest-SHA perl-devel perl-ExtUtils-Embed \
zlib-devel openssl-devel readline-devel python-devel tcl-devel \
libxml2-devel libxslt-devel openldap-devel

mkdir bf
cd bf
mv latest-client.tgz buildfarm-latest-client.tgz
tar -z --strip-components=1 -xf buildfarm-latest-client.tgz
mkdir root
sed -i "s!build_root =>.*!build_root => '$HERE/root',!" build-farm.conf
At this stage the buildfarm client will run quite successfully, as can be tested using:
perl --test

While hacking PostgreSQL it’s very useful to know pid of the backend you are working with. You need to know pid of the process to attach debugger, profiler etc. Luckily, .psqlrc provides us an elegant way to define the shortcuts for psql. Using config line below one can find out backend pid just by typing :pid.

\set pid 'SELECT pg_backend_pid();'

=# :pid
(1 row)

In 9.6 it becomes possible to even include backend pid into psql prompt.

However, it’s possible to automate more complex actions in psql. I’ve configured my psql to run gdb attached to current backend in new tab of iTerm2 just by typing :gdb.

The :gdb command selects pid of current backend and puts it to the input of pg_debug script.

\set gdb 'SELECT pg_backend_pid() \\g |pg_debug'

pg_debug extracts pid from its input and then runs OSA script which runs gdb in the new tab of iTerm2.



while read line
    # Extended display off
    if [[ $line =~ ^\ +([0-9]+) ]]; then
    # Extended display on
    if [[ $line =~ ^pg_backend_pid.*\ ([0-9]+) ]]; then

# Open gdb session
osascript -e "
tell application \"iTerm\"
    tell the current terminal
        set mysession to (the current session)
        launch session \"Default Session\"
        tell the last session
            write text \"gdb --pid=$PID -x <(echo continue)\"
        end tell
        select mysession
    end tell
end tell"

This script works for Mac OS X and iTerm2, but the same approach should work for other platforms and terminal emulators.

Posted by Pavan Deolasee in 2ndQuadrant on 2015-08-26 at 11:12:40

It’s been busy few months as we work towards merging Postgres-XL with the latest and greatest release of PostgreSQL. Postgres-XL is an open source fork of PostgreSQL that provides a scalable platform for OLTP and Business Intelligence. The current release of Postgres-XL is based on PostgreSQL 9.2, so it lacks all the improvements made to PostgreSQL over the last three years.

2ndQuadrant and other companies are working on bringing distributed scalability into PostgreSQL core as well as building tools and extensions outside the core. As part of that, Postgres-XL has a number of features that we’d like to bring back into core PostgreSQL, so 2ndQuadrant has picked up the task of updating the Postgres-XL code base to the latest PostgreSQL release as the first step. After more than 3 months work, PostgreSQL 9.5 is still in alpha stage, so we wanted to give a progress report on how the work is proceeding. I also need to say the magic words: This ongoing work on Postgres-XL is part of the AXLE project, funded by the European Union under grant agreement 318633.

Preparation for the Merge

Since PostgreSQL and Postgres-XL both use GIT as source control system, it makes the merge process much simpler since GIT provides many tools to assist the process. But as soon as we tried the merge, we faced the first hurdle.

We realised that the current Postgres-XL repository is based on an older minor 9.2 release of PostgreSQL. That means there were commits and changes in the Postgres-XL master branch which either never made to PostgreSQL’s master branch or had different commit ids. So merge with PostgreSQL master branch threw a lot more conflicts than what we would have other expected. So the first task we must accomplish was to rebase the Postgres-XL 9.2 repository on a later commit point. This obviously required careful treading, making sure that nothing breaks during the process. Once we had the basic rebase done, we also merged all Postgres-XL bug fixes and enhancements, created a Postgres-XL 9.2 stable branch and merged the 9.2 br

[continue reading]

Posted by Yann Larrivee on 2015-08-26 at 00:50:48

ConFoo is once more seekconfooing passionate speakers for the upcoming conference.

The event is happening in Montreal, Canada, between February 24th and 26th, 2016. It is an exciting conference for web developers with speakers from all over the world. It unites many web programming languages under one roof, as well as other topics related to web development. The call for papers closes on September 20th.

ConFoo renews 50% of its speakers each year. If you’re new to this conference, you should definitely submit.

If you would just like to attend, there is a discount until October 13th.

Posted by Terry Erisman in CitusData on 2015-08-25 at 17:13:49

We are pleased to announce that the PGConf Silicon Valley breakout sessions and tutorials are now posted on the conference website. Our outstanding Conference Committee has worked through the large pool of submissions and selected the sessions which will be presented November 17-18, 2015 at the South San Francisco Conference Center. As the first dedicated PostgreSQL conference in the Bay Area, the conference is a great opportunity for PostgreSQL community members to exchange technical knowledge and make new connections. Early Bird registration prices are available through October 4, 2015.

Breakout Sessions

The breakout sessions are organized into five tracks:

  • DevOps
  • In the Trenches
  • Features
  • Hackery
  • PostgreSQL at Scale

The outstanding list of speakers include representatives from companies including TripAdvisor, Urban Airship, Square, Heap, Chartio, AWS, Rackspace, Joyent, and Heroku. The breakout sessions, speakers, and talk descriptions are available on the conference website.


The tutorials will be presented by leading PostgreSQL practitioners and range from beginner to advanced level. Presenters include representatives from 2ndQuadrant, EnterpriseDB, Citus Data, OmniTI, and PostgreSQL Experts. Check the conference website for talk descriptions and speaker information.

More Information Coming Soon

The schedule for the tutorials and breakout sessions will be posted to the PGConf Silicon Valley conference website in the near future. We will also post a series of interviews with conference speakers who will provide insights into their sessions and what they are looking forward to at the conference. 

Take advantage of Early Bird pricing before it ends on October 4th. I hope to see you PGConf Silicon Valley!


PGConf Silicon Valley Breakout Sessions and Tutorials are Now Posted

November 17-18, 2015 at the South San Francisco Conference Center

ALTER TABLE has been known for many years in the Postgres ecosystem as being a command taking systematically an ACCESS EXCLUSIVE lock on the relation being modified, preventing all operations on the relation in parallel. Those locks are getting more and more relaxed, with for example in Postgres 9.4 the following commands that got improvements:

  • ALTER COLUMN SET and ALTER COLUMN RESET for attribute options

In 9.5 as well those commands have been improved:


Now, Postgres 9.6, which is currently in development, brings in more lock reduction, with the following commit:

commit: 47167b7907a802ed39b179c8780b76359468f076
author: Simon Riggs <>
date: Fri, 14 Aug 2015 14:19:28 +0100
Reduce lock levels for ALTER TABLE SET autovacuum storage options

Reduce lock levels down to ShareUpdateExclusiveLock for all
autovacuum-related relation options when setting them using ALTER TABLE.

Add infrastructure to allow varying lock levels for relation options in
later patches. Setting multiple options together uses the highest lock
level required for any option. Works for both main and toast tables.

Fabrízio Mello, reviewed by Michael Paquier, mild edit and additional
regression tests from myself

Code speaking, ALTER TABLE SET has been improved to be able to define different types of locks depending on the parameter touched, and in the case of this commit all the parameters tuning autovacuum and auto-analyze at relation level have been updated to use SHARE UPDATE EXCLUSIVE LOCK. In short, this allows read as well as write operations to occur in parallel of the ALTER TABLE, something that will definitely help leveraging activity bloat on such relations.

Note as well that when multiple subcommands are used, the stronger lock of the whole set is taken for the duration of the ALTER TABLE command. So for example, should an update on the parameter fillfactor be mixed with a

[continue reading]

Posted by Gulcin Yildirim in 2ndQuadrant on 2015-08-23 at 16:01:35

PgBouncer is lightweight connection pooler for PostgreSQL.

The latest release of PgBouncer was announced on 1st of August 2015. In this blog post we’ll talk about the major new improvements of PgBouncer.

Main new features of PgBouncer

Load user password hash from postgres database

With this hot feature, PgBouncer allows loading user’s password from database with two config parameters which are auth_user and auth_query.

Note: These config parameters are defined in the configuration file pgbouncer.ini.

  • auth_user
    If auth_user is set, any user not specified in auth_file will be queried from pg_shadow in the database using auth_user. Auth_user’s password will be taken from auth_file. This parameter can be set per-database too.

    Note: auth_file is the name of the file to load user names and passwords from.

  • auth_query
    This parameter allow us to write a SQL query to load user’s password from database. It runs under auth_user.See the default query below:
    SELECT usename, passwd FROM pg_shadow WHERE usename=$1

Pooling mode can be configured both per-database and per-user

With this feature, independent of the main pooling mode, clients can now connect to different databases with one of the 3 pooling modes described below. This is also applicable to users as well. For example, if the pooling mode is session pooling, a specific user can be configured to use transaction pooling. This gives us database-level and user-level flexibility to apply more appropriate pooling options.

PgBouncer provides 3 connection pooling modes:

  • Session pooling
    During the lifetime of a client connection, an existing server connection is assigned to the client and after the client disconnection, the assigned server connection is put back to the connection pool.
  • Transaction pooling
    In this mode, a server connection is not assigned to a connected client immediately: only during a transaction a server connection is assigned. As soon as the transaction is over, the connection is put back into the pool.
  • Statement pooling
    This is similar to the transa

[continue reading]

Posted by Shaun M. Thomas on 2015-08-21 at 16:32:15

PG Phriday: Basic Partitioning

Most PGDB (PostgreSQL) users who are familiar with partitioning use the method described in the partitioning documentation. This architecture comes in a fairly standard stack:

  • One empty base table for structure.
  • At least one child table that inherits the base design.
  • A trigger to redirect inserts based on the partitioning scheme.
  • A constraint on each child table to enforce the partition scheme, and help the planner exclude child partitions from inapplicable queries.

It looks simple, but there’s a lot of SQL involved. Let’s take the sensor_log table we introduced a couple weeks ago] and turn it into a partitioned structure.

TRUNCATE TABLE sensor_log;
CREATE TABLE sensor_log_part_2012 (
    CHECK (reading_date >= '2012-01-01' AND
           reading_date < '2013-01-01')    
) INHERITS (sensor_log);
CREATE TABLE sensor_log_part_2013 (
    CHECK (reading_date >= '2013-01-01' AND
           reading_date < '2014-01-01')    
) INHERITS (sensor_log);
CREATE TABLE sensor_log_part_2014 (
    CHECK (reading_date >= '2014-01-01' AND
           reading_date < '2015-01-01')    
) INHERITS (sensor_log);
CREATE TABLE sensor_log_part_2015 (
    CHECK (reading_date >= '2015-01-01' AND
           reading_date < '2016-01-01')    
) INHERITS (sensor_log);
    WHEN 2015 THEN
      INSERT INTO sensor_log_part_2015 VALUES (NEW.*);
    WHEN 2014 THEN
      INSERT INTO sensor_log_part_2014 VALUES (NEW.*);
    WHEN 2013 THEN
      INSERT INTO sensor_log_part_2013 VALUES (NEW.*);
    WHEN 2012 THEN
      INSERT INTO sensor_log_part_2012 VALUES (NEW.*);
$$ LANGUAGE plpgsql;
CREATE TRIGGER t_correct_partition
   FOR EACH ROW EXECUTE PROCEDURE sensor_partition();

[continue reading]

Posted by Andreas Scherbaum on 2015-08-20 at 20:17:49
Andreas 'ads' Scherbaum

What happens, if you ask the Chinese PostgreSQL community for a Meetup-like event, one or two speakers? You end up getting a full day conference ;-)


On September 12, Saturday, a full day PostgreSQL conference will take place at the Pivotal office in Beijing. If you want to attend, please sign up here.

The full address is: 17/F, South Block, Tower C, Raycom Info Tech Park, NO.2, Kexueyuan South Road, Beijing 100190, China.


Continue reading "PostgreSQL Day in Beijing"

At this year's PGCon, we gave a talk on pg_shard that included a futuristic pg_shard demo: a distributed table with JSONB fields, backed by a dynamically changing row and columnar store. The demo is based on the Github archive data, which comprises a log of all events across all public Github repositories with detailed metadata in JSON format.

The pg_shard extension can store very large event logs in a PostgreSQL table by transparently sharding the table across many servers. We envision a use-case where new events are added to the table in real-time and frequent look-ups occur on data less than 7 days old. Data older than 7 days is compressed inside the database using cstore_fdw. A video of the pg_shard talk and demo is available on our website and at the end this post.

In our demo, the events table was range-partitioned by time, such that each shard contains events for a specific hour of the day. One of the advantages of range-partitioning a table by time is that older shards are read-only, which makes it easier to perform certain operations on the shards, such as replacing the shard with an equivalent cstore_fdw table. We consider this demo to be futuristic since pg_shard does not yet have functions to create a range-partitioned table. It can only be set up by manually changing the pg_shard metadata tables, for which we created several PL/pgSQL functions.

The architecture used in the demo resembles a shard assembly line as shown below. In the first stage of the assembly line the shard is filled wih data, after that the shard is frequently read by SELECT queries, and after 7 days the shard is compressed for archival. When pg_shard receives an INSERT, it compares the timestamp of the time ranges of the shards in the metadata tables to find the shard for the current hour. Assuming the INSERTS only happen for the current time, then there is only ever one shard to which new data is being written. At the end of the hour, INSERTs will start going into a new shard. The shard for the next hour needs to be set up in ad

[continue reading]

On 30th of July, Andrew Dunstan committed patch: Add IF NOT EXISTS processing to ALTER TABLE ADD COLUMN   Fabrízio de Royes Mello, reviewed by Payal Singh, Alvaro Herrera and Michael Paquier. Nice. I always like when there is new “IF EXISTS" or “IF NOT EXISTS", because it makes my life as dba simpler. This […]
Posted by Josh Berkus on 2015-08-18 at 23:58:57

For September, two of the founders of PipelineDB are going to be introducing this streaming PostgreSQL fork. Come learn what PipelineDB is about and how you'd use it alongside PostgreSQL. This meetup will be at a new venue, If(we), a social networking company which uses PostgreSQL together with internal databases. RSVP on meetup.

P.S. we are looking for a lightning talk for this meetup! Contact

Posted by Glyn Astill on 2015-08-18 at 18:52:44
Prior to pg 9.1, you could connect to any database with psql regardless of encoding and you’d get the server encoding as your client encoding unless you set it: That wasn’t quite right; the client_encoding is a lie. On a modern psql version that’s quite rightly prevented: This is not an issue if you want […]
Posted by Julien Rouhaud in Dalibo on 2015-08-18 at 14:03:34


The work memory, or work_mem is one of the hardest thing to configure. It can be used for various purposes. It’s mainly used when sorting data or creating hash tables, but it can also be used by set returning functions using a tuplestore for instance, like the generate_series() function. And each node of a query can use this amount of memory. Set this parameter too low, and a lot of temporary files will be used, set it too high and you may encounter errors, or even an Out Of Memory (OOM) depending on your OS configuration.

I’ll focus here on the amount of memory needed when sorting data, to help you understand how much memory is required when PostgreSQL runs a sort operation.

Truth is out

I often hear people say there is a correlation between the size of the temporary files generated and the amount of data needed. It’s wrong, you can’t make any assumption on the value of work_mem based on the size of a sort temporary file.

It’s because when the data to be sorted don’t fit in the allowed memory, PostgreSQL switches to an external sort. In addition to the currently used memory, a temporary file is used multiple times, to avoid wasting disk space. If you want more details on this, the relevant source code is present in tuplesort.c and logtapes.c. As a brief introduction, the header of tuplesort.c says:

[…] This module handles sorting of heap tuples, index tuples, or single Datums (and could easily support other kinds of sortable objects, if necessary). It works efficiently for both small and large amounts of data. Small amounts are sorted in-memory using qsort(). Large amounts are sorted using temporary files and a standard external sort algorithm.

See Knuth, volume 3, for more than you want to know about the external sorting algorithm. We divide the input into sorted runs using replacement selection, in the form of a priority tree implemented as a heap (essentially his Algorithm 5.2.3H), then merge the runs using polyphase merge, Knuth’s Algorithm 5.4.2D. The logical “tapes” used by Algorithm D are imp

[continue reading]

Q: So, what is TABLESAMPLE actually good for?

A: To speed things up that can be answered from a small sample instead of the whole population!

Like so:

CREATE TABLE public.ts_test
  id integer NOT NULL DEFAULT nextval('ts_test_id_seq'::regclass),
  val double precision,
  CONSTRAINT ts_test_pkey PRIMARY KEY (id)

Insert 100 million random values in the range 0.0 <= x < 1.0:

INSERT INTO public.ts_test (val)
    generate_series(1, 100000000);

And now, some queries to get the standard deviation.

For the whole population:

SELECT round(stddev_pop(val)::numeric,5) FROM ts_test;

0.28869 in 15005 ms
Error: 0%
Speedup: 1x

For a sample of size 20000 (good for 1% error and 99% confidence):

SELECT round(stddev_samp(val)::numeric,5) FROM ts_test TABLESAMPLE BERNOULLI(0.02);

0.28867 in 2387 ms
Error: < 1%
Speedup: 6.3x

SELECT round(stddev_samp(val)::numeric,5) FROM ts_test TABLESAMPLE SYSTEM(0.02);

0.28791 in 25 ms
Error: < 1%
Speedup: 600x

With a table that large, both methods BERNOULLI and SYSTEM are accurate enough to give results well below the 1% error target.

(For potential problems with SYSTEM, see this post)

SYSTEM aces BERNOULLI with a speedup of 600x vs. 6.3x relative to a full population query.

And this is hard to beat...
On 17th of July, Robert Haas committed patch: Add new function pg_notification_queue_usage.   This tells you what fraction of NOTIFY's queue is currently filled.   Brendan Jurd, reviewed by Merlin Moncure and Gurjeet Singh. A few further tweaks by me. LISTEN / NOTIFY are very cool features, that don't get enough recognition. For completeness sake […]
Posted by Federico Campoli on 2015-08-17 at 21:19:00
Friday 14th August we kicked off the Brighton PostgreSQL  Meetup.
We had a nice evening with cool people all togheter discussing about PostgreSQL and how we can run effectively the meetup.

We decided to have a regular monthly meetup hosted around Brighton, possibly, by companies or any suitable venue.

The next meetup will be the 25th of September and this time there will be some PostgreSQL talks. The general interest favours the standby servers and the streaming replication. Another topic could be the query planning and execution.
I can cover both of the topics. We just need to decide how many talks will be in one hour session.

After the meetup we’ll head to the nearest pub for a nice beer.

I can only say a big thank you, to all the meetup members.

That’s all for now.