The world's most advanced open source database
Top posters
Number of posts in the past month
Top teams
Number of posts in the past month
  • Policy for being listed on Planet PostgreSQL.
  • Add your blog to Planet PostgreSQL.
  • List of all subscribed blogs.
  • Manage your registration.
  • Get in touch with the Planet PostgreSQL administrators at planet at

Just to make people aware of it, the following commit has reached the Postgres land a couple of days ago in all the active branches of the project, down to 9.1:

commit: 7e2a18a9161fee7e67642863f72b51d77d3e996f
author: Tom Lane <>
date: Tue, 6 Oct 2015 17:15:52 -0400
Perform an immediate shutdown if the file is removed.

The postmaster now checks every minute or so (worst case, at most two
minutes) that is still there and still contains its own PID.
If not, it performs an immediate shutdown, as though it had received

The original goal behind this change was to ensure that failed buildfarm
runs would get fully cleaned up, even if the test scripts had left a
postmaster running, which is not an infrequent occurrence.  When the
buildfarm script removes a test postmaster's $PGDATA directory, its next
check on will fail and cause it to exit.  Previously, manual
intervention was often needed to get rid of such orphaned postmasters,
since they'd block new test postmasters from obtaining the expected socket

However, by checking and not something else, we can provide
additional robustness: manual removal of is a frequent DBA
mistake, and now we can at least limit the damage that will ensue if a new
postmaster is started while the old one is still alive.

Back-patch to all supported branches, since we won't get the desired
improvement in buildfarm reliability otherwise.

While the commit log is already very descriptive on the matter, the idea is that if has been replaced by something else or has simply been removed, the postmaster will decide by itself to perform hara-kiri and stop as if an immediate shutdown has been initiated. At next restart this instance will then perform recovery actions, like that for example:

$ rm $PGDATA/
[wait a bit]
$ tail -n 4 $PGDATA/pg_log/some_log_file
LOG:  performing immediate shutdown because data directory lock file is invalid
LOG:  received immed

[continue reading]

On 3rd of October, Andres Freund committed patch: Add CASCADE support for CREATE EXTENSION.   Without CASCADE, if an extension has an unfullfilled dependency on another extension, CREATE EXTENSION ERRORs out with "required extension ... is not installed". That is annoying, especially when that dependency is an implementation detail of the extension, rather than something […]
Posted by Christophe Pettus in pgExperts on 2015-10-08 at 19:41:26

SERIAL (32 bit integer) or BIGSERIAL (64 bit integer) are the first choice for most people for a synthetic primary key. They’re easy, they’re comprehensible, and they’re transaction-safe. The values that come out of them are, at least to start, manageable and human-scale. They can also provide an easy sortation on creation order.

They’re not perfect, though: If you have to merge together two tables that were generated using SERIALs, you have a massive key update ahead of you to avoid conflicts. If you use SERIAL, exhausting the range is a possibility. If you have a sharded database, you need some way of keeping the sequences independent, such as different starting offsets (but what if you get the offset wrong?) or creating them using different increments (but what if you add another server)

A good alternative is using UUIDs, generated using the uuid_generate_v4() function in PostgreSQL’s uuid-ossp contrib module. This makes mergers much easier, and guarantees independence across multiple tables.

But UUIDs are 128 bits, not 64, and require a function call to generate. How much of a problem is that, really?

As a test, I created a table with a single primary key column and a single float field:

   id <type>,
   f float,

<type> could be one of three possibilities:

  • UUID, using uuid_generate_v4().
  • BIGINT, using the next_id function from Instagram.

The test inserted 10,000,000 rows into the table. In one run, it did a COMMIT after each INSERT; in the other, a single COMMIT after all INSERTs. This was on PostgreSQL 9.4.4 on an AWS i2.2xlarge instance, with the two SSDs in a RAID-0 as the database volume.

The results were:

COMMITing after each INSERT:

column type time (s) size (MB)
BIGSERIAL 4262 636.7
UUID 4367 890.0
BIGINT 4624 636.7


column type time (s) size (MB)
BIGSERIAL 898 636.7
UUID 991 890.0
BIGINT 1147 636.7

Overall, the INSERT time for UUIDs was slightly longer than that for BIGSERIAL, but not appreciably. The BIGINT colu

[continue reading]

Posted by Andrew Dunstan in pgExperts on 2015-10-08 at 17:27:00
One of the things that is nice about Fedora is how easy it is to test something very quickly. They provide a considerable number of cloud images suitable for spinning up instances fast. In my case I have a machine with Vagrant and VirtualBox installed, and when we got a complaint recently about PostgreSQL misbehaving with some glibc locales in fedora 23, I wanted to test it. After downloading the image I wanted I was able to get running with this:

vagrant box add --name=f23
mkdir f23
cd f23
vagrant init f23
vagrant ssh

It took me about 5 minutes from the time I discovered the image I wanted to test to starting on setting up to build PostgreSQL on it, and most of that was download time.
Posted by Andrew Dunstan in pgExperts on 2015-10-08 at 16:36:00
Redis Cluster is a way of sharding data across a group of federated Redis nodes. I was asked recently if this is supported by the Redis Foreign Data Wrapper. Unfortunately, the answer is "No." The Redis site coyly states that "At this stage one of the problems with Redis Cluster is the lack of client libraries implementations." What they don't mention is that their own C client library, hiredis, has no support for it at all, happily reporting back redirects to the client rather than following them. It's rather as if a web client library refused to follow an HTTP redirect. The Redis FDW is based on hiredis, so as the library doesn't support Redis Cluster, neither do we. Of course, at the expense of very large amounts of code we could make the FDW handle it, but this seems quite wrong architecturally. The FDW should not have to know or care that the Redis server it's connected to is sharded.
Posted by Luca Ferrari in ITPUG on 2015-10-07 at 22:22:00
The ninth edition of the Italian PGDay (PGDay.IT 2015) is really close, and ITPUG is proud to announce that the schedule is available on-line.
As in the previous editions we have a rich set of talks and contributions, as well as the third edition of the ITPUG's own Open Space named ITPUG-Lab.

Check out the official website at and see you soon at PGDay.IT 2015!
Posted by Paul Ramsey on 2015-10-07 at 00:00:00

PostGIS 2.2.0 is released! Over the last two years a number of interesting new features have been added, such as:

See the full list of changes in the news file and please report bugs that you find in the release.

The source download is available here:

Binary packages will appear in repositories over the coming weeks as packagers roll out builds.

View all closed tickets for 2.2.0.

Square's Ryan Lowe, production engineer, and John Cesario, platform engineer, spoke to us recently about PGConf Silicon Valley which is November 17-18 at the South San Francisco Conference Center. 

Square creates technology that helps sellers of all sizes unlock the value of every sale. Square's register service is a full point-of-sale with tools for every part of running a business, from accepting credit cards and tracking inventory, to real-time analytics and invoicing.

Ryan and John will co-present a session titled "Postgres for MySQL DBAs." I talked with them recently about their session and the conference.

Terry: Ryan, please tell us a bit about the professional journey you've traveled to get where you are today at Square.

Ryan: I started off as a MS SQL Server DBA in the healthcare space before moving on to manage Oracle databases at a telco. It was there that I was first tasked with running PostgreSQL (6.5) as a backend to our VoIP suite. Since then, I bounced around industries managing primarily Postgres and/or MySQL before finding my way to Square, where I manage both as part of the Platform Engineering team.

Terry: John, the same question to you. Please tell us a bit about your background.

John: I started working in tech with Drupal hosted solution providers, where I got my first experiences with databases. I then moved on to larger hosting providers and helped to manage the database platform. Now, I'm part of the Online Data Storage team supporting multiple database platforms at Square.

Terry: Ryan, John, your session is titled, appropriately enough, "Postgres for MySQL DBAs." Why should a MySQL DBA attend this talk?

Ryan: Postgres can seem very unfamiliar to people who have used MySQL for many years. Although the general RDBMS concepts remain largely similar, implementation and terminology differences tend to cause steeper learning curves than necessary.

John: MySQL DBAs who intend to explore PostgreSQL as an alternative, or who are only casually administering it now, will avoid a lot of frustration

[continue reading]

Barman 1.5.0 enhances the robustness and business continuity capabilities of PostgreSQL clusters, integrating the get-wal command with any standby server’s restore_command.

In this blog article I will go over the reasons behind this feature and briefly describe it.

One of the initial ideas we had in mind when conceiving Barman was to make it, one day, a very large basin of WAL files, collected from one or more PostgreSQL servers within the same organisation.

The internal codename of this feature was “WAL hub” and, in our mind, its main purpose was to allow any application (e.g. standby) to easily request and receive any WAL file for a given server, by enhancing Barman’s command line interface and, ultimately, by implementing server support for PostgreSQL’s streaming replication.

The first leg of this journey has now been reached, and it is part of Barman 1.5.0: the get-wal command.

Firstly, why would such a feature be needed?

Barman currently supports only WAL shipping from the master using archive_command (Postgres 9.5 will allow users to ship WAL files from standby servers too through “archive_mode = always” and we’ll be thinking of something for the next Barman release).

However, very frequently we see users that prefer to ship WAL files from Postgres to multiple locations, such as one or more standby servers or another file storage destination.

In these cases, the logic of these archive command scripts can easily become quite complex and, in the long term, dangerously slip through your grasp (especially if PostgreSQL and its operating system are not under strict monitoring).

Also, if you too are a long time user of PostgreSQL, you are probably empathising with me while remembering the “pre-9.0 era” when file-based log shipping was the only available option for replication.

Nowadays, on the contrary, the amount of possibilities you have in terms of design of PostgreSQL architectures in business continuity environments is practically endless, as you can combine native features such as streaming replication (bo

[continue reading]

The reason why I do benchmarks is that, in my experience, many recommended settings are effectively myths. Many are perfectly fine, but a surprisingly large part either was never quite true or is based on data that got obsolete for various reasons (e.g. changes in hardware or software). So the verification of the facts is rather imporant to make good decisions, which is why I've been recently benchmarking file systems available on Linux, briefly discussed in the posts on EXT4 vs. XFS, BTRFS and Reiser.

In this post, I'd like to briefly look at a potential myth when it comes to databases and I/O schedulers. The purpose of the scheduler is to decide in what order to submit I/O requests (received from the user) to the hardware - it may decide to reorder them, coalesce requests into larger continuous chunks, prioritize requests from some processes etc.

Linux currently (kernel 4.0.4) offers three schedulers (there used to be anticipatory scheduler, but it got removed in kernel 2.6.33):

  • cfq (default) - attempts to do "completely fair queueing" for various processes, with the possibility to lower/increase priority of some processes using ionice, and also applies a fair amount of magic (reordering, coalescing etc.)
  • noop - probably the simplest scheduler, pretty much just a FIFO with trivial merging of requests
  • deadline - attempts to guarantee start time of serving the request, prioritises reads over writes and performs various other things

I'm not going to discuss cfq here - it's a fairly complex scheduler attempting to reorder the requests in various ways, support priorities, different handling for synchronous and asynchronous requests and such. Instead, let's look at the simpler scheduler and why they might work better in some cases.


The usual wisdom used to be that for databases, deadline scheduler works best. The reasoning behind this recommendation is quite nicely explained for example here, but let me quote the main part:

This tends to reduce the merging opportunity for reads as they are dispatched q

[continue reading]

One of the upcoming features in PostgreSQL 9.5 is foreign table inheritance, which offers a suite of new possibilities for users of foreign data wrappers. An exciting possibility is that you will be able to create a single table in which recent data is stored in a row-based format for fast appending and look-ups, while older data is stored in a columnar format for compression and fast analytics using cstore_fdw

An example use-case might be a news organization which has frequent look-ups of recent data (e.g. to generate news feeds), while older data is used for analytics (e.g., find correlations in events). We use the gdelt dataset containing hundreds of millions of news events as a reference. This use-case will benefit greatly from using row-based storage for recent data, and columnar storage for older data, so let's see how we will be able to do this in PostgreSQL 9.5. The current release of cstore_fdw is not yet 9.5 compatible, but a preview is available on the 9.5-compatible branch.

To get started, we can create the events table and the row-based events_current table that inherits from it (see full source). When querying the events table, postgres includes inherited tables in the results.

    event_date int,
CREATE TABLE events_current () INHERITS(events);

We also set up a trigger that routes INSERTs on the events table to the events_current table.

AS $$
    INSERT INTO events_current VALUES (NEW.*);
$$ LANGUAGE plpgsql;

FOR EACH ROW EXECUTE PROCEDURE events_insert_trigger();

We can now start adding new data to our event table, which will use row-based storage. The gdelt dataset comprises over 100GB of news events in easy-to-digest CSV format. It can be copied into the events table using:

\COPY events FROM '20150101.export.CSV' WITH (NULL '')

After a while we can migrate older data to a cstore_fdw partition. We first cr

[continue reading]

Posted by Shaun M. Thomas on 2015-10-02 at 21:33:10

I’ve been talking about partitions a lot recently, and I’ve painted them in a very positive light. Postgres partitions are a great way to distribute data along a logical grouping and work best when data is addressed in a fairly isloated manner. But what happens if we direct a basic query at a partitioned table in such a way that we ignore the allocation scheme? Well, what happens isn’t pretty. Let’s explore in more detail.

Let’s use the fancy partitioning I introduced a while back. Using this structure, I ran a bunch of tests with a slight modification to my python script. I changed the reading_date granularity to daily, and had the script itself create and destroy the partitions. This allowed me to run several iterations and then meter the performance of queries on the resulting output.

The relevant python chunk looks like this:

end_stamp = stamp + timedelta(days = 1)
part = ''
if j > 0:
    part = '_part_%d%02d%02d' % (stamp.year, stamp.month,
    cur.execute("DROP TABLE IF EXISTS sensor_log%s" % part)
      "CREATE TABLE sensor_log%s (" % part +
      "LIKE sensor_log INCLUDING ALL," +
      "  CHECK (reading_date >= '%s' AND" % +
      "    reading_date < '%s')" % +
      ") INHERITS (sensor_log)"

With that, the script will assume daily granularity and build the check constraints properly so constraint exclusion works as expected. To get an idea of how partitions scale with basic queries, I ran the script with 10, 20, 50, 100, 150, and 250 days. This gave a wide distribution of partition counts and the numbers I got made it fairly clear what kind of drawbacks exist.

All tests on the partition sets used these two queries:

-- Check 100 distributed values in the partitions.
  FROM sensor_log
 WHERE sensor_log_id IN (
         SELECT generate_series(1, 10000000, 100000)
-- Check 100 random values in the partitions.
  FROM sensor_log
 WHERE sensor_

[continue reading]

PostgreSQL 9.5 will have support for a feature that is popularly known as "UPSERT" - the ability to either insert or update a row according to whether an existing row with the same key exists. If such a row already exists, the implementation should update it. If not, a new row should be inserted. This is supported by way of a new high level syntax (a clause that extends the INSERT statement) that more or less relieves the application developer from having to give any thought to race conditions. This common operation for client applications is set to become far simpler and far less error-prone than legacy ad-hoc approaches to UPSERT involving subtransactions.

When we worked on UPSERT, many edge-cases were carefully considered. A technique called "unique index inference" allows DML statement authors to be very explicit about what condition they want to take the alternative (UPDATE or NOTHING) path on. That alternative path can only be taken in the event of a would-be duplicate violation (for the DO NOTHING variant, a would-be exclusion violation is also a possible reason to take the alternative NOTHING path). The ability to write UPSERT statements explicitly and safely while also having lots of flexibility is an important differentiator for PostgreSQL's UPSERT in my view.

As the 9.5 INSERT documentation explains, the inference syntax contains one or more column_name_index (columns) and/or expression_index expressions (expressions), and perhaps an optional index_predicate (for partial unique indexes, which are technically not constraints at all). This is internally used to figure out which of any available unique indexes ought to be considered as an arbiter of taking the alternative path. If none can be found, the optimizer raises an error.

The inference syntax is very flexible, and very tolerant of variations in column ordering, whether or not a partial unique index predicate is satisfied, and several other things. It can infer multiple unique indexes at a time, which is usually not necessary, but can be in th

[continue reading]

Posted by Andrew Dunstan in pgExperts on 2015-10-02 at 15:03:00
A conversation yesterday with a client: "What changes were made in the PostgreSQL configs?" "I don't recall exactly." This is why you should keep all the configs under version control, and have a regular monitoring check that the controlled version is the same as the operating version, so if someone makes a manual change without checking it in the monitoring alarm will go off.

If you keep your config in the data directory, in the default PostgreSQL style, probably the simplest way to do this is to create a git repository with a symbolic link to the data directory and a .gitignore that ignores everything in the data directory but the config files. The add the config files and you're done. If you keep the config files separately, as Debian does, then you can probably just create a git repository right in the config directory if you don't already have one at a higher level.

I have been guilty of not doing this in the past, but I have come to the strong conclusion that this should be a part of standard best practice.
Posted by gabrielle roth on 2015-10-02 at 00:54:00

When: 6-8pm Thursday October 15, 2015
Where: Iovation
Who: Gavin McQuillan
What: Sharded Postgres with PLProxy

Data of Future Past: Postgres as Distributed Online Processing Analytics Engine

Postgres has long been a great analysis tool, but as organizations seek to decentralize their data, it’s overlooked for other solutions. Maybe this doesn’t need to be the case. Using a novel combination of tools, old and new, Postgres can also decentralize its data. We can maintain the immensely powerful interface that we love, while meeting (or exceeding) throughput decentralized datastores provide. In this talk we see how this is possible using Foreign Data Wrappers (PLProxy) and HyperLogLog (Postgres-HLL) extensions, detailing benefits and constraints to this method. We explore a reimplementation of a complex Online Processing Analytics hypercube in Hbase with Postgres and measure its effectiveness, throughput, latency, and cost savings.

Gavin is a longtime PostgreSQL enthusiast, specializing in building automation and analytics solutions, distributed systems, scaling and securing web applications. His experience includes Google SRE, software development and leading teams in young technology startups. He lives in Portland with his family, bicycles, and hand planes.

If you have a job posting or event you would like me to announce at the meeting, please send it along. The deadline for inclusion is 5pm the day before the meeting.

Our meeting will be held at Iovation, on the 32nd floor of the US Bancorp Tower at 111 SW 5th (5th & Oak). It’s right on the Green & Yellow Max lines. Underground bike parking is available in the parking garage; outdoors all around the block in the usual spots. No bikes in the office, sorry!

Elevators open at 5:45 and building security closes access to the floor at 6:30.

See you there!

Andreas 'ads' Scherbaum
Pivotal hosted the September PostgreSQL Conference in Beijing. I posted about this event in the Pivotal blog.
When I first started working with docker last year, there was a clear pattern already out there the docker image itself only consists of application binary (and depending on the philosophy - the entire OS libraries that are required) and all application data goes in a volume.

Also the concept  called "Data Container" also seemed to be little popular at that time.  Not everyone bought into that philosophy and there were various other patterns emerging out then on how people used volumes with their docker containers.

One of the emerging pattern was (or still is)  "Data Initialization if it does not exist" during container startup.
Let's face it, when we first start a docker container consisting of say PostgreSQL 9.4 database the volume is an empty file system. We then do an initdb and setup a database so that it is ready to serve.
The simplest way is to  check if the data directory has data in it and if it does not have data, then run initdb and setup the most common best practices of the database and serve it up.

Where's the simplest place to do this? In the entrypoint script of docker container of course.

I did the same mistake in my jkshah/postgres:9.4 image too. In fact I still see that same pattern in the official postgres docker image also where it looks for PG_VERSION and if it does not exists then it runs initdb.

if [ ! -s "$PGDATA/PG_VERSION" ]; then
    gosu postgres initdb


This certainly has advantages:
1. Very simple to code the script.
2. Great Out of the box experience - You start the container up - the container sets itself up and it is ready to use.

Lets look what happens next in real life enterprise usages.

We got in scenarios while the applications using such databases are running but they lost all data in it. Hmm what's going wrong here? The application is working fine, the database is working fine, but all data is like it was freshly deployed and not something that was running well for 3-5 months.

Let's look at various activities that an enterprise will typically do with such

[continue reading]

Matthew Kelly, TripAdvisor's in-house PostgreSQL guru, spoke to us recently about PGConf Silicon Valley which is November 17-18 at the South San Francisco Conference Center. Early Bird pricing ends October 4th.

Matthew's talk at PGConf Silicon Valley 2015 is At the Heart of a Giant: Postgres at TripAdvisor. Matthew said the PostgreSQL database server has been at the core of the company's infrastructure since the "early days as a scrappy start up" 15 years ago.

PostgreSQL is the database powering TripAdvisor’s 250 million reviews from inquisitive travellers around the world. With more than 375 million unique monthly visitors, TripAdvisor is the world's largest travel site and has been helping travelers plan and book the perfect trip since 2000.

Matthew joined TripAdvisor first as an intern during his computer science studies at the University of Massachusetts Amherst and was hired full-time following his graduation in 2012. I sat down with him virtually recently and here's what he had to say…

Terry: How is Postgres used at TripAdvisor today?

Matthew: Postgres is the backbone of our entire core production infrastructure here at TripAdvisor. On any given day, our Postgres infrastructure throughout the company might have serviced more than 1 billion queries. These will range from the OLTP sort of operations you would expect from a high end website to analytics to real time anomaly processing to fraud detection. We even use it to hold the photos that our in our upload queue. We run numerous different hardware configurations and workloads. Sure we have Hadoop for some back office processing but Postgres really is the primary workhorse of our data infrastructure.

Terry: TripAdvisor gets a massive amount of traffic. Similar companies operating at that scale are using or considering WebscaleSQL – which as you know is a collaboration among engineers from several companies facing similar challenges in running MySQL at scale. How does PostgreSQL stack up against it in terms of forming the backbone of a popular site like you

[continue reading]

Since its introduction in 9.1, synchronous replication, or the ability to wait for a WAL flush confirmation from a standby before committing a transaction on the master node (depends on synchronous_commit whose default value is on), ensuring that the transaction committed has not been lost on the standby should a failover need to be done, has faced a wide adoption in many production environments for applications that need a no-data-loss scenario. Still, such application or clients may have seen something like that:

Cancel request sent
WARNING:  01000: canceling wait for synchronous replication due to user request
DETAIL:  The transaction has already committed locally, but might not have been replicated to the standby.
LOCATION:  SyncRepWaitForLSN, syncrep.c:217

What happens here is that the commit query has remained stuck for a couple of seconds because it was keeping waiting for the flush confirmation that was not coming. Depending on the outage of the standby, this could take a while, so here what happened is that the query has been manually cancelled, then transaction has been committed on the master without the confirmation coming from the standby. One of the ways to check if a backend is being stuck similarly to the above is to have a look at the output of ps, to find something like that:

$ ps x | grep postgres | grep waiting
15323   ??  Ss     0:00.06 postgres: easteregg easteregg [local] COMMIT waiting for 0/1797488

But that's not particularly useful when user is not able to connect directly to the node involved (usually that should be the case for a superuser of this database instance), and the catalog table pg_stat_activity does not offer a way to know if the backend is really stuck at commit because of synchronous replication or because of another lock.

=# SELECT pid, state, waiting FROM pg_stat_activity
   WHERE lower(query) = 'commit;';
  pid  | state  | waiting
 15323 | active | f
(1 row)

Note that there is a patch submitted for integration t

[continue reading]

On October 8th at 17:30 the second Dutch PostgreSQL meetup will be held in Amsterdam. Please join us for some very interesting talks.

Andres Freund, a PostgreSQL committer and developer working at Citus Data, will speak about major scalability improvements in recent PostgreSQL versions and what the biggest remaining scalability concerns are. Some of the remaining issues can mainly be addressed by changing Postgres, others can be worked around on the application side.

Marco Slot from Citus Data will give a talk about the open source pg_shard extension, which lets you seamlessly shard and scale out PostgreSQL tables across many machines. We will look at how to use it for building a scalable document store and demonstrate some upcoming features.

Finally, Reiner Peterke from Splendid Data will show pg_insight, a flexible tool for collecting historical statistics information using the Postgres statistics infrastructure. It integrates with elastic search for graphical representation of the data from performance monitoring and trending.

You can RSVP on the PostgreSQL Meetup. Hope to see you there!

Come to the Dutch PostgreSQL Meetup in Amsterdam on October 8th

Talks by Andres Freund, Marco Slot, and Reiner Peterke

Modern servers provide people with more and more CPU cores. 16 cores or more are not uncommon these days. Unfortunately PostgreSQL cannot scale a single query to more than one core. A lot of effort is going into this issue already. Cybertec experiments To address the issue we have done some experiments to see, what […]
Posted by gabrielle roth on 2015-09-29 at 01:37:00
We’ve wrapped up another PostgresOpen! The official numbers aren’t in yet, but we were up a significant percentage from last year. We had so many last-minute registrants and walk-ups that we ran out of extra badges. Even better, I got to attend most of the talks on my list. I can’t pick a favorite, but […]
Posted by Tomas Vondra in 2ndQuadrant on 2015-09-28 at 17:40:00

In the last two post, I've briefly discussed results of a benchmark measuring PostgreSQL performance with different filesystems. The first post compared EXT4 and XFS, and the second one was mostly dealing with the problems I've ran into when benchmarking BTRFS.

In the discussion, I've been asked to benchmark ReiserFS. I haven't really planned to benchmark this file system because ... Let's say ReiserFS does not have the best reputation when it comes to reliability due to historical reasons. According to some sources the motto of the development was essentially "performance over correctness" which is approach database folks don't really like. I don't know how much that's true for the current state of ReiserFS, or how that applies to the new ReiserFS 4 (which is not merged and I don't think it'll ever happen).

But database people are rather conservative when it comes to storage reliability, so even if all the issues got fixed and the current version is reliable, it's unlikely to become very popular in this group soon. Factor in the fact that the development of ReiserFS pretty much stopped (in favor of the Reiser4, which did not get merged for years), so the code pretty much in maintenance mode only. Which is not necessarily bad, but it also means no significant reworks necessary for new features and performance improvements (which was the purpose of Reiser4).

So let's see if ReiserFS 3 actually performs so much better than the common file systems (EXT4 and XFS), as the users often believe ...

The short answer is "no". At least not on the workload measured by pgbench (a lot of random I/O) and on this particular hardware (good SSD), ReiserFS 3 actually performs worse than both XFS and EXT4.


On read-only benchmarks, all the filesystems perform pretty much exactly the same, irrespectedly of the amount of data. For example on the large data set, you'll get something like this:


The chart shows throughput (transactions per second) with different number of clients, for those three filesystems. And there's pret

[continue reading]

Posted by Pavel Stehule on 2015-09-28 at 04:32:00
I wrote simple extension session_exec. This extension enforce running some specified function before user is connected to PostgreSQL. This function can be used for some complex user management.


session_preload_libraries to session_exec
session_exec.login_name to login


# first login
[pavel@dhcppc4 plpgsql_check]$ psql
WARNING: function "login()" does not exist

postgres-# RETURNS void AS $$
postgres$# BEGIN
postgres$# IF current_database() <> 'admindb' AND session_user = 'postgres' THEN
postgres$# RAISE EXCEPTION 'user postgres is blocked';
postgres$# ELSE
postgres$# RAISE NOTICE 'Hello, %', session_user;
postgres$# END IF;
postgres$# RETURN;
postgres$# END;
postgres$# $$ LANGUAGE plpgsql;

# new login
[pavel@dhcppc4 plpgsql_check]$ psql
NOTICE: Hello, pavel

-- try to login as Postgres
postgres=# \c postgres postgres
FATAL: unhandled exception in login function "login"
DETAIL: user postgres is blocked
CONTEXT: PL/pgSQL function login() line 4 at RAISE
session_exec: perform login function "login"
Previous connection kept

This is example only. For this specific use case the modification of pg_hba.conf is better. But you can implement more complex logic in plpgsql. The login function can be specific for database or for user via ALTER DATABASE xx SET or ALTER ROLE xx SET.


When you use login function, the connect to database will be little bit slower. Use it only when it is necessary.

There are two PostgreSQL FDWs (currently maintained) I know of for connecting to SQL Server from a Linux/Unix PostgreSQL box. There is the TDS Foreign Data wrapper (tds_fdw driver) which relies on the Free TDS driver. This is a fairly light-weight FDW since it just relies on TDS which is commonly already available on Linux installs or an easy install away. Unfortunately when I tried to use it on windows (compiling my usual mingw64 way), while it compiled and installed, it crashed when I attempted to connect to my SQL Server 2008 R2 box table, so I gave up on it for the time being as a cross-platform solution. One thing I will say about it is that it accepts ad-hoc queries from what I can see, as a data source, which is pretty nice. So we may revisit it in the future to see if we can get it to work on windows. I'm not sure if tds_fdw would support SQL Server spatial geometry columns though would be interesting to try.

The second option, which as you may have noticed, we spent much time talking about is the ogr_fdw foreign data driver. ogr_fdw utilizes UnixODBC on Linux, iODBC on MacOSX and Windows ODBC on windows for connecting to SQL Server. The ogr_fdw big downside is that it has a dependency on GDAL, which is a hefty FOSS swiss-army knife ETL tool that is a staple of all sorts of spatial folks doing both open source and proprietary development. The good thing about ogr_fdw, is that since it is a spatial driver, it knows how to translate SQL Server geometry to it's equivalent PostGIS form in addition to being able to handle most of the other not-so spatial columns.

Continue reading "Connecting to SQL Server from Linux using FDWs"
Posted by Josh Berkus on 2015-09-26 at 03:49:03

On October 15th Josh Berkus will be reprising EXPLAIN Explained, his presentation from Postgres Open, at the Pandora offices in downtown Oakland. This talk will cover how to read EXPLAIN and EXPLAIN ANALYZE, and is aimed at developers and new DBAs. Pandora staff will also do a lightning talk.

On November 17th and 18th is pgConfSV. Registration is open and the full schedule is up. Be there!

On November 19th, the day after pgConfSV, Tableau is hosting an extended meetup which will cover using Tableau with PostgreSQL, Joe Conway on PL/R, and Álvaro Hernández Tortosa on Logical Decoding. This meetup will be in Palo Alto, and is expected to fill up very soon.

We are still looking to schedule a meetup in the city of San Francisco for late October or early November, but need a speaker for it. If you wanted to present, and didn't submit in time for pgConfSV, now's you chance. Contact the organizers through the meetup.

Posted by Shaun M. Thomas on 2015-09-25 at 17:07:41

This PG Phriday is going to be a bit different. During my trip to Postgres Open this year, I attended a talk I had originally written off as “some Red Hat stuff.” But I saw the word “containers” in the PostgreSQL in Containers at Scale talk and became intrigued. A few days later, I had something of an epiphany: I’ve been wrong about servers for years; we all have.

That’s a pretty bold claim, so it needs some background.

Fairly often in infrastructure and development, we see servers like this diagram:


But what’s the most important thing in the diagram? Can you even tell? Well, everything except the AppData component can be replaced. The primary goal of highly available design is building an architecture to ensure this is the case. I don’t care where the data lives, or what server is hosting it, so long as certain performance metrics are supplied by the underlying hardware. But from that diagram, the most important element would appear to be the servers themselves.

This kind of thinking is a trap, and one pretty much everyone I’ve met in the industry is guilty of falling into. In a two-server Postgres stack, where one server is the primary database and the other is a replica, we tend to gravitate toward preserving the original 1->2 orientation. This may be the case even if retaining this arrangement requires a second failover after an outage or some kind of maintenance. For some reason, the physical server gets the focus. How many times have you seen this? I can’t even count the number of times I’ve encountered servers named after their primary role as a database host, yet I have to physically log in to find out what the databases are. Sure, sometimes there’s a Wiki page of some kind outlining the data to host relationship, but is that really putting the data first?

That’s what I mean when I say we’ve all had it wrong. This obviously isn’t absolutely true, but the prevailing momentum still treats servers as important. They’re not. They’re a medium for whatever is running on them. Businesses with dozens or hundre

[continue reading]

Posted by Markus Winand on 2015-09-25 at 15:26:45

I've just launched my new website It's very small right now—just six content pages—but it will grow over next month and years and might eventually become a book like did. So I though I'd better get you on board right now so you can grow as the site grows.

So what is modern SQL about? Yet another SQL reference? Absolutely not. There are definitively enough references out there. And they all suffer from the same issues:

  • They hardly cover more than entry-level SQL-92. Some tutorials don't even mention that they describe a proprietary dialect.

  • The examples just show the syntax but not how to solve real-world problems.

  • They don't document the availability of the features amongst different database products.

No question, the latter ones are a consequence of the first one. However, modern SQL is different:

  • modern SQL covers SQL:2011—the current release of the ISO SQL standard. But it adheres to the fact that many databases are developed against elder releases or only support a subset of the standard.

  • modern SQL shows how to use the most recent SQL features to solve real world problems. It also provides “conforming alternatives” as a way to solve the same problem using other (often elder) standard methods. If necessary and possible it will also show “proprietary alternatives.”

    Note that there is a hierarchy: the recommended (most idiomatic) approach is shown first, standard conforming alternatives next and proprietary alternatives last but only if inevitable.

  • modern SQL documents the availability of the features amongst six SQL databases (the example is about values):

Go here if you don't see an image above.

Click on a feature (e.g., “valid where select is valid”) to see how the availability of the features has evolved.

There you can see that SQLite just recently started to accept values where select is valid. The timeline view also documents the last checked version for those databases that don't support a feature.

Inevitably, modern SQL will become both: an homage

[continue reading]

The recent release of PgBouncer 1.6, a connection pooler for Postgres, brought a number of new features. The two I want to demonstrate today are the per-database and per-use pool_modes. To get this effect previously, one had to run separate instances of PgBouncer. As we shall see, a single instance can now run different pool_modes seamlessly.

There are three pool modes available in PgBouncer, representing how aggressive the pooling becomes: session mode, transaction mode, and statement mode.

Session pool mode is the default, and simply allows you to avoid the startup costs for new connections. PgBouncer connects to Postgres, keeps the connection open, and hands it off to clients connecting to PgBouncer. This handoff is faster than connecting to Postgres itself, as no new backends need to be spawned. However, it offers no other benefits, and many clients/applications already do their own connection pooling, making this the least useful pool mode.

Transaction pool mode is probably the most useful one. It works by keeping a client attached to the same Postgres backend for the duration of a transaction only. In this way, many clients can share the same backend connection, making it possible for Postgres to handle a large number of clients with a small max_connections setting (each of which consumes resources).

Statement pool mode is the most interesting one, as it makes no promises at all to the client about maintaining the same Postgres backend. In other words, every time a client runs a SQL statement, PgBouncer releases that connections back to the pool. This can make for some great performance gains, although it has drawbacks, the primary one being no multi-statement transactions.

To demonstrate the new pool_mode features, I decided to try out a new service mentioned by a coworker, called Scaleway. Like Amazon Web Services (AWS), it offers quick-to-create cloud servers, ideal for testing and demonstrating. The unique things about Scaleway is the servers are all ARM-based SSDs. Mini-

[continue reading]

Posted by Paul Ramsey on 2015-09-24 at 00:00:00

This is it, PostGIS 2.2 is almost out the door, so we’re looking for testing and feedback! Please give the release candidate a try and report back any issues you encounter.

View all closed tickets for 2.2.0.