The world's most advanced open source database
Top posters
Number of posts in the past month
Top teams
Number of posts in the past month
  • Policy for being listed on Planet PostgreSQL.
  • Add your blog to Planet PostgreSQL.
  • List of all subscribed blogs.
  • Manage your registration.
  • Get in touch with the Planet PostgreSQL administrators at planet at
Posted by Christophe Pettus in pgExperts on 2015-03-29 at 02:42:19

The slides from my talk at PGConf US 2015 are now available.

Posted by Josh Berkus in pgExperts on 2015-03-28 at 18:34:00
I have a data analytics project which produces multiple statistical metrics for a large volume of sensor data.  This includes percentiles (like median and 90%) as well as average, min and max.  Originally this worked using PL/R, which was pretty good except that some of the R modules were crashy, which was not so great for uptime.

This is why, two years ago, I ripped out all of the PL/R and replaced it with PL/Python and SciPy.  I love SciPy because it gives me everything I liked about R, without most of the things I didn't like.  But now, I've ripped out the SciPy as well.  What am I replacing it with?  Well, SQL.

In version 9.4, Andrew Gierth added support for percentiles to PostgreSQL via WITHIN GROUP aggregates. As far as I'm concerned, this is second only to JSONB in reasons to use 9.4.

Now, one of the more complicated uses I make of aggregates is doing "circular" aggregates, that is producing percentiles for a set of circular directions in an effort to determine the most common facings for certain devices.  Here's the PL/Python function I wrote for this, which calculates circular aggregates using the "largest gap" method.  This algorithm assumes that the heading measurements are essentially unordered, so to find the endpoints of the arc we look for two measurements which are the furthest apart on the circle.  This means shifting the measurements to an imaginary coordinate system where the edge of this gap is the low measurement, calculating percentiles, and then shifting it back.  Note that this method produces garbage if the device turned around a complete circle during the aggregate period.

Now, that SciPy function was pretty good and we used it for quite a while.  But we were unhappy with two things: first, SciPy is rather painful as a dependency because the packaging for it is terrible; second, having PostgreSQL call out to SciPy for each iteration isn't all that efficient.

So, since 9.4 has percentiles now, I started writing a function based the built-in SQL percentiles.  Initially I was thinking

[continue reading]

High availability of PostgreSQL databases is incredibly important to me. You might even say it’s a special interest of mine. It’s one reason I’m both excited and saddened by a feature introduced in 9.4. I’m Excited because it’s a feature I plan to make extensive use of, and saddened because it has flown under the radar thus far. It’s not even listed in the What’s new in PostgreSQL 9.4 Wiki page. If they’ll let me, I may have to rectify that.

What is this mysterious change that has me drooling all over my keyboard? The new recovery_min_apply_delay standby server setting. In name and intent, it forces a standby server to delay application of upstream changes. The implications, however, are much, much more important.

Let me tell you a story; it’s not long, actually. A couple years ago, I had to help a client that was using a hilariously over-engineered stack to prevent data loss. I only say that because at first glance, the number of layers and duplicate servers would shock most people, and the expense would finish the job. This was one of my full recommended stacks, plus a few extra bits for the truly paranoid. DRBD-bonded servers, Pacemaker failover, off-site disaster recovery streaming clones, nightly backup, off-site backup and historical WAL storage, and long-term tape archival in a vault for up to seven years. You would need to firebomb several cities to get rid of this data.

But data permanence and availability are not synonymous. All it took was a single misbehaving CPU to take out the entire constellation of database servers, and corrupt a bunch of recent WAL files for good measure. How this is possible, and how difficult it is to avoid, is a natural extension of using live streaming replicas for availability purposes. We always need to consider one important factor: immediacy applies to everything.

Here’s what actually happened:

  1. A CPU on master-1 went bad.
  2. Data being written to database files was corrupt.
  3. DRBD copied the bad blocks, immediately corrupting master-2.
  4. Shared memory became corrupt.
  5. Streamin

[continue reading]

This week the following commit has landed in PostgreSQL code tree, introducing a new feature that will be released in 9.5:

commit: cb1ca4d800621dcae67ca6c799006de99fa4f0a5
author: Tom Lane <>
date: Sun, 22 Mar 2015 13:53:11 -0400
Allow foreign tables to participate in inheritance.

Foreign tables can now be inheritance children, or parents.  Much of the
system was already ready for this, but we had to fix a few things of
course, mostly in the area of planner and executor handling of row locks.


Shigeru Hanada and Etsuro Fujita, reviewed by Ashutosh Bapat and Kyotaro
Horiguchi, some additional hacking by me

As mentioned in the commit message, foreign tables can now be part of an inheritance tree, be it as a parent or as a child.

Well, seeing this commit, one word comes immediately in mind: in-core sharding. And this feature opens such possibilities with for example a parent table managing locally a partition of foreign child tables located on a set of foreign servers.

PostgreSQL offers some way to already do partitioning by using CHECK constraints (non-intuitive system but there may be improvements in a close future in this area). Now combined with the feature committed, here is a small example of how to do sharding without the need of any external plugin or tools, only postgres_fdw being needed to define foreign tables.

Now let's take the example of 3 Postgres servers, running on the same machine for simplicity, using ports 5432, 5433 and 5434. 5432 will hold a parent table, that has two child tables, the two being foreign tables, located on servers listening at 5433 and 5434. The test case is simple: a log table partitioned by year.

First on the foreign servers, let's create the child tables. Here it is for the table on server 5433:

=# CREATE TABLE log_entry_y2014(log_time timestamp,
       entry text,
       check (date(log_time) >= '2014-01-01' AND
              date(log_time) < '2015-01-01'));

And the second one on 5434:

=# CREATE TABLE log_entry_y2015(log_tim

[continue reading]

Posted by Josh Berkus on 2015-03-26 at 19:51:28

On November 18th, 2015, we will have an independent, multi-track conference all about high performance PostgreSQL: pgConf SV. This conference is being organized by CitusData at the South San Francisco Convention Center. Stay tuned for call for presentations, sponsorships, and more details soon.

Posted by David Fetter in Farmers Business Network on 2015-03-26 at 13:27:51
SQL is code.

This may seem like a simple idea, but out in the wild, you will find an awful lot of SQL programs which consist of a single line, which makes them challenging to debug.

Getting it into a format where debugging was reasonably easy used to be tedious and time-consuming, but no more!
Continue reading "Formatting!"
Posted by Peter Eisentraut on 2015-03-26 at 00:00:00

PgBouncer has a virtual database called pgbouncer. If you connect to that you can run special SQL-like commands, for example

$ psql -p 6432 pgbouncer
=# SHOW pools;
┌─[ RECORD 1 ]───────────┐
│ database   │ pgbouncer │
│ user       │ pgbouncer │
│ cl_active  │ 1         │
│ cl_waiting │ 0         │
│ sv_active  │ 0         │
│ sv_idle    │ 0         │
│ sv_used    │ 0         │
│ sv_tested  │ 0         │
│ sv_login   │ 0         │
│ maxwait    │ 0         │

This is quite nice, but unfortunately, you cannot run full SQL queries against that data. So you couldn’t do something like

SELECT * FROM pgbouncer.pools WHERE maxwait > 0;

Well, here is a way: From a regular PostgreSQL database, connect to PgBouncer using dblink. For each SHOW command provided by PgBouncer, create a view. Then that SQL query actually works.

But before you start doing that, I have already done that here:

[continue reading]

Posted by Rajeev Rastogi on 2015-03-25 at 05:00:00
In PostgreSQL 9.5, we can see improved performance  for Index Scan on ">" condition.

In order to explain this optimization, consider the below schema:
create table tbl2(id1 int, id2 varchar(10), id3 int);
create index idx2 on tbl2(id2, id3);

Query as:
                select count(*) from tbl2 where id2>'a' and id3>990000;

As per design prior to this patch, Above query used following steps to retrieve index tuples:

  • Find the scan start position by searching first position in BTree as per the first key condition i.e. as per id2>'a'
  • Then it fetches each tuples from position found in step-1.
  • For each tuple, it matches all scan key condition, in our example it matches both scan key condition.
  • If condition match, it returns the tuple otherwise scan stops.

Now problem is here that already first scan key condition is matched to find the scan start position (Step-1), so it is obvious that any further tuple also will match the first scan key condition (as records are sorted).

So comparison on first scan key condition again in step-3 seems to be redundant.

So we have made the changes in BTree scan algorithm to avoid the redundant check i.e. remove the first key comparison for each tuple as it is guaranteed to be always true.

Performance result summary:

I would like to thanks Simon Riggs for verifying and committing this patch. Simon Riggs also confirmed improvement of 5% in both short and long index, on the least beneficial data-type and considered to be very positive win overall. 
Posted by Josh Berkus on 2015-03-24 at 19:30:10


On March 10th, we had our third ever pgDay for SFPUG, which was a runaway success. pgDaySF 2015 was held together with FOSS4G-NA and EclipseCon; we were especially keen to join FOSS4G because of the large number of PostGIS users attending the event. In all, around 130 DBAs, developers and geo geeks joined us for pgDay SF ... so many that the conference had to reconfigure the room to add more seating!

standing room only

The day started out with Daniel Caldwell showing how to use PostGIS for offline mobile data, including a phone demo.

Daniel Caldwell setting up

Ozgun Erdogan presented pg_shard with a a short demo.

Ozgun presents pg_shard with PostGIS

Gianni Ciolli flew all the way from London to talk about using Postgres' new Logical Decoding feature for database auditing.

Gianni Ciolli presenting

Peak excitement of the day was Paul Ramsey's "PostGIS Feature Frenzy" presentation.

Paul Ramsey making quotes

We also had presentations by Mark Wong and Bruce Momjian, and lightning talks by several presenters. Slides for some sessions are available on the FOSS4G web site. According to FOSS4G, videos will be available sometime soon.

Of course, we couldn't have done it without our sponsors: Google, EnterpriseDB, 2ndQuadrant, CitusDB and pgExperts. So a big thank you to our sponsors, our speakers, and the staff of FOSS4G-NA for creating a great day.

Last Thursday, I had this short and one-sided conversation with myself: “Oh, cool, Pg 9.4 is out for RDS. I’ll upgrade my new database before I have to put it into production next week, because who knows when else I’ll get a chance. Even though I can’t use pg_dumpall, this will take me what, 20 […]
Posted by Heikki Linnakangas on 2015-03-23 at 20:05:31

Before PostgreSQL got streaming replication, back in version 9.0, people kept asking when we’re going to get replication. That was a common conversation-starter when standing at a conference booth. I don’t hear that anymore, but this dialogue still happens every now and then:

- I have streaming replication set up, with a master and standby. How do I perform failover?
- That’s easy, just kill the old master node, and run “pg_ctl promote” on the standby.
- Cool. And how do I fail back to the old master?
- Umm, well, you have to take a new base backup from the new master, and re-build the node from scratch..
- Huh, what?!?

pg_rewind is a better answer to that. One way to think of it is that it’s like rsync on steroids. Like rsync, it copies files that differ between the source and target. The trick is in how it determines which files have changed. Rsync compares timestamps, file sizes and checksums, but pg_rewind understands the PostgreSQL file formats, and reads the WAL to get that information instead.

I started hacking on pg_rewind about a year ago, while working for VMware. I got it working, but it was a bit of a pain to maintain. Michael Paquier helped to keep it up-to-date, whenever upstream changes in PostgreSQL broke it. A big pain was that it has to scan the WAL, and understand all different WAL record types – miss even one and you might end up with a corrupt database. I made big changes to the way WAL-logging works in 9.5, to make that easier. All WAL record types now contain enough information to know what block it applies to, in a common format. That slashed the amount of code required in pg_rewind, and made it a lot easier to maintain.

I have just committed pg_rewind into the PostgreSQL git repository, and it will be included in the upcoming 9.5 version. I always intended pg_rewind to be included in PostgreSQL itself; I started it as a standalone project to be able to develop it faster, outside the PostgreSQL release cycle, so I’m glad it finally made it into the main distribution now. Please give it a l

[continue reading]

Posted by Marco Slot in CitusData on 2015-03-23 at 12:53:12
Posted by Umair Shahid on 2015-03-23 at 12:36:53


Developers have been really excited about the addition of JSON support starting PostgreSQL v9.2. They feel they now have the flexibility to work with a schema-less unstructured dataset while staying within a relational DBMS. So what’s the buzz all about? Let’s explore below …

Why is NoSQL so attractive?

Rapid turnaround time … it is as simple as that. With the push to decrease time-to-market, developers are under constant pressure to turn POCs around very quickly. It is actually not just POCs, marketable products are increasingly getting the same treatment. The attitude is, “If I don’t get it out, someone else will.”.

Any decent sized application will need to store data somewhere. Rather than going through the pains of designing schemas and the debates on whether to normalize or not, developers just want to get to the next step. That’s how databases like MongoDB gained such tremendous popularity. They allow for schema-less, unstructured data to be inserted in document form and the developers find it easy to convert class objects within their code into that document directly.

There is a trade-off, however. The document (and key/value store) databases are very unfriendly to relations. While retrieving data, you will have a very hard time cross referencing between different tables making analytics nearly impossible. And, nightmare of nightmares for mission critical applications, these databases are not ACID compliant.

In walks PostgreSQL with JSON and HSTORE support.

NoSQL in PostgreSQL

While the HSTORE contrib module has been providing key/value data types in standard PostgreSQL table columns since v8.2, the introduction of native JSON support in v9.2 paves way for the true power of NoSQL within PostgreSQL.

Starting v9.3, not only do you have the ability to declare JSON data types in standard tables, you now have functions to encode data to JSON format and also to extract data elements from a JSON column. What’s more, you can also interchange data between JSON and HSTORE using simple & intuitive fun

[continue reading]

Posted by Paul Ramsey on 2015-03-21 at 16:16:00

I did a new PostGIS talk for FOSS4G North America 2015, an exploration of some of the tidbits I've learned over the past six months about using PostgreSQL and PostGIS together to make "magic" (any sufficiently advanced technology...)


Posted by Paul Ramsey on 2015-03-20 at 20:07:00

Somehow I've gotten through 10 years of SQL without ever learning this construction, which I found while proof-reading a colleague's blog post and looked so unlikely that I had to test it before I believed it actually worked. Just goes to show, there's always something new to learn.

Suppose you have a GPS location table:

  • gps_id: integer
  • geom: geometry
  • gps_time: timestamp
  • gps_track_id: integer

You can get a correct set of lines from this collection of points with just this SQL:

ST_MakeLine(geom ORDER BY gps_time ASC) AS geom
FROM gps_poinst
GROUP BY gps_track_id

Those of you who already knew about placing ORDER BY within an aggregate function are going "duh", and the rest of you are, like me, going "whaaaaaa?"

Prior to this, I would solve this problem by ordering all the groups in a CTE or sub-query first, and only then pass them to the aggregate make-line function. This, is, so, much, nicer.

PostgreSQL has provided table partitions for a long time. In fact, one might say it has always had partitioning. The functionality and performance of table inheritance has increased over the years, and there are innumerable arguments for using it, especially for larger tables consisting of hundreds of millions of rows. So I want to discuss a quirk that often catches developers off guard. In fact, it can render partitioning almost useless or counter-productive.

PostgreSQL has a very good overview in its partitioning documentation. And the pg_partman extension at PGXN follows the standard partitioning model to automate many of the pesky tasks for maintaining several aspects of partitioning. With modules like this, there’s no need to manually manage new partitions, constraint maintenance, or even some aspects of data movement and archival.

However, existing partition sets exist, and not everyone knows about extensions like this, or have developed in-house systems instead. Here’s something I encountered recently:

CREATE TABLE sys_order
    order_id     SERIAL       PRIMARY KEY,
    product_id   INT          NOT NULL,
    item_count   INT          NOT NULL,
    order_dt     TIMESTAMPTZ  NOT NULL DEFAULT now()

CREATE TABLE sys_order_part_201502 ()
       INHERITS (sys_order);

ALTER TABLE sys_order_part_201502
  ADD CONSTRAINT chk_order_part_201502
      CHECK (order_dt >= '2015-02-01'::DATE AND
             order_dt < '2015-02-01'::DATE + INTERVAL '1 mon');

This looks innocuous enough, but PostgreSQL veterans are already shaking their heads. The documentation alludes to how this could be a problem:

Keep the partitioning constraints simple, else the planner may not be able to prove that partitions don’t need to be visited.

The issue in this case, is that adding the interval of a month changes the right boundary of this range constraint into a dynamic value. PostgreSQL will not use dynamic values in evaluating check constraints. Here’s a query plan from PostgreSQL 9.4.1, which is the most recent release as of

[continue reading]

The BDR team has recently introduced support for dynamically adding new nodes to a BDR group from SQL into the current development builds. Now no configuration file changes are required to add nodes and there’s no need to restart the existing or newly joining nodes.

This change does not appear in the current 0.8.0 stable release; it’ll land in 0.9.0 when that’s released, and can be found in the bdr-plugin/next branch in the mean time.

New nodes negotiate with the existing nodes for permission to join. Soon they’ll be able to the group without disrupting any DDL locking, global sequence voting, etc.

There’s also an easy node removal process so you don’t need to modify internal catalog tables and manually remove slots to drop a node anymore.

New node join process

With this change, the long-standing GUC-based configuration for BDR has been removed. bdr.connections no longer exists and you no longer configure connections with bdr.[conname]_dsn etc.

Instead, node addition is accomplished with the bdr.bdr_group_join(...) function. Because this is a function in the bdr extension, you must first CREATE EXTENSION bdr;. PostgreSQL doesn’t have extension dependencies and the bdr extension requires the btree_gist extension so you’ll have to CREATE EXTENSION btree_gist first.

Creating the first node

Creation of the first node must now be done explicitly using bdr.bdr_group_create. This promotes a standalone PostgreSQL database to a single-node BDR group, allowing other nodes to then be joined to it.

You must pass a node name and a valid externally-reachable connection string for the dsn parameter, e.g.:



SELECT bdr.bdr_group_join(
  local_node_name = 'node1',
  node_external_dsn := 'host=node1 dbname=mydb'

Note that the dsn is not used by the root node its self. It’s used by other nodes to connect to the root node, so you can’t use a dsn like host=localhost dbname=mydb if you intend to have nodes on multiple machines.

Adding other nodes

You can now join other nodes to f

[continue reading]

Posted by Paul Ramsey on 2015-03-20 at 00:00:00

The 2.1.6 release of PostGIS is now available.

The PostGIS development team is happy to release patch for PostGIS 2.1, the 2.1.6 release. As befits a patch release, the focus is on bugs, breakages, and performance issues. Users with large tables of points will want to priorize this patch, for substantial (~50%) disk space savings.

Continue Reading by clicking title hyperlink ..
Posted by Josh Berkus in pgExperts on 2015-03-19 at 20:55:00
Since the folks at aren't on Planet Postgres, I thought I'd link their recent blog post on cool data migration hacks.  Tilt is a YCombinator company, and a SFPUG supporter.
Posted by Jason Petersen in CitusData on 2015-03-19 at 19:53:36

Last winter, we open-sourced pg_shard, a transparent sharding extension for PostgreSQL. It brought straightforward sharding capabilities to PostgreSQL, allowing tables and queries to be distributed across any number of servers.

Today we’re excited to announce the next release of pg_shard. The changes in this release include:

  • Improved performaceINSERT commands run up to four times faster
  • Shard repair — Easily bring inactive placements back up to speed
  • Copy script — Quickly import data from CSV and other files from the command line
  • CitusDB integration — Expose pg_shard’s metadata for CitusDB’s use
  • Resource improvements — Execute larger queries than ever before

For more information about recent changes, you can view all the issues closed during this release cycle on GitHub.

Upgrading or installing is a breeze: see pg_shard’s GitHub page for detailed instructions.

Whether you want a distributed document store alongside your normal PostgreSQL tables or need the extra computational power afforded by a sharded cluster, pg_shard can help. We continue to grow pg_shard’s capabilities and are open to feature requests.

Got questions?

If you have any questions about pg_shard, please contact us using the pg_shard-users mailing list.

If you discover an issue when using pg_shard, please submit it to our issue tracker on GitHub.

Further information is available on our website, where you are free to contact us with any general questions you may have.

A nice feature extending the usage of pgbench, in-core tool of Postgres aimed at doing benchmarks, has landed in 9.5 with this commit:

commit: 878fdcb843e087cc1cdeadc987d6ef55202ddd04
author: Robert Haas <>
date: Mon, 2 Mar 2015 14:21:41 -0500
pgbench: Add a real expression syntax to \set

Previously, you could do \set variable operand1 operator operand2, but
nothing more complicated.  Now, you can \set variable expression, which
makes it much simpler to do multi-step calculations here.  This also
adds support for the modulo operator (%), with the same semantics as in

Robert Haas and Fabien Coelho, reviewed by Álvaro Herrera and
Stephen Frost

pgbench has for ages support for custom input files using -f with custom variables, variables that can be set with for example \set or \setrandom, and then can be used in a custom set of SQL queries:

\set id 10 * :scale
\setrandom id2 1 :id
SELECT name, email FROM users WHERE id = :id;
SELECT capital, country FROM world_cities WHERE id = :id2;

Up to 9.4, those custom variables can be calculated with simple rules of the type "var operator var2" (the commit message above is explicit enough), resulting in many intermediate steps and variables when doing more complicated calculations (note as well that additional operands and variables, if provided, are simply ignored after the first three ones):

\setrandom ramp 1 200
\set scale_big :scale * 10
\set min_big_scale :scale_big + :ramp
SELECT :min_big_scale;

In 9.5, such cases become much easier because pgbench has been integrated with a parser for complicated expressions. In the case of what is written above, the same calculation can be done more simply with that, but far more fancy things can be done:

\setrandom ramp 1 200
\set min_big_scale :scale * 10 + :ramp
SELECT :min_big_scale;

With pgbench run for a couple of transactions, here is what you could get:

$ pgbench -f test.sql -t 5
$ tail -n5 $PGDATA/pg_log/postgresql.log
LOG:  statement: SELECT 157;
LOG:  statement: SELECT 53;

[continue reading]

Amit Kapila and I have been working very hard to make parallel sequential scan ready to commit to PostgreSQL 9.5.  It is not all there yet, but we are making very good progress.  I'm very grateful to everyone in the PostgreSQL community who has helped us with review and testing, and I hope that more people will join the effort.  Getting a feature of this size and complexity completed is obviously a huge undertaking, and a significant amount of work remains to be done.  Not a whole lot of brand-new code remains to be written, I hope, but there are known issues with the existing patches where we need to improve the code, and I'm sure there are also bugs we haven't found yet.
Read more »
On 18th of March, Alvaro Herrera committed patch: array_offset() and array_offsets()   These functions return the offset position or positions of a value in an array.   Author: Pavel Stěhule Reviewed by: Jim Nasby It's been a while since my last “waiting for" post – mostly because while there is a lot of work happening, […]
Turkish PostgreSQL Users' and Developer's Association is organizing 4th PGDay.TR on May 9, 2015 at Istanbul. Dave Page, one of the community leaders, will be giving the keynote.

This year, we are going to have 1 full English track along with 2 Turkish tracks, so if you are close to Istanbul, please join us for a wonderful city, event and fun!

We are also looking for sponsors for this great event. Please email to for details.

See you in Istanbul!

Conference website:
Last night I attended the second WhatcomPUG. This meeting was about Sqitch, a interesting database revision control mechanism. The system is written in Perl and was developed by David Wheeler of PgTap fame. It looks and feels like git. As it is written in Perl it definitely has too many options. That said, what we were shown works, works well and appears to be a solid and thorough system for the job.

I also met a couple of people from CoinBeyond. They are a point-of-sale software vendor that specializes in letting "regular" people (read: not I or likely the people reading this blog) use Bitcoin!

That's right folks, the hottest young currency in the market today is using the hottest middle aged technology for their database, PostgreSQL. It was great to see that they are also located in Whatcom County. The longer I am here, the more I am convinced that Whatcom County (and especially Bellingham) is a quiet tech center working on profitable ventures without the noise of places like Silicon Valley. I just keep running into people doing interesting things with technology.

Oh, for reference:

Posted by Julien Rouhaud in Dalibo on 2015-03-18 at 08:09:10

Last month, I had the chance to talk about PostgreSQL monitoring, and present some of the tools I’m working on at

This talk was a good opportunity to work on an overview of existing projects dealing with monitoring or performance, see what may be lacking and what can be done to change this situation.

Here are my slides:

If you’re interested in this topic, or if you developped a tool I missed while writing these slides (my apologies if it’s the case), the official wiki page is the place you should go first.

I’d also like to thank all the staff for their work, this conference was a big success, and the biggest postgresql-centric event ever organized.

Talking About OPM and PoWA at was originally published by Julien Rouhaud at rjuju's home on March 18, 2015.

Posted by Rajeev Rastogi on 2015-03-18 at 07:01:00
POSTGRESQL is an open-source, full-featured relational database. This blog gives an overview of how POSTGRESQL engine processes queries received from the user.
Typical simplified flow of PostgreSQL engine is:

SQL Engine Flow

As part of this blog, I am going to cover all modules marked in yellow colour.

Parser module is responsible for syntactical analysis of the query. It constitute two sub-modules:
1. Lexical scanner
2. Bison rules/actions

Lexical Scanner:
Lexical scanner reads each character from the given query and return the appropriate token based on the matching rules. E.g. rules can be as follows:


    Name given in the <> is the state name, in the above example <xc> is the state name for the comment start. So once it sees the comment starting character, comment body token will be read in the <xc> state only.

Bison reads token returned from scanner and matches the same  against the given rule for a particular query and performs the associated actions. E.g. the bison rule for SELECT statement is:


So each returned token is matched with the rule mentioned above in left-right order, if at any time it does not find matching rule, then either it goes to next possible matching rule or throws an error.

Analyzer module is responsible for doing semantic analysis of the given query.  Each raw information about the query received from the Parser module is transformed to database internal object form to get the corresponding object id. E.g. relation name "tbl" get replaces with its object id.
Output of analyzer module is Query tree, structure of same can be seen in the structure "Query" of file src/include/nodes/parsenodes.h

Optimizer module also consider to be brain of SQL engine is responsible for choosing the best path for execution of the query. Best path for a query is selected based on the cost of the path. The path with least cost is considered to be a winner path.
Based on the winner path, plan is created which is used by executor to execut

[continue reading]

After discussing the pgbench and TPC-DS results, it's time to look at the last benchmark, testing performance of built-in fulltext (and GIN/GiST index implementation in general).

The one chart you should remember from this post is this one, GIN speedup between 9.3 and 9.4:


Interpreting this chart is a bit tricky - x-axis tracks duration on PostgreSQL 9.3 (log scale), while y-axis (linear scale) tracks relative speedup 9.4 vs. 9.3, so 1.0 means 'equal performance', and 0.5 means that 9.4 is 2x faster than 9.3.

The chart pretty much shows exponential speedup for vast majority of queries - the longer the duration on 9.3, the higher the speedup on 9.4. That's pretty awesome, IMNSHO. What exactly caused that will be discussed later (spoiler: it's thanks to GIN fastscan). Also notice that almost no queries are slower on 9.4, and those few examples are not significantly slower.


While both pgbench and TPC-DS are well established benchmarks, there's no such benchmark for testing fulltext performance (as far as I know). Luckily, I've had played with the fulltext features a while ago, implementing archie - an in-database mailing list archive.

It's still quite experimental and I use it for testing GIN/GiST related patches, but it's suitable for this benchmark too.

So I've taken the current archives of PostgreSQL mailing lists, containing about 1 million messages, loaded them into the database and then executed 33k real-world queries collected from I can't publish those queries because of privacy concerns (there's no info on users, but still ...), but the queries look like this:

SELECT id FROM messages
 WHERE body_tsvector @@ ('optimizing & bulk & update')::tsquery
 ORDER BY ts_rank(body_tsvector, ('optimizing & bulk & update')::tsquery)
          DESC LIMIT 100;

The number of search terms varies quite a bit - the simplest queries have a single letter, the most complex ones often tens of words.

PostgreSQL config

The PostgreSQL configuration was mostly default, with only minor changes:


[continue reading]

I submitted to PgConf.US. I submitted talks from my general pool. All of them have been recently updated. They are also all solid talks that have been well received in the past. I thought I would end up giving my, "Practical PostgreSQL Performance: AWS Edition" talk. It is a good talk, is relevant to today and the community knows of my elevated opinion of using AWS with PostgreSQL (there are many times it works just great, until it doesn't and then you may be stuck).

I also submitted a talk entitled: "Suck it! Webscale is Dead; PostgreSQL is King!". This talk was submitted as a joke. I never expected it to be accepted, it hadn't been written, the abstract was submitted on the fly, improvised and in one take. Guess which talk was accepted? "Webscale is Dead; PostgreSQL is King!". They changed the first sentence of the title which is absolutely acceptable. The conference organizers know their audience best and what should be presented.

What I have since learned is that the talk submission committee was looking for dynamic talks, dynamic content, and new, inspired ideas. A lot of talks that would have been accepted in years past weren't and my attempt at humor fits the desired outcome. At first I thought they were nuts but then I primed the talk at SDPUG/PgUS PgDay @ Southern California Linux Expo.

I was the second to last presenter on Thursday. I was one hour off the plane. I was only staying the night and flying home the next morning, early. The talk was easily the best received talk I have given. The talk went long, the audience was engaged, laughter, knowledge and opinions were abound. When the talk was over, the talk was given enthusiastic applause and with a definite need for water, I left the room.

I was followed by at least 20 people, if not more. I don't know how many there were but it was more than I have ever had follow me after a talk before. I was deeply honored by the reception. One set of guys that approached me said something to the effect of: "You seem like you don't mind expressing your opinions".

[continue reading]

Database Management Systems uses MVCC to avoid the problem of
Writers blocking Readers and vice-versa, by making use of multiple
versions of data.

There are essentially two approaches to multi-version concurrency.

Approaches for MVCC
The first approach is to store multiple versions of records in the
database, and garbage collect records when they are no longer
required. This is the approach adopted by PostgreSQL and
Firebird/Interbase. SQL Server also uses somewhat similar approach
with the difference that old versions are stored in tempdb
(database different from main database).

The second approach is to keep only the latest version of data in
the database, but reconstruct older versions of data dynamically
as required by using undo. This is approach adopted by Oracle
and MySQL/InnoDB

MVCC in PostgreSQL
In PostgreSQL, when a row is updated, a new version (called a tuple)
of the row is created and inserted into the table. The previous version
is provided a pointer to the new version. The previous version is
marked “expired", but remains in the database until it is garbage collected.

In order to support multi-versioning, each tuple has additional data
recorded with it:
xmin - The ID of the transaction that inserted/updated the
row and created this tuple.
xmax - The transaction that deleted the row, or created a
new version of this tuple. Initially this field is null.

Transaction status is maintained in CLOG which resides in $Data/pg_clog.
This table contains two bits of status information for each transaction;
the possible states are in-progress, committed, or aborted.

PostgreSQL does not undo changes to database rows when a transaction
aborts - it simply marks the transaction as aborted in CLOG . A PostgreSQL
table therefore may contain data from aborted transactions.

A Vacuum cleaner process is provided to garbage collect expired/aborted
versions of a row. The Vacuum Cleaner also deletes index entries
associated with tuples that are garbage collected.

A tuple is visible if its xmin is valid and xmax is

[continue reading]