The world's most advanced open source database
Top posters
Number of posts in the past month
Top teams
Number of posts in the past month
  • Policy for being listed on Planet PostgreSQL.
  • Add your blog to Planet PostgreSQL.
  • List of all subscribed blogs.
  • Manage your registration.
  • Get in touch with the Planet PostgreSQL administrators at planet at
Posted by Andrew Dunstan in pgExperts on 2014-08-22 at 14:51:00
I mentioned to a client that I was going to write a function they had asked me to write using PLPerl, because it has good dynamic string handling capabilities that make it a better fit for this particular task than PLPgsql. "Oh," said the client, "we don't do much Perl. But ever since you hooked us up with PLV8 we do everything in JavaScript. We all know that, and use it every day." So I'm now writing it in Javascript. Proof once again that no good deed goes unpunished. It remains to be seen if it's going to be quite such a good fit as Perl would be, but at least it will give me a thorough refresher in JS.
Posted by Vasilis Ventirozos in OmniTI on 2014-08-22 at 13:54:00
A while ago I wanted to explore the options i had for multi master replication in postgres, I have previously worked with Oracle MMR and to be honest i don't like the idea of mmr, mostly because of the all conflicts that you sometime get, most of the times these conflicts are easy to resolve though and mmr, or in our case BDR can be a very nice way to scale up your database writes.
So, while searching i came across a project called postgresql BDR (bi-directional replication) developed by 2ndQuadrant, it allows users to create a geographically distributed asynchronous multi-master database using Logical Log Streaming Replication based on the changeset extraction feature introduced in PostgreSQL 9.4.
This post is about showing how to set this up.
For this setup i used 2 debian VMs (debian-7.5.0-amd64-netinst.iso)
the extra packages i installed were :
git, mc, joe,sudo, curl, libreadline-dev, zlib1g-dev, bison, flex
(by the way, comments about joe wont be tolerated !!)

With the vms ready and a user postgres1 created :

git clone git://
cd 2ndquadrant_bdr
git checkout bdr/0.6
./configure --prefix=$HOME/bdr
make install
cd ~/2ndquadrant_bdr/contrib/btree_gist && make && make install && cd ../../contrib/bdr && make && make install
add to user's profile:
export PATH=/home/postgres1/bdr/bin:$PATH

check version :
psql (PostgreSQL) 9.4beta1_bdr0601

sudo mkdir /opt/pgdata1
sudo mkdir /opt/pgdata2
sudo chown postgres1:postgres1 /opt/pgdata1/
sudo chown postgres1:postgres1 /opt/pgdata2/

as postgres1
initdb -D /opt/pgdata1/ -U postgres1 --auth-host=md5 --auth-local=peer

edit /opt/pgdata1/pg_hba.conf and add :
host    replication     postgres1          trust
edit /opt/pgdata1/postgresql.conf and change :
listen_addresses = '*'

wal_level = 'logical'
max_replication_slots = 3
max_wal_senders = 4
shared_preload_libraries = 'bdr'
bdr.connections = 'postgres1'
bdr.postgres2_dsn = 'dbname=postgres host= user=postgres p

[continue reading]

PostgreSQL has proven to be one of the most reliable Open Source databases around. In the past we have dealt with customers who have achieved superior uptimes and who are really happy with PostgreSQL. But even if PostgreSQL does its job day after day after day – there are some components, which can just fail […]
Posted by gabrielle roth on 2014-08-21 at 23:24:00

I updated our speaker and topic tag clouds for our 8th anniversary.

I've been looking for a way to edit mixed mode files in emacs, so I can edit a plperl function, and have the create statement part (and everything except the body) use standard SQL mode and the body use CPerl mode or whatever mode some magic marker tells it to use. I've taken a bit of a look at mmm-mode, but haven't been able to get it to work, and don't have more time to spend on it. If someone has a good recipe for this please let me know.

Some good news for packagers of PostgreSQL on Windows with many improvements in versioning that are showing up in 9.5. This is the result of a couple of months of work, concluded with the two following commits. The first commit is covering a major portion of executables and libraries:

commit: 0ffc201a51395ca71fe429ef86c872850a5850ee
author: Noah Misch <>
date: Mon, 14 Jul 2014 14:07:52 -0400
Add file version information to most installed Windows binaries.

Prominent binaries already had this metadata.  A handful of minor
binaries, such as pg_regress.exe, still lack it; efforts to eliminate
such exceptions are welcome.

This has added versioning for contrib modules, conversion_procs, most of the ecpg thingies, WAL receiver and PL languages. And then the final shot has been done with this commit, for utilities like regression tools or even zic.exe, part of the timezone code path:

commit: ee9569e4df1c3bdb6858f4f65d0770783c32a84d
author: Noah Misch <>
date: Mon, 18 Aug 2014 22:59:53 -0400
Finish adding file version information to installed Windows binaries.

In support of this, have the MSVC build follow GNU make in preferring
GNUmakefile over Makefile when a directory contains both.

Michael Paquier, reviewed by MauMau.

This work has basically needed the following things to get correct versioning coverage when building with either MinGW and MSVC:

  • Addition of PGFILEDESC to have a file description
  • Addition of some WIN32RES in the object list being built by make to compile the version number.
  • For MSVC, some refactoring of the scripts used for build to have them pick up correctly PGFILEDESC, and create version files.

Now, the result of this work is directly visible on the files themselves by looking at the file details in menu "Property" by left-clicking on a given file, tab "Details". With that, it is possible to see fields for:

  • File Description, for example for adminpack.dll, the description is "adminpack - support functions for pgAdmin".
  • Version number, made of 4 integer di

[continue reading]

There is a new release - version 4.14 - of the buildfarm client, now available at

The only change of note is that a bug which only affects MSVC clients (such that the client will not complete a run) and is present in releases 4.12 and 4.13 is fixed. Clients on other platforms do not need to upgrade.
Posted by Chris Travers on 2014-08-19 at 12:29:00
The SELECT statement is the workhorse of SQL.  Updates and inserts are necessary, but selects are where the power is.  One of the significant issues many people have in understanding these and using them is a clear understanding of the math involved, in part because there are a large number of implicit possibilities in the syntax.  In general folks learn to avoid the implicit aspects of the statement, but there are some implicit odditities that can't go away because they are baked into the language.  One of these is the confusion between selection and projection, and because SQL operates on bags instead of sets, neither of these work in SQL quite like they do in relational math.

In "Relational Theory and SQL," Chris Date says that tuples are unordered.  Tuples are strongly ordered, but what Date is getting at is that a relation has no natural ordering of columns, and therefore from any relation with a given ordering, a new relation can be created with a different ordering.  This process is called 'projection' and it is entirely implicit in SQL.  In the interest of being concise, in that book, Date does not stick with his typical approach of starting from clear definitions and discussing what these mean.

This being said, understanding projection, selection, and the difference between them makes understanding relational databases far easier in my view.

Because SQL operates on bags, I will use both SQL (for bag examples) and Set::Relation (for set examples) in order to discuss the math differences.  I will finally offer an idea of what a clearer English rendition of SQL's math semantics would be.

I:  Projection - Transform One Relation into Vertical Subset

Projection, represented by a Pi character (π) creates a set of ordered tuples with an ordering based on the operation.  What projection does, essentially, is take a subset (or the whole set) from each tuple, possibly re-ordering it in the process, in order to transform one relation into a derived relation.

In relational algebra, projection takes a set of tup

[continue reading]

Posted by gabrielle roth in EnterpriseDB on 2014-08-19 at 02:20:51
We had so many good submissions for Postgres Open this year, we had to make some very difficult choices. While I haven’t quite achieved Dan Langille’s (of PgCon fame) level of conference-running zen and rarely get to all the talks I’d like to see, here are my picks: Wednesday: The PostGIS tutorials. I am really […]
Posted by Tomas Vondra on 2014-08-19 at 00:00:00

Almost a year ago, I wrote a custom experimental aggregate replacing COUNT(DISTINCT). The problem with the native COUNT(DISTINCT) is that it forces a sort on the input relation, and when the amount of data is significant (say, tens of millions rows), that may be a significant performance drag. And sometimes we really need to do COUNT(DISTINCT).

Extensibility is one of the great strengths of PostgreSQL - users of most other databases can only dream about things like defining custom aggregates. And this extension point was exactly the foundation for my idea was - implementing a custom aggregate, counting the distinct items in a more efficient way.

That's how count_distinct was conceived, and the principle was really simple. For each group, a small hash table is maintained, making it trivial to keep track of distinct values (and counting them). And it worked quite well - instead of the COUNT(DISTINCT) query


you can call the customa aggregate


and in most cases it was much faster (usually ~3x), without doing the sort, etc. The main disadvantage was memory consumption - the overhead of the additional hash table structure and palloc overhead was bad (and in some cases quite terrible - consuming an order of magnitude more memory than the amount of data being processed). I experimented with various hash table variants, allocation schemes, but either the impact on performance was unacceptable, or the memory consumption was not much lower. Until about a month ago ...

Luckily, I've been messing with the hash join implementation recently, and it occured to me that a hash table may not be the right tool for this task. Hash tables are great for lookups, but that's not really the point of count_distinct - it only needs to detect and remove duplicates, and lookups are only one of the ways to achieve that goal.

Also, as the hash table grows, the L2/L3 cache hit ratio gets worse and worse. And after a bit of experimenting, I managed to come up

[continue reading]

Posted by Pavel Stehule on 2014-08-18 at 08:39:00
I released new version of plpgsql_check.

What is new?

lot of new checks:

  • assignment stmt
  • performance warnings
  • return stmts

create or replace function fx()
returns t2 as $$
return (10,20,30)::t1;
$$ language plpgsql;
select * from plpgsql_check_function('fx()', performance_warnings := true);
error:42804:3:RETURN:returned record type does not match expected record type
Detail: Returned type integer does not match expected type numeric in column 2.
(2 rows)

create or replace function f1()
returns setof int as $$
return next 1::numeric; -- tolerant, doesn't use tupmap
$$ language plpgsql;
select * from plpgsql_check_function('f1()', performance_warnings := true);
performance:42804:3:RETURN NEXT:target variable has different type then expression result
Detail: assign "numeric" value to "integer" variable
Hint: Hidden casting can be a performance issue.
(3 rows)

create or replace function f1()
returns int as $$
return current_date;
$$ language plpgsql;
select * from plpgsql_check_function('f1()', performance_warnings := true);
warning:42804:3:RETURN:target variable has different type then expression result
Detail: assign "date" value to "integer" variable
Hint: There are no possible explicit coercion between those types, possibly bug!
(3 rows)

Last month was a discussion about speed of plpgsql. There was example of synthetic tests - for these tests plpgsql is strongly penalized. There is zero optimization for intensive mathematics calculations :
DO LANGUAGE plpgsql $$ DECLARE n real;
DECLARE f integer;
FOR f IN 1..10000000 LOOP
if 0=0 then
n = SQRT (f);
end if;
RAISE NOTICE 'Result => %',n;
END $$;

There is two issues: a) useless test 0=0 (Oracle throws it, Postgres has no any optimiz

[continue reading]

Posted by Fabien Coelho on 2014-08-17 at 17:20:00

I have posted recently performance figures with varying page size using pgbench on SSD, which show a +10% improvement with smaller 4 kB page size over the default 8 kB page size.

Josh Berkus pointed out that pgbench test uses rather small 100-bytes rows, and that changing the tuple size might induce a different conclusion. To assess this point, I ran some tests with different row sizes: 1 kB, 3 kB, 7 kB and 15 kB.

Test Setup

I ran prepared transactions (-M prepared) to avoid cpu cycles and focus on write performance. The scale is 10 for 1 kB rows (about 1 GB base), 10 for 3 kB rows, 5 for 7 kB rows and 2 for 15 kB rows (about 4 GB bases): bases are small enough to fit in memory, so that reads should not require disk accesses. I ran 200 seconds tests with 4 clients (-c 4) on Postgresql 9.4b2. Although this is short, I am confident that it is enough because the rather good performance of SSDs implies dozens of checkpoints during that time, and also prior tests I did with 1000 vs 200 seconds on SSDs showed that the later were consistent with the former, although slightly on the optimistic side. I lowered autovacuum_naptime to 20 seconds and checkpoint_timeout to 1min and raised checkpoint_completion_target to 0.8 to shake things a bit, but I do not think that there is a great influence of these parameters given the overall performance. I used the default configuration for everything else. The hardware is the same as the previous test.

The row size is changed by updating the filler attributes of all tables with commands like:

ALTER TABLE pgbench_accounts
  DROP COLUMN filler,
VACUUM FULL pgbench_accounts;

The key point is to change the STORAGE setting, otherwise with the default EXTENDED value TOAST compression kicks in after 2 kB and with the default empty string rows end up smaller than with the standard test…


Here are the performance results for two series of test.

The first serie uses simple updates (-N) t

[continue reading]

Posted by Craig Kerstiens on 2014-08-15 at 07:00:00

As I followed along with the 9.4 release of Postgres I had a few posts of things that I was excited about, some things that missed, and a bit of a wrap-up. I thought this year (year in the sense of PG releases) I’d jump the gun and lay out areas I’d love to see addressed in PostgreSQL 9.5. And here it goes:


Merge/Upsert/Insert or Update whatever you want to call it this is still a huge wart that it doesn’t exist. There’s been a few implementations show up on mailing lists, and to the best of my understanding there’s been debate on if it’s performant enough or that some people would prefer another implementation or I don’t know what other excuse. The short is this really needs to happen, until that time you can always implement it with a CTE which can have a race condition.

Foreign Data Wrappers

There’s so much opportunity here, and this has easily been my favorite feature of the past 2-3 years in Postgres. Really any improvement is good here, but a hit list of a few valuable things:

  • Pushdown of conditions
  • Ability to accept a DSN to a utility function to create foreign user and tables.
  • Better security around creds of foreign tables
  • More out of the box FDWs


Today there’s madlib for machine learning, and 9.4 got support for ordered set aggregates, but even still Postgres needs to keep moving forward here. PL-R and PL-Python can help a good bit as well, but having more out of the box functions for stats can continue to keep it at the front of the pack for a database that’s not only safe for your data, but powerful to do analysis with.


This is definitely more of a dream than not. Full multi-master replication would be amazing, and it’s getting closer to possible. The sad truth is even once it lands it will probably require a year of maturing, so even more reason for it to hopefully hit in 9.5

Logical Replication

The foundation made it in for 9.4 which is huge. This means we’ll probably see a good working out of the box logical replication in 9.5. For those less familiar this m

[continue reading]

Posted by Josh Berkus in pgExperts on 2014-08-15 at 00:10:00
Some of you may be following the discussion on JSONB compression on -hackers, which is the remaining major issue for 9.4.  As part of this, I needed to run some statistics on average column sizes for JSON vs JSONB tables. This makes a good example of how incredibly useful WITHIN GROUP will be in 9.4.

First, I created two versions of the data, one called "jsonic" which has text-JSON data, and one called "jsonbish" which has the JSONB version.  Then I ran some comparisons using pg_column_size on the JSON columns.

with colsizes_b as (
    select pg_column_size(jsonbish.jcol) as colsize
    from jsonbish
distrib_b as (
    select percentile_cont(array [0,0.25,0.5,0.75,1])
    within group (order by colsize) as colsize_distribution
    from colsizes_b
colsizes_j as (
    select pg_column_size(jsonic.jcol) as colsize
    from jsonic
distrib_j as (
    select percentile_cont(array [0,0.25,0.5,0.75,1])
    within group (order by colsize) as colsize_distribution
    from colsizes_j
select 'json' as thetype,
    colsize_distribution from distrib_j
union all
select 'jsonb',
    from distrib_b;

So I'm taking the column size of each row, then sorting them by size, and then doing a percentile distribution using the new WITHIN GROUP.  There's no group there because the group is actually the whole table.  I've chosen the usual box plot percentages: minimum, 25%, median, 75%, and maximum.

And the results:

 thetype |    colsize_distribution   
 json    | {1741,1767,1854,1904,2292}
 jsonb   | {3551,5866,5910,5958,6168}

... which demonstrates that we do indeed have a major issue with compressability.

A reasonable understanding of the relational model requires understanding the basic data types which make it up, both in the idealized model and in real-world applications.  This post discusses both the idealized model and the accommodations the standard implementations of it make to the messiness of the real world.  We won't deal with NULLs here.  That will be the subject of a future entry.

I Relation and Set

At its root, the relational model is about relations.  Consequently a relation needs to be understood on a mathematical level.  Since a relation is a form of set, let's explore sets first.

A set is an unordered list of unique items.  Thus, {a, b} is a set, but {a, b, b} is not (it is a multi-set or bag, however, see below).  Additionally since it is unordered, {a, b} = {b, a}.  We can project ordering onto a list for our convenience but this does not affect the underlying mathematics.  Sets can be finite or infinite.  In databases, we focus on finite sets (infinite sets can be calculated, but cannot be stored in set form on a finite filesystem).  Infinite sets are for math libraries and don't really concern us here.

A relation is a set of propositions, of correlating facts.   These are represented in tuples (see below). At this point, just recognize that a tuple represents a correlation between two or more facts.  The individual facts are also trivial functions of the tuple as a whole, because each tuple is necessarily unique, each fact is a function of the correlation.

Consider a point in the format of (x, y). For any given point, x and y are trivial functions of it.  For (2, 2), x=2 is trivially true, as is y=2.  This seems tautological (and on one level it is) but it is also important because it gets to the basic operation of a relational database.  For more on this, see part 2 of this series.

II Row and Tuple

A tuple is an ordered set of items.  Unlike a set the order is important and repetition is allowed.  (1, 2, 3), (1, 3, 3), and (1, 1, 1) are all valid tuples.  In tuples, the order ascribes

[continue reading]

Posted by Hans-Juergen Schoenig in Cybertec on 2014-08-14 at 12:30:52
With PostgreSQL 9.4 just around the corner and 8.4 just retired I thought it is time to review those years of development to see how the size of the PostgreSQL code base has evolved over time. In the past couple of years (I even already got to say “decades”) a lot of development has happened […]
Posted by Denish Patel in OmniTI on 2014-08-14 at 01:07:08

I thought to share conference details that I’m going to attend next month. If you aren’t aware but interested attending any of them.

  1. PostgresOpen at Chicago
    • Postgres conference (Sept 17th – Sept 19th)
  2. Surge at National Harbor,MD
    • Scalability and Performance Conference  (Sept 24th – Sept  26th)

The speakers and talks lineup is looking interesting for both conferences. If you haven’t registered for them, it’s not too late yet!!  If you are planning to attend anyone of them, looking forward to meet you in-person.

See you soon!

Posted by Daniel Pocock on 2014-08-13 at 18:29:56

In October this year I'll be visiting the US and Canada for some conferences and a wedding. The first event will be xTupleCon 2014 in Norfolk, Virginia. xTuple make the popular open source accounting and CRM suite PostBooks. The event kicks off with a keynote from Apple co-founder Steve Wozniak on the evening of October 14. On October 16 I'll be making a presentation about how JSCommunicator makes it easy to add click-to-call real-time communications (RTC) to any other web-based product without requiring any browser plugins or third party softphones.

Juliana Louback has been busy extending JSCommunicator as part of her Google Summer of Code project. When finished, we hope to quickly roll out the latest version of JSCommunicator to other sites including, the WebRTC portal for the Debian Developer community. Juliana has also started working on wrapping JSCommunicator into a module for the new xTuple / PostBooks web-based CRM. Versatility is one of the main goals of the JSCommunicator project and it will be exciting to demonstrate this in action at xTupleCon.

xTupleCon discounts for developers

xTuple has advised that they will offer a discount to other open source developers and contributers who wish to attend any part of their event. For details, please contact xTuple directly through this form. Please note it is getting close to their deadline for registration and discounted hotel bookings.

Potential WebRTC / JavaScript meet-up in Norfolk area

For those who don't or can't attend xTupleCon there has been some informal discussion about a small WebRTC-hacking event at some time on 15 or 16 October. Please email me privately if you may be interested.

Posted by Denish Patel in OmniTI on 2014-08-13 at 02:00:45

After having decade of experience managing small to large scale and/or varieties of database  systems, here is my first trial to come up with the top ten commands of database management !

  1. Thou shalt always learn database systems strengths and weaknesses
  2. Thou shalt choose appropriate database to store and process data to empower business
  3. Thou shalt always build systems to resilience/cope with failures
  4. Thou shalt implement automated restore procedures to test backups
  5. Thou shalt always trend and monitor database performance and maitenance metrics
  6. Thou shalt  document and follow database change management procedure
  7. Thou shalt plan to upgrade database systems in timely manner
  8. Thou shalt always build tools to automate processes
  9. Thou shalt implement security policy and processes to secure data
  10. Thou shalt educate developers to write efficient code against databases

If you have your own 10 commandments list, don’t hesitate to post in comment.  Thanks!


Posted by Josh Berkus in pgExperts on 2014-08-12 at 18:17:00
TL;DR: SFPUG video available on YouTube, including streaming video tonight.  I need help converting old videos and making new ones.

First, we have a few SFPUG videos available on YouTube:
While JSONB Deep Dive was also recorded, there were issues with the audio, so I don't recommend watching it, sorry.

We will have streaming video for tonight's SFPUG, which will be on Postgres-XL.   Join the Google Event to get a link to the streaming video once it starts; expected start time is 7:15PM PST.

This brings me to the third thing, which is that I could use some help with video, in two areas:
  1. I desperately need someone to help take better-quality videos of SFPUG meetings, with better focus and sound.  Right now I'm using a Logitech webcam, and it's just not cutting it.  Video will need to be compatible with Google Hangouts, unless you have another plan for broadcasting.
  2. I have 7 or 8 videos of SFPUG meetings from 2013, including Tom Lane explaining the query planner, in FLV format.  I need someone to transcode these to a YouTube format and upload them.
If you can help, email me.  Thanks!
Posted by Tomas Vondra on 2014-08-12 at 00:00:00

Every now and then, someone gets confused about resetting stats in PostgreSQL, and the impact it may have on planner and other parts of the database. Perhaps the documentation might be a tad more clear on this, because while it certainly can be confusing for those who never had to deal with stats in PostgreSQL before. But it's not just about newbies - I wrote a patch in this area for 9.3, and I get occasionally confused to.

The most surprising fact for most users is that 'stats' may actually mean one of two things - statistics describing distribution of the data, and monitoring statistics, tracking some interesting counters about the operation of the database system itself. Each of the kinds has different purposes, is stored differently, and the impact when the data is missing is very different.

So let's see what is the purpose of the two kinds of stats, what are the common issues and what happens when the data get missing for some reason.

Data distribution stats

The first kind od stats tracks distribution of data - number of distinct values, most common values in the column, histogram of data etc. This is the information used when planning the queries - it allows answering questions like:

  • How many rows match the condition? (selectivity estimation)
  • How many rows is produced by the join? (selectivity estimation)
  • How much memory will be needed for the aggregate?

Essentially those are some of the questions asked by the planner/optimizer when deciding what is the best way to execute a query.

This kind of statistics is collected by ANALYZE (or autovacuum) and stored in pretty much "regular" tables, protected by a transaction log just like the regular data. Check pg_statistic system catalog, or rather pg_stats which is a view on top of pg_statistic making the stats easier to read for humans.

A lot of things can go wrong with the stats, mostly resulting in choice of a bad plan and bad query performance. There are various reasons why that may happen (inaccurate stats, complex conditions, correlated columns, ...) - I

[continue reading]

While first normal form is pretty much restricted to relational operations, this piece is more general.  It looks at the basic promise of functional programming, and the way that this applies to PostgreSQL.  In the mean time, we will also look at a fully functional approach to a key-value store in Perl as a close analogue.

I The Basic Promise of a Function

We should start by reminding ourselves of what a function represents: the promise that if you know one or more pieces of information, you can derive or look up another piece of information.

That promise is behind all functional programming, and in areas which are close to the math (such as relational databases) it is especially important.  This basic promise is important, but it also changes the way we have to think about programs.  Rather than computer programs doing things, they merely calculate things.  Whatever is done in the real world as a result is a byproduct of that calculation.

In general, a simple way to think of functional programming is whether output is solely dependent on input, but in relational math, this actually becomes a remarkably blurry line.  With data which can be looked up or stored, this line is often drawn at the point of mutability, because otherwise it is very difficult for routines to share data structures.  This means that data and state which cannot change during the function's existence does not count against it being a function.

Now, obviously a functional view of programming is pretty limited.  While a network server might listen to requests and return a function of input, side effects in the control flow become impossible in a purely function-oriented approach.  For this reason many functional programming environments I have worked with have imperative programs which link the calculations to real-world actions and side effects.

II How State Complicates Programming and Testing

Application state is a major source of bugs in any program.  Very often unforeseen circumstances arise because of slow changes the application's

[continue reading]

Posted by Hadi Moshayedi on 2014-08-11 at 09:43:15

We are excited to announce the release of cstore_fdw 1.1, Citus Data's open source columnar store extension for PostgreSQL. The changes in this release include:

  • Automatic file management. The filename foreign table option has become optional, and cstore_fdw uses a default directory inside PostgreSQL’s data directory to manage cstore tables.
  • Automatically delete table files on DROP FOREIGN TABLE. In cstore_fdw v1.0 it was a user's responsibility to delete the files created by cstore_fdw after dropping a table. Failure to properly delete the files could result in unexpected behavior in the future. For example, if a user dropped a table and then created another table with the same filename option they could get errors when querying the new table. cstore_fdw now automatically deletes table files on DROP FOREIGN TABLE and eliminates these kinds of problems.
  • cstore_table_size. The new cstore_table_size('tablename') function can be used to get the size of a cstore table in bytes.
  • Improved documentation. “Using Skip Indexes” and “Uninstalling cstore_fdw” sections were added to the README file.
  • Bug fixes:
    • Previously querying empty tables errored out. These tables can now be queried as expected.
    • Previously cost estimation functions overestimated number of columns. The source of estimation error has been fixed.

For installation and update instructions, please see cstore_fdw’s page in GitHub.

What’s next?

We continue to improve cstore_fdw and there are several tasks in progress, among which are:

  • Improve memory usage. Users have reported out-of-memory problems when using cstore_fdw. We have conducted a detailed study of how we can improve cstore_fdw’s memory usage and are planning to improve the memory usage in coming releases.
  • Vectorized execution. Can Güler who is doing his internship this summer in Citus Data, is prototyping vectorized execution primitives in cstore_fdw. We expect this to lead to significant performance improvements for queries which involve filtering and aggregation. Making this production-ready is

[continue reading]

pg_receivexlog is an in-core utility of Postgres able to recover WAL files through a stream using the replication protocol. It is particularly useful when for example using it to transfer some WAL files to a proxy node when standby node cannot connect directly to a root node for whatever reason. The standby can then replay the WAL files obtained. The reliability of this utility has been improved in Postgres 9.5 with the following commit:

commit: 3dad73e71f08abd86564d5090a58ca71740e07e0
author: Fujii Masao <>
date: Fri, 8 Aug 2014 16:50:54 +0900
Add -F option to pg_receivexlog, for specifying fsync interval.

This allows us to specify the maximum time to issue fsync to ensure
the received WAL file is safely flushed to disk. Without this,
pg_receivexlog always flushes WAL file only when it's closed and
which can cause WAL data to be lost at the event of a crash.

Furuya Osamu, heavily modified by me.

Thanks to the addition of a new option called -F/--fsync-interval, user can now control the interval of time between which WAL records are flushed to disk with fsync calls.

The default value, 0, makes flush occur only when a WAL file is closed. This is the same flush strategy as in the previous versions of this utility (since 9.2 precisely).

On the contrary, specifying -1 will make sure that WAL data is flushed as soon as possible, in this case at the moment when WAL data is available.

Now, using this option is rather simple:

pg_receivexlog -v -D /path/to/raw_wal/ -F -1 # For maximum flush
pg_receivexlog -v -D /path/to/raw_wal/ -F 0  # For default
pg_receivexlog -v -D raw_wal/ -F 10          # For interval of 10s

The level of information printed in verbose mode has not changed as well, so you can continue to rely on that as before.

$ pg_receivexlog -D raw_wal/ -v
pg_receivexlog: starting log streaming at 0/4D000000 (timeline 1)
pg_receivexlog: finished segment at 0/4E000000 (timeline 1)
pg_receivexlog: finished segment at 0/4F000000 (timeline 1)
pg_receivexlog: finished segment at 0/50000

[continue reading]

When I am doing training here at Cybertec Schönig & Schönig GmbH people often ask, how to load data fast and efficiently. Of course, there are many ways to achieve this. One way not too many know about is VACUUM FREEZE. It makes sure that PostgreSQL can organize data nicely straight away by instantly writing […]
Posted by Fabien Coelho on 2014-08-08 at 06:31:00

In a previous post, I have outlined the time required by a PostgreSQL database to warm-up from a HDD on a read-only load for a database that fits in memory.

In this post, I want to look at write performance on SSD, focusing on the impact of PostgreSQL page size (blocksize), and on checking whether the current 8 kB default is relevant.

Other PostgreSQL SSD Tests

I got interested by reading a post by Hans-Jürgen Schönig who tested very large 1 MB block as well as standard page sizes. What struck me is that there is a significant impact of smaller page size on OLTP performance (6% better for 4 kB, 10% better for 2 kB, with 8 kB as the reference), suggesting that the 8 kB default is not necessarily the best choice, especially when using SSD. However the test is on a relatively small database (pgbench scale factor 10 gives about 150 MB), a short time (typically 1 minute), and with poor performance (less than 150 tps) given that a SSD was used.

I also found an older but excellent post by Tomas Vondra who played with the file system options as well as PostgreSQL page and WAL block size. The results show a +40% performance boost for 4 kB pages wrt 8 kB pages for read-write loads on a SSD. The 300 pgbench scaling is significant, reported runs seem to be 5 minutes long, and the +1000 tps performances are consistent with the SSD hardware. The downside is that both page and WAL block sizes are moved together, and only one shot performances seem to be reported, although my past experience with pgbench makes me wary of performance reproducibility and stability.

Test Setup

I choose pgbench simple updates with prepared transactions (-N -M prepared) to focus on write performance without contention interference, scale 100 so that the database is reasonably large but can be kept in memory, 4 clients (-c 4) because my laptop has 2 HT cores. I run a number of 1000 seconds tests so as to derive hopefully meaningful statistics. With the SSD performance the 1000 seconds tests basically overwrite a majority of rows in the account table.

[continue reading]

Posted by gabrielle roth on 2014-08-08 at 00:27:01

When: 7-9pm Thu Aug 21, 2014
Where: iovation
Who: Jim Newman, iovation
What: Elasticsearch & Postgres, a working relationship

Topics to be covered:
– A brief intro to Elasticsearch
– Indexing Postgres with Elasticsearch using the JDBC river plugin
– SQL versus the Elasticsearch Query DSL, similarities (there are some) and differences
– Using Postgres Perl functions for some SQLish access to Elasticsearch
– Feasibility of a foreign data wrapper for Elasticsearch?, pg2elasticsearch is still just a GitHub readme

Jim Newman has 30 years of assorted database-related activity and shooting pool. He’s currently employed as Principal Data Architect at iovation.

This month we get to try yet another meeting location while the iovation offices are being renovated. We’re still in the US Bancorp Tower at 111 SW 5th (5th & Oak), but we’ll be in the iovation temporary office on the 36th floor, in the “West Hills” conference room. Once off the elevator, enter through the door with the small “iovation” sign, then just go right and keep right until you hit a dead end, which is where “West Hills” is. Signs will be posted.

The building is on the Green & Yellow Max lines. Underground bike parking is available in the parking garage; outdoors all around the block in the usual spots.

See you there!

We are looking for endorsement quotes for the 9.4 release.
Specifically, we want quotes from end-users or ISVs* who have not been
featured for a previous release, and enthuse about one of the following
  • Streaming Changesets/Bi-Directional Replication
  • Performance improvements in 9.4
  • Something else about Flexibility, Scalability or Performance
We need these quotes by August 15th.  Thanks!

(* while we love our PostgreSQL consulting and support companies,
endorsements from them aren't news in a PR sense)
Posted by Robert Haas in EnterpriseDB on 2014-08-06 at 18:35:00
Database performance and hardware selection are complicated topics, and a great deal has been written on that topic over the years by many very smart people, like Greg Smith, who wrote a whole book about PostgreSQL performance.  In many cases, the answers to performance questions require deep understanding of software and hardware characteristics and careful study and planning.

But sometimes the explanation is something very simple, such as "you don't have enough memory".

Read more »
There is probably no piece of relational database theory which is so poorly understood in professional literature than first normal form.  This piece seeks to ground an understanding of 1NF in set mathematics not only so that it can be better leveraged but also so that one can be aware of when (and when not) to violate it.

I:  Defining First Normal Form

In the previous post, we talked about the definition of function and relation.  This is key to understanding first normal form.

First Normal Form requires two things, as defined by Codd:
  1. Each relation is a well-formed relation (i.e. a well-formed set of fact correlations). 
  2. No element of the relation (i.e. correlated fact) is itself a set.  Therefore sets within sets are not allowed.
By extension the second prong prevents elements of the relation from having forms which are readily reduced to sets.  For example, an unordered comma separated list of values violates 1NF.  A clearer way of putting the second prong is:

Each element of a relation must contain a single value for its domain.

A clear mathematical way of looking at First Normal Form is:

  1. Each relation is a set of tuples.
  2. No tuple contains a set.

II: Sets and Ordinality

Let's further note that one defining feature of a set is that it has no ordering.  More formally we would say that sets have no ordinality.  {1, 2, 3} = {2, 1, 3}.  For this reason, no duplicate rows may be allowed, as that means that a relation is not a well-formed set.  

Tuples (the correspondences which make up the relation) do have mandatory ordering, and allow duplication.  However the ordering is not "natural" and therefore can be specified in relational operations.  In relational algebra, the project operation allows you to generate one tuple ordering from another.

Ordinality is an extremely important concept regarding normalization and arrays in the database, as we will see with SQL arrays.

III: SQL Arrays, Ordinality and 1NF

An SQL array is actually a mathematical matrix, carrying with it all the principle propert

[continue reading]