The world's most advanced open source database
Top posters
Number of posts in the past month
Top teams
Number of posts in the past month
  • Policy for being listed on Planet PostgreSQL.
  • Add your blog to Planet PostgreSQL.
  • List of all subscribed blogs.
  • Manage your registration.
  • Get in touch with the Planet PostgreSQL administrators at planet at

PostgreSQL has json support – but you shouldn’t use it for the great majority of what you’re doing. This goes for hstore too, and the new jsonb type. These types are useful tools where they’re needed, but should not be your first choice when modelling your data in PostgreSQL, as it’ll make querying and manipulating it harder.

Some readers will be familiar with the (anti)-pattern. EAV has been a necessary evil in database schemas for a long time. It’s something we all advise against using and look down on, then resort to when reality and business requirements mean not everything can be fit into rigid fields in properly modelled relations. For those who don’t know it, it’s a schema design where you turn a relational database into a poor-man’s key/value store using a table with object_id (“entity”), key (“attribute”) and value columns. Querying it usually involves lots and lots of self-joins.

json is the new EAV – a great tool when you need it, but not something you should use as a first choice.

(Most of this also applies to PostgreSQL arrays, which are great tools for building queries, but not usually an appropriate way to store authorative data).

JSON blob tables

Since the release of PostgreSQL 9.3, and even more since jsonb in 9.4, I’m seeing more and more Stack Overflow questions where people ask how to do joins on json objects, how to query for a nested key inside any array element, etc. All reasonable things to do, but when asked for schema and example data they’ll often post something like this:

    id serial primary key,
    data jsonb not null

INSERT INTO people(data) VALUES ($$
    "name": "Bob",
    "addresses": [
            "street": "Centre",
            "streetnumber": 24,
            "town": "Thornlie",
            "state": "WesternAustralia",
            "country": "Australia"
            "street": "Example",
            "streetnumber": "4/311-313",
            "town": "Auckland",
            "country": "NewZealand"

[continue reading]

Posted by John DeSoi on 2015-05-05 at 03:07:40

I came across this motorcycle leaving the movies on Friday. Very cool!

Lately I've been experimenting with building semi-schemaless apps. These are apps where much of the data may never be used for reporting aside from story telling and also that as time goes by some of these may be revisited and converted to more structured fields for easier roll-up and reporting. For the front-end UI, I'm using AngularJS which naturally spits out data as JSON and can autobind to JSON data of any complexity. My stored functions in PostgreSQL take JSON blobs as inputs spit it out into various tables and throws the whole thing in a jsonb field for later consumption (it's a bit redundant). Similarly they return JSON back. One of the things I wanted to be able to do was take this jsonb blob and tack on additional properties from well-structured fields or even a whole set of data like sub recordsets to feed back to my app in JSON. While there are lots of functions in PostgreSQL 9.3/9.4 that can easily build json objects from records, aggregate rows, etc. I couldn't find a function that allowed me to just add a property to an existing JSON object, so I went to my tried and true old-pal PL/V8 for some comfort. Here is a quickie function I created in PL/V8 that did what I needed. Hopefully it will be of use to others or others might have other ideas of doing this that I missed.

Continue reading "Adding properties to existing JSON object with PLV8"
On 26th of April, Peter Eisentraut committed patch: Add transforms feature   This provides a mechanism for specifying conversions between SQL data types and procedural languages. As examples, there are transforms for hstore and ltree for PL/Perl and PL/Python.   reviews by Pavel Stěhule and Andres Freund After this was committed, there were subsequent patches […]

I’ve been asked this question in some shape or form at least 3 times, mostly from people puzzled why they get this error. The last iteration went something like this:

I can’t use ST_AsPNG when doing something like

    FROM  sometable;

Gives error: Warning: pg_query(): Query failed: ERROR: rt_raster_to_gdal: Could not load the output GDAL driver.

Continue Reading by clicking title hyperlink ..

Last week, I covered how MVCC, PGDB’s storage system, works on a very high level. Near the end, I also mentioned that it doesn’t quite lend itself well to certain use cases, such as rapidly mutating session storage. Well, there is one caveat to that statement that I’d forgotten about because of its relatively limited utility: unlogged tables.

Here’s what the PGDB documentation has to say about unlogged tables:

Data written to unlogged tables is not written to the write-ahead log, which makes them considerably faster than ordinary tables. However, they are not crash-safe: an unlogged table is automatically truncated after a crash or unclean shutdown. The contents of an unlogged table are also not replicated to standby servers.

But what does this actually mean in the back end? That comes down to how databases interact with the underlying filesystem. In the case of PGDB, all writes to the database are actually first saved to a binary file that is decoupled from the actual data files. Then a periodic event called a checkpoint will commit those changes to the target table files. This way, crashes do not damage the database, the binary logs can be used to replicate to other instances, and backups can be rolled forward in time by applying transaction logs manually.

This is why unlogged tables have absolutely no crash protection, will not be copied to the replication stream, and get truncated if there’s any doubt regarding the contents. There’s no way to verify the data they contain, and they’ve been pruned from the very infrastructure that gives PGDB its durability. So what are they good for, then?

Let’s run a very short experiment. I commonly recommend any PostgreSQL server be set up with these three options in postgresql.conf:

wal_level = hot_standby
archive_mode = on
archive_command = 'exit 0'

Since archive mode can only be modified by restarting the server, it’s a good idea to just leave it enabled, but neutralized. PGDB does not archive transaction logs with these settings, just as imposed by the defaults. Ho

[continue reading]

On Unix-Linux platforms, Postgres binaries are not authorized to run as root use for security reasons when those utilities directly manipulate files in the data directory. See for example initdb:

initdb: cannot be run as root
Please log in (using, e.g., "su") as the (unprivileged) user that will
own the server process.

Or the backend binary:

"root" execution of the PostgreSQL server is not permitted.
The server must be started under an unprivileged user ID to prevent
possible system security compromise.  See the documentation for
more information on how to properly start the server.

On Windows as well, platform where everything is different, the error that the "postgres" binary returns is a bit different but similar:

Execution of PostgreSQL by a user with administrative permissions is not
The server must be started under an unprivileged user ID to prevent
possible system security compromises.  See the documentation for
more information on how to properly start the server.

The PostgreSQL server cannot be logically started if it has administrator privileges. But now, there is an exception for frontend utilities like initdb, which can be run as a user with Administrator privileges by using a restricted token ensuring that the process is being run under a secured context (See CreateRestrictedToken refering to DISABLE_MAX_PRIVILEGE).

Most of the frontend utilities of Postgres make use of a restricted token, but actually this was not the case of two of them, as reported a couple of months back. The discussion regarding the bug report has resulted in the following commit:

commit: fa1e5afa8a26d467aec7c8b36a0b749b690f636c
author: Andrew Dunstan <>
date: Mon, 30 Mar 2015 17:07:52 -0400
Run pg_upgrade and pg_resetxlog with restricted token on Windows

As with initdb these programs need to run with a restricted token, and
if they don't pg_upgrade will fail when run as a user with Adminstrator

Backpatch to all live branches. On the development branch the code is

[continue reading]

Posted by Josh Berkus on 2015-04-29 at 17:37:16

pgConf Silicon Valley is now accepting proposals for talks. The conference is primarily looking for talks about high-performance PostgreSQL, including scale-out, data warehousing, analytics, case studies, and more. Both regular talks and tutorials are solicited.

pgConfSV is also looking for sponsors. The conference will be on November 17 and 18th. Talk submissions are due by June 15th.

pgConf Silicon Valley is organized under the San Francisco PostgreSQL User Group, and is produced by CitusData.

A previous blog I wrote on JSON functionality in PostgreSQL went wildly popular and it got me thinking about writing a follow-up HOWTO about JSONB.

JSONB, like JSON, is a native data type in PostgreSQL and was introduced in version 9.4. The major difference between the types is signified in the name; the extra ‘B’ stands for binary format. Some more detailed comparison is given in the table below:


  • Stores data in text format
  • Input is as fast, as no conversions are required
  • Processing functions must re-parse the data on each execution
  • Indexing is not supported
  • All white space and line feeds in the input are preserved as-is
  • Duplicate keys are retained, processing functions only consider the last value
  • Order of the keys is preserved


  • Stores data in decomposed binary format
  • Input is slightly slower, as there is conversion overhead involved
  • Re-parsing is not needed, making data processing significantly faster
  • Indexing is supported
  • Extra white space and line feeds are stripped
  • Duplicate keys are purged at input, only the last value is stored
  • Order is not preserved

In general, unless there are some highly specialized needs (like legacy applications designed to assume an order of keys), it is highly recommended that JSONB be used.

Basic handling provided by PostgreSQL for both JSON and JSONB is pretty much the same. Because I have already talked about JSON in a previous blog, I will focus on the difference JSONB provides below.

Ignoring extra white space

Let’s start with the following valid object:

     "key1": "value1",
     "key2": [1, 2],
     "key3": "value3"

JSON data type shows the following:

json_sample=# SELECT '{
                             "key1": "value1",
                             "key2": [1, 2],
                             "key3": "value3"
                  {                    +
                      "key1": "value1",+
                      "key2": [1, 2]

[continue reading]

Posted by Shaun M. Thomas on 2015-04-24 at 16:48:06

As a DBA, I strive not to live in an isolated ivory tower, away from the developers that are destined to fill our databases with volumes of data from a myriad of applications. It’s important, I think, to occasionally come back to the basics. So I’d like to discuss one of the core elements that PGDB DBAs might be intimately familiar with, but comes up often enough that some clarification is warranted. How does PostgreSQL store data? The answer may surprise you, and is critical to understand before regularly interacting with a PGDB system.

The storage engine PGDB uses is called Multi Version Concurrency Control, or MVCC for short. There is excellent documentation of the subject, along with further resources that attempt to convey a thorough understanding, and succeed in that regard. I strongly encourage anyone who works with PostgreSQL in any context, to read these until they can recite the basics from memory. It is the very bedrock of the database.

But we need an executive summary, so let’s start with an extremely simple table with an id and generic column:

ID Stuff
11 foo
21 bar
31 baz

Let’s say that some time in the future, we update ID 2. This would be the table at that point:

ID Stuff
11 foo
21 bar
31 baz
218 moo

Notice that there are now two copies of the row for ID 2. How is it possible for PGDB to differentiate between them? Which one is “correct”?

To answer those two questions, I’ll need a short aside to explain transaction counters. PGDB keeps a counter that it assigns to all database activities, regardless of origin. Because these numbers can’t cycle endlessly, it currently wraps after two billion values have been used. It then applies an “age” algorithm to produce an abstract representation of data antiquity. For the purposes of this discussion, the subscript in the examples represents the transaction age.

Every action gets a transaction ID, and every transaction ID acts like a snapshot because it provides a stable comparison point. Even if you don’t explicitly call BEGIN T

[continue reading]

On 5th of April, Simon Riggs committed patch: Reduce lock levels of some trigger DDL and add FKs   Reduce lock levels to ShareRowExclusive for the following SQL CREATE TRIGGER (but not DROP or ALTER) ALTER TABLE ENABLE TRIGGER ALTER TABLE DISABLE TRIGGER ALTER TABLE … ADD CONSTRAINT FOREIGN KEY   Original work by Simon […]
The PUG meeting was good. We now have a consistent if small group that are attending. Before the presentation we spoke about possibly moving the group to meetup to get a little better visibility. G+ Communities are awesome but Meetup seems to be where the people in the area look.

The presentation was provided by Eric Worden who happens to be a CMD employee. The talk overall is very good and provided a lot of information that I didn't know about dates. It also lead to the development of a new PostgreSQL extension (more on that at a later time).

The most interesting part of the talk to me was the use of a dimensions table to make date queries much, much faster. Either he or I will be submitting a blog post on that feature alone.

If you are interested in seeing what he has to say you can visit us at the Whatcom PgDay being hosted at LinuxFestNorthwest this weekend!

robotIn the second part of the Automating Barman with Puppet series we configured, via Puppet, two virtual machines: a PostgreSQL server and a Barman server to back it up. However, human intervention was required to perform the SSH key exchange and most of the manifest was written to allow the servers to access each other. In this third and final part of the series, we will look at how to configure a third VM that will act as the Puppet Master and use it to simplify the configuration of PostgreSQL and Barman. 

The entire code of this tutorial is on GitHub at

Configuring the Puppet Master: Vagrant

First, change the Vagrantfile to boot a third VM, called “puppet”, which will be our Puppet Master. To ensure that the machine is instantly accessible by the Puppet agents present on each VM, we add a “puppet” entry in the /etc/hosts file with the first script we run. We need also to enable the Puppet agent, as Debian-like distributions disable it by default.
Finally, within the Vagrantfile, let’s make a distinction between master and agents. The master will initially load its configuration straight from the manifest files, then the agents running on each host will apply the configuration sent from the master. Agents will also send data back to the master allowing other nodes to use it to build their configuration. For this reason, an agent is also set to run on the master.

The Vagrantfile is as follows:

Vagrant.configure("2") do |config|
    :puppet => {
      :ip      => '',
      :box     => 'ubuntu/trusty64',
      :role    => 'master'
    :pg => {
      :ip      => '',
      :box     => 'ubuntu/trusty64',
      :role    => 'agent'
    :backup => {
      :ip      => '',
      :box     => 'ubuntu/trusty64',
      :role    => 'agent'
  }.each do |name,cfg|
    config.vm.define name do |local| = cfg[:box]
      local.vm.hostname = name.to_s + '.local.lan' :

[continue reading]

Similarly to the post of a couple of weeks back relating about the new memory allocation routine able to give a plan B route in case of OOM, here is a follow-up commit adding more infrastructure in the same area but this time for some widely-used memory allocation routines:

commit: 8c8a886268dfa616193dadc98e44e0715f884614
author: Fujii Masao <>
date: Fri, 3 Apr 2015 17:36:12 +0900
Add palloc_extended for frontend and backend.

This commit also adds pg_malloc_extended for frontend. These interfaces
can be used to control at a lower level memory allocation using an interface
similar to MemoryContextAllocExtended. For example, the callers can specify
MCXT_ALLOC_NO_OOM if they want to suppress the "out of memory" error while
allocating the memory and handle a NULL return value.

Michael Paquier, reviewed by me.

palloc_extended() is an equivalent of palloc() that operates on CurrentMemoryContext (understand by that the current memory context a process is using) with a set of flags, named the same way for frontend and backend:

  • MCXT_ALLOC_HUGE for allocations larger than 1GB. This flag has an effect on backend-side only, frontend routines using directly malloc.
  • MCXT_ALLOC_ZERO for zero allocation.
  • MCXT_ALLOC_NO_OOM to bypass an ERROR message in case of an out-of-memory and return NULL to the caller instead. This is the real meat. In the case of frontends, not using this flag results in leaving with exit(1) immediately.

The advantage of this routine is that it is made available for both frontends and backends, so when sharing code between both things, like xlogreader.c used by pg_xlogdump and pg_rewind on frontend-side and by Postgres backend, consistent code can be used for everything, making maintenance far easier.

A last thing to note is the addition of pg_malloc_extended(), which is available only for frontends, which is a natural extension similar to what already exists for pg_malloc0(), pg_realloc().

Posted by Joshua Drake in CommandPrompt on 2015-04-20 at 18:20:25
Saturday the 18th of April, I woke up to the following:

It was one of those moments that you realize just how blessed of a life you have. A moment where you stop and realize that you must have done something right, at least once. I was with my all of my ladies, there were no other people at the camp site, the weather was clear and it was set to hit 68F. The only sound was the gentle lapping of water and the occasional goose.

It was at this time that I was able to finally take a step back and reflect on PgConf.US. This conference meant a lot to me professionally. I didn't organize it. I only spoke at it, Command Prompt sponsored, and I occasionally offered feedback to the conference committee (as it is run by PgUS) via PgUS board meetings. This conference has become everything (and more) I tried to make the United States PostgreSQL Conference series. It is a conference of the best, the brightest and most importantly the users of our beloved elephant. In the United States, it is "The" PostgreSQL Conference.

That isn't to say there aren't other PostgreSQL conferences in the states. There are at least two others that run, but somehow after this latest conference I feel they are destined to niche attendance. There is nothing wrong with that and frankly we need the niche conferences. They fill a hole, else people wouldn't attend. It is just that experiencing the level of effort and greatness that was created by Jonathan and Jim was truly something awesome to behold.

They aren't paid to run these conferences. They do it because they want to help our community. They do it for free and speaking as somebody who has run more PostgreSQL conferences than any other current PostgreSQL conference organizer in the U.S., it is a thankless job. It is a volunteer effort that everyone should take a moment to thank them for. Jonathan, Jimmy: Thank you for all your efforts. The community could not have had such a great conference without you.

I have only two constructive feedback points for the organizers:

  1. Do not allow tutorials base

[continue reading]

I previously blogged about NoSQL support in PostgreSQL and then later wrote a tutorial on using JSON data type. Today, I will be talking about another angle of NoSQL in PostgreSQL, handling key-value data using the HSTORE contrib module.

HSTORE was introduced back in 2006 as part of PostgreSQL 8.2, and I doubt the authors had any idea at the time how NoSQL popularity will skyrocket in the coming years. Using HSTORE, developers can emulate a schema-less semi-structured data store while staying within the fully ACID-compliant domain of PostgreSQL, a relational database.

Without further ado, here is how some basic HSTORE functionality can be handled …

Create the extension

The very first step is to create the HSTORE extension, essentially loading the contrib module to your PostgreSQL instance.

hstore_db=# CREATE EXTENSION hstore;

Create a table with HSTORE data type

Once the extension is loaded, creating a table with a column of HSTORE data type is fairly straightforward. The syntax is exactly the same as any native data type.

hstore_db=# CREATE TABLE hstore_test ( id SERIAL, sale HSTORE );

Insert data into HSTORE column

You can use the ‘=>’operator to assign values to keys while you insert data.

hstore_db=# INSERT INTO hstore_test (sale) 
            VALUES ('"milk"=>"4", 

Retrieve data from an HSTORE column

HSTORE data can be retrieved like any other native PostgreSQL data type.

hstore_db=# SELECT sale FROM hstore_test;
"milk"=>"4", "bread"=>"2", "cereal"=>"1", "bananas"=>"12"
(1 row)

Retrieve value for a particular key

HSTORE provides the ‘->’ operator to retrieve value of a particular key from the table.

hstore_db=# SELECT sale -> 'bread' AS quantity FROM hstore_test;
(1 row)

Use value of a key in the WHERE cl

[continue reading]

Posted by Andrew Dunstan in pgExperts on 2015-04-18 at 15:22:00
Unfortunately there was a small bug in yesterday's buildfarm client release. The bug only affects MSVC builds, which would fail silently on the HEAD (master) branch.

There is a bug fix release available at or you can just pick up the fixed version of (the only thing changed) at

Sorry for the inconvenience.
We all know that \copy command does not return anything when you load the data. The idea is to capture how many # of records got loaded into table through \copy command.
Here's a shell script that should work:
echo number of rows in input: $(wc -l
( echo "\copy test from stdin delimiter '|';" ; cat ) | psql -v ON_ERROR_STOP=1
echo psql exit code $?

If the exit code printed is 0, everything went well, and the value printed by the first echo can be used to to indicate how many rows were inserted. If the printed exit code is non-zero, no rows were inserted, of course. If the exit code printed is 3 then the data being copied had some error.

From the docs: If the exit code printed is 1 or 2 then something went wrong in psql (like it ran out of memory) or the server connection was broken, respectively. Following facts play a role in the above script:

.) COPY (and hence \copy) expects the input records to be terminated by a newline. So counting the number of newlines in the input is a reliable way of counting the records inserted.
.) psql will exit with code 3 iff there's an error in script and ON_ERROR_STOP is set. 
Note: This seems to not apply to the `psql -c "sql command"` construct.

# Example clean input

$ pgsql -c "create table test(a text,b int);"
$ cat

$ echo number of rows in input: $(wc -l; ( echo "\copy test from stdin delimiter '|';" ; cat ) | psql -v ON_ERROR_STOP=1 ; echo psql exit code $?
number of rows in input: 6
psql exit code 0

# Example malformed input
$ cat

$ echo number of rows in input: $(wc -l; ( echo "\copy test from stdin delimiter '|';" ; cat ) | pgsql -v ON_ERROR_STOP=1 ; echo psql exit code $?
number of rows in input: 6
ERROR: invalid input syntax for integer: "2c"
CONTEXT: COPY test, line 3, column b: "2c"
psql exit code 3
I hope this helps someone.
(Baji is trying to impress 'X')
Baji: Packt Publishing has published a book on troubleshooting PostgreSQL database.
 _X_: Uh, so what(!?). It published other 4 PostgreSQL books this year !
Baji: yeah, I know !
 _X_: then why do you care about thisssss.
Baji: I should care about it as I was part of technical reviewing team.. :(
 _X_: Oh really !, thats fantastic.. Congratulations !

Note: Finally, Baji impressed _X_ :-)

Ok, in reality, I am glad to announce that "My first book as a Technical Reviewer has been published by Packt Publishing" ;-)

Author of this book is Hans-Jürgen Schönig, he has couple of other PostgreSQL Books as well.

This book is to provide a series of valuable troubleshooting solutions to database administrators responsible for maintaining a PostgreSQL database. It is aimed at PostgreSQL administrators who have developed an application with PostgreSQL, and need solutions to common administration problems they encounter when managing a database instance. So give a try ;-)

I would like to thank my loving parents for everything they did for me. Personal time always belongs to family, and I did this in my personal time.

I want to thank the Packt Publishing for giving me this opportunity and thanks to Sanchita Mandal and Paushali Desai for choosing me and working with me for this project.

Last but not least, would like to thanks Dinesh Kumar who taught me PostgreSQL and inspiring me for this. :)
Posted by Amit Kapila in EnterpriseDB on 2015-04-18 at 03:38:00

I have ran some benchmark tests to see the Write performance/scalability in 
PostgreSQL 9.5 and thought it would be good to share the same with others,
so writing this blog post.

I have ran a pgbench tests (TPC-B (sort of) load) to compare the performance
difference between different modes and scale factor in HEAD (e5f455f5) on
IBM POWER-8 having 24 cores, 192 hardware threads, 492GB RAM
and here are the performance result

Some of the default settings used in all the tests are:
checkpoint_timeout    =35min
maintenance_work_mem = 1GB
checkpoint_completion_target = 0.9

I have kept auto vacuum as off to reduce the fluctuation due to same and is
dropping and re-creating the database after each run.  I have kept high values
of min_wal_size and max_wal_size to reduce the effect of checkpoints, probably
somewhat lower values could have served the purpose of this workload, but I
haven't tried it.

The data is mainly taken for 2 kind of modes (synchronous_commit = on | off) and
at 2 different scale factors to cover the cases when all the data fits in shared buffers
(scale_factor = 300) and when all the data can't fit in shared buffers, but can fit in
RAM (scale_factor = 3000).

First lets talk about synchronous_commit = off case, here when all the data fits in
shared_buffers (scale_factor = 300), we can see the scalability upto 64 client count
with TPS being approximately 75 percent higher at 64 client-count as compare to 8
client count which doesn't look bad. When all the data doesn't fit in shared buffers,
but fit in RAM (scale_factor = 3000), we can see scalability upto 32 client-count with
TPS being 64 percent higher than at 8 client-count and then it falls there on.

One major difference in case of Writes when data doesn't fit in shared_buffers is
that backends performing transactions needs to write the dirty buffers themselves
when they are not able to find a clean buffer to read the page, this can hamper
the TPS.

Now let's talk about synch

[continue reading]

PGDB has had anonymous blocks since the release of 9.0 in late 2010. But it must either be one of those features that got lost in the shuffle, or is otherwise considered too advanced, because I rarely see it used in the wild. If that’s the case, it’s a great shame considering the raw power it conveys. Without committing to a function, we can essentially execute any code in the database, with or without SQL input.

Why is that good? One potential element of overhead when communicating with a database is network transfer. If processing millions of rows, forcing PGDB to allocate and push those results over the network will be much slower than manipulating them locally within the database itself. However, the looming specter of ad-hoc scripts is always a threat as well.

It was the latter scenario that prompted this particular discussion. A few weeks ago, I addressed date-based constraints and how they’re easy to get wrong. Knowing this, there’s a good chance we have objects in our database that need revision in order to operate properly. In one particular instance, I needed to correct over 800 existing check constraints an automated system built over the last year.

I hope you can imagine that’s not something I would want to do by hand. So it was a great opportunity to invoke an anonymous block, because there’s very little chance I’d need to do this regularly enough to justify a fully-fledged function. In the end, I came up with something like this:

DO $$
  chk TEXT;
  col TEXT;
  edate DATE;
  sdate DATE;
  tab TEXT;
  ym TEXT;
  FOR tab, chk, col IN 
      SELECT i.inhrelid::REGCLASS::TEXT AS tab,
             co.conname AS cname,
             substring(co.consrc FROM '\w+') AS col
        FROM pg_inherits i
        JOIN pg_constraint co ON (co.conrelid = i.inhrelid)
       WHERE co.contype = 'c'
    ym := substring(tab FROM '......$');
    sdate := to_date(ym, 'YYYYMM01');
    edate := sdate + INTERVAL '1 mon';


[continue reading]

Posted by Andrew Dunstan in pgExperts on 2015-04-17 at 14:34:00
I have just released version 4.15 of the PostgreSQL Buildfarm Client. Here's what's changed:
  • support the new location for pg_upgrade
  • support running tests of client programs
  • support building, installing and running testmodules
  • use a default ccache directory
  • improve logging when running pg_upgrade tests
  • handle odd location of Python3 regression tests
  • add timestamp to default log_line_prefix
  • make qw() errors in the config file fatal (helps detect errors)
  • minor bug fixes for web script settings.
  • allow for using linked git directories in non-master branches
The last item might need a little explanation.  Essentially this can reduce quite dramatically the amount of space required if you are building on more than one branch. Instead of keeping, say, 6 checked out repos for the current six tested branches, we keep one and link all the others to it. This works almost exactly the way git-new-workdir does (I stole the logic from there). This doesn't work in a couple of situations: if you are using Windows or if you are using git-reference. In these cases the new setting is simply ignored.

To enable this new setting in an existing installation, do the following after installing the new release:
  • in your config file, add this setting:
    git_use_workdirs => 1,
  • remove the pgsql directory in each branch directory other than HEAD
Another good thing to do in existing installations would be to add "%m" to the beginning of the log_line_prefix setting in extra_config stanza.

Posted by Peter Eisentraut on 2015-04-17 at 00:00:00

About two months ago, this happened:

And a few hours later:

It took a few more hours and days after this to refine some details, but I have now tagged the first release of this extension. Give it a try and let me know what you think. Bug reports and feature requests are welcome.

(I chose to name the data type uri instead of url, as originally suggested, because that is more correct and matches what the parsing library calls it. One could create a domain if one prefers the other name or if one wants to restrict the values to certain kinds of URIs or URLs.)

(If you are interested in storing email addresses, here is an idea.)

Posted by Andreas 'ads' Scherbaum on 2015-04-16 at 23:30:13

As Josh posted before, the 2nd South Bay PostgreSQL Meetup will take place at Adobe in San Jose, on April 28th.

Happy to announce that CK Tan from VitesseData will speak about how Vitesse DB speeds up analytics queries in PostgreSQL. Heikki Linnakangas will speak about pg_rewind.

Please register for the event, only registered users will be allowed into the building by security.

Also if you want to have Wifi access, please send an email to not later than the 24th, please include "SBPUG" in the subject.

See you there!

Posted by Josh Berkus in pgExperts on 2015-04-16 at 19:07:00
So ... you're using some of 9.4's new advanced aggregates, including FILTER and WITHIN GROUP.  You want to take some statistical samples of your data, including median, mode, and a count of validated rows.  However, your incoming data is floats and you want to store the samples as INTs, since the source data is actually whole numbers.  Also, COUNT(*) returns BIGINT by default, and you want to round it to INT as well.  So you do this:

        count(*)::INT as present,
        count(*)::INT FILTER (WHERE valid) as valid_count,
        mode()::INT WITHIN GROUP (order by val) as mode,
        percentile_disc(0.5)::INT WITHIN GROUP (order by val)
          as median
    FROM dataflow_0913
    GROUP BY device_id
    ORDER BY device_id;

And you get this unhelpful error message:

    ERROR:  syntax error at or near "FILTER"
    LINE 4:         count(*)::INT FILTER (WHERE valid)
            as valid_count,

And your first thought is that you're not using 9.4, or you got the filter clause wrong.  But that's not the problem.  The problem is that "aggregate() FILTER (where clause)" is a syntactical unit, and cannot be broken up by other expressions.  Hence the syntax error.  The correct expression is this one, with parens around the whole expression and then a cast to INT:

        count(*)::INT as present,
        (count(*) FILTER (WHERE valid))::INT as valid_count,
        (mode() WITHIN GROUP (order by val))::INT as mode,
        (percentile_disc(0.5) WITHIN GROUP (order by val))::INT
           as median
    FROM dataflow_0913
    GROUP BY device_id
    ORDER BY device_id;

If you don't understand this, and you use calculated expressions, you can get a worse result: one which does not produce and error but is nevertheless wrong.  For example, imagine that we were, for some dumb reason, calculating our own average over validated rows.  We might do this:

        sum(val)/count(*) FILTER (WHERE valid) as avg
    FROM dataflow_0913

[continue reading]

Andreas 'ads' Scherbaum 2015 is the sequel of the highly successful German-speaking PostgreSQL Conferences 2011 and 2013. Due to space limitations in the old location, we are moving to Hamburg. The conference takes place on Friday, November 27th. We also add a day with trainings on the 26th.

Registration for the conference will be possible well in advance. Tickets must be purchased online. For sponsors, we have put together a package that includes among other things, a number of discounted ticket. More in the Call for Sponsors in a separate announcement.

Yesterday the pg_upgrade program was moved from contrib to bin in the source tree. Unfortunately this broke most of those buildfarm members which check pg_upgrade. There is a hot fix for the TestUpgrade buildfarm module that can be downloaded from github. I will work on cutting a new buildfarm release in the next few days, but this file can just be dropped in place on existing installations.

log_autovacuum_min_duration is a system-wide parameter controlling a threshold from which autovacuum activity is logged in the system logs. Every person who has already worked on looking at a system where a given set of table is bloated has for sure already been annoyed by the fact that even a high value of log_autovacuum_min_duration offers no guarantee in reducing log spams of not-much-bloated tables whose autovacuum runtime takes more than the threshold value, making its activity being logged (and this is after working on such a lambda system that the author of this feature wrote a patch for it). Postgres 9.5 is coming with a new feature allowing to control this logging threshold at relation level, feature introduced by this commit:

commit: 4ff695b17d32a9c330952192dbc789d31a5e2f5e
author: Alvaro Herrera <>
date: Fri, 3 Apr 2015 11:55:50 -0300
Add log_min_autovacuum_duration per-table option

This is useful to control autovacuum log volume, for situations where
monitoring only a set of tables is necessary.

Author: Michael Paquier
Reviewed by: A team led by Naoya Anzai (also including Akira Kurosawa,
Taiki Kondo, Huong Dangminh), Fujii Masao.

This parameter can be set via CREATE TABLE or ALTER TABLE, with default value being the one defined by the equivalent parameter at server-level, like that for example:

=# CREATE TABLE vac_table (a int) WITH (log_autovacuum_min_duration = 100);
=# ALTER TABLE vac_table SET (log_autovacuum_min_duration = 200);

Note that This parameter has no unit and cannot use any units like the other relation-level options, and it has a default unit of milliseconds, so after CREATE TABLE the autovacuum activity of relation vac_table is logged if its run has taken more than 100ms, and 200ms after ALTER TABLE.

Thinking wider, there are two basically cases where this parameter is useful, an inclusive and an exclusive case:

  • when system-wide log_autovacuum_min_duration is -1, meaning that all the autovacuum activity is ignored for all th

[continue reading]

Posted by Rajeev Rastogi on 2015-04-14 at 05:47:00
I recently just got back from Indian PGDay conference 2015. It was an interesting, motivating and lot of knowledge sharing in terms of both attending and speaking at the conference.

I spoke about the various kind of "Optimizer Hint" provided by many database engines and also a new idea of "Hint", which can be provided to Optimizer. Some of the speakers shared their work on PostgreSQL as User.
Also it was interesting to know that many companies are evaluating migration or are in process of migrating from other DBs to PostgreSQL. This is really encouraging for all PostgreSQL experts.

Some of the details from presentation are as below (For complete presentation please visit Full Optimizer Hint)

Statistics Hint:

Statistics Hint is used to provide any kind of possible statistics related to query, which can be used by optimizer to yield the even better plan compare to what it would have done otherwise.
Since most of the databases stores statistics for a particular column or relation but doesn't store statistics related to join of column or relation. Rather these databases just multiply the statistics of individual column/relation to get the statistics of join, which may not be always correct.

Lets say there is query as

If we calculate independent stats for a and b.
suppose sel(GRADE) = .2 and sel(SALARY) = .2;

then sel (GRADE and SALARY) =
sel(GRADE) * sel (SALARY) = .04.
In all practical cases if we see, these two components will be highly dependent i.e. if first column satisfy,second column will also satisfy. Then in that case sel (GRADE and SALARY) should be .2 not .04. But current optimizer will be incorrect in this case and may give wrong plan.

Data Hint:

This kind of hints provides the information about the relationship/ dependency among relations or column to influence the plan instead of directly hinting to provide desired plan or direct selectivity value. Optimizer can consider dependency information to derive the actual selectivi

[continue reading]

Posted by Josh Berkus on 2015-04-13 at 18:24:03

The South Bay PUG is having a meetup on April 28th. Speakers will include CK Tan of PostgreSQL enhancement company Vitesse Data, and Heikki Linnakangas, PostgreSQL Committer. We do not expect to have live video for this meetup, sorry!

RSVP on the Meetup Page.