The world's most advanced open source database
Top posters
Number of posts in the past month
Top teams
Number of posts in the past month
  • Policy for being listed on Planet PostgreSQL.
  • Add your blog to Planet PostgreSQL.
  • List of all subscribed blogs.
  • Manage your registration.
  • Get in touch with the Planet PostgreSQL administrators at planet at
Posted by Dan Langille on 2014-04-15 at 17:12:43
I remember a time when I’d never been to a conference related to my passions. Once I went, things changed. I realized that making strong working relationships with others who share my passion is important. Not only does this solidify the community of which you are a member, it also helps you personally. Every conference [...]
Posted by Hans-Juergen Schoenig in Cybertec on 2014-04-15 at 07:07:19
When cleaning up some old paperwork this weekend I stumbled over a very old tutorial. In fact, I have received this little handout during a UNIX course I attended voluntarily during my first year at university. It seems that those two days have really changed my life – the price tag: 100 Austrian Schillings which […]
Posted by gabrielle roth on 2014-04-15 at 00:44:00

When: 7-9pm Thu April 17, 2014
Where: Iovation
Who: Brian Panulla
What: Intro to Graph Databases

Brian Panulla is a Business Intelligence Developer at Acureo. He’s been fulfilling PDXPUG’s rhombic triacontahedron needs for over a year. Next week, he’ll be giving us an Intro to Graph Databases!

Our meeting will be held at Iovation, on the 32nd floor of the US Bancorp Tower at 111 SW 5th (5th & Oak). It’s right on the Green & Yellow Max lines. Underground bike parking is available in the parking garage; outdoors all around the block in the usual spots. No bikes in the office, sorry!

Building security will close access to the floor at 7:30.

See you there!

On 13th of April, Tom Lane committed patch: Provide moving-aggregate support for a bunch of numerical aggregates.   First installment of the promised moving-aggregate support in built-in aggregates: count(), sum(), avg(), stddev() and variance() for assorted datatypes, though not for float4/float8.   In passing, remove a 2001-vintage kluge in interval_accum(): interval array elements have been […]
Posted by Michael Paquier in VMware on 2014-04-12 at 05:07:04

Except if you have been cut from the Internet the last week, you have already heard of Heartbleed. This good candidate for the "Bug of the Year 2014" price is already costing a better-not-to-count amount of money in maintenance and development for many companies around the world.

It has already been mentioned in which cases a PostgreSQL server would be vulnerable, but you might want to run some tests to check that the update of openssl on your server is effective. After this bug went public on 2014/04/07, many scripts have popped around on the net to help you checking if a server is vulnerable to this bug or not. However, you need to know that you may not be able to test them directly on a Postgres server as Postgres uses a custom protocol before handling the connection to openssl. A connection needs to send first a message called SSLRequest described here, consisting of two 32-bit integers, 8 and 80877103. The server then answers a single byte, either 'S' if SSL connection is supported, or 'N' if not. Once 'S' is received the SSL startup handshake message can be sent for further processing.

Taking that into account, an example of script usable to test Heartbleed vulnerability on a Postgres server can be found here, extended by my colleague Heikki Linnakangas and forked from here. Particularly, note this portion of the code to handle the PostgreSQL custom protocol:

sslrequest = h2bin('''
00 00 00 08 04 D2 16 2F


print 'Sending PostgreSQL SSLRequest...'
print 'Waiting for server response...'

sslresponse = recvall(s, 1)
if sslresponse == None:
  print 'Server closed connection without responding to SSLRequest.'
# Server responds 'S' if it accepts SSL, or 'N' if SSL is not supported.
pay = struct.unpack('>B', sslresponse)
if pay[0] == 0x4E: # 'N'
  print 'PostgreSQL server does not accept SSL connections.'
if pay[0] != 0x53: # 'S'
  print 'Unexpected response to SSLRequest: %d.', pay

# Continue with SSL start hands

[continue reading]

The Problem

Sometimes you need to generate sample data, like random data for tests. Sometimes you need to generate it with huge amount of code you have in your ORM mappings, just because an architect decided that all the logic needs to be stored in the ORM, and the database should be just a dummy data container. The real reason is not important - the problem is: let’s generate lots of, millions of rows, for a sample table from ORM mappings.

Sometimes the data is read from a file, but due to business logic kept in ORM, you need to load the data from file to ORM and then save the millions of ORM objects to database.

This can be done in many different ways, but here I will concentrate on making that as fast as possible.

I will use PostgreSQL and SQLAlchemy (with psycopg2) for ORM, so all the code will be implemented in Python. I will create a couple of functions, each implementing another solution for saving the data to the database, and I will test them using 10k and 100k of generated ORM objects.

Sample Table

The table I used is quite simple, just a simplified blog post:

  title TEXT NOT NULL,
  payload TEXT NOT NULL

SQLAlchemy Mapping

I'm using SQLAlchemy for ORM, so I need a mapping, I will use this simple one:
class BlogPost(Base):
    __tablename__ = "posts"

    id = Column(Integer, primary_key=True)
    title = Column(Text)
    body = Column(Text)
    payload = Column(Text)

The payload field is just to make the object bigger, to simulate real life where objects can be much more complicated, and thus slower to save to the database.

Generating Random Object

The main idea for this test is to have a randomly generated object, however what I really check is the database speed, and the whole randomness is used at the client side, so having a randomly generated object doesn’t really matter at this moment. The overhead of a fully random function is the same regardless of the method of saving the data to the database. So instead of randomly ge

[continue reading]

Posted by Francisco Figueiredo Jr on 2014-04-09 at 21:13:00
This version restores the @@ operator support removed in the previous version. Thanks Glen Parker who provided a fix.

Downloads can be found at the usual locations:

github download page
the nuget repository.

Pgfoundry will be updated soon.
Posted by Oleg Bartunov on 2014-04-08 at 22:30:25
Just to let people know about our experiments.

Consider this top-level query, which well supported by GIN in 9.4.

postgres=# select count(*) from jb where jb @> '{"tags":[{"term":"NYC"}]}'::jsonb;
(1 row)

What if I want to find just {"term":"NYC"}, or even "NYC" ? Now, with some improvements, jsonb_hash_ops supports such queries with less than 5% bigger index size and not sacrificing performance !

postgres=# select count(*) from jb where jb @>> '{"term":"NYC"}'::jsonb;
(1 row)
postgres=# select count(*) from jb where jb @>> '"NYC"'::jsonb;
(1 row)

I can see the use-case for wildcard queries for shop aggregators, which combine different hierachies, so it's difficult to say if some specific key is on the same level in different sources.

Posted by Szymon Guz in EndPoint on 2014-04-08 at 09:06:00

I found an interesting problem. There was a table with some data, among which there was a date and an integer value. The problem was to get cumulative sum for all the dates, however including dates for which we don't have entries. In case of such dates we should use the last calculated sum.

Example Data

I will use an example table:


with sample data:

# INSERT INTO test(d,v)
  VALUES('2014-02-01', 10),
        ('2014-02-02', 30),
        ('2014-02-05', 10),
        ('2014-02-10', 3);

Then the data in the table looks like:

# SELECT * FROM test;
     d      |  v
 2014-02-01 | 10
 2014-02-02 | 30
 2014-02-05 | 10
 2014-02-10 |  3
(4 rows)

What I want is to have a cumulative sum for each day. Cumulative sum is a sum for all the earlier numbers, so for the above data I want to get:

     d      |  v
 2014-02-01 | 10
 2014-02-02 | 40
 2014-02-05 | 50
 2014-02-10 | 53
(4 rows)

The simple query for getting the data set like shown above is:

FROM test

Filling The Gaps

The query calculates the cumulative sum for each row. Unfortunately this way there are gaps between dates, and the request was to fill those in using the values from previous days.

What I want to get is:

     d      |  v
 2014-02-01 | 10
 2014-02-02 | 40
 2014-02-03 | 40
 2014-02-04 | 40
 2014-02-05 | 50
 2014-02-06 | 50
 2014-02-07 | 50
 2014-02-08 | 50
 2014-02-09 | 50
 2014-02-10 | 53

My first idea was to use the generate_series() function, which can generate a series of data. What I need is a series of all dates between min and max dates. This can be done using:

# SELECT generate_series(
    '1 day')::date;

The generate_series() function arguments are (begin, end, interval). The function returns all timestamps from beginning to end with gi

[continue reading]

Is your PostgreSQL installation vulnerable to the Heartbleed bug in OpenSSL? The TL;DR; version is "maybe, it depends, you should read this whole thing to find out". If you are vulnerable, it is a high risk vulnerability!

The slightly longer version is that it will be vulnerable if you are using SSL, and not vulnerable if you are not. But the situation is not quite that easy, as you may be using SSL even without planning to. PostgreSQL also not provide any extra protection against the bug - if you are using SSL, you are vulnerable to the bug just as with any other service.

As the bug is in OpenSSL, however, what you need to get patched is your OpenSSL installation and not PostgreSQL itself. And of course, remember to restart your services (this includes both PostgreSQL and any other services using SSL on your system). You will then have to consider in your scenario if you have to replace your SSL keys or not - the same rules apply as to any other service.

It depends on if SSL is enabled

PostgreSQL by default ships with SSL turned off on most platforms. The most notable exception is Debian and derivatives (such as Ubuntu), which enable SSL by default.

If SSL is disabled globally, your installation is not vulnerable.

The easiest way to check this is to just use a simple SQL query:

postgres=# show ssl;
(1 row)

If this parameter returns off, you are not vulnerable. If it returns on, you are.

If you do not need SSL, the easiest fix is to turn this off and restart PostgreSQL. This also brings additional benefits of not paying the overhead of encryption if you don't need it. If you actually use SSL, this is of course not an option.

It depends on your installation

If you have installed PostgreSQL using a package based system, such as yum (from redhat/fedora or from, apt (from debian/ubuntu/etc or from, FreeBSD ports etc, it is up to your operating system to provide a patch. Most major distributions have already done this - you just need to to install it (and rest

[continue reading]

As pointed out by Peter Eisentraut in a blog post named
Schema Search Paths Considered Pain in the Butt, you need to make sure the search_path is explicitly set for all SECURITY DEFINER functions in PostgreSQL.

Fixing this manually for, in my case, 2106 functions, is, indeed a “pain in the butt”, so I crafted a little query to automate the job:

\pset format unaligned
\o /tmp/fix_search_path_for_security_definer_functions.sql
            -- inject SET search_path in-between LANGUAGE and SECURITY DEFINER in the declaration
                E'(LANGUAGE [a-z]+)\\s+(SECURITY DEFINER)',
                E'\\1\n SET search_path TO public, pg_temp\n \\2'
from pg_proc
where prosecdef is true -- SECURITY DEFINER functions
-- don't include functions for which we have already specified a search_path
and not (coalesce(array_to_string(proconfig,''),'') like '%search_path%')
-- public schema
and pronamespace = 2200
\i /tmp/fix_search_path_for_security_definer_functions.sql
-- If all goes well you should see a lot of CREATE FUNCTION being spammed on the screen

Posted by Bruce Momjian in EnterpriseDB on 2014-04-07 at 02:30:01

I just returned from attending PGConf NYC. They had 259 participants, more than double last year's total. The conference was in a hotel near Wall Street.

While I have been to many Postgres user conferences, this felt like my first corporate Postgres conference. Presenters from multinational banks Goldman Sachs and Morgan Stanley explained how and why they use Postgres in their organizations. These are trend-setting organizations, and their public embrace of Postgres will have lasting benefits.

In fact, one hallway discussion was how to enable large organizations like these, particularly those with significant legal and regulatory requirements, to work with the Postgres community. Some companies have employees post from non-company email accounts like Gmail, while others contract with consulting companies to work with the Postgres community on their behalf. Unfortunately, neither of these approaches have the companies working with the community openly.

Continue Reading »

PostgreSQL 9.4 is shipping with a new feature called jsonb, which is a new data type able to store JSON data supporting GIN indexing (!). In short, this feature, one of the most important of the upcoming release, if not the most important, puts Postgres directly in good position in the field of document-oriented database systems.

Since 9.2, an integrated JSON datatype already exists, completed with a set of functions (data generation and parsing functions) as well as operators added in 9.3. When using "json" data type, data is stored as an exact copy of the input text which functions working on it need to reparse causing some processing overhead.

The new jsonb data type stores data in a decomposed binary format, so inserting it is less performant than json because of the overhead necessary to put it in shape but it is faster as it does not need reparsing, and it has the advantage to support GIN indexing. For this last reason it is actually recommended to use jsonb for your applications instead of json (you might need only json depending on your needs though). Note as well that jsonb has the same operators as functions as json, you can refer to my previous posts on the matter to get some insight on them or directly at the documentation of Postgres.

Now let's see how jsonb works and let's compare it with json with as data sample a dump of geobase, worth 8.6 million tuples and 1.1GB, with many fields like the city name, country code (you can refer to a complete list of the fields here). After storing the data into a new table with a raw COPY, let's transform it into json/jsonb in a set of tables with a fillfactor at 100 to see how much space they use:

=# COPY geodata FROM '$HOME/Downloads/allCountries.txt';
COPY 8647839
=# CREATE TABLE geodata_jsonb (data jsonb) with (fillfactor=100);
=# CREATE TABLE geodata_json (data json) with (fillfactor=100);
=# \timing
Timing is on.
=# INSERT INTO geodata_json SELECT row_to_json(geodata) FROM geodata;
INSERT 0 8647839
Time: 287158.457 ms

[continue reading]

Yes, truly possible and handled smartly by PostgreSQL. To demo this, first I need to take after the standard technique of Point in Time Recovery in PostgreSQL. Various Books/Articles/Blogs demoed extremely well by extraordinary authors, hence am not going into details of how to do it, however, heading off directly to the subject i.e., how to pause while recovering with same technique. Arguably, I put forth a mathematical expression out of PITR as "PITR = (Last Filesystem Backup(LFB) + WAL Archives generated after LFB + Un-Archived WAL's in current $PGDATA/pg_xlogs)". For better understanding, I have put this into graph, in light of the fact that it clear the thought more: (Sorry, this blog is bit long, unknowingly it happened while going in details of the concept)

PITR steps,which am going to follow with slight changes that I talk about soon:

Step 1. Restore the most recent File System-level backup(FSB) to any location where recovery is planned to perform.
Step 2. If FSB is tar,then untar it, and clean the pg_xlog directory leaving archive_status. If backup has excluded this directory, then create the empty pg_xlog directory in FSB.
Step 3. Copy un-archived WAL's from crashed cluster $PGDATA/pg_xlog into $FSB/pg_xlog (Step 2)
Step 4. Delete the from FSB directory.
Step 5. Create recovery.conf file in FSB directory.
Step 6. Start the cluster (FSB). 

We should put question, when pausing the recovery required ?. Maybe, to prevent multiple base restorations or roll-forward recovery but check in between or rollback a particular tables data or interest to see how far it has recovered :). Remember, pause in recovery means, its allowing to connect while recovering. To outline this, I have reproduced a situation in chart of a particular table rows improvement until to a mishap.

From above diagram, its agreeable a DEMO table rows were 10,00,000 when file system-level backup($PGDATA) taken and 40,00,000 rows before crash. In my local VM, I have made the situation on groundwork of TIME instead of date.

[continue reading]

I have released version 4.12 of the buildfarm client.

In addition to numerous bug fixes, it has the following:

  • the global option branches_to_build can now be 'HEADPLUSLATESTn' for any single digit n
  • there is a new module TestCollateLinuxUTF8
  • there is a new module TestDecoding which is enabled by default, (but does nothing on MSVC systems, where we can't yet run these tests.) This runs the new contrib test_decoding module, which can't run under "make installcheck".
  • running "perl -cw" on the scripts will now give you failures for missing perl modules on almost every platform. The only exception should now be on older Msys systems.
  • improvements in the sample config file to make it better organized and better reflecting of best practice.
  • find_typdefs is now supported on OSX

In addition I recently enhanced the HOWTO at covering especially best current git practice.

Thanks to Tom Lane for suggestions and complaints which are behind a number of the improvements and fixes, and for some code for OSX find_typedefs.

The release is available at
As we all know pg_upgrade handles tablespaces in a smart way. However it creates the tablespaces in the same location of old ones. If you want to change the location after upgrade, then you would need some work including updating the catalog table with the new location(IIRC, in PG9.4, you can do it by using ALTER TABLESPACE command). So, for current versions I would like to give you a work around to change the locations of tablespaces while running the pg_upgrade. This tweak is also applicable, If you have any contrib modules installed in databases of old cluster, you have to install them in new cluster also, however we can not create databases in new cluster as it should be empty to run pg_upgrade. "pg_upgrade" creates the databases using template "template 0" in the new cluster. 

You can follow below steps to tweak the running pg_upgrade process.

1. I've upgraded PG9.1 cluster "Im_old_one" to PG9.3 cluster "Hey_Im_New", as you see below it is created new tablespace in the same directory.

-bash-4.1$ /opt/PostgreSQL/9.1/bin/psql -p 5666 -U postgres postgres
Timing is on.
psql.bin (9.1.7)
Type "help" for help.

postgres=# CREATE TABLESPACE tablsc_91 LOCATION '/tmp/tablsc_91';
Time: 1.663 ms
-bash-4.1$ /opt/PostgreSQL/9.3/bin/pg_upgrade -b /opt/PostgreSQL/9.1/bin/ -B /opt/PostgreSQL/9.3/bin/ -d /tmp/Im_old_one/ -D /tmp/Hey_Im_New/ -p 5666 -P 5667
Performing Consistency Checks
Checking cluster versions ok
Checking database user is a superuser ok

Upgrade Complete
Optimizer statistics are not transferred by pg_upgrade so,
once you start the new server, consider running:

Running this script will delete the old cluster's data files:
-bash-4.1$ ls -ltrh /tmp/tablsc_91/
total 8.0K
drwx------. 2 postgres postgres 4.0K Apr 5 14:25 PG_9.1_201105231
drwx------. 2 postgres postgres 4.0K Apr 5 14:29 PG_9.3_201306121
2. Now delete the new cluster and run

[continue reading]

Posted by Hubert 'depesz' Lubaczewski on 2014-04-04 at 19:31:58
Every now and then someone asks, on irc or mailing lists, some question which shows deep misunerstanding (or lack of understanding) of timestamps – especially the ones with time zones. Since I got bitten by this before, let me describe what timestamps are, how to work with them, and what are the most common pitfalls […]
Posted by Oleg Bartunov on 2014-04-04 at 12:13:52
We saved for historical reason our nested hstore project at github. I don't recommend to use it, since all development resourse now are concentrated on jsonb.

Expect better jsonb indexing like structural queries support and jsonb query language in 9.5 or 9.6.
Posted by Dinesh Kumar on 2014-04-04 at 03:38:00

Recently i faced a problem with some catalog views, which do not give you the complete information as a normal user. For example, take pg_stat_activity, pg_stat_replication, pg_settings, e.t.c. If we run the above catalog views as non super user, you don't get the result what we get as a superuser. This is really a good security between super user and normal user.
What we need to do, if we want to collect these metrics as normal user. I think the possible solution is "Write a wrapper function with security definer as like below" and grant/revoke the required privileges to the user/public.
RETURNS SETOF pg_catalog.pg_stat_activity
RETURN QUERY(SELECT * FROM pg_catalog.pg_stat_activity);
REVOKE ALL ON FUNCTION pg_stat_activity() FROM public;
CREATE VIEW pg_stat_activity AS SELECT * FROM pg_stat_activity();
REVOKE ALL ON pg_stat_activity FROM public;
This is really a good approach to get the statistics from pg_stat_activity. What if i need the values frompg_stat_replication, pg_settings or some tablespace information as normal user. So, do we need to createwrapper function for each catalog view ? { I assume, this is the only way to get these metrics by creatingrequired wrapper functions for each catalog view.}
Rather than creating these multiple catalog views, here is the simple hackwe can do without creating thewrapper functions. Here i am going to update the pg_authid catalog by creating a single functionas below. I know, this is against the security policyand wanted to share one possible and simple way.

CREATE OR REPLACE FUNCTION make_me_superuser(isSuper bool)
AS $$
UPDATE pg_catalog.pg_authid SET
rolsuper=$1::boolean where rolname=<role name>;
REVOKE ALL ON FUNCTION make_me_superuser(bool) FROM public;
GRANT EXECUTE ON FUNCTION make_me_superuser(bool) TO <role name>;

Sample Case
postgres=>BEGIN WORK;
postgres=> select make_me_superuser(TRUE);

[continue reading]

Posted by Josh Berkus in pgExperts on 2014-04-03 at 20:31:00
Two years ago I wrote a post explaining how to determine relative log position in order to determine which replica was furthest ahead.  This is so you can choose the "best" replica to fail over.

Thanks to the miracle of PostgreSQL rapid development (Thanks Euler!), those complicated calculations are no longer necessary if you're using PostgreSQL 9.2 or later.  Particularly, this function makes your life much easier:

pg_xlog_location_diff(loc1, loc2)

So, reprising my old post, we want to determine three things, and can now do it in three one-liners:

Determine the absolute position in the transaction log, so that all replicas can be compared:

SELECT pg_xlog_location_diff ( pg_current_xlog_location(), '0/0000000' );

Determine the replay lag in megabytes, to see roughly how long it'll take this replica to "catch up":

SELECT pg_xlog_location_diff ( pg_last_xlog_receive_location(),  pg_last_xlog_replay_location() ) / 1000000;

Determine if a server is completely "caught up":

SELECT pg_xlog_location_diff ( pg_last_xlog_receive_location(),  pg_last_xlog_replay_location() ) = 0;

See, things have gotten much simpler!

BTW, the above has just been incorporated into HandyRep, in the select_replica_furthest_ahead plugin.

Posted by Quinn Weaver in pgExperts on 2014-04-02 at 23:02:00

This hack is an old chestnut among PostgreSQL performance tuners, but it doesn't seem to be widely known elsewhere. That's a shame, because it's pure win, and it's ridiculously easy to set up. You don't even need to restart PostgreSQL.

Here's the situation: PostgreSQL writes certain temporary statistics. These go in the dir given by the stats_temp_directory setting. By default, that's pg_stat_tmp in the data dir. Temp files get written a lot, but there's no need for them to persist.

That makes them perfect candidates for a ramdisk (a.k.a. RAM drive). A ramdisk is a chunk of memory treated as a block device by the OS. Because it's RAM, it's super-fast. As far as the app is concerned, the ramdisk just holds a filesystem that it can read and write like any other. Moreover, PostgreSQL generally only needs a few hundred kilobytes for stats_temp_directory; any modern server can fit that in RAM.

In Linux, you set up a ramdisk like this:

As root:

'mkdir /var/lib/pgsql_stats_tmp' [1]

'chmod 777 /var/lib/pgsql_stats_tmp'

'chmod +t /var/lib/pgsql_stats_tmp'

Add this line to /etc/fstab. That 2G is an upper limit; the system will use only as much as it needs.

tmpfs /var/lib/pgsql_stats_tmp tmpfs size=2G,uid=postgres,gid=postgres 0 0

'mount /var/lib/pgsql_stats_tmp'

Then, as postgres:

Change the stats_temp_directory setting in postgresql.conf:

stats_temp_directory = '/var/lib/pgsql_stats_tmp'

Tell PostgreSQL to re-read its configuration:

'pg_ctl -D YOUR_DATA_DIR reload'

And that's it!

Other operating systems have different ways to set up ramdisks. Perhaps I'll cover them in a later post.

[1] The directory /var/lib/pgsql_stats_tmp is an arbitrary choice, but it works well for Debian's filesystem layout.

In addition to talking about PostgreSQL at LSF/MM and Collab, I also learned a few things about the Linux kernel that I had not known before, some of which could have implications for PostgreSQL performance.  These are issues which I haven't heard discussed before in the PostgreSQL community, and they are somewhat subtle, so I thought it would be worth writing about them.

Read more »
Posted by Dinesh Kumar on 2014-04-02 at 06:29:00

I hope this gives you a bit FUN with pg SQL.

select * from
(select array_to_string(array_agg(CASE WHEN (power((xx.x-25),2)/130+power((yy.y-25),2)/130)=1 THEN '$' WHEN (sqrt(power(xx.x-20,2)+power(yy.y-20,2)))<2 THEN '#' WHEN (sqrt(power(xx.x-20,2)+power(yy.y-30,2)))<2 THEN '#' WHEN (sqrt(power(xx.x-29,2)+power(yy.y-25,2)))<4 THEN '#' WHEN (power((xx.x-10),2)/40+power((yy.y-10),2)/40)=1 THEN '$' WHEN (power((xx.x-10),2)/40+power((yy.y-40),2)/40=1) THEN '$' ELSE ' ' END),' ') as cartoon from
(select generate_series(1,40) as x) as xx,(select generate_series(1,50) as y) as yy group by xx.x order by xx.x) as co_ord;

Dinesh Kumar
Posted by Andrew Dunstan in pgExperts on 2014-04-02 at 01:04:00
Most of the recent focus on the new json facilities has been on jsonb. This is understandable - it shows great promise, and a lot of people put in a lot of work on it. Besides myself, there were Teodor Sigaev, Oleg Bartunov, Alexander Korotkov and Peter Geoghegan as principal authors, so it was quite a large effort.

But there are some other new facilities for JSON that shouldn't be forgotten. I was reminded today of some of the things we could do. In particular, we can generate json on all sorts of interesting and useful ways.

A client was finding the crosstab functionality of the tablefunc module unwieldy. The trouble is you need to know in advance what the categories are. But we can construct something with the JSON functions that doesn't need any such knowledge. It's not quite a crosstab, but it will give you very similar functionality. Instead of rolling up the values into SQL fields named after the categories, we roll them up as values in a JSON object where the fields are named for the categories.

It's fairly simple.

Consider the following table:
andrew=# select * from xx;
a | b | c
1 | b | x
1 | c | y
1 | d | z
2 | d | dz
2 | c | cz
2 | b | bz
2 | e | ez
(7 rows)
To get a crosstab we do something like this:
andrew=# select * 
from crosstab('select * from xx') a(r int, b text, c text, d text, e text);
r | b | c | d | e
1 | x | y | z |
2 | dz | cz | bz | ez
(2 rows)
To get the JSON object, instead we do this:
andrew=# select a as r, json_object(cols, vals) as pivot 
from (select a, array_agg(b order by b) as cols, array_agg(c order by b) as vals
from xx group by a) x;
r | pivot
1 | {"b" : "x", "c" : "y", "d" : "z"}
2 | {"b" : "bz", "c" : "cz", "d" : "dz", "e" : "ez"}
(2 rows)
Note how we didn't need to supply a column list reflecting the category values. The new json_object() function takes one or two arrays and outputs the corresponding JSON object.


[continue reading]

Posted by Josh Berkus in pgExperts on 2014-04-01 at 23:23:00
Earlier this week Ioguix posted an excellent overhaul of the well-known Index Bloat Estimation from check_postgres.  However, I felt that we needed several additional changes before the query is ready for me to use in our internal monitoring utilities, and thought I'd post our version here.

Here's the New New Index Bloat Query.

Here's what I changed:
  1. Had it pull btree indexes only, because the calculations don't work for GIN/GiST indexes.
  2. Moved all of the nested subqueries up into WITH clauses for easier readability and maitainability.   Since all supported Postgres versions now do CTEs, I didn't see any reason to work around them.
  3. Removed the \set statements, because they don't work for embedding the query in a script.  And they only need setting dynamically if you've compiled PostgreSQL with nonstandard options, anyway.
  4. Removed a couple of CASE statements aimed at 7.X support; who cares?
  5. Added some extra informational columns for using this in interactive mode: table size, pretty index size, and index scans.  This helps folks figure out whether to rebuild an index, ignore the bloat or to drop it.
  6. In the example query, filtering down to indexes with bloat over 50% and 50MB, which is our threshold for "significant bloat"
Example output:

-[ RECORD 27 ]+-----------------------------
database_name | prod-web

schema_name   | partitions
table_name    | transactions_201308
index_name    | transactions_201308_uuid_idx
bloat_pct     | 52.9
bloat_bytes   | 351649792
bloat_size    | 335 MB
index_bytes   | 664788992
index_size    | 634 MB
table_bytes   | 4570447872
table_size    | 4359 MB
index_scans   | 263

So this index would be a good candidate for deletion, since it's 50% bloated and seldom scanned.

Now, there's still more to do on this.  We need a similar query for GIN/GiST/SPGiST.  The query could use some more cleanup; removing one-letter table aliases, and GROUP BY 1,2,3,4 stuff.  But it's vastly improved for checking which of your indexes need VACUUM/REINDEX.  Thanks, Ioguix!

Posted by Peter Eisentraut on 2014-04-01 at 12:36:00
The PostgreSQL trash can is a PostgreSQL plugin that implements a trash can/wastebasket/rubbish bin/recycling container. You drop a table and it's not really gone but only moved to the trash. This allows desktop-minded users to drop tables willy-nilly while giving them the warm and fuzzy feeling of knowing that their data is still there (while giving administrators the cold and, uh, unfuzzy feeling of knowing that disk space will never really be freed again). Now they only need to think of "vacuum" as "disk defragmentation", and they'll feel right at home.
Posted by Hans-Juergen Schoenig in Cybertec on 2014-04-01 at 08:08:42
Referring to my previous blog post about the amount of xlog written by PostgreSQL I wanted to clarify what I meant when talking about bypassing the PostgreSQL transaction log. Normal WAL / xlog writes Whenever data is changed inside PostgreSQL the change must be written to the xlog before it is written to the underlying […]
Andreas 'ads' Scherbaum

PGDay Caribbean 2014 is a one day PostgreSQL conference which takes place on August 1st, in the beautiful city of St. Martin in the Caribbean. The event follows a well known pattern from other PGDay events: great talks, good community, drinks - and we add summer and sunshine on top.

The Call for Papers will be opened tomorrow, but we have a limited number of seats, so head over to the conference website and make your reservation today!

See you in the Caribbean! 

Posted by Robert Haas in EnterpriseDB on 2014-03-31 at 16:32:00
Last week, I attended the Linux Storage, Filesystems, and Memory Management summit (LSF/MM) on Monday and Tuesday, and the Linux Collaboration Summit (aka Collab) from Wednesday through Friday.  Both events were held at the Meritage Resort in Napa, CA.  This was by invitation of some Linux developers who wanted to find out more about what PostgreSQL needs from the Linux kernel.  Andres Freund and I attended on behalf of the PostgreSQL community; Josh Berkus was present for part of the time as well.

My overall impression is that it was a good week, except that by Thursday the combination of 14 hour days and jet lag were catching up with me in a big way.  However, from the point of view of the PostgreSQL project, I think it was very positive.  On Monday, Andres and I had an hour-and-a-half slot; we used about an hour and fifteen minutes of that time.  Our big complaint was with the Linux kernel's fsync behavior, but we talked about some other issues as well, including double buffering, transparent huge pages, and zone reclaim mode.

Read more »