PostgreSQL
The world's most advanced open source database
Top posters
Number of posts in the past month
Top teams
Number of posts in the past month
Feeds
Twitter
Planet
  • Policy for being listed on Planet PostgreSQL.
  • Add your blog to Planet PostgreSQL.
  • List of all subscribed blogs.
  • Manage your registration.
Contact
  • Get in touch with the Planet PostgreSQL administrators at planet at postgresql.org.
Posted by Andrew Dunstan in pgExperts on 2014-12-19 at 00:09:00
PostgreSQL 9.4 is released today, and once again my main input has been in the JSON area. We've introduced a new spiffy JSON type called "jsonb" that makes performing operations on JSON much, much faster, because it's not stored as plain text, but rather decomposed into the structures and scalar values that go to make up JSON. Much of the credit for this goes to Oleg Bartunov and Teodor Sigaev, the original authors of the hstore module, who created a "nested hstore" module that we eventually based jsonb on. Credit also goes to Alexander Korotkov and Peter Geoghegan, especially for their indexing work, which will make many operations super fast. My role was to make sure that all (or almost all) of the operations that are available for the original json type are also available for the jsonb type.

There have been some benchmark tests that suggest that our jsonb is faster in many respects that other database engines that use json as their native storage.

The nice thing about this is that you can now have a highly performing mixture of traditional relational data and semi-structured treeish data with proper transactional semantics, all in one engine. That's a very attractive proposition for a lot of people looking to design new data stores.

My other JSON contribution was to add a number of functions that allow the construction of arbitrarily complex json documents, with data from the database. This will be particularly useful for clients that expect json in a particular, non-regular shape, and is based on the needs of some actual json-using customers.

Already committed for 9.5, to be released some time next year, are jsonb versions of those json functions we didn't provide jsonb versions of in 9.4, and functions to allow the stripping of object fields with 'null' as the value from json and jsonb values (this can result in much more compact storage in some cases, with no significant semantic loss).

I made a few other contributions to 9.4, but they are not worth mentioning in detail.

I'm now looking for new Postgr

[continue reading]

Posted by Josh Berkus in pgExperts on 2014-12-19 at 00:05:00
For the third night of Hanukkah, we have a nice present for you: PostgreSQL 9.4.0.  Now eat your latkes.



Let's talk about 9.4 by talking about one of my favorite features: JSONB.  A lot of people worked to bring you JSONB in this release, including my colleague Andrew Dunstan, Oleg Bartunov and Teodor Sigaev (sponsored by Engine Yard), Peter Geohegan of Heroku.com, and several others.

For example, imagine we have a large set of JSON documents which contain publication data about various books.  One reason we might have this is that we receive the data in JSON or HTML format from various publishers, and it is not normalized, so we keep it in its original format for data mining..  We can create store them in a JSON column like so:

    table booksdata (
        title citext not null,
        isbn isbn not null primary key,
        pubinfo jsonb not null
    )

Some example pubinfo might be:

    {"authority" : [ "James Michaels", "Nina Totenberg" ], "cost": 16, "edition" : 3,
     "format": "Trade PB", "publisher": "HarperOne",
     "published_on": "1995-08-05", "revised" : [ "1996-06-01", "1999-01-01" ] }

You'll note that the keys are already sorted in alphabetical order.  This is a feature of JSONB; whitespace gets cleaned up and keys get storted on import.

Then we can create a general index on the JSONB like so:

    CREATE INDEX ON booksdata USING GIN (pubinfo);

or:

    CREATE INDEX ON booksdata USING GIN (pubinfo json_path_ops);

There are two different versions depending on the operations you expect to run.  The standard GIN index supports every kind of operation we support for JSONB.  The path_ops index supports only the search path operator "@>" (see below), but produces a smaller and faster index for these kinds of searches.  However, we're going to do a bunch of different JSON operations for data mining, so we'll use the regular GIN index.

Once we have the index, we can do arbitrary path searches on JSONB, and these will be index searches, depending on what the planner picks.  Examples of JSONB qu

[continue reading]

Posted by gabrielle roth on 2014-12-16 at 01:04:00
I had reason recently to suspect that autovacuum jobs weren’t being properly logged on my RDS instance. Instead of compulsively re-running “SELECT relname, last_autovacuum FROM pg_stat_user_tables” while I waited for one to happen, I set up this quick test: connect to database and do the following: -- verify settings -- log_autovacuum_min_duration should be 0 ("log […]
Posted by Robins Tharakan on 2014-12-14 at 21:50:00
Recently, someone asked for some inputs before going ahead with a PostgreSQL Certification. From what I know, outside Japan, EnterpriseDB is the only Consultancy offer Certification services consistently over the past few years. Having recently cleared one from EnterpriseDB, I had a few things to say therein, and thought I'd rather enlist them as a full-blown post here. An important
Posted by Pavel Stehule on 2014-12-14 at 17:52:00
mscottie found a magic option -undefined dynamic_lookup. With this option we are able to compile plpgsql_check on OS X platform.
Every new release of PostgreSQL comes with a packed of exciting features. To benefit new features, database server should be upgraded. Choosing traditional upgrade paths like pg_dump/pg_restore or pg_upgrade requires a significant downtime of application. Today, if you are looking for minimum downtime upgrade path among major PostgreSQL versions with perfect rollback plan, then it will be accomplished by asynchronous Slony-I replication. Since Slony-I (know more about it here) has the capability to replicate between different PostgreSQL versions,OS and bit architectures easily, so upgrades are doable without requiring a substantial downtime. In addition, it has a consistent switchover and switchback functionality in its design.

IMO, while doing major version upgrades there should be a proper fallback plan because just in case application turn out to buggy or fail to perform well on upgraded version, then we should be able to rollback to older version immediately. Slony-I provides such functionality in the way of switchback. This post demonstrates, minimum downtime upgradation including switchover/switchback steps.

Before going to demo, one important step to be noted, earlier to PG 9.0.x version bytea datatype columns use to store data in ESCAPE format and later version its in HEX format. While performing switchback (newer version to older version), this kind of bytea format differences are not support by Slony-I hence ESCAPE format should be maintained through out upgrade duration, else you may encounter with an error:
ERROR  remoteWorkerThread_1_1: error at end of COPY IN: ERROR:  invalid input syntax for type bytea
CONTEXT: COPY sl_log_1, line 1: "1 991380 1 100001 public foo I 0 {id,500003,name,"A ",b,"\\\\x41"}"
ERROR remoteWorkerThread_1: SYNC aborted
To fix, no changes required on PG 8.4.x but on PG 9.3.5 bytea_output parameter should be set from HEX to ESCAPE as shown. We can set it at cluster-level ($PGDATA/postgresql.conf) or user-level (ALTER TABLE...SET), I have pre

[continue reading]

Posted by Ian Barwick on 2014-12-11 at 21:48:00

Japan has been an early and vigorous adopter of PostgreSQL (back in 2006, when PostgreSQL was still...

permalink

Posted by Josh Berkus in pgExperts on 2014-12-11 at 07:40:00
Of course, the first thing which occurred to me with pg_shard was to create a distributed "NoSQL" database using 9.4's JSONB.  I had to wait a bit because there were some issues with 9.4 compatibility which needed resolving, but this week I was able to build it.  I was hoping to show it off at the SFPUG meeting, but sadly the weather gods (in the form of the PostgreSQL-powered NWS) interfered.

So we'll go over it here because it makes a good introduction to pg_shard and to JSONB.

First step was to spin up an AWS m3.xlarge instance to be our master.  While I don't need much in other capabilities, I do want the master node to have AWS's best network option since it'll be doing a LOT of networking.  Do, however, make sure to configure it with instance storage because you'll want that space for temp tables and any tables which aren't sharded.  This also becomes our template for the shards, because it's easier that way.  I created the instance running Ubuntu 14.04.  Then I had to install all of the PostgreSQL 9.4 components and a few others, including:

  • postgresql-9.4, postgresql-contrib-9.4, postgresql-server-dev-9.4
  • python-psycopg2 (for my benchmark test)
  • git, make

After that I "git cloned" the pg_shard repo and did a make and make install in it.  Currently it emits some spurious warnings with 9.4; ignore those.

Next was configuring PostgreSQL for pg_shard.  This is the master node, so it's where we're going to use pg_shard.  In addition to the usual PostgreSQL configuration, you want to change two settings:

shared_preload_libraries = 'pg_shard' #load pg_shard
temp_buffers = 64MB #pg_shard uses temp tables internally


Then start (or restart) PostgreSQL.  After that, we create a "psocial" database to represent our dummy social networking application, and set up a .pgpass file and an authorized_hosts file for passwordless access by the postgres user.  Then we stop, and shut PostgreSQL down. and use the AWS API to create eight r3.large instances based on that master instance to be our shards.  This way they alr

[continue reading]

Posted by Ian Barwick in 2ndQuadrant on 2014-12-11 at 04:51:04

jpug-conference-2014-sponsors-smallJapan has been an early and vigorous adopter of PostgreSQL (back in 2006, when PostgreSQL was still emerging from obscurity in the western hemisphere, I noted that in Tokyo bookstores, PostgreSQL books outweighed MySQL ones by about 5:3), and it’s no surprise that by nationality, only the USA and Germany have more committers. The Japan PostgreSQL User Group (JPUG) has been around a while too and is one of the oldest and most active worldwide (and has been around long enough to have established a Japanese logo for PostgreSQL which is often used in place of Slonik [*]) . This year JPUG celebrates its 15th anniversary, and the 2014 conference – held in Tokyo on December 5th – was the largest yet with over 20 speakers, 5 (five!) parallel tracks and around 280 attendees.

Continuing a tradition started in 2012, the keynote speaker has been a foreign guest, and this year’s speaker was 2ndQuadrant’s very own Simon Riggs, who gave a well-received talk about the open source development process and PostgreSQL. (The interpreter did a fantastic job despite the occasional idomatic diversion from the script).

A key theme of this year’s conference was international exchange in general, and Asia in particular. As you can see from the schedule below, apart from the keynote speech there was an international track (rightmost column) with talks in English by speakers from a variety of Asian countries, and it’s hoped this will develop into a PgCon Asia.

jpug-conference-2014-sessions-smallAnother theme to emerge was the need for increased communication between the Japanese PostgreSQL community and the rest of the planet – Japan is home to many interesting developments which often go unnoticed in the wider world – and 2ndQuadrant hopes to play a part in improving their visibility.

Finally, no modern conference is complete without lightning talks, which this year took place during the post-conference party. By an ancient Japanese tradition dating back to the earlier part of this year, each speaker (including yours truly, who burbled on about the ideas contained in the p

[continue reading]

Posted by Dan Robinson on 2014-12-10 at 17:45:57

Did you know PostgreSQL supports indexing a subset of your table? This enables very fast reads from that subset with almost no index overhead. It’s often the best way to index your data if you want to repeatedly analyze rows that match a given WHERE clause. This makes PostgreSQL a great fit for workflows that involve pre-aggregation with additional ad hoc analysis. In this post, I’ll walk through an example query optimization in which partial indexes are well suited.

Consider a table of events with the following schema:

Each event is associated with a user and has an ID, a timestamp, and a JSON representation of the event. The JSON includes the page path, the event type (e.g. click, page view, form submission), and any other properties that describe the event.

We can use this table to store many different kinds of events, and let’s assume we have some automatic event tracking in place that logs every click, page view, and form submission so that we can analyze it later. We might want to have an internal dashboard that displays some high-value metrics, such as the number of weekly signups or the amount of revenue we are collecting per day. The events that are relevant to this dashboard make up a small fraction of this table — a very small percentage of clicks on your website are to finalize purchases! But they are mixed in with the rest of the table, so our signal-to-noise ratio is low.

We would like to index our data to make these dashboard queries fast.[1] Let’s start with a signup event, which we define to be a form submission on our /signup/ page. Getting the number of such signups in the first week of September translates to:

On a synthetic dataset with 10 million events, of which 3000 are signups, and without any indexes, this query takes 45 seconds.

Full Indexes On Single Columns: A Mixed Bag

A naive way to improve this performance is by creating single-column indexes for each of the relevant event features: (data->>'type'), (data->>'path'), and time. We can use a bitmap join between results from three i

[continue reading]

On 8th of December, Simon Riggs committed patch: REINDEX SCHEMA   Add new SCHEMA option to REINDEX and reindexdb.   Sawada Masahiko   Reviewed by Michael Paquier and Fabrízio de Royes Mello This is very simple thing, so the description will not be long. But – I'm mostly a DBA, so all things that give […]
Posted by Josh Berkus in pgExperts on 2014-12-09 at 06:09:00
One of the limitations with both RDS and Heroku is that you can only install the extensions they've approved and made available to you.  Except ... you can install "SQL-only" extensions which don't have a binary component.  Like pg_partman.

pg_partman is a set of functions and tables which help you manage your partitioned tables, including creating new partitions and triggers on the partition master.  I love it because it means I never have to write ad-hoc code for a client to do that again.  So of course I want to install it on our clients who are running on Heroku and RDS and need partitioning.  However, I have no access to "create extension partman" on the RDS command line, so how do I make it work?  Mostly by loading the extension the old way.

1) create a new schema on the target database called "partman".

2) Install PostgreSQL 9.3 on my laptop, including the development files.

3) git clone pg_partman.

4) make & make install pg_partman on my laptop.  Inside the pg_partman source directory, that will create a file in the "sql" directory called (currently): pg_partman--1.7.2.sql.

5) Since that file is designed to be installed as an extension, I need to alter it.  First, I replace all occurances of "@extschema@" with "partman".  Second, I delete the line "SELECT pg_catalog.pg_extension_config_dump('part_config', '');"

6) Load partman into your database by using "psql -f  pg_partman--1.7.2.sql YOUR_DB_NAME -h YOUR_HOST".  In the RDS case, this needs to be done as the instance superuser.

pg_partman can now be used in the target database by calling all of the functions as "partman.function", or by putting the partman schema in your search_path.

Now, this has the major drawback that upgrading pg_partman once 1.7.3 comes out will be a very manual process.  But that's better than not using it at all.



Posted by gabrielle roth on 2014-12-09 at 04:27:00
For the last PDXPUG lab of the year, we tried out BiDirectional Replicatoin (BDR). Five of us just set up VMs on our laptops and followed the instructions on the wiki. We only had about 90 minutes time for this lab, so the goal was to get a basic configuration up & running, understand the […]

PostgreSQL 9.5 has added a new mode for REINDEX with this commit:

commit: fe263d115a7dd16095b8b8f1e943aff2bb4574d2
author: Simon Riggs <simon@2ndQuadrant.com>
date: Tue, 9 Dec 2014 00:28:00 +0900
REINDEX SCHEMA

Add new SCHEMA option to REINDEX and reindexdb.

Sawada Masahiko

Reviewed by Michael Paquier and Fabrízio de Royes Mello

Up to 9.4, REINDEX is able to run on different things:

  • INDEX, to reindex a given index.
  • TABLE, to reindex entirely a table, including its underlying toast index if there is a toast table on it.
  • DATABASE, to reindex all the relations of the database user is currently connected to, including catalog tables.
  • SYSTEM, to reindex all the system catalogs.

A couple of things to note though:

  • Indexes reindexed are locked with a strong exclusive lock, preventing any other session to touch it.
  • Parent tables are locked with a share lock
  • DATABASE and SYSTEM cannot run in a transaction, and process tables one-by-one, pg_class being run first as REINDEX updates it (this last point is an implementation detail, not mentioned in the docs).

The new mode for SCHEMA mixes those things, but behaves similarly to DATABASE and SYSTEM, for a schema:

  • It cannot run in a transaction.
  • Each table of the schema is processed one-by-one
  • pg_class is reindexed first only if pg_catalog is processed.

That's actually what you can find here, note first that pg_class is at the top of the relations indexed.

=# REINDEX SCHEMA pg_catalog;
NOTICE:  00000: table "pg_catalog.pg_class" was reindexed
LOCATION:  ReindexObject, indexcmds.c:1942
[...]
NOTICE:  00000: table "pg_catalog.pg_depend" was reindexed
LOCATION:  ReindexObject, indexcmds.c:1942
REINDEX

And that this operation is non-transactional:

=# BEGIN;
BEGIN
=# REINDEX SCHEMA pg_catalog;
ERROR:  25001: REINDEX SCHEMA cannot run inside a transaction block
LOCATION:  PreventTransactionChain, xact.c:2976

A last thing to note is that a user that has no access on a schema will logically not be able to run REINDEX on it.

=# CREATE USER foo;
CREATE ROLE
=# SET SE

[continue reading]

On 7th of December, Simon Riggs committed patch: Event Trigger for table_rewrite   Generate a table_rewrite event when ALTER TABLE attempts to rewrite a table. Provide helper functions to identify table and reason.   Intended use case is to help assess or to react to schema changes that might hold exclusive locks for long periods. […]
Posted by Craig Ringer in 2ndQuadrant on 2014-12-08 at 03:27:44

For a couple of years now a team at 2ndQuadrant led by Andres Freund have been working on adding bi-directional asynchronous multi-master replication support for PostgreSQL. This effort has become known as the BDR project.

We’re really excited to see these efforts leading to new PostgreSQL features and have a great deal more still to come.

Incremental Development

As a large development project it is neither practical nor desirable to deliver all the changes to PostgreSQL as a single huge patch. That way lies madness and unmaintainable code. It would also be resoundingly rejected by the PostgreSQL community, as it should be.

Instead, BDR has been developed as a series of discrete changes to core PostgreSQL, plus an extension that uses those core changes to implement multi-master asynchronous logical replication.

To make this separation clearer we’ve split BDR into two separate trees since the last official release. We now manage and develop the set of patches to PostgreSQL separately from the BDR extension its self.

BDR is destined for community PostgreSQL

2ndQuadrant’s goal is to get all of BDR – the changes to the core server as well as the extension or its functionality – included in a future open source community PostgreSQL release.

When that’s achieved you’ll be able to use all the features currently provided by the BDR patches and extension with no 3rd party (open source or otherwise) patches, extensions and tools.

All of BDR is open source, released under the PostgreSQL license, and is developed on PostgreSQL’s public infrastructure. The git repository containing the authoritative current version of BDR is on git.postgresql.org.

Getting BDR into users’ hands

With a project as big as BDR and with the PostgreSQL project’s safety-first, conservative and careful development style it’ll be a long time before all of BDR is included in community PostgreSQL. As a result we’ve had to make some compromises in order to get BDR functionality out to users prior to the expected end of the 9.5/9.6 release cycles in a cou

[continue reading]

Posted by Dave Page in EnterpriseDB on 2014-12-07 at 12:30:00
I’m often asked how I first became involved in PostgreSQL, and how the pgAdmin project got started. Much as I’m happy to tell the story over beer, it becomes tedious to do so over email after the first half-dozen or so times. So in a vain attempt to save my fingers from future carpal tunnel syndrome, here’s the tale…

I first discovered PostgreSQL in 1997. I was working at a social housing provider in the UK where, at the time, we were running Informix SE and Pick databases. We wanted to start building in-house web applications for managing additional areas of the business. There were no useful interfaces for Pick that could be used with any of the web technologies available to us at the time and the licensing model for Informix and its ODBC driver was prohibitively expensive, so I started looking for alternatives. 

After spending some time researching mSQL and MySQL, I eventually found PostgreSQL, which seemed to offer everything we needed—a SQL-based DBMS with ODBC and C interfaces, a liberal license, a rich set of features, and a vibrant community supporting it that was responsive to bug reports and open to feature requests. My colleagues and I developed a number of applications using PostgreSQL in the following years and the database became a critical part of our business. We had applications written in PHP for the web as well as Visual Basic for the desktop users. 

However, it was early on in our journey with Postgres that, as a Windows shop (on the desktop at least), we found ourselves wishing for a good graphical management tool for the database. At the time, the only option was pgAccess, which was a basic tool written in TCL/TK that was not easy to get to work on Windows. So I decided to write my own in Visual Basic. The first version was never released to the public and was modeled on the pgAccess design. Called pgManager, it was quickly abandoned as we found the UI to be restrictive and, well, not overly usable. The second incarnation was pgAdmin, which was redesigned from the ground up. It was relea

[continue reading]

Posted by Hubert 'depesz' Lubaczewski on 2014-12-06 at 18:47:33
The general knowledge is that numerics are slower than integers/float, but offer precision and ranges that are better. While I understand what is slow, I don't really know how much slower numerics are. So let's test it. There are couple of things to consider: table size (related to disk usage per column) speed of inserts […]

Oracle FDW is a foreign data wrapper PostgreSQL extension that allows you to read Oracle database tables from a PostgreSQL database. You can get it via the PGXN network or the main website http://laurenz.github.io/oracle_fdw/.

What is new about the latest 1.1.0 release is that there is now support for the Oracle SDO_GEOMETRY type that allows you to map the most common geometry types POINT, LINE, POLYGON, MULTIPOINT, MULTILINE and MULTIPOLYGON to PostGIS geometry type. Much of the spatial plumbing work was done by Vincent Mora of Oslandia. If we have any Windows Oracle users out there, yes there are binaries available for windows for PostgreSQL 9.1- 9.4 for both 32-bit and 64-bit. The FDW does have a dependency on the OCI.dll which I think comes shipped with Oracle products. Unfortunately, we are not Oracle database users so can't kick the tires on this one.

Posted by Josh Berkus on 2014-12-04 at 18:59:38

First, SFPUG December meeting has been scheduled. It's about pg_shard, which has been covered elsewhere.

Second, a reminder that talk submissions for pgDay SF 2015 are due Monday for regular length talks. Please submit your proposals ASAP!

Posted by Josh Berkus in pgExperts on 2014-12-04 at 18:50:00
I've been working with CitusData for a while, and I'm excited that they've finally released pg_shard, a generic "sharding" extension for PostgreSQL databases.  The press releases have all of the noise about this, so this blog post is for the nuts-and-bolts of pg_shard.

 

What does pg_shard do?


pg_shard takes one or more tables in your database and distributes them over multiple databases on multiple servers, or "shards".  Rows are distributed based on a hash function on the primary key which you define.  The hash function used is Postgres' own hash function for hash indexes.

A single node is the "master node" which contains templates for the distributed tables, as and transparently handles making sure that inserts, updates, and selects go to the right shards.  To ensure redundancy, sharded tables can be defined as having multiple copies across multiple shards.

SELECTs are distributed based on comparing constants in your WHERE clause and other filter conditions against the hashed key, allowing you to query against only the shards which have compliant data.  This should speed up queries for really large tables a great deal.

Where there are multiple copies of shards, pg_shard notices timeouts in contacting the shards and marks certain shards as offline. 

All of this is done inside PostgreSQL, most of it using our defined APIs and hooks.  The only thing the Citus team had to fork was ruleutils.c, and they're hoping to change that in 9.5 so forking that isn't required either.

 

What does pg_shard not do?


First, since the rows are distributed using a hash function, only equality comparisons are currently supported.  This means that if you query your table on ranges (such as BETWEEN two timestamps) it will scan all shards for that data.  Adding range partitioning to pg_shard is planned.

Second, unlike Postgres-XC/Postgres-XL, transactions between shards and distributed JOINs are not supported.  Those features are among the things which distinguish CitusData's proprietary product.

Third, currently only one master

[continue reading]

PostgreSQL offers many ways to define how a node reaches the end of recovery with many parameters related to recovery target, like a timestamp, a transaction ID, a target name and recovery timeline, using recovery_target_timeline. A parameter called pause_at_recovery_target that exists since 9.1 allows as well to put a standby in pause when the recovery target is reached. The upcoming release 9.5 has made some improvements in this area with the following commit:

commit: aedccb1f6fef988af1d1a25b78151f3773954b4c
author: Simon Riggs <simon@2ndQuadrant.com>
date: Tue, 25 Nov 2014 20:13:30 +0000
action_at_recovery_target recovery config option

action_at_recovery_target = pause | promote | shutdown

Petr Jelinek

Reviewed by Muhammad Asif Naeem, Fujji Masao and
Simon Riggs

Its use is rather simple, when a standby has hot_standby enabled in postgresql.conf, meaning that it is able to execute read queries while being in recovery, it is possible to perform the set of actions defined above using action_at_recovery_target in recovery.conf:

  • pause, acting the same as when pause_at_recovery_target is enabled to have the standby pause any replay actions so as it is possible to check in which state it is once the recovery target has been reached. Note as well that the recovery replay can be resumed using pg_xlog_replay_resume().
  • promote, to perform automatically a promotion of the node and have it just to the next timeline, making it available for write queries as well. This is the same as when pause_at_recovery_target or action_at_recovery_target are not used, or when only pause_at_recovery_target is used and set to false.
  • shutdown, to simply shutdown the standby once target is reached. This is the real new addition that this feature brings in because this can be used to make a standby instance immediately ready for use after it has finished its recovery. Note that in this case the node will also need to re-apply all the WAL since last checkpoint, so there is some cost in this mode. Moreover, recovery.conf is not renamed

[continue reading]

Posted by Paul Ramsey on 2014-12-02 at 21:22:00

Most users of PostGIS are safely ensconsed in the world of Linux, and their build/deploy environments are pretty similar to the ones used by the developers, so any problems they might experience are quickly found and removed early in development.

Some users are on Windows, but they are our most numerous user base, so we at least test that platform preemptively before release and make sure it is as good as we can make it.

And then there's the rest. We've had a passel of FreeBSD bugs lately, and I've found myself doing Solaris builds for customers, and don't get me started on the poor buggers running AIX. One of the annoyances of trying to fix a problem for a "weird platform" user is just getting the platform setup and running in the first place.

So, having recently learned a bit about vagrant, and seeing that some of the "weird" platforms have boxes already, I thought I would whip off a couple vagrant configurations so it's easy in the future to throw up a Solaris or FreeBSD box, or even a quick Centos box for debugging purposes.

I've just been setting up my Solaris Vagrantfile and using my favourite Solaris crutch: the OpenCSW software repository. But as I use it, I'm not just adding the "things I need", I'm implicitly choosing an environment:

  • my libxml2 is from OpenCSV
  • so is my gcc, which is version 4, not version 3
  • so is my postgres

This is convenient for me, but what are the chances that it'll be the environment used by someone on Solaris having problems? They might be compiling against libraries from /usr/sfw/bin, or using the Solaris gcc-3 package, or any number of other variants. At the end of the day, when testing on such a Solaris environment, will I be testing against a real situation, or a fantasyland of my own making?

For platforms like Ubuntu (apt) or Red Hat (yum) or FreeBSD (port) where there is One True Way to get software, the difficulties are less, but even then there is no easy way to get a "standard environment", or to quickly replicate the combinations of versions a user might have run int

[continue reading]

Posted by Dan Robinson on 2014-12-02 at 19:15:31

PostgreSQL 9.3 has a new join type! Lateral joins arrived without a lot of fanfare, but they enable some powerful new queries that were previously only tractable with procedural code. In this post, I’ll walk through a conversion funnel analysis that wouldn’t be possible in PostgreSQL 9.2.

What is a LATERAL join?

The best description in the documentation comes at the bottom of the list of FROM clause options:

The LATERAL key word can precede a sub-SELECT FROM item. This allows the sub-SELECT to refer to columns of FROM items that appear before it in the FROM list. (Without LATERAL, each sub-SELECT is evaluated independently and so cannot cross-reference any other FROM item.)

When a FROM item contains LATERAL cross-references, evaluation proceeds as follows: for each row of the FROM item providing the cross-referenced column(s), or set of rows of multiple FROM items providing the columns, the LATERAL item is evaluated using that row or row set’s values of the columns. The resulting row(s) are joined as usual with the rows they were computed from. This is repeated for each row or set of rows from the column source table(s).

This is a bit dense. Loosely, it means that a LATERAL join is like a SQL foreach loop, in which PostgreSQL will iterate over each row in a result set and evaluate a subquery using that row as a parameter.

What can we do with this?

Consider a table of click events with the following schema:

Each event is associated with a user and has an ID, a timestamp, and a JSON blob with the event’s properties. At Heap, these properties might include the DOM hierarchy of a click, the window title, the session referrer, and so forth.

Let’s say we want to optimize our landing page to increase signups. The first step is to figure out where we’re losing users in our conversion funnel.

An example conversion funnel between four steps in a signup flow.
An example conversion funnel between four steps in a signup flow.

We’ll assume that we’ve instrumented our frontend to log events along this flow and that all of the data lives in the event table specified above.[1] As an initi

[continue reading]

Posted by Peter Eisentraut on 2014-12-02 at 01:00:00

In part 1 and part 2 I investigated how to use ccache with clang. That was more than three years ago.

I got an email the other day that ccache bug 8118, which I filed while writing part 1, was closed, as ccache 3.2 was released. The release notes of clang 3.2 contain several items related to clang. So it was time to give this another look.

Basically, the conclusions from part 2 still stand: You cannot use ccache with clang without using CCACHE_CPP2. And it is now becoming clear to me that this is an issue that is not going to go away, and it’s not really even Clang’s fault.

Warnings!

The problem is that clang’s -Wall can cause warnings when compiling the preprocessed version of otherwise harmless C code. This can be illustrated by this piece of C code:

int
foo()
{
        int *p, *q;

        p = q = 0;
        p = p;
        if (p == p)
                return 1;
        if ((p == q))
                return 2;
        return 0;
}

When compiled by gcc-4.9 -Wall, this gives no warnings. When compiled by clang-3.5 -Wall, this results in test.c:7:4: warning: explicitly assigning value of variable of type 'int *' to itself [-Wself-assign] test.c:8:8: warning: self-comparison always evaluates to true [-Wtautological-compare] test.c:10:9: warning: equality comparison with extraneous parentheses [-Wparentheses-equality] test.c:10:9: note: remove extraneous parentheses around the comparison to silence this warning test1.c:10:9: note: use '=' to turn this equality comparison into an assignment

You wouldn’t normally write code like this, but the C preprocessor could create code with self-assignments, self-comparisons, extra parentheses, and so on.

This example represents the issues I saw when trying to compile PostgreSQL 9.4 with ccache and clang; there might be others.

You can address this issue in two ways:

  1. Use CCACHE_CPP2, as discussed in part 2. With ccache 3.2, you can now also put this into a configuration file: run_second_cpp = true in ~/.ccache/ccache.conf

  2. Turn off the warnings mentioned above: -Wno-parenth

[continue reading]

Posted by Tomas Vondra on 2014-12-01 at 01:30:00

About two weeks ago I posted a performance comparison of PostgreSQL compiled using various compilers and compiler versions. The conclusion was that for pgbench-like workloads (lots of small transactions), the compilers make very little difference. Maybe one or two percent, if you're using a reasonably fresh compiler version. For analytical workloads (queries processing large amounts of data), the compilers make a bit more difference - gcc 4.9.2 giving the best results (about 15% faster than gcc 4.1.2, with the other compilers/versions somewhere between those two).

Those results were however measured with the default optimization level (which is -O2 for PostgreSQL), and one of the questions in the discussion below that article is what difference would the other optimizations level (like -O3, -march=native and -flto) do. So here we go!

I repeated the two tests described in the previous post - pgbench and TPC-DS, with almost exactly the same configuration. If you're interested in the details, read that post.

The only thing that really changed is that thile compiling the PostgreSQL sources, I modified the opmtimization level or enabled the additional options. In total, I decided to test these combinations:

  • clang -O3 -O4 (available since clang-3.4)
  • gcc -O3 -O3 -march=native (since gcc-4.2) ** -O3 -march=native -flto (since gcc-4.5)

When combined with all the clang and gcc versions, this amounts to 43 combinations. I haven't done the tests for the Intel C Compiler.

BTW if you're interested in more detailed results, see [this](https://docs.google.com/spreadsheets/d/1DMn3e1oaYeBAhcCgbAzH4JfkBCjc06qBEqOgh9n7nkE/edit] spreadsheet, or download this OpenDocument spreadheet (same data).

pgbench

For the small dataset (~150MB), the results of the read-only test are depicted on the following chart, showing the number of transactions per second (50k-58k range) - so the higher the number, the better.

The results are sorted by compiler (clang, gcc), optimization level and finally compiler version. The bars depict minimum, avera

[continue reading]

I'm currently in love with Ghost, a Node.js blogging platform.  It's simple and small, uses Markdown for blog editing, and has scads of themes available, most of which are responsive and typography-centric.  If there's a project which might be a Wordpress-killer -- especially for PostgreSQL users -- Ghost is probably it.  I've set up both my food and pottery blog and the new SFPUG page using Ghost.

Ghost comes configured by default to use SQLite, though, and of course I had to run it on Postgres.  Also, I needed to set up Apache vhosts to redirect to the various Ghost instances, since one Node instance only runs one Ghost instance, and needed supervisord for autostart.  At some point later, I'll repackage all this to use docker containers for each Ghost instance, but these instructions don't cover that.

First, install Node.js.  OS packages are fine for Ubuntu 14.04 or Fedora; for other OSes you may need to get funky with source installs.  I'm not going to cover that here.  You'll also need to install the Node PostgreSQL driver, and the postgresql-server-dev package.

Second, "git clone" the stable branch of the Ghost code into an appropriate directory.  This can be directly your web directory for Ghost, or you can download, build, and then copy.  If you do the latter, be aware that Ghost has several "dot" files it needs which a regular "cp -r" won't pick up.

Now, before you build Ghost, you're going to need to make some changes to support PostgreSQL.  Edit "package.json" in the main directory, and add PostgreSQL into the "optional dependencies" section like so:

    "optionalDependencies": {
        "mysql": "2.1.1",
        "pg" : "latest"
    },


Now, build Ghost by running the build commands from the home directory:

    npm install -g grunt-cli
    npm install
    grunt init
    grunt prod


Make sure you scroll through the output of the above commands and look for errors; they don't always fail on error.

As a note, Ghost, like Rails, has "dev", "stage", and "production" modes.  Each of these modes can use

[continue reading]

Posted by Petr Jelínek in 2ndQuadrant on 2014-11-28 at 17:35:20

In last couple of months I’ve been working on online upgrade for very large databases as part of the AXLE project and I would like to share my thoughts on the topic and what progress we have made recently.

Before joining 2ndQuadrant I used to work in Skype where the business would not allow a maintenance window for our databases. This meant no downtime was allowed for deployments, upgrades, etc. That kind of rule makes you change the way you do things. Most changes are small, you don’t do any heavy locks, you have replicas to allow for fast fail-over. But while you can make your releases small and non-blocking, what happens when you need to do a major version upgrade of the PostgreSQL database?

You might be in a different situation, as most companies do have an upgrade window, and so you might afford some downtime during the upgrade. This however brings two problems. For one, no company actually likes the downtimes even if they are allowed. And more importantly once your database grows beyond gigabytes in size into the range of terabytes or hundreds of terabytes, the downtime can take days or even weeks and nobody can afford to stop their operations for that long. The result is many companies often skip important upgrades, making the next one actually even more painful. And the developers are missing new features, performance improvements. They (the companies) sometime even risk running a PostgreSQL version that is no longer supported and has known data corruption or security problems. In the following paragraphs I will talk a little about my work on making the upgrades less time consuming and as result less painful and hopefully more frequent.

Let me start with a little history first. Before PostgreSQL 9.0 the only way to do a major version upgrade was to run pg_dump and restore the dump into an instance running a newer version of PostgreSQL. This method required the structure and all data to be read from the database and written into a file. Then read from the file and inserted into a new database, indexes have

[continue reading]

As is now our tradition, we will not have a regular meeting this month. Go to the Winter Coders’ Social instead.

We are, however, having a lab night to close out the year!

When:
Thu Dec 4, 6pm-8pm
Where: Renewable Funding, 400 SW 6th Ave

Mark Wong is going to take us on a tour of Bi-Directional Replication.

This lab will require some prep work, as BDR uses a patched version of 9.4. Don’t let that put you off attending though, as you can always pair with someone else.

We do need to know who’s coming so we can notify building security, so please sign up in advance here.

Grab dinner at the carts & come on over!


The extension module pageinspect has been already dumped to version 1.3 in PostgreSQL 9.5 with the addition of functions for BRIN indexes. A couple of days back a new set of functions has been added for GIN indexes with this commit.

commit: 3a82bc6f8ab38be3ed095f1c86866900b145f0f6
author: Heikki Linnakangas <heikki.linnakangas@iki.fi>
date: Fri, 21 Nov 2014 11:46:50 +0200
Add pageinspect functions for inspecting GIN indexes.

Patch by me, Peter Geoghegan and Michael Paquier, reviewed by Amit Kapila.

This consists of a set of 3 functions that take as argument the raw content of a relation page (fetched using get_raw_page):

  • gin_page_opaque_info, able to fetch information about the flags located in the opaque area of a GIN page.
  • gin_metapage_info, able to translate information located in the meta page of a GIN index.
  • gin_leafpage_items, giving some information about the content of a GIN leaf page.

Such functions are useful for developers TODOTODO First let's create an index using the GIN operators of pg_trgm on the book "Les Miserables", English translation.

=# CREATE EXTENSION pg_trgm;
CREATE EXTENSION
=# CREATE TABLE les_miserables (num serial, line text);
CREATE TABLE
=# COPY les_miserables (line) FROM '/path/to/les_miserables.txt';
COPY 68116
=# CREATE INDEX les_miserables_idx ON les_miserables
USING gin (line gin_trgm_ops);
CREATE INDEX

First, gin_page_opaque_info provides information about the status of a page (plus alpha like the right link page if any). Here is for example the status of the meta page of the previous index and one of its leaf page.

=# SELECT * FROM gin_page_opaque_info(get_raw_page('les_miserables_idx', 0));
 rightlink  | maxoff | flags
------------+--------+--------
 4294967295 |      0 | {meta}
(1 row)
=# SELECT * FROM gin_page_opaque_info(get_raw_page('les_miserables_idx', 3));
 rightlink | maxoff |         flags
-----------+--------+------------------------
         5 |      0 | {data,leaf,compressed}
(1 row)

Using gin_metapage_info, a direct visual of what is stored in Gin

[continue reading]