PostgreSQL
The world's most advanced open source database
Top posters
Number of posts in the past month
Top teams
Number of posts in the past month
Feeds
Twitter
Planet
  • Policy for being listed on Planet PostgreSQL.
  • Add your blog to Planet PostgreSQL.
  • List of all subscribed blogs.
  • Manage your registration.
Contact
  • Get in touch with the Planet PostgreSQL administrators at planet at postgresql.org.
Posted by Andrew Dunstan in pgExperts on 2015-04-18 at 15:22:00
Unfortunately there was a small bug in yesterday's buildfarm client release. The bug only affects MSVC builds, which would fail silently on the HEAD (master) branch.

There is a bug fix release available at http://www.pgbuildfarm.org/downloads/releases/build-farm-4_15_1.tgz or you can just pick up the fixed version of run_build.pl (the only thing changed) at https://raw.githubusercontent.com/PGBuildFarm/client-code/b80efc68c35ef8a1ced37b57b3d19a98b8ae5dd2/run_build.pl

Sorry for the inconvenience.
We all know that \copy command does not return anything when you load the data. The idea is to capture how many # of records got loaded into table through \copy command.
Here's a shell script that should work:
echo number of rows in input: $(wc -l data.in)
( echo "\copy test from stdin delimiter '|';" ; cat data.in ) | psql -v ON_ERROR_STOP=1
echo psql exit code $?

If the exit code printed is 0, everything went well, and the value printed by the first echo can be used to to indicate how many rows were inserted. If the printed exit code is non-zero, no rows were inserted, of course. If the exit code printed is 3 then the data being copied had some error.

From the docs: If the exit code printed is 1 or 2 then something went wrong in psql (like it ran out of memory) or the server connection was broken, respectively. Following facts play a role in the above script:

.) COPY (and hence \copy) expects the input records to be terminated by a newline. So counting the number of newlines in the input is a reliable way of counting the records inserted.
.) psql will exit with code 3 iff there's an error in script and ON_ERROR_STOP is set. 
Note: This seems to not apply to the `psql -c "sql command"` construct.

# Example clean input

$ pgsql -c "create table test(a text,b int);"
CREATE TABLE
$ cat data.in
column1|2
column1|2
column1|2
column1|2
column1|2
column1|2

$ echo number of rows in input: $(wc -l data.in); ( echo "\copy test from stdin delimiter '|';" ; cat data.in ) | psql -v ON_ERROR_STOP=1 ; echo psql exit code $?
number of rows in input: 6 data.in
psql exit code 0

# Example malformed input
$ cat data.in
column1|2
column1|2
column1|2c
column1|2
column1|2
column1|2

$ echo number of rows in input: $(wc -l data.in); ( echo "\copy test from stdin delimiter '|';" ; cat data.in ) | pgsql -v ON_ERROR_STOP=1 ; echo psql exit code $?
number of rows in input: 6 data.in
ERROR: invalid input syntax for integer: "2c"
CONTEXT: COPY test, line 3, column b: "2c"
psql exit code 3
 
I hope this helps someone.
(Baji is trying to impress 'X')
==========
Baji: Packt Publishing has published a book on troubleshooting PostgreSQL database.
 _X_: Uh, so what(!?). It published other 4 PostgreSQL books this year !
Baji: yeah, I know !
 _X_: then why do you care about thisssss.
Baji: I should care about it as I was part of technical reviewing team.. :(
 _X_: Oh really !, thats fantastic.. Congratulations !
==========

Note: Finally, Baji impressed _X_ :-)

Ok, in reality, I am glad to announce that "My first book as a Technical Reviewer has been published by Packt Publishing" ;-)

https://www.packtpub.com/big-data-and-business-intelligence/troubleshooting-postgresql
http://my.safaribooksonline.com/book/databases/postgresql/9781783555314/troubleshooting-postgresql/pr02_html

Author of this book is Hans-Jürgen Schönig, he has couple of other PostgreSQL Books as well.

This book is to provide a series of valuable troubleshooting solutions to database administrators responsible for maintaining a PostgreSQL database. It is aimed at PostgreSQL administrators who have developed an application with PostgreSQL, and need solutions to common administration problems they encounter when managing a database instance. So give a try ;-)

I would like to thank my loving parents for everything they did for me. Personal time always belongs to family, and I did this in my personal time.

I want to thank the Packt Publishing for giving me this opportunity and thanks to Sanchita Mandal and Paushali Desai for choosing me and working with me for this project.

Last but not least, would like to thanks Dinesh Kumar who taught me PostgreSQL and inspiring me for this. :)
Posted by Amit Kapila in EnterpriseDB on 2015-04-18 at 03:38:00

I have ran some benchmark tests to see the Write performance/scalability in 
PostgreSQL 9.5 and thought it would be good to share the same with others,
so writing this blog post.

I have ran a pgbench tests (TPC-B (sort of) load) to compare the performance
difference between different modes and scale factor in HEAD (e5f455f5) on
IBM POWER-8 having 24 cores, 192 hardware threads, 492GB RAM
and here are the performance result





























Some of the default settings used in all the tests are:
min_wal_size=15GB
max_wal_size=20GB
checkpoint_timeout    =35min
maintenance_work_mem = 1GB
checkpoint_completion_target = 0.9
autovacuum=off

I have kept auto vacuum as off to reduce the fluctuation due to same and is
dropping and re-creating the database after each run.  I have kept high values
of min_wal_size and max_wal_size to reduce the effect of checkpoints, probably
somewhat lower values could have served the purpose of this workload, but I
haven't tried it.

The data is mainly taken for 2 kind of modes (synchronous_commit = on | off) and
at 2 different scale factors to cover the cases when all the data fits in shared buffers
(scale_factor = 300) and when all the data can't fit in shared buffers, but can fit in
RAM (scale_factor = 3000).

First lets talk about synchronous_commit = off case, here when all the data fits in
shared_buffers (scale_factor = 300), we can see the scalability upto 64 client count
with TPS being approximately 75 percent higher at 64 client-count as compare to 8
client count which doesn't look bad. When all the data doesn't fit in shared buffers,
but fit in RAM (scale_factor = 3000), we can see scalability upto 32 client-count with
TPS being 64 percent higher than at 8 client-count and then it falls there on.

One major difference in case of Writes when data doesn't fit in shared_buffers is
that backends performing transactions needs to write the dirty buffers themselves
when they are not able to find a clean buffer to read the page, this can hamper
the TPS.

Now let's talk about synch

[continue reading]

PGDB has had anonymous blocks since the release of 9.0 in late 2010. But it must either be one of those features that got lost in the shuffle, or is otherwise considered too advanced, because I rarely see it used in the wild. If that’s the case, it’s a great shame considering the raw power it conveys. Without committing to a function, we can essentially execute any code in the database, with or without SQL input.

Why is that good? One potential element of overhead when communicating with a database is network transfer. If processing millions of rows, forcing PGDB to allocate and push those results over the network will be much slower than manipulating them locally within the database itself. However, the looming specter of ad-hoc scripts is always a threat as well.

It was the latter scenario that prompted this particular discussion. A few weeks ago, I addressed date-based constraints and how they’re easy to get wrong. Knowing this, there’s a good chance we have objects in our database that need revision in order to operate properly. In one particular instance, I needed to correct over 800 existing check constraints an automated system built over the last year.

I hope you can imagine that’s not something I would want to do by hand. So it was a great opportunity to invoke an anonymous block, because there’s very little chance I’d need to do this regularly enough to justify a fully-fledged function. In the end, I came up with something like this:

DO $$
DECLARE
  chk TEXT;
  col TEXT;
  edate DATE;
  sdate DATE;
  tab TEXT;
  ym TEXT;
BEGIN
  FOR tab, chk, col IN 
      SELECT i.inhrelid::REGCLASS::TEXT AS tab,
             co.conname AS cname,
             substring(co.consrc FROM '\w+') AS col
        FROM pg_inherits i
        JOIN pg_constraint co ON (co.conrelid = i.inhrelid)
       WHERE co.contype = 'c'
  LOOP
    ym := substring(tab FROM '......$');
    sdate := to_date(ym, 'YYYYMM01');
    edate := sdate + INTERVAL '1 mon';

    EXECUTE 'ALTER TABLE ' || tab || ' DROP CONSTRAINT ' ||
        quote_ident(ch

[continue reading]

Posted by Andrew Dunstan in pgExperts on 2015-04-17 at 14:34:00
I have just released version 4.15 of the PostgreSQL Buildfarm Client. Here's what's changed:
  • support the new location for pg_upgrade
  • support running tests of client programs
  • support building, installing and running testmodules
  • use a default ccache directory
  • improve logging when running pg_upgrade tests
  • handle odd location of Python3 regression tests
  • add timestamp to default log_line_prefix
  • make qw() errors in the config file fatal (helps detect errors)
  • minor bug fixes for web script settings.
  • allow for using linked git directories in non-master branches
The last item might need a little explanation.  Essentially this can reduce quite dramatically the amount of space required if you are building on more than one branch. Instead of keeping, say, 6 checked out repos for the current six tested branches, we keep one and link all the others to it. This works almost exactly the way git-new-workdir does (I stole the logic from there). This doesn't work in a couple of situations: if you are using Windows or if you are using git-reference. In these cases the new setting is simply ignored.

To enable this new setting in an existing installation, do the following after installing the new release:
  • in your config file, add this setting:
    git_use_workdirs => 1,
  • remove the pgsql directory in each branch directory other than HEAD
Another good thing to do in existing installations would be to add "%m" to the beginning of the log_line_prefix setting in extra_config stanza.

Enjoy!
Posted by Peter Eisentraut on 2015-04-17 at 00:00:00

About two months ago, this happened:

And a few hours later:

It took a few more hours and days after this to refine some details, but I have now tagged the first release of this extension. Give it a try and let me know what you think. Bug reports and feature requests are welcome.

(I chose to name the data type uri instead of url, as originally suggested, because that is more correct and matches what the parsing library calls it. One could create a domain if one prefers the other name or if one wants to restrict the values to certain kinds of URIs or URLs.)

(If you are interested in storing email addresses, here is an idea.)

Posted by Josh Berkus in pgExperts on 2015-04-16 at 19:07:00
So ... you're using some of 9.4's new advanced aggregates, including FILTER and WITHIN GROUP.  You want to take some statistical samples of your data, including median, mode, and a count of validated rows.  However, your incoming data is floats and you want to store the samples as INTs, since the source data is actually whole numbers.  Also, COUNT(*) returns BIGINT by default, and you want to round it to INT as well.  So you do this:

    SELECT
        device_id,
        count(*)::INT as present,
        count(*)::INT FILTER (WHERE valid) as valid_count,
        mode()::INT WITHIN GROUP (order by val) as mode,
        percentile_disc(0.5)::INT WITHIN GROUP (order by val)
          as median
    FROM dataflow_0913
    GROUP BY device_id
    ORDER BY device_id;


And you get this unhelpful error message:

    ERROR:  syntax error at or near "FILTER"
    LINE 4:         count(*)::INT FILTER (WHERE valid)
            as valid_count,


And your first thought is that you're not using 9.4, or you got the filter clause wrong.  But that's not the problem.  The problem is that "aggregate() FILTER (where clause)" is a syntactical unit, and cannot be broken up by other expressions.  Hence the syntax error.  The correct expression is this one, with parens around the whole expression and then a cast to INT:

    SELECT
        device_id,
        count(*)::INT as present,
        (count(*) FILTER (WHERE valid))::INT as valid_count,
        (mode() WITHIN GROUP (order by val))::INT as mode,
        (percentile_disc(0.5) WITHIN GROUP (order by val))::INT
           as median
    FROM dataflow_0913
    GROUP BY device_id
    ORDER BY device_id;


If you don't understand this, and you use calculated expressions, you can get a worse result: one which does not produce and error but is nevertheless wrong.  For example, imagine that we were, for some dumb reason, calculating our own average over validated rows.  We might do this:

    SELECT
        device_id,
        sum(val)/count(*) FILTER (WHERE valid) as avg
    FROM dataflow_0913
  &nbs

[continue reading]

Author
Andreas 'ads' Scherbaum

PGConf.de 2015 is the sequel of the highly successful German-speaking PostgreSQL Conferences 2011 and 2013. Due to space limitations in the old location, we are moving to Hamburg. The conference takes place on Friday, November 27th. We also add a day with trainings on the 26th.

http://2015.pgconf.de/

Registration for the conference will be possible well in advance. Tickets must be purchased online. For sponsors, we have put together a package that includes among other things, a number of discounted ticket. More in the Call for Sponsors in a separate announcement.

Yesterday the pg_upgrade program was moved from contrib to bin in the source tree. Unfortunately this broke most of those buildfarm members which check pg_upgrade. There is a hot fix for the TestUpgrade buildfarm module that can be downloaded from github. I will work on cutting a new buildfarm release in the next few days, but this file can just be dropped in place on existing installations.

log_autovacuum_min_duration is a system-wide parameter controlling a threshold from which autovacuum activity is logged in the system logs. Every person who has already worked on looking at a system where a given set of table is bloated has for sure already been annoyed by the fact that even a high value of log_autovacuum_min_duration offers no guarantee in reducing log spams of not-much-bloated tables whose autovacuum runtime takes more than the threshold value, making its activity being logged (and this is after working on such a lambda system that the author of this feature wrote a patch for it). Postgres 9.5 is coming with a new feature allowing to control this logging threshold at relation level, feature introduced by this commit:

commit: 4ff695b17d32a9c330952192dbc789d31a5e2f5e
author: Alvaro Herrera <alvherre@alvh.no-ip.org>
date: Fri, 3 Apr 2015 11:55:50 -0300
Add log_min_autovacuum_duration per-table option

This is useful to control autovacuum log volume, for situations where
monitoring only a set of tables is necessary.

Author: Michael Paquier
Reviewed by: A team led by Naoya Anzai (also including Akira Kurosawa,
Taiki Kondo, Huong Dangminh), Fujii Masao.

This parameter can be set via CREATE TABLE or ALTER TABLE, with default value being the one defined by the equivalent parameter at server-level, like that for example:

=# CREATE TABLE vac_table (a int) WITH (log_autovacuum_min_duration = 100);
CREATE TABLE
=# ALTER TABLE vac_table SET (log_autovacuum_min_duration = 200);
ALTER TABLE

Note that This parameter has no unit and cannot use any units like the other relation-level options, and it has a default unit of milliseconds, so after CREATE TABLE the autovacuum activity of relation vac_table is logged if its run has taken more than 100ms, and 200ms after ALTER TABLE.

Thinking wider, there are two basically cases where this parameter is useful, an inclusive and an exclusive case:

  • when system-wide log_autovacuum_min_duration is -1, meaning that all the autovacuum activity is ignored for all th

[continue reading]

Posted by Rajeev Rastogi on 2015-04-14 at 05:47:00
I recently just got back from Indian PGDay conference 2015. It was an interesting, motivating and lot of knowledge sharing in terms of both attending and speaking at the conference.

I spoke about the various kind of "Optimizer Hint" provided by many database engines and also a new idea of "Hint", which can be provided to Optimizer. Some of the speakers shared their work on PostgreSQL as User.
Also it was interesting to know that many companies are evaluating migration or are in process of migrating from other DBs to PostgreSQL. This is really encouraging for all PostgreSQL experts.



Some of the details from presentation are as below (For complete presentation please visit Full Optimizer Hint)

Statistics Hint:

Statistics Hint is used to provide any kind of possible statistics related to query, which can be used by optimizer to yield the even better plan compare to what it would have done otherwise.
Since most of the databases stores statistics for a particular column or relation but doesn't store statistics related to join of column or relation. Rather these databases just multiply the statistics of individual column/relation to get the statistics of join, which may not be always correct.

Example:
Lets say there is query as
SELECT * FROM EMPLOYEE WHERE GRADE>5 AND SALARY > 10000;

If we calculate independent stats for a and b.
suppose sel(GRADE) = .2 and sel(SALARY) = .2;

then sel (GRADE and SALARY) =
sel(GRADE) * sel (SALARY) = .04.
 
In all practical cases if we see, these two components will be highly dependent i.e. if first column satisfy,second column will also satisfy. Then in that case sel (GRADE and SALARY) should be .2 not .04. But current optimizer will be incorrect in this case and may give wrong plan.

Data Hint:

This kind of hints provides the information about the relationship/ dependency among relations or column to influence the plan instead of directly hinting to provide desired plan or direct selectivity value. Optimizer can consider dependency information to derive the actual selectivi

[continue reading]

Posted by Josh Berkus on 2015-04-13 at 18:24:03

The South Bay PUG is having a meetup on April 28th. Speakers will include CK Tan of PostgreSQL enhancement company Vitesse Data, and Heikki Linnakangas, PostgreSQL Committer. We do not expect to have live video for this meetup, sorry!

RSVP on the Meetup Page.

For 2015, once again, Gartner’s Magic Quadrant for Business Intelligence and Analytics Platforms ranks Tableau pretty much at the top. How powerful would it be to combine Tableau with the world’s most advanced open source database, PostgreSQL, to create reports and analytics for businesses? This HOWTO walks you through the process of connecting Tableau with PostgreSQL and creating a simple report.

This HOWTO is based on the following configuration:

  • Windows 7
  • Tableau 9 Desktop
  • PostgreSQL 9.4.1

You can download the database used in this post from here: Database SQL (1 download ) .

Let’s create our first report!

Connecting to PostgreSQL

1) Launch Tableau. Click ‘More Servers …’ and select PostgreSQL from the menu as illustrated in the following screenshot:

 

2) Fill in the PostgreSQL connection properties and hit OK. If you don’t already have the required connection libraries installed, you will get an error as seen in following screenshot.

 

3) The (rather helpful) error dialog provides a link to download the libraries, click on it (requires working internet connection). This should take you to the driver’s section on Tableau’s official website. Locate PostgreSQL and download the corresponding setup file. See following screenshot for reference:

 

4) Run the downloaded file (may require administrator privileges). This will setup ODBC drivers and all system configurations required for PostgreSQL connectivity. Once you have completed setup, run Tableau Desktop again and connect to the PostgreSQL database you downloaded before the 1st step.

Creating a simple report

1) Once connected, you’ll find yourself on the Data Source tab. It lists the server, database and the tables as shown in the screen below.

 

2) You can drag and drop the tables you want to use. Alternatively, you can write a custom SQL for the required dataset. For the purpose of this blog, I have dragged and dropped ‘sales’ table. You can take a peek at a preview of the dataset in the result section of the window. Please note that you may have to c

[continue reading]

Posted by David E. Wheeler on 2015-04-11 at 04:34:08

Looks like it’s been close to two years since my last post on the PGXN blog. Apologies for that. I’ve thought for a while maybe I should organize an “extension of the week” series or something. Would there be interest in such a thing?

Meanwhile, I’m finally getting back to posting to report on a fun thing you can now do with your PGXN distributions. Thanks to the Version Badge service from the nice folks at Gemfury, you can badge your distributions! Badges look like this:

PGXN version

You’ve no doubt seem simlar badges for Ruby, Perl, and Python modules. Now the fun comes to PGXN. Want in? Assuming you have a disribution named pgfoo, just put code like this into the README file:

[![PGXN version](https://badge.fury.io/pg/pgfoo.svg)](https://badge.fury.io/pg/pgfoo)

This is Markdown format; use the syntax appropriate to your preferred README format to get the badg to show up on GitHub and PGXN.

That’s it! The badge will show the current releases version on PGXN, and the button will link through to PGXN.

Use Travis CI? You can badge your build status, too, as I’ve done for pgTAP, like this:

Build Status

[![Build Status](https://travis-ci.org/theory/pgtap.png)](https://travis-ci.org/theory/pgtap)

Coveralls provides patches, too. I’ve used them for Sqitch, though I’ve not yet taken the time figure out how to do coverage testing with PostgreSQL extensions. If you have, you can badge your current coverage like so:

Coverage Status

[![Coverage Status](https://coveralls.io/repos/theory/sqitch/badge.svg)](https://coveralls.io/r/theory/sqitch)

So get badging, and show off your PGXN distributions GitHub and elsewhere!

Fairly recently, a friend of mine presented a problem he wanted to solve with some JSON he had in a table. After he presented the end result he was trying to reach, I made the assumption that this would be pretty easy to do. But then I looked at the JSON Functions to try and find that quick fix. Though I read extensively and used rather liberal interpretations of the functions, there’s no way to directly manipulate JSON object contents with PostgreSQL.

Wait! Before you start yelling at me for being an idiot, I know what you’re going to say. I thought the same thing… at first. Go ahead and look, though. As of PostgreSQL 9.4, there is no built-in functionality to add or remove JSON elements without one or more intermediate transformation steps through PostgreSQL arrays or records. But that isn’t necessarily unexpected. Why?

Because PostgreSQL is a database. Its primary purpose is to store data and subsequently extract and view it. From this perspective, there’s no reason for PostgreSQL to have an entire library of JSON modification functions or operators. Regardless of this however, actions such as data merges and bulk updates still need to be possible. Yet all other fields allow a single update statement to append information, or otherwise perform a native calculation to replace the value in-line. There must be a way to do this with JSON too, without jumping through too many burning hoops.

Luckily there is, but it does require some preliminary work. Let’s start with a simple JSON document, as seen by PostgreSQL:

SELECT '{"Hairy": true, "Smelly": false}'::JSON;

               json
----------------------------------
 {"Hairy": true, "Smelly": false}

Ok. Now, how would I add an attribute named “Wobbly”? Well, I could pull the data into an external application, add it, and store the result. But suppose this was in a table of millions of records? That’s probably the least efficient way to modify them. This could be parallelized to a certain extent, but that requires a lot of scaffolding code and is way too much wor

[continue reading]

Posted by gabrielle roth on 2015-04-10 at 01:00:00

When: 6-8pm Thursday April 16, 2015
Where: Iovation
Who: Eric Hanson
What: Aquameta release!

Eric Hanson will give a tutorial for how to build applications with Aquameta, an open source web development platform built entirely in PostgreSQL. Aquameta is about to be launched as open source, so we’ll do a quick launch recap, and then dive into the tutorial.


Our meeting will be held at Iovation, on the 32nd floor of the US Bancorp Tower at 111 SW 5th (5th & Oak). It’s right on the Green & Yellow Max lines. Underground bike parking is available in the parking garage; outdoors all around the block in the usual spots. No bikes in the office, sorry!

Elevators open at 5:45 and building security closes access to the floor at 6:30.

The building is on the Green & Yellow Max lines. Underground bike parking is available in the parking garage; outdoors all around the block in the usual spots.

See you there!


Recently Simon Riggs committed a patch by himself and Andreas Karlsson to reduce the lock strength required by certain ALTER TABLE commands, including those to enable or disable triggers. Now the lock level required is SHARE ROW EXCLUSIVE instead of ACCESS EXCLUSIVE. That means it doesn't block SELECT commands any more, and isn't blocked by them, although it will still block and be blocked by INSERT, UPDATE and DELETE operations. Very nice.

However, without formally disabling a trigger you can tell it dynamically not to do anything in the current session without taking any locks at all. Here's a little bit of PLpgsql code I wrote recently for this sort of operation in an INSERT trigger:
    begin
disabled := current_setting('mypackage.foo_trigger_disabled');
exception
when others then disabled := 'false';
end;
if disabled = 'true' then
return NEW;
end if;
Note that this will only block the trigger from doing anything in sessions where this variable is set. But that's often exactly what you want. In the case this was written for, the trigger is redundant (and expensive) for certain bulk operations, but required for normal operations.  So in a session where we are performing the bulk operation, we can simply set this and avoid taking out a heavy lock on the table, and do this instead, before running our bulk operation:
    set mypackage.foo_trigger_disabled = 'true';
The code above is a bit ugly because of the requirement for the exception handler. There's a cure for that coming, too. David Christensen has submitted a patch to provide a form of current_setting() which will return NULL for unset variables instead of raising an exception.

Note, too, that you could use a value in a one-row one-column table if you wanted something that could apply in all sessions, not just the current session. It would be a bit less efficient, though. This mechanism is pretty light-weight.
Posted by Umair Shahid in Stormatics on 2015-04-07 at 11:51:10

In a previous post, I talked about the excitement that surrounds NoSQL support in PostgreSQL. Today, I will dive a little deeper into the technical details of native JSON support that makes NoSQL possible.

Below, I explain how to use some basic functions and operators.

Creating a table with JSON column

So this is fairly simple. You can declare a JSON column just like a column of any other data type. Below, I create a table ‘sales’ with 2 columns, ‘id’ and ‘sale’, with the latter being a JSON.

json_sample=# CREATE TABLE sales (id INT, sale JSON);
CREATE TABLE

Inserting JSON data

Insertion of data is pretty close to that of any other data type as well, except that you have to make sure the data is in a valid JSON format. Below, I am inserting 4 records into the table, each with a JSON containing nodes for ‘customer_name’ and a nested JSON for ‘items’ containing a ‘description’ and purchased ‘quantity’.
John bought 4 cartons of milk:

json_sample=# INSERT INTO sales 
                   VALUES (1,'{ "customer_name": "John", "items": { "description": "milk", "quantity": 4 } }');
INSERT 0 1

Susan bought 2 loaves of bread:

json_sample=# INSERT INTO sales 
                   VALUES (2,'{ "customer_name": "Susan", "items": { "description": "bread", "quantity": 2 } }');
INSERT 0 1

Mark bought a dozen bananas:

json_sample=# INSERT INTO sales 
                   VALUES (3,'{ "customer_name": "Mark", "items": { "description": "bananas", "quantity": 12 } }');
INSERT 0 1

Jane bought a pack of cereal:

json_sample=# INSERT INTO sales 
                   VALUES (4,'{ "customer_name": "Jane", "items": { "description": "cereal", "quantity": 1 } }');
INSERT 0 1

Retrieving JSON data

Like insertion, retrieving the JSON formatted data is fairly straightforward as well. Below, I am retrieving the data I inserted in the previous section.

json_sample=# SELECT * FROM sales;
id |                                      sale
----+------------------------------------------------------------------------------------
1 | { "c

[continue reading]

On 31st of March, Bruce Momjian committed patch: psql: add asciidoc output format   Patch by Szymon Guz, adjustments by me   Testing by Michael Paquier, Pavel Stehule To be honest, when Szymon posted first mail about asciidoc – it was the first time I heard about it. Immediately I thought: “why not markdown? or […]
Posted by Andrew Dunstan in pgExperts on 2015-04-06 at 23:29:00
Here's a picture of my two Raspberry Pi 2 boxes, both running headless and wireless.


One is running Raspbian, installed via NOOBS, and the other Fidora, a remix of Fedora 21 for Raspberry Pi 2. It turned out that Pidora doesn't work on the Raspberry Pi 2, a fact that is extremely well hidden on the Raspberry Pi web site.

I have set up test buildfarm animals on both of these. But something odd is happening. They are both getting intermittent failures of the stats regression test. Sometimes it happens during "make check", sometimes during "make installcheck" and sometimes during testing of pg_upgrade (which in turn runs "make installcheck").

These machines are not anything like speed demons. Far from it. But we also run other slow machines without getting this happening all the time. So I'm a bit perplexed about what might be going on.

Incidentally, if you want to play with one of these, I do recommend getting a starter kit from Amazon or elsewhere. It's probably cheaper than buying everything separately, and gets you everything you need to get started. Well worth the $69.99.

Posted by David Christensen in End Point on 2015-04-06 at 22:10:00
I recently just got back from PGConf 2015 NYC.  It was an invigorating, fun experience, both attending and speaking at the conference.

What follows is a brief summary of some of the talks I saw, as well as some insights/thoughts:

On Thursday:

"Managing PostgreSQL with Puppet" by Chris Everest.  This talk covered experiences by CoverMyMeds.com staff in deploying PostgreSQL instances and integrating with custom Puppet recipes.

"A TARDIS for your ORM - application level timetravel in PostgreSQL" by Magnus Hagander. Demonstrated how to construct a mirror schema of an existing database and manage (via triggers) a view of how data existed at some specific point in time.  This system utilized range types with exclusion constraints, views, and session variables to generate a similar-structured schema to be consumed by an existing ORM application.

"Building a 'Database of Things' with Foreign Data Wrappers" by Rick Otten.  This was a live demonstration of building a custom foreign data wrapper to control such attributes as hue, brightness, and on/off state of Philips Hue bulbs.  Very interesting live demo, nice audience response to the control systems.  Used a python framework to stub out the interface with the foreign data wrapper and integrate fully.

"Advanced use of pg_stat_statements: Filtering, Regression Testing & More" by Lukas Fittl.  Covered how to use the pg_stat_statements extension to normalize queries and locate common performance statistics for the same query.  This talk also covered the pg_query tool/library, a Ruby tool to parse/analyze queries offline and generate a JSON object representing the query.  The talk also covered the example of using a test database and the pg_stat_statements views/data to perform query analysis to theorize about planning of specific queries without particular database indexes, etc.

On Friday:

"Webscale's dead! Long live Postgres!" by Joshua Drake.  This talk covered improvements that PostgreSQL has made over the years, specific technologies that they have incorporated

[continue reading]

On 27th of March, Andrew Dunstan committed patch: Add stats for min, max, mean, stddev times to pg_stat_statements.   The new fields are min_time, max_time, mean_time and stddev_time.   Based on an original patch from Mitsumasa KONDO, modified by me. Reviewed by Petr Jelínek. While pg_stat_statements provides a lot of information about statements, it's timing […]
Posted by Paul Ramsey on 2015-04-06 at 00:00:00

Due to a critical bug in GeoJSON ingestion we have made an early release of versions 2.0.7 and 2.1.7. If you are running an earlier version on a public site and accepting incoming GeoJSON, we recommend that you update as soon as possible.

http://download.osgeo.org/postgis/source/postgis-2.1.7.tar.gz

http://download.osgeo.org/postgis/source/postgis-2.0.7.tar.gz

View all closed tickets for 2.0.7.

On 26th of March, Heikki Linnakangas committed patch: Add support for index-only scans in GiST.   This adds a new GiST opclass method, 'fetch', which is used to reconstruct the original Datum from the value stored in the index. Also, the 'canreturn' index AM interface function gains a new 'attno' argument. That makes it possible […]
Andrew Gierth's numeric abbreviated keys patch was committed recently. This commit added abbreviation/sortsupport for the numeric type (the PostgreSQL type which allows practically arbitrary precision, typically recommended for representing monetary values).

The encoding scheme that Andrew came up with is rather clever - it has an excellent tendency to concentrate entropy from the original values into the generated abbreviated keys in real world cases. As far as accelerating sorts goes, numeric abbreviation is at least as effective as the original text abbreviation scheme. I easily saw improvements of 6x-7x with representative queries that did not spill to disk (i.e. that used quicksort). In essence, the patch makes sorting numeric values almost as cheap as sorting simple integers, since that is often all that is actually required during sorting proper (the abbreviated keys compare as integers, except that the comparison is inverted to comport with how abbreviation builds abbreviated values from numerics as tuples are copied into local memory ahead of sorting - see the patch for exact details).

Separately, over lunch at pgConf.US in New York, Corey Huinker complained about a slow, routine data warehousing CREATE INDEX operation that took far too long. The indexes in question were built on a single text column. I suggested that Corey check out how PostgreSQL 9.5 performs, where this operation is accelerated by text abbreviation, often very effectively.

Corey chose an organic set of data that could be taken as a reasonable proxy for how PostgreSQL behaves when he performs these routine index builds. In all cases maintenance_work_mem was set to 64MB, meaning that an external tapesort is always required - those details were consistent. This was a table with 1.8 million rows. Apparently, on PostgreSQL 9.4, without abbreviation, the CREATE INDEX took 10 minutes and 19 seconds in total. On PostgreSQL 9.5, with identical settings, it took only 51.3 seconds - a 12x improvement! This was a low cardinality pre-sorted col

[continue reading]

This week, I wanted to share something that may impact many users of Postgres, with this commit changing a behavior that had for a long time the same default:

commit: 0badb069bc9f590dbc1306ccbd51e99ed81f228c
author: Bruce Momjian <bruce@momjian.us>
date: Tue, 31 Mar 2015 11:46:27 -0400
pg_ctl:  change default shutdown mode from 'smart' to 'fast'

Retain the order of the options in the documentation.

pg_ctl has three shutdown modes:

  • smart, the polite one, waits patiently for all the active clients connections to be disconnected before shutting down the server. This is the default mode for Postgres for ages.
  • immediate, the brute-force one, aborts all the server processes without thinking, leading to crash recovery when the instance is restarted the next time.
  • fast, takes an intermediate approach by rollbacking all the existing connections and then shutting down the server.

Simply, the "smart" mode has been considered the default because it is the least distuptive, particularly it will wait for a backup to finish before shutting down the server. It has been (justly) discussed that it was not enough aggresive, users being sometimes surprised that a shutdown requested can finish with a timeout because a connection has been for example left open, hence the default has been switched to "fast".

This is not complicated litterature, however be careful if you had scripts that relied on the default behavior of pg_ctl when switching to 9.5, particularly for online backups that will be immediately terminated at shutdown with the new default.

On 23rd of March, Heikki Linnakangas committed patch: Add pg_rewind, for re-synchronizing a master server after failback.   Earlier versions of this tool were available (and still are) on github.   Thanks to Michael Paquier, Alvaro Herrera, Peter Eisentraut, Amit Kapila, and Satoshi Nagayasu for review. So, we have a situation, where we have master […]
It’s generally a bad idea to do logical dump/restores of slony nodes, and for this reason slony provides the CLONE PREPARE/CLONE FINISH action commands to clone subscriber nodes. In this instance though, I’ve a test environment where I’d stopped the slons, dumped out and dropped a subscriber database and then gone on to do some […]
On 22nd of March, Tom Lane committed patch: Allow foreign tables to participate in inheritance.   Foreign tables can now be inheritance children, or parents. Much of the system was already ready for this, but we had to fix a few things of course, mostly in the area of planner and executor handling of row […]