PostgreSQL
The world's most advanced open source database
Top posters
Number of posts in the past month
Top teams
Number of posts in the past month
Feeds
Twitter
Planet
  • Policy for being listed on Planet PostgreSQL.
  • Add your blog to Planet PostgreSQL.
  • List of all subscribed blogs.
  • Manage your registration.
Contact
  • Get in touch with the Planet PostgreSQL administrators at planet at postgresql.org.

Big Data is a buzz word doing the rounds nowadays, and the tool people generally associate with Big Data is Hadoop. While it is true that the sheer volume of data Hadoop can efficiently and cost effectively manage is unparalleled, not all businesses actually need Hadoop. What if you don’t want to adopt yet another fairly complex technology (Hadoop) that you would need to maintain and have in-house expertise in? What if you could do it all in PostgreSQL?

Due to many rich and diverse features, I have found PostgreSQL to be able to do a good enough job for most business cases. I have already talked about handling unstructured data with JSON, JSONB, and HSTORE. Another feature that you can use for scaling is table partitioning.

Let me walk you through a very simple tutorial on how to partition tables in PostgreSQL to enable massive scale-out.

Before starting off, let’s create a new database and then a table that tracks events in it. The table simply stores the type of event, its time, and comments associated with it.

postgres=# CREATE DATABASE partition_sample;
CREATE DATABASE
partition_sample =# CREATE TABLE events (id SERIAL, type INTEGER, time TIMESTAMP WITH TIME ZONE, comments TEXT);
CREATE TABLE

 

1. Tables with inheritance and CHECK CONSTRAINT

First step is to create the partitions, which will essentially be tables that inherit from ‘events’. Each of these tables will have a CHECK CONSTRAINT specifying exactly what range of values are allowed in it. Below, I am setting up 9 such tables.

Table to contain events of type ‘1’ occurring in January 2015:

partition_sample =# create table events_1_2015_01 
                           (CONSTRAINT events_type_time_check 
                           CHECK (time >= '2015-01-01' 
                             AND time < '2015-02-01' 
                             AND type = 1)) 
                           inherits(events);
CREATE TABLE

 

Table to contain events of type ‘2’ occurring in January 2015:

partition_sample=# create table events_2_2015_01 
          

[continue reading]

Most PostgreSQL developers don't use Windows, and many fairly openly regard the Windows port as a nuisance. But we've had it for more than a decade, now, and it's long since time we got over that. And checking that at least your changes compile is easy. It doesn't need a Windows box at all - Postgres builds for Windows quite happily via a cross-compiler. Here's exactly what I did on a Fedora 20 machine this morning:
sudo yum install mingw64-gcc mingw64-zlib
git clone git://git.postgresql.org/git/postgresql.git
cd postgresql
./configure --host=x86_64-w64-mingw32
make world
And it just worked. I tested it on Windows, and it might even be possible to test it using WINE, but even without testing it this procedure is useful to see if there are compilation errors.

I assume all this could be (and possibly has been) built for other platforms such as OSX and FreeBSD. If not, it might be worth putting together some packages for those platforms - it shouldn't be too hard.

PostgreSQL has json support – but you shouldn’t use it for the great majority of what you’re doing. This goes for hstore too, and the new jsonb type. These types are useful tools where they’re needed, but should not be your first choice when modelling your data in PostgreSQL, as it’ll make querying and manipulating it harder.

Some readers will be familiar with the (anti)-pattern. EAV has been a necessary evil in database schemas for a long time. It’s something we all advise against using and look down on, then resort to when reality and business requirements mean not everything can be fit into rigid fields in properly modelled relations. For those who don’t know it, it’s a schema design where you turn a relational database into a poor-man’s key/value store using a table with object_id (“entity”), key (“attribute”) and value columns. Querying it usually involves lots and lots of self-joins.

json is the new EAV – a great tool when you need it, but not something you should use as a first choice.

(Most of this also applies to PostgreSQL arrays, which are great tools for building queries, but not usually an appropriate way to store authorative data).

JSON blob tables

Since the release of PostgreSQL 9.3, and even more since jsonb in 9.4, I’m seeing more and more Stack Overflow questions where people ask how to do joins on json objects, how to query for a nested key inside any array element, etc. All reasonable things to do, but when asked for schema and example data they’ll often post something like this:

CREATE TABLE people(
    id serial primary key,
    data jsonb not null
);

INSERT INTO people(data) VALUES ($$
{
    "name": "Bob",
    "addresses": [
        {
            "street": "Centre",
            "streetnumber": 24,
            "town": "Thornlie",
            "state": "WesternAustralia",
            "country": "Australia"
        },
        {
            "street": "Example",
            "streetnumber": "4/311-313",
            "town": "Auckland",
            "country": "NewZealand"
        }
    ],
    "p

[continue reading]

Posted by John DeSoi on 2015-05-05 at 03:07:40

I came across this motorcycle leaving the movies on Friday. Very cool!

Lately I've been experimenting with building semi-schemaless apps. These are apps where much of the data may never be used for reporting aside from story telling and also that as time goes by some of these may be revisited and converted to more structured fields for easier roll-up and reporting. For the front-end UI, I'm using AngularJS which naturally spits out data as JSON and can autobind to JSON data of any complexity. My stored functions in PostgreSQL take JSON blobs as inputs spit it out into various tables and throws the whole thing in a jsonb field for later consumption (it's a bit redundant). Similarly they return JSON back. One of the things I wanted to be able to do was take this jsonb blob and tack on additional properties from well-structured fields or even a whole set of data like sub recordsets to feed back to my app in JSON. While there are lots of functions in PostgreSQL 9.3/9.4 that can easily build json objects from records, aggregate rows, etc. I couldn't find a function that allowed me to just add a property to an existing JSON object, so I went to my tried and true old-pal PL/V8 for some comfort. Here is a quickie function I created in PL/V8 that did what I needed. Hopefully it will be of use to others or others might have other ideas of doing this that I missed.


Continue reading "Adding properties to existing JSON object with PLV8"
On 26th of April, Peter Eisentraut committed patch: Add transforms feature   This provides a mechanism for specifying conversions between SQL data types and procedural languages. As examples, there are transforms for hstore and ltree for PL/Perl and PL/Python.   reviews by Pavel Stěhule and Andres Freund After this was committed, there were subsequent patches […]

I’ve been asked this question in some shape or form at least 3 times, mostly from people puzzled why they get this error. The last iteration went something like this:

I can’t use ST_AsPNG when doing something like

SELECT ST_AsPNG(rast) 
    FROM  sometable;
 

Gives error: Warning: pg_query(): Query failed: ERROR: rt_raster_to_gdal: Could not load the output GDAL driver.

Continue Reading by clicking title hyperlink ..

Last week, I covered how MVCC, PGDB’s storage system, works on a very high level. Near the end, I also mentioned that it doesn’t quite lend itself well to certain use cases, such as rapidly mutating session storage. Well, there is one caveat to that statement that I’d forgotten about because of its relatively limited utility: unlogged tables.

Here’s what the PGDB documentation has to say about unlogged tables:

Data written to unlogged tables is not written to the write-ahead log, which makes them considerably faster than ordinary tables. However, they are not crash-safe: an unlogged table is automatically truncated after a crash or unclean shutdown. The contents of an unlogged table are also not replicated to standby servers.

But what does this actually mean in the back end? That comes down to how databases interact with the underlying filesystem. In the case of PGDB, all writes to the database are actually first saved to a binary file that is decoupled from the actual data files. Then a periodic event called a checkpoint will commit those changes to the target table files. This way, crashes do not damage the database, the binary logs can be used to replicate to other instances, and backups can be rolled forward in time by applying transaction logs manually.

This is why unlogged tables have absolutely no crash protection, will not be copied to the replication stream, and get truncated if there’s any doubt regarding the contents. There’s no way to verify the data they contain, and they’ve been pruned from the very infrastructure that gives PGDB its durability. So what are they good for, then?

Let’s run a very short experiment. I commonly recommend any PostgreSQL server be set up with these three options in postgresql.conf:

wal_level = hot_standby
archive_mode = on
archive_command = 'exit 0'

Since archive mode can only be modified by restarting the server, it’s a good idea to just leave it enabled, but neutralized. PGDB does not archive transaction logs with these settings, just as imposed by the defaults. Ho

[continue reading]

On Unix-Linux platforms, Postgres binaries are not authorized to run as root use for security reasons when those utilities directly manipulate files in the data directory. See for example initdb:

initdb: cannot be run as root
Please log in (using, e.g., "su") as the (unprivileged) user that will
own the server process.

Or the backend binary:

"root" execution of the PostgreSQL server is not permitted.
The server must be started under an unprivileged user ID to prevent
possible system security compromise.  See the documentation for
more information on how to properly start the server.

On Windows as well, platform where everything is different, the error that the "postgres" binary returns is a bit different but similar:

Execution of PostgreSQL by a user with administrative permissions is not
permitted.
The server must be started under an unprivileged user ID to prevent
possible system security compromises.  See the documentation for
more information on how to properly start the server.

The PostgreSQL server cannot be logically started if it has administrator privileges. But now, there is an exception for frontend utilities like initdb, which can be run as a user with Administrator privileges by using a restricted token ensuring that the process is being run under a secured context (See CreateRestrictedToken refering to DISABLE_MAX_PRIVILEGE).

Most of the frontend utilities of Postgres make use of a restricted token, but actually this was not the case of two of them, as reported a couple of months back. The discussion regarding the bug report has resulted in the following commit:

commit: fa1e5afa8a26d467aec7c8b36a0b749b690f636c
author: Andrew Dunstan <andrew@dunslane.net>
date: Mon, 30 Mar 2015 17:07:52 -0400
Run pg_upgrade and pg_resetxlog with restricted token on Windows

As with initdb these programs need to run with a restricted token, and
if they don't pg_upgrade will fail when run as a user with Adminstrator
privileges.

Backpatch to all live branches. On the development branch the code is
reorg

[continue reading]

In this post I will present my findings that show a 12x compression of PostgreSQL JSONB data using the cstore_fdw extension. Cstore_fdw is an open source PostgreSQL extension for reducing the storage footprint and speeding up analytical queries. Cstore_fdw is maintained by Citus Data and is available on GitHub.

The JSONB data type is one of the major additions to PostgreSQL in the 9.4 release and also in CitusDB 4.0. JSONB gives the ability to store and query JSON documents in PostgreSQL without having to parse them at run-time. This addition instantly made PostgreSQL one of the most powerful document databases, introducing new possibilities for unstructured and big data use cases. However, there is a significant cost to using JSONB: Storage space. Even when using JSONB, the keys in the JSON are repeated in every row, which creates a significant storage overhead (e.g., 400%) compared to regular tables. So what if we compressed JSONB data using our cstore_fdw extension?

For this experiment, we set up 4 "customer_reviews" tables using different storage methods.

  • A regular table (12 columns)
  • A cstore_fdw table (12 columns)
  • A regular table with a single JSONB column
  • A cstore_fdw table with a single JSONB column

We loaded example CSV and JSON data into the tables using COPY and measured the table size using pg_relation_size and cstore_table_size. We found that using JSONB with cstore_fdw reduced the storage size by over 12 times compared to ordinary JSONB, and it's even much smaller than a regular table.

The reduced storage size does come at a cost. Data loading time goes up a bit because of the time spent compressing the data. Below are the load times on a c3.large EC2 instance.

Since the data needs to be decompressed while running queries, query times go up as well. However, the good news is that they do not go up by nearly as much as the storage space goes down and CPU time can be reduced by scaling out using CitusDB. Below are the average query times over 10 runs on a c3.large EC2 instance for the following ex

[continue reading]

Posted by Josh Berkus on 2015-04-29 at 17:37:16

pgConf Silicon Valley is now accepting proposals for talks. The conference is primarily looking for talks about high-performance PostgreSQL, including scale-out, data warehousing, analytics, case studies, and more. Both regular talks and tutorials are solicited.

pgConfSV is also looking for sponsors. The conference will be on November 17 and 18th. Talk submissions are due by June 15th.

pgConf Silicon Valley is organized under the San Francisco PostgreSQL User Group, and is produced by CitusData.

A previous blog I wrote on JSON functionality in PostgreSQL went wildly popular and it got me thinking about writing a follow-up HOWTO about JSONB.

JSONB, like JSON, is a native data type in PostgreSQL and was introduced in version 9.4. The major difference between the types is signified in the name; the extra ‘B’ stands for binary format. Some more detailed comparison is given in the table below:

JSON

  • Stores data in text format
  • Input is as fast, as no conversions are required
  • Processing functions must re-parse the data on each execution
  • Indexing is not supported
  • All white space and line feeds in the input are preserved as-is
  • Duplicate keys are retained, processing functions only consider the last value
  • Order of the keys is preserved

JSONB

  • Stores data in decomposed binary format
  • Input is slightly slower, as there is conversion overhead involved
  • Re-parsing is not needed, making data processing significantly faster
  • Indexing is supported
  • Extra white space and line feeds are stripped
  • Duplicate keys are purged at input, only the last value is stored
  • Order is not preserved

In general, unless there are some highly specialized needs (like legacy applications designed to assume an order of keys), it is highly recommended that JSONB be used.

Basic handling provided by PostgreSQL for both JSON and JSONB is pretty much the same. Because I have already talked about JSON in a previous blog, I will focus on the difference JSONB provides below.

Ignoring extra white space

Let’s start with the following valid object:

{
     "key1": "value1",
     "key2": [1, 2],
     "key3": "value3"
 }

JSON data type shows the following:

json_sample=# SELECT '{
                             "key1": "value1",
                             "key2": [1, 2],
                             "key3": "value3"
                         }'::JSON;
                          json
                 -----------------------
                  {                    +
                      "key1": "value1",+
                      "key2": [1, 2]

[continue reading]

Posted by Shaun M. Thomas on 2015-04-24 at 16:48:06

As a DBA, I strive not to live in an isolated ivory tower, away from the developers that are destined to fill our databases with volumes of data from a myriad of applications. It’s important, I think, to occasionally come back to the basics. So I’d like to discuss one of the core elements that PGDB DBAs might be intimately familiar with, but comes up often enough that some clarification is warranted. How does PostgreSQL store data? The answer may surprise you, and is critical to understand before regularly interacting with a PGDB system.

The storage engine PGDB uses is called Multi Version Concurrency Control, or MVCC for short. There is excellent documentation of the subject, along with further resources that attempt to convey a thorough understanding, and succeed in that regard. I strongly encourage anyone who works with PostgreSQL in any context, to read these until they can recite the basics from memory. It is the very bedrock of the database.

But we need an executive summary, so let’s start with an extremely simple table with an id and generic column:

ID Stuff
11 foo
21 bar
31 baz

Let’s say that some time in the future, we update ID 2. This would be the table at that point:

ID Stuff
11 foo
21 bar
31 baz
218 moo

Notice that there are now two copies of the row for ID 2. How is it possible for PGDB to differentiate between them? Which one is “correct”?

To answer those two questions, I’ll need a short aside to explain transaction counters. PGDB keeps a counter that it assigns to all database activities, regardless of origin. Because these numbers can’t cycle endlessly, it currently wraps after two billion values have been used. It then applies an “age” algorithm to produce an abstract representation of data antiquity. For the purposes of this discussion, the subscript in the examples represents the transaction age.

Every action gets a transaction ID, and every transaction ID acts like a snapshot because it provides a stable comparison point. Even if you don’t explicitly call BEGIN T

[continue reading]

On 5th of April, Simon Riggs committed patch: Reduce lock levels of some trigger DDL and add FKs   Reduce lock levels to ShareRowExclusive for the following SQL CREATE TRIGGER (but not DROP or ALTER) ALTER TABLE ENABLE TRIGGER ALTER TABLE DISABLE TRIGGER ALTER TABLE … ADD CONSTRAINT FOREIGN KEY   Original work by Simon […]
The PUG meeting was good. We now have a consistent if small group that are attending. Before the presentation we spoke about possibly moving the group to meetup to get a little better visibility. G+ Communities are awesome but Meetup seems to be where the people in the area look.

The presentation was provided by Eric Worden who happens to be a CMD employee. The talk overall is very good and provided a lot of information that I didn't know about dates. It also lead to the development of a new PostgreSQL extension (more on that at a later time).

The most interesting part of the talk to me was the use of a dimensions table to make date queries much, much faster. Either he or I will be submitting a blog post on that feature alone.

If you are interested in seeing what he has to say you can visit us at the Whatcom PgDay being hosted at LinuxFestNorthwest this weekend!

robotIn the second part of the Automating Barman with Puppet series we configured, via Puppet, two virtual machines: a PostgreSQL server and a Barman server to back it up. However, human intervention was required to perform the SSH key exchange and most of the manifest was written to allow the servers to access each other. In this third and final part of the series, we will look at how to configure a third VM that will act as the Puppet Master and use it to simplify the configuration of PostgreSQL and Barman. 

The entire code of this tutorial is on GitHub at https://github.com/2ndquadrant-it/vagrant-puppet-barman.

Configuring the Puppet Master: Vagrant

First, change the Vagrantfile to boot a third VM, called “puppet”, which will be our Puppet Master. To ensure that the machine is instantly accessible by the Puppet agents present on each VM, we add a “puppet” entry in the /etc/hosts file with the first script we run. We need also to enable the Puppet agent, as Debian-like distributions disable it by default.
Finally, within the Vagrantfile, let’s make a distinction between master and agents. The master will initially load its configuration straight from the manifest files, then the agents running on each host will apply the configuration sent from the master. Agents will also send data back to the master allowing other nodes to use it to build their configuration. For this reason, an agent is also set to run on the master.

The Vagrantfile is as follows:

Vagrant.configure("2") do |config|
  {
    :puppet => {
      :ip      => '192.168.56.220',
      :box     => 'ubuntu/trusty64',
      :role    => 'master'
    },
    :pg => {
      :ip      => '192.168.56.221',
      :box     => 'ubuntu/trusty64',
      :role    => 'agent'
    },
    :backup => {
      :ip      => '192.168.56.222',
      :box     => 'ubuntu/trusty64',
      :role    => 'agent'
    }
  }.each do |name,cfg|
    config.vm.define name do |local|
      local.vm.box = cfg[:box]
      local.vm.hostname = name.to_s + '.local.lan'
      local.vm.network :

[continue reading]

Similarly to the post of a couple of weeks back relating about the new memory allocation routine able to give a plan B route in case of OOM, here is a follow-up commit adding more infrastructure in the same area but this time for some widely-used memory allocation routines:

commit: 8c8a886268dfa616193dadc98e44e0715f884614
author: Fujii Masao <fujii@postgresql.org>
date: Fri, 3 Apr 2015 17:36:12 +0900
Add palloc_extended for frontend and backend.

This commit also adds pg_malloc_extended for frontend. These interfaces
can be used to control at a lower level memory allocation using an interface
similar to MemoryContextAllocExtended. For example, the callers can specify
MCXT_ALLOC_NO_OOM if they want to suppress the "out of memory" error while
allocating the memory and handle a NULL return value.

Michael Paquier, reviewed by me.

palloc_extended() is an equivalent of palloc() that operates on CurrentMemoryContext (understand by that the current memory context a process is using) with a set of flags, named the same way for frontend and backend:

  • MCXT_ALLOC_HUGE for allocations larger than 1GB. This flag has an effect on backend-side only, frontend routines using directly malloc.
  • MCXT_ALLOC_ZERO for zero allocation.
  • MCXT_ALLOC_NO_OOM to bypass an ERROR message in case of an out-of-memory and return NULL to the caller instead. This is the real meat. In the case of frontends, not using this flag results in leaving with exit(1) immediately.

The advantage of this routine is that it is made available for both frontends and backends, so when sharing code between both things, like xlogreader.c used by pg_xlogdump and pg_rewind on frontend-side and by Postgres backend, consistent code can be used for everything, making maintenance far easier.

A last thing to note is the addition of pg_malloc_extended(), which is available only for frontends, which is a natural extension similar to what already exists for pg_malloc0(), pg_realloc().

Posted by Joshua Drake in CommandPrompt on 2015-04-20 at 18:20:25
Saturday the 18th of April, I woke up to the following:

It was one of those moments that you realize just how blessed of a life you have. A moment where you stop and realize that you must have done something right, at least once. I was with my all of my ladies, there were no other people at the camp site, the weather was clear and it was set to hit 68F. The only sound was the gentle lapping of water and the occasional goose.

It was at this time that I was able to finally take a step back and reflect on PgConf.US. This conference meant a lot to me professionally. I didn't organize it. I only spoke at it, Command Prompt sponsored, and I occasionally offered feedback to the conference committee (as it is run by PgUS) via PgUS board meetings. This conference has become everything (and more) I tried to make the United States PostgreSQL Conference series. It is a conference of the best, the brightest and most importantly the users of our beloved elephant. In the United States, it is "The" PostgreSQL Conference.

That isn't to say there aren't other PostgreSQL conferences in the states. There are at least two others that run, but somehow after this latest conference I feel they are destined to niche attendance. There is nothing wrong with that and frankly we need the niche conferences. They fill a hole, else people wouldn't attend. It is just that experiencing the level of effort and greatness that was created by Jonathan and Jim was truly something awesome to behold.

They aren't paid to run these conferences. They do it because they want to help our community. They do it for free and speaking as somebody who has run more PostgreSQL conferences than any other current PostgreSQL conference organizer in the U.S., it is a thankless job. It is a volunteer effort that everyone should take a moment to thank them for. Jonathan, Jimmy: Thank you for all your efforts. The community could not have had such a great conference without you.

I have only two constructive feedback points for the organizers:

  1. Do not allow tutorials base

[continue reading]

I previously blogged about NoSQL support in PostgreSQL and then later wrote a tutorial on using JSON data type. Today, I will be talking about another angle of NoSQL in PostgreSQL, handling key-value data using the HSTORE contrib module.

HSTORE was introduced back in 2006 as part of PostgreSQL 8.2, and I doubt the authors had any idea at the time how NoSQL popularity will skyrocket in the coming years. Using HSTORE, developers can emulate a schema-less semi-structured data store while staying within the fully ACID-compliant domain of PostgreSQL, a relational database.

Without further ado, here is how some basic HSTORE functionality can be handled …

Create the extension

The very first step is to create the HSTORE extension, essentially loading the contrib module to your PostgreSQL instance.

hstore_db=# CREATE EXTENSION hstore;
CREATE EXTENSION

Create a table with HSTORE data type

Once the extension is loaded, creating a table with a column of HSTORE data type is fairly straightforward. The syntax is exactly the same as any native data type.

hstore_db=# CREATE TABLE hstore_test ( id SERIAL, sale HSTORE );
CREATE TABLE

Insert data into HSTORE column

You can use the ‘=>’operator to assign values to keys while you insert data.

hstore_db=# INSERT INTO hstore_test (sale) 
            VALUES ('"milk"=>"4", 
                     "bread"=>"2", 
                     "bananas"=>"12", 
                     "cereal"=>"1"');
INSERT 0 1

Retrieve data from an HSTORE column

HSTORE data can be retrieved like any other native PostgreSQL data type.

hstore_db=# SELECT sale FROM hstore_test;
                           sale
-----------------------------------------------------------
"milk"=>"4", "bread"=>"2", "cereal"=>"1", "bananas"=>"12"
(1 row)

Retrieve value for a particular key

HSTORE provides the ‘->’ operator to retrieve value of a particular key from the table.

hstore_db=# SELECT sale -> 'bread' AS quantity FROM hstore_test;
quantity
----------
2
(1 row)

Use value of a key in the WHERE cl

[continue reading]

Posted by Andrew Dunstan in pgExperts on 2015-04-18 at 15:22:00
Unfortunately there was a small bug in yesterday's buildfarm client release. The bug only affects MSVC builds, which would fail silently on the HEAD (master) branch.

There is a bug fix release available at http://www.pgbuildfarm.org/downloads/releases/build-farm-4_15_1.tgz or you can just pick up the fixed version of run_build.pl (the only thing changed) at https://raw.githubusercontent.com/PGBuildFarm/client-code/b80efc68c35ef8a1ced37b57b3d19a98b8ae5dd2/run_build.pl

Sorry for the inconvenience.
We all know that \copy command does not return anything when you load the data. The idea is to capture how many # of records got loaded into table through \copy command.
Here's a shell script that should work:
echo number of rows in input: $(wc -l data.in)
( echo "\copy test from stdin delimiter '|';" ; cat data.in ) | psql -v ON_ERROR_STOP=1
echo psql exit code $?

If the exit code printed is 0, everything went well, and the value printed by the first echo can be used to to indicate how many rows were inserted. If the printed exit code is non-zero, no rows were inserted, of course. If the exit code printed is 3 then the data being copied had some error.

From the docs: If the exit code printed is 1 or 2 then something went wrong in psql (like it ran out of memory) or the server connection was broken, respectively. Following facts play a role in the above script:

.) COPY (and hence \copy) expects the input records to be terminated by a newline. So counting the number of newlines in the input is a reliable way of counting the records inserted.
.) psql will exit with code 3 iff there's an error in script and ON_ERROR_STOP is set. 
Note: This seems to not apply to the `psql -c "sql command"` construct.

# Example clean input

$ pgsql -c "create table test(a text,b int);"
CREATE TABLE
$ cat data.in
column1|2
column1|2
column1|2
column1|2
column1|2
column1|2

$ echo number of rows in input: $(wc -l data.in); ( echo "\copy test from stdin delimiter '|';" ; cat data.in ) | psql -v ON_ERROR_STOP=1 ; echo psql exit code $?
number of rows in input: 6 data.in
psql exit code 0

# Example malformed input
$ cat data.in
column1|2
column1|2
column1|2c
column1|2
column1|2
column1|2

$ echo number of rows in input: $(wc -l data.in); ( echo "\copy test from stdin delimiter '|';" ; cat data.in ) | pgsql -v ON_ERROR_STOP=1 ; echo psql exit code $?
number of rows in input: 6 data.in
ERROR: invalid input syntax for integer: "2c"
CONTEXT: COPY test, line 3, column b: "2c"
psql exit code 3
 
I hope this helps someone.
(Baji is trying to impress 'X')
==========
Baji: Packt Publishing has published a book on troubleshooting PostgreSQL database.
 _X_: Uh, so what(!?). It published other 4 PostgreSQL books this year !
Baji: yeah, I know !
 _X_: then why do you care about thisssss.
Baji: I should care about it as I was part of technical reviewing team.. :(
 _X_: Oh really !, thats fantastic.. Congratulations !
==========

Note: Finally, Baji impressed _X_ :-)

Ok, in reality, I am glad to announce that "My first book as a Technical Reviewer has been published by Packt Publishing" ;-)

https://www.packtpub.com/big-data-and-business-intelligence/troubleshooting-postgresql
http://my.safaribooksonline.com/book/databases/postgresql/9781783555314/troubleshooting-postgresql/pr02_html

Author of this book is Hans-Jürgen Schönig, he has couple of other PostgreSQL Books as well.

This book is to provide a series of valuable troubleshooting solutions to database administrators responsible for maintaining a PostgreSQL database. It is aimed at PostgreSQL administrators who have developed an application with PostgreSQL, and need solutions to common administration problems they encounter when managing a database instance. So give a try ;-)

I would like to thank my loving parents for everything they did for me. Personal time always belongs to family, and I did this in my personal time.

I want to thank the Packt Publishing for giving me this opportunity and thanks to Sanchita Mandal and Paushali Desai for choosing me and working with me for this project.

Last but not least, would like to thanks Dinesh Kumar who taught me PostgreSQL and inspiring me for this. :)
Posted by Amit Kapila in EnterpriseDB on 2015-04-18 at 03:38:00

I have ran some benchmark tests to see the Write performance/scalability in 
PostgreSQL 9.5 and thought it would be good to share the same with others,
so writing this blog post.

I have ran a pgbench tests (TPC-B (sort of) load) to compare the performance
difference between different modes and scale factor in HEAD (e5f455f5) on
IBM POWER-8 having 24 cores, 192 hardware threads, 492GB RAM
and here are the performance result





























Some of the default settings used in all the tests are:
min_wal_size=15GB
max_wal_size=20GB
checkpoint_timeout    =35min
maintenance_work_mem = 1GB
checkpoint_completion_target = 0.9
autovacuum=off

I have kept auto vacuum as off to reduce the fluctuation due to same and is
dropping and re-creating the database after each run.  I have kept high values
of min_wal_size and max_wal_size to reduce the effect of checkpoints, probably
somewhat lower values could have served the purpose of this workload, but I
haven't tried it.

The data is mainly taken for 2 kind of modes (synchronous_commit = on | off) and
at 2 different scale factors to cover the cases when all the data fits in shared buffers
(scale_factor = 300) and when all the data can't fit in shared buffers, but can fit in
RAM (scale_factor = 3000).

First lets talk about synchronous_commit = off case, here when all the data fits in
shared_buffers (scale_factor = 300), we can see the scalability upto 64 client count
with TPS being approximately 75 percent higher at 64 client-count as compare to 8
client count which doesn't look bad. When all the data doesn't fit in shared buffers,
but fit in RAM (scale_factor = 3000), we can see scalability upto 32 client-count with
TPS being 64 percent higher than at 8 client-count and then it falls there on.

One major difference in case of Writes when data doesn't fit in shared_buffers is
that backends performing transactions needs to write the dirty buffers themselves
when they are not able to find a clean buffer to read the page, this can hamper
the TPS.

Now let's talk about synch

[continue reading]

PGDB has had anonymous blocks since the release of 9.0 in late 2010. But it must either be one of those features that got lost in the shuffle, or is otherwise considered too advanced, because I rarely see it used in the wild. If that’s the case, it’s a great shame considering the raw power it conveys. Without committing to a function, we can essentially execute any code in the database, with or without SQL input.

Why is that good? One potential element of overhead when communicating with a database is network transfer. If processing millions of rows, forcing PGDB to allocate and push those results over the network will be much slower than manipulating them locally within the database itself. However, the looming specter of ad-hoc scripts is always a threat as well.

It was the latter scenario that prompted this particular discussion. A few weeks ago, I addressed date-based constraints and how they’re easy to get wrong. Knowing this, there’s a good chance we have objects in our database that need revision in order to operate properly. In one particular instance, I needed to correct over 800 existing check constraints an automated system built over the last year.

I hope you can imagine that’s not something I would want to do by hand. So it was a great opportunity to invoke an anonymous block, because there’s very little chance I’d need to do this regularly enough to justify a fully-fledged function. In the end, I came up with something like this:

DO $$
DECLARE
  chk TEXT;
  col TEXT;
  edate DATE;
  sdate DATE;
  tab TEXT;
  ym TEXT;
BEGIN
  FOR tab, chk, col IN 
      SELECT i.inhrelid::REGCLASS::TEXT AS tab,
             co.conname AS cname,
             substring(co.consrc FROM '\w+') AS col
        FROM pg_inherits i
        JOIN pg_constraint co ON (co.conrelid = i.inhrelid)
       WHERE co.contype = 'c'
  LOOP
    ym := substring(tab FROM '......$');
    sdate := to_date(ym, 'YYYYMM01');
    edate := sdate + INTERVAL '1 mon';

    EXECUTE 'ALTER TABLE ' || tab || ' DROP CONSTRAINT ' ||
        quote_ident(ch

[continue reading]

Posted by Andrew Dunstan in pgExperts on 2015-04-17 at 14:34:00
I have just released version 4.15 of the PostgreSQL Buildfarm Client. Here's what's changed:
  • support the new location for pg_upgrade
  • support running tests of client programs
  • support building, installing and running testmodules
  • use a default ccache directory
  • improve logging when running pg_upgrade tests
  • handle odd location of Python3 regression tests
  • add timestamp to default log_line_prefix
  • make qw() errors in the config file fatal (helps detect errors)
  • minor bug fixes for web script settings.
  • allow for using linked git directories in non-master branches
The last item might need a little explanation.  Essentially this can reduce quite dramatically the amount of space required if you are building on more than one branch. Instead of keeping, say, 6 checked out repos for the current six tested branches, we keep one and link all the others to it. This works almost exactly the way git-new-workdir does (I stole the logic from there). This doesn't work in a couple of situations: if you are using Windows or if you are using git-reference. In these cases the new setting is simply ignored.

To enable this new setting in an existing installation, do the following after installing the new release:
  • in your config file, add this setting:
    git_use_workdirs => 1,
  • remove the pgsql directory in each branch directory other than HEAD
Another good thing to do in existing installations would be to add "%m" to the beginning of the log_line_prefix setting in extra_config stanza.

Enjoy!
Posted by Peter Eisentraut on 2015-04-17 at 00:00:00

About two months ago, this happened:

And a few hours later:

It took a few more hours and days after this to refine some details, but I have now tagged the first release of this extension. Give it a try and let me know what you think. Bug reports and feature requests are welcome.

(I chose to name the data type uri instead of url, as originally suggested, because that is more correct and matches what the parsing library calls it. One could create a domain if one prefers the other name or if one wants to restrict the values to certain kinds of URIs or URLs.)

(If you are interested in storing email addresses, here is an idea.)

Posted by Andreas 'ads' Scherbaum on 2015-04-16 at 23:30:13

As Josh posted before, the 2nd South Bay PostgreSQL Meetup will take place at Adobe in San Jose, on April 28th.

Happy to announce that CK Tan from VitesseData will speak about how Vitesse DB speeds up analytics queries in PostgreSQL. Heikki Linnakangas will speak about pg_rewind.

Please register for the event, only registered users will be allowed into the building by security.

Also if you want to have Wifi access, please send an email to mustain@adobe.com not later than the 24th, please include "SBPUG" in the subject.

See you there!

Posted by Josh Berkus in pgExperts on 2015-04-16 at 19:07:00
So ... you're using some of 9.4's new advanced aggregates, including FILTER and WITHIN GROUP.  You want to take some statistical samples of your data, including median, mode, and a count of validated rows.  However, your incoming data is floats and you want to store the samples as INTs, since the source data is actually whole numbers.  Also, COUNT(*) returns BIGINT by default, and you want to round it to INT as well.  So you do this:

    SELECT
        device_id,
        count(*)::INT as present,
        count(*)::INT FILTER (WHERE valid) as valid_count,
        mode()::INT WITHIN GROUP (order by val) as mode,
        percentile_disc(0.5)::INT WITHIN GROUP (order by val)
          as median
    FROM dataflow_0913
    GROUP BY device_id
    ORDER BY device_id;


And you get this unhelpful error message:

    ERROR:  syntax error at or near "FILTER"
    LINE 4:         count(*)::INT FILTER (WHERE valid)
            as valid_count,


And your first thought is that you're not using 9.4, or you got the filter clause wrong.  But that's not the problem.  The problem is that "aggregate() FILTER (where clause)" is a syntactical unit, and cannot be broken up by other expressions.  Hence the syntax error.  The correct expression is this one, with parens around the whole expression and then a cast to INT:

    SELECT
        device_id,
        count(*)::INT as present,
        (count(*) FILTER (WHERE valid))::INT as valid_count,
        (mode() WITHIN GROUP (order by val))::INT as mode,
        (percentile_disc(0.5) WITHIN GROUP (order by val))::INT
           as median
    FROM dataflow_0913
    GROUP BY device_id
    ORDER BY device_id;


If you don't understand this, and you use calculated expressions, you can get a worse result: one which does not produce and error but is nevertheless wrong.  For example, imagine that we were, for some dumb reason, calculating our own average over validated rows.  We might do this:

    SELECT
        device_id,
        sum(val)/count(*) FILTER (WHERE valid) as avg
    FROM dataflow_0913
  &nbs

[continue reading]

Author
Andreas 'ads' Scherbaum

PGConf.de 2015 is the sequel of the highly successful German-speaking PostgreSQL Conferences 2011 and 2013. Due to space limitations in the old location, we are moving to Hamburg. The conference takes place on Friday, November 27th. We also add a day with trainings on the 26th.

http://2015.pgconf.de/

Registration for the conference will be possible well in advance. Tickets must be purchased online. For sponsors, we have put together a package that includes among other things, a number of discounted ticket. More in the Call for Sponsors in a separate announcement.

Yesterday the pg_upgrade program was moved from contrib to bin in the source tree. Unfortunately this broke most of those buildfarm members which check pg_upgrade. There is a hot fix for the TestUpgrade buildfarm module that can be downloaded from github. I will work on cutting a new buildfarm release in the next few days, but this file can just be dropped in place on existing installations.