PostgreSQL
The world's most advanced open source database
Top posters
Number of posts in the past month
Top teams
Number of posts in the past month
Feeds
Twitter
Planet
  • Policy for being listed on Planet PostgreSQL.
  • Add your blog to Planet PostgreSQL.
  • List of all subscribed blogs.
  • Manage your registration.
Contact
  • Get in touch with the Planet PostgreSQL administrators at planet at postgresql.org.
Posted by Hans-Juergen Schoenig in Cybertec on 2014-04-24 at 08:41:27
The goal of the PostgreSQL optimizer is to provide you with a plan, which executes as fast as possible and returns all the data as rapidly as possible. In short: The game is about overall execution time of queries. However, there are cases in which you are not interested in receiving the entire result as […]
Our team recently got  a problem, which is to solve the N-Node multi master replication in PostgreSQL.

We all know that, there are some other db engines like Postgres-XC which works in this way. But, we don't have any tool available in PostgreSQL, except Bucardo.

Bucardo is the nice solution for 2-Nodes. Is there a way we can exceed this limitation from 2 to N..?

As an initial step on this, I have done with 3 Nodes, which I believe, we can extend this upto N. { I might be wrong here.}

Please follow the below steps to set up the 1 - 1 multi master replication.

1. Follow the below steps to get all the pre-requisites for the Bucardo.

yum install perl-DBIx-Safe
or
apt-get install libdbix-safe-perl

Install the below components from CPAN.

DBI
DBD::Pg
Test::Simple
boolean (Bucardo 5.0 and higher)

Download the latest tarball from here.

tar xvfz Bucardo-4.4.8.tar.gz
cd Bucardo-4.4.8
perl Makefile.PL
make
sudo make install
2. We need to create plperl extension in db. For this, download the required active perl to set up or simply do as below.
apt-get install postgresql-plperl-9.3
or
yum install postgresql-plperl-9.3

Make a copy of Old $PGLIBPATH/plperl.so
Move the new plperl.so to $PGLIBPATH/plperl.so
3. plperl extension.
Create extension plperl;
4. Create 3 databases like "node1", "node2", "node3".
CREATE DATABASE node1;
CREATE DATABASE node2;
CREATE DATABASE node3;
5. Execute below statements on 2 databases (node1, node2).
CREATE TABLE test1(t INT PRIMARY KEY);
INSERT INTO test1 VALUES(generate_series(1, 10));
6. Install Bucardo catalog database using the below command.
postgres@dinesh-VirtualBox:~/Bucardo-4.5.0$ ./bucardo_ctl install
Postgres version is: 9.3
Attempting to create and populate the bucardo database and schema
Database creation is complete
Connecting to database 'bucardo' as user 'bucardo'
Updated configuration setting "piddir"
Installation is now complete.
7. Adding databases, those will be part of mutli-master replication.
postgres@dinesh-VirtualBox:~/Bucardo-4.5.0$ ./bucardo_ctl add db db1 dbname="node1"
Added d

[continue reading]

Posted by Marko Tiikkaja on 2014-04-23 at 00:50:00
In my previous post I benchmarked a number of different UPSERTy implementations.  However, a number of people (rightfully) asked me how the results would change under concurrency.  So I had to come up with a way of testing that. I wanted reproducible numbers, so I figured that just throwing a lot of UPSERTs at a database was not acceptable, not to mention the problems with keeping that up while
Posted by David Fetter in Disqus on 2014-04-22 at 21:21:29
PostgreSQL 9.4 has an amazing feature hidden underneath a complicated syntax. Here's a neat way to use it.
Continue reading "9.4: Quantiles!"
As all of you know after upgrading the database server from one version to other major version, ANALYZE command should be executed to update the pg_catalogs on newly populated data. On a huge upgraded database, its a challenge for the application to gain its performance back without updating the statistics. In PostgreSQL 9.4, an option "vacuumdb --analyze-in-stages" will make this work faster to produce usable statistics required by the optimizer.  It runs in three stages with different configuration settings(default_statistics_target/vacuum_cost_delay) to analyze the database.

If the database followed any of the up-gradation procedure like pg_dump/pg_restore or pg_upgrade, then its recommended to use "vacuumdb --analyze-in-stages"

Sample output:
bash-4.1$ /usr/local/pgpatch/pg/bin/vacuumdb -p 9999 --analyze-in-stages -d tester
Generating minimal optimizer statistics (1 target)
Generating medium optimizer statistics (10 targets)
Generating default (full) optimizer statistics
Posted by Marko Tiikkaja on 2014-04-21 at 01:01:00
As I'm sure a lot of people know already, PostgreSQL 9.4 will eventually be released without "native" support for UPSERT.  That said, it's still one of the most common questions on the excellent #postgresql IRC channel, and people are sometimes tempted to try and solve it via TRIGGERs (hell, even RULEs), unsuccessfully. Even though people are often correctly told to "use a function" and are

We all know about replication and its types in postgresql. There are basic 3 types of replication in postgresql i.e  Warm, hot standby and Streaming Replication. I used to get confused between these three and was unable to find the difference when I was in my initial stages of postgres learning. However I understand the difference later by reading the docs. So I would like to post key differences between these 3 types of replication in short(it also helps if anyone wants to find which one is configured for his/her environment).

Warm Standby:
==========
Its introduced in PostgreSQL 8.3(IIRC).

1. It is based on WAL log shipping which typically means WAL archives generated on Master will be transferred and applied at Standby side. So Warm Standby always waits for the WAL archive in which Master is currently writing and keeps throw messages like "cp: cannot stat <archive>: No such file or directory". So it is always one archive behind than Master and data loss will be max of 16MB(assuming a healthy warm standby by :-) )

2. In postgresql.conf file, you would need to change just three parameters in master; wal_level to archive, archive_mode and archive_command, however nothing in postgresql.conf file at standby side.
On Master:
wal_level = archive
archive_mode = on
archive_command = 'cp %p /path_to/archive/%f'

3. In recovery.conf file, three parameters; standby_mode, restore_command and trigger_file.
4. You cannot connect to Standby, so database is not even open for read operations (or read operations are not permitted on db).

Detailed explanation and related docs are here
Hot Standby:
========
Its introduce in PostgreSQL 9.0.

1. It is also based on WAL log shipping(same as warm standby). And of-course, WALs will be transferred and applied at Standby, so one WAL behind and always waits for the WAL archive in which Master is currently writing.
2. In postgresql.conf file, you would need to change wal_level to hot_standby, archive_mode and archive_command. Since you'll likely want to use pg_basebackup you should a

[continue reading]

Posted by gabrielle roth on 2014-04-19 at 01:45:29

PDXPUG’s official logo is a mask-wearing Slonik:
pdxpug_logo

Hir name is Chelnik. Sort of like Slonik, but different.*

A couple years ago, my co-leader Mark Wong starting making amigurumi, little crocheted stuffed animals. I think he started with Angry Birds, but pretty soon after, he made a Chelnik (the one in the banner on this blog, in fact). And then another…

He took one to PgCon 2012, and donated it to the charity auction, where it netted $500 for The Ottawa Mission.

To date, Mark’s made a dozen of these elephants for sale or auction at various Pg conferences, raising money for donations to the following organizations:
PGUS ($350), The Ottawa Mission ($680), Chicago Hopes ($640), and most recently at PGConf NYC, $1200 for Heartsong.

Thanks to Mark and everyone who’s participated in these auctions.

stowaway_cropped

* No, it doesn’t really mean anything. Beer was involved in the naming.
There’s a tumblr with some photos of the various Chels he’s distributed.


Sometimes I find myself wanting to grep all the query durations out of a file for sorting.  I recently discovered that I can use gnu grep's lookbehind syntax like so:

grep -oP '(?<=duration: )[0-9\.]+' postgresql-Mon.csv

and then sort them numerically like so:

grep -oP '(?<=duration: )[0-9\.]+' postgresql-Mon.csv | sort -n | tail
Using PostgreSQL JDBC with Spring.

This article came about after a user filed an issue on the JDBC list

The specific issue is that I’m using SimpleJdbcCall.execute() to call the database and getting back a Jdbc4Array. When I then try to do something like Jdbc4Array.getArray() I get a SQL error that can be tracked down to the Jdbc driver trying to use a connection object which has already been closed by the Spring Framework.

The problem is that once you get the array back more work has to be done to get the values out of it. The following code is an example of how to use transactions with spring and PostgreSQL JDBC

Simple interface to get a value out of an array Implementation which does the actual work Note the @Transaction annotation on line 22, this is required to ensure that the connection is not closed after the first call on line 42 without this annotation spring would close the connection and the next line 43 would throw an exception. The constructor line 27 is required for spring to create the implementation and inject the datasource defined in the context Spring context SQL to create function and data Code for this can be found here SpringTransactionExample Thanks to Michael Miller for the initial code used in this example
Posted by Dimitri Fontaine in 2ndQuadrant on 2014-04-17 at 11:53:00

A couple of week ago I had the chance to participate into the PGConf NYC 2014 Conference, one of the biggest conferences about PostgreSQL worldwide.

I presented one of my favourite talks over there, where the whole goal is to blow the mind of innocent developers and show them how much they can do in just SQL.

PostgreSQL for developers, window functions galore!

The basis for the talk is this detailed blog entry about the Reset Counter application and how to leverage SQL to write the code for it.

The second huge feature coming in PostgreSQL 9.4 with jsonb is called logical decoding. In short, it is a new plugin facility that can be used to decode changes that happen on a database and stream them to external sources. It can be used for many things like replication, auditing or even online upgrade solutions.

Logical decoding has been introduced in the core of PostgreSQL incrementally with a set of features that could roughly be listed as follows:

  • Logical replication slots, similar to physical slots except that they are attached to a single database.
  • WAL level "logical" in wal_level, level of WAL generated by server to be able to decode changes to the database into a coherent format.
  • Creation of a SQL interface to view the changes of a replication slot.
  • Extension of the replication protocol to support logical replication (with particularly the possibility to provide a database name in parameter "replication" of a connection string)
  • Addition of REPLICA IDENTITY, a table parameter to modify how updated and deleted tuple data is written to WAL.

Then, two new utilities are present to help users to grab an understanding of how things work:

  • test_decoding, providing an example of output plugin for decoding.
  • pg_recvlogical, an example of utility that can be used to receive changes from a logical replication slot.

Logical decoding introduces a lot of new concepts and features, making it impossible to write everything in a single post. Remember however that it is possible to customize the decoding plugin, or in this post test_decoding, and the remote source receiving the changes, pg_recvlogical in the case of this post. So for now, using what Postgres core offers, let's see how to simply set up logical replication. First, be sure that the following parameters are set in postgresql.conf:

wal_level = logical
max_replication_slots = 1

max_replication_slots needs to be at least 1. test_decoding needs to be installed as well on your server. In order to work, logical replicaton needs first a logical replication sl

[continue reading]

Posted by Robert Haas in EnterpriseDB on 2014-04-16 at 18:58:00
Last month, ZDNet published an interview with MongoDB CEO Max Schireson which took the position that the document databases, such as MongoDB, are better-suited to today's applications than traditional relational databases; the title of the article implies that the days of relational databases are numbered. But it is not, as Schireson would have us believe, that the relational database community is ignorant of or has not tried the design paradigms which he advocates, but that they have been tried and found, in many cases, to be anti-patterns. Certainly, there are some cases in which the schemaless design pattern that is perhaps MongoDB's most distinctive feature is just the right tool for the job, but it is also misleading to think that such designs must use a document store. Relational databases can also handle such workloads, and their capabilities in this area are improving rapidly.

Read more »
Posted by Jim Mlodgenski in OpenSCG on 2014-04-16 at 14:21:50

With all of the new users to PostgreSQL, one of the things we wanted to do for PGConf NYC this year was to have a video showing the history of PostgreSQL. We ended up with a very professional video showing how the features have grown over the years and by the applause of the attendees delivered on our goals. You can see it at:
https://www.youtube.com/watch?v=2RSkI9dxdbo

However, our proof of concept video was very different. Since I was done by me and my video editing skills are very limited, its not nearly as clean and professional. I also have the feeling that Jonathon Katz was afraid we’d freak out some people with the Five Finger Death Punch background music so we ended up going to more professional route. I still think it had some potential so I posted it up to YouTube:
http://youtu.be/AYn2ukNEmQM

Also, thanks to Kirk Roybal for putting together the animation of PostgreSQL source code changes over the years.
https://www.youtube.com/watch?v=HsxwNvlKZRU

 

Posted by Dan Langille on 2014-04-15 at 17:12:43
I remember a time when I’d never been to a conference related to my passions. Once I went, things changed. I realized that making strong working relationships with others who share my passion is important. Not only does this solidify the community of which you are a member, it also helps you personally. Every conference [...]
Posted by Hans-Juergen Schoenig in Cybertec on 2014-04-15 at 07:07:19
When cleaning up some old paperwork this weekend I stumbled over a very old tutorial. In fact, I have received this little handout during a UNIX course I attended voluntarily during my first year at university. It seems that those two days have really changed my life – the price tag: 100 Austrian Schillings which […]
Posted by gabrielle roth on 2014-04-15 at 00:44:00

When: 7-9pm Thu April 17, 2014
Where: Iovation
Who: Brian Panulla
What: Intro to Graph Databases

Brian Panulla is a Business Intelligence Developer at Acureo. He’s been fulfilling PDXPUG’s rhombic triacontahedron needs for over a year. Next week, he’ll be giving us an Intro to Graph Databases!

Our meeting will be held at Iovation, on the 32nd floor of the US Bancorp Tower at 111 SW 5th (5th & Oak). It’s right on the Green & Yellow Max lines. Underground bike parking is available in the parking garage; outdoors all around the block in the usual spots. No bikes in the office, sorry!

Building security will close access to the floor at 7:30.

See you there!


On 13th of April, Tom Lane committed patch: Provide moving-aggregate support for a bunch of numerical aggregates.   First installment of the promised moving-aggregate support in built-in aggregates: count(), sum(), avg(), stddev() and variance() for assorted datatypes, though not for float4/float8.   In passing, remove a 2001-vintage kluge in interval_accum(): interval array elements have been […]
Posted by Michael Paquier in VMware on 2014-04-12 at 05:07:04

Except if you have been cut from the Internet the last week, you have already heard of Heartbleed. This good candidate for the "Bug of the Year 2014" price is already costing a better-not-to-count amount of money in maintenance and development for many companies around the world.

It has already been mentioned in which cases a PostgreSQL server would be vulnerable, but you might want to run some tests to check that the update of openssl on your server is effective. After this bug went public on 2014/04/07, many scripts have popped around on the net to help you checking if a server is vulnerable to this bug or not. However, you need to know that you may not be able to test them directly on a Postgres server as Postgres uses a custom protocol before handling the connection to openssl. A connection needs to send first a message called SSLRequest described here, consisting of two 32-bit integers, 8 and 80877103. The server then answers a single byte, either 'S' if SSL connection is supported, or 'N' if not. Once 'S' is received the SSL startup handshake message can be sent for further processing.

Taking that into account, an example of script usable to test Heartbleed vulnerability on a Postgres server can be found here, extended by my colleague Heikki Linnakangas and forked from here. Particularly, note this portion of the code to handle the PostgreSQL custom protocol:

sslrequest = h2bin('''
00 00 00 08 04 D2 16 2F
''')

[...]

print 'Sending PostgreSQL SSLRequest...'
sys.stdout.flush()
s.send(sslrequest)
print 'Waiting for server response...'
sys.stdout.flush()

sslresponse = recvall(s, 1)
if sslresponse == None:
  print 'Server closed connection without responding to SSLRequest.'
  return
# Server responds 'S' if it accepts SSL, or 'N' if SSL is not supported.
pay = struct.unpack('>B', sslresponse)
if pay[0] == 0x4E: # 'N'
  print 'PostgreSQL server does not accept SSL connections.'
  return
if pay[0] != 0x53: # 'S'
  print 'Unexpected response to SSLRequest: %d.', pay
  return

# Continue with SSL start hands

[continue reading]

The Problem

Sometimes you need to generate sample data, like random data for tests. Sometimes you need to generate it with huge amount of code you have in your ORM mappings, just because an architect decided that all the logic needs to be stored in the ORM, and the database should be just a dummy data container. The real reason is not important - the problem is: let’s generate lots of, millions of rows, for a sample table from ORM mappings.

Sometimes the data is read from a file, but due to business logic kept in ORM, you need to load the data from file to ORM and then save the millions of ORM objects to database.

This can be done in many different ways, but here I will concentrate on making that as fast as possible.

I will use PostgreSQL and SQLAlchemy (with psycopg2) for ORM, so all the code will be implemented in Python. I will create a couple of functions, each implementing another solution for saving the data to the database, and I will test them using 10k and 100k of generated ORM objects.

Sample Table

The table I used is quite simple, just a simplified blog post:

CREATE TABLE posts (
  id SERIAL PRIMARY KEY,
  title TEXT NOT NULL,
  body TEXT NOT NULL,
  payload TEXT NOT NULL
);

SQLAlchemy Mapping

I'm using SQLAlchemy for ORM, so I need a mapping, I will use this simple one:
class BlogPost(Base):
    __tablename__ = "posts"

    id = Column(Integer, primary_key=True)
    title = Column(Text)
    body = Column(Text)
    payload = Column(Text)

The payload field is just to make the object bigger, to simulate real life where objects can be much more complicated, and thus slower to save to the database.

Generating Random Object

The main idea for this test is to have a randomly generated object, however what I really check is the database speed, and the whole randomness is used at the client side, so having a randomly generated object doesn’t really matter at this moment. The overhead of a fully random function is the same regardless of the method of saving the data to the database. So instead of randomly ge

[continue reading]

Posted by Francisco Figueiredo Jr on 2014-04-09 at 21:13:00
This version restores the @@ operator support removed in the previous version. Thanks Glen Parker who provided a fix.

Downloads can be found at the usual locations:

github download page
and
the nuget repository.

Pgfoundry will be updated soon.
Posted by Oleg Bartunov on 2014-04-08 at 22:30:25
Just to let people know about our experiments.


Consider this top-level query, which well supported by GIN in 9.4.

postgres=# select count(*) from jb where jb @> '{"tags":[{"term":"NYC"}]}'::jsonb;
count
-------
285
(1 row)

What if I want to find just {"term":"NYC"}, or even "NYC" ? Now, with some improvements, jsonb_hash_ops supports such queries with less than 5% bigger index size and not sacrificing performance !

postgres=# select count(*) from jb where jb @>> '{"term":"NYC"}'::jsonb;
count
-------
285
(1 row)
postgres=# select count(*) from jb where jb @>> '"NYC"'::jsonb;
count
-------
285
(1 row)


I can see the use-case for wildcard queries for shop aggregators, which combine different hierachies, so it's difficult to say if some specific key is on the same level in different sources.

Posted by Szymon Guz in EndPoint on 2014-04-08 at 09:06:00

I found an interesting problem. There was a table with some data, among which there was a date and an integer value. The problem was to get cumulative sum for all the dates, however including dates for which we don't have entries. In case of such dates we should use the last calculated sum.

Example Data

I will use an example table:

# CREATE TABLE test (d DATE, v INTEGER);

with sample data:

# INSERT INTO test(d,v)
  VALUES('2014-02-01', 10),
        ('2014-02-02', 30),
        ('2014-02-05', 10),
        ('2014-02-10', 3);

Then the data in the table looks like:

# SELECT * FROM test;
     d      |  v
------------+----
 2014-02-01 | 10
 2014-02-02 | 30
 2014-02-05 | 10
 2014-02-10 |  3
(4 rows)

What I want is to have a cumulative sum for each day. Cumulative sum is a sum for all the earlier numbers, so for the above data I want to get:

     d      |  v
------------+----
 2014-02-01 | 10
 2014-02-02 | 40
 2014-02-05 | 50
 2014-02-10 | 53
(4 rows)

The simple query for getting the data set like shown above is:

SELECT DISTINCT d, SUM(v) OVER (ORDER BY d) v
FROM test
ORDER BY d ASC;

Filling The Gaps

The query calculates the cumulative sum for each row. Unfortunately this way there are gaps between dates, and the request was to fill those in using the values from previous days.

What I want to get is:

     d      |  v
------------+----
 2014-02-01 | 10
 2014-02-02 | 40
 2014-02-03 | 40
 2014-02-04 | 40
 2014-02-05 | 50
 2014-02-06 | 50
 2014-02-07 | 50
 2014-02-08 | 50
 2014-02-09 | 50
 2014-02-10 | 53

My first idea was to use the generate_series() function, which can generate a series of data. What I need is a series of all dates between min and max dates. This can be done using:

# SELECT generate_series(
    '2014-02-01'::timestamp,
    '2014-02-05'::timestamp,
    '1 day')::date;
 generate_series 
-----------------
 2014-02-01
 2014-02-02
 2014-02-03
 2014-02-04
 2014-02-05

The generate_series() function arguments are (begin, end, interval). The function returns all timestamps from beginning to end with gi

[continue reading]

Is your PostgreSQL installation vulnerable to the Heartbleed bug in OpenSSL? The TL;DR; version is "maybe, it depends, you should read this whole thing to find out". If you are vulnerable, it is a high risk vulnerability!

The slightly longer version is that it will be vulnerable if you are using SSL, and not vulnerable if you are not. But the situation is not quite that easy, as you may be using SSL even without planning to. PostgreSQL also not provide any extra protection against the bug - if you are using SSL, you are vulnerable to the bug just as with any other service.

As the bug is in OpenSSL, however, what you need to get patched is your OpenSSL installation and not PostgreSQL itself. And of course, remember to restart your services (this includes both PostgreSQL and any other services using SSL on your system). You will then have to consider in your scenario if you have to replace your SSL keys or not - the same rules apply as to any other service.

It depends on if SSL is enabled

PostgreSQL by default ships with SSL turned off on most platforms. The most notable exception is Debian and derivatives (such as Ubuntu), which enable SSL by default.

If SSL is disabled globally, your installation is not vulnerable.

The easiest way to check this is to just use a simple SQL query:


postgres=# show ssl;
 ssl
-----
 off
(1 row)
 

If this parameter returns off, you are not vulnerable. If it returns on, you are.

If you do not need SSL, the easiest fix is to turn this off and restart PostgreSQL. This also brings additional benefits of not paying the overhead of encryption if you don't need it. If you actually use SSL, this is of course not an option.

It depends on your installation

If you have installed PostgreSQL using a package based system, such as yum (from redhat/fedora or from yum.postgresql.org), apt (from debian/ubuntu/etc or from apt.postgresql.org), FreeBSD ports etc, it is up to your operating system to provide a patch. Most major distributions have already done this - you just need to to install it (and rest

[continue reading]

As pointed out by Peter Eisentraut in a blog post named
Schema Search Paths Considered Pain in the Butt, you need to make sure the search_path is explicitly set for all SECURITY DEFINER functions in PostgreSQL.

Fixing this manually for, in my case, 2106 functions, is, indeed a “pain in the butt”, so I crafted a little query to automate the job:

\t
\pset format unaligned
\o /tmp/fix_search_path_for_security_definer_functions.sql
select
    array_to_string(
        array_agg(
            -- inject SET search_path in-between LANGUAGE and SECURITY DEFINER in the declaration
            regexp_replace(
                pg_get_functiondef(oid),
                E'(LANGUAGE [a-z]+)\\s+(SECURITY DEFINER)',
                E'\\1\n SET search_path TO public, pg_temp\n \\2'
            )
        ),
        ';'
    )
from pg_proc
where prosecdef is true -- SECURITY DEFINER functions
-- don't include functions for which we have already specified a search_path
and not (coalesce(array_to_string(proconfig,''),'') like '%search_path%')
-- public schema
and pronamespace = 2200
;
\t
\o
\i /tmp/fix_search_path_for_security_definer_functions.sql
-- If all goes well you should see a lot of CREATE FUNCTION being spammed on the screen

Posted by Bruce Momjian in EnterpriseDB on 2014-04-07 at 02:30:01

I just returned from attending PGConf NYC. They had 259 participants, more than double last year's total. The conference was in a hotel near Wall Street.

While I have been to many Postgres user conferences, this felt like my first corporate Postgres conference. Presenters from multinational banks Goldman Sachs and Morgan Stanley explained how and why they use Postgres in their organizations. These are trend-setting organizations, and their public embrace of Postgres will have lasting benefits.

In fact, one hallway discussion was how to enable large organizations like these, particularly those with significant legal and regulatory requirements, to work with the Postgres community. Some companies have employees post from non-company email accounts like Gmail, while others contract with consulting companies to work with the Postgres community on their behalf. Unfortunately, neither of these approaches have the companies working with the community openly.

Continue Reading »

PostgreSQL 9.4 is shipping with a new feature called jsonb, which is a new data type able to store JSON data supporting GIN indexing (!). In short, this feature, one of the most important of the upcoming release, if not the most important, puts Postgres directly in good position in the field of document-oriented database systems.

Since 9.2, an integrated JSON datatype already exists, completed with a set of functions (data generation and parsing functions) as well as operators added in 9.3. When using "json" data type, data is stored as an exact copy of the input text which functions working on it need to reparse causing some processing overhead.

The new jsonb data type stores data in a decomposed binary format, so inserting it is less performant than json because of the overhead necessary to put it in shape but it is faster as it does not need reparsing, and it has the advantage to support GIN indexing. For this last reason it is actually recommended to use jsonb for your applications instead of json (you might need only json depending on your needs though). Note as well that jsonb has the same operators as functions as json, you can refer to my previous posts on the matter to get some insight on them or directly at the documentation of Postgres.

Now let's see how jsonb works and let's compare it with json with as data sample a dump of geobase, worth 8.6 million tuples and 1.1GB, with many fields like the city name, country code (you can refer to a complete list of the fields here). After storing the data into a new table with a raw COPY, let's transform it into json/jsonb in a set of tables with a fillfactor at 100 to see how much space they use:

=# COPY geodata FROM '$HOME/Downloads/allCountries.txt';
COPY 8647839
=# CREATE TABLE geodata_jsonb (data jsonb) with (fillfactor=100);
CREATE TABLE
=# CREATE TABLE geodata_json (data json) with (fillfactor=100);
CREATE TABLE
=# \timing
Timing is on.
=# INSERT INTO geodata_json SELECT row_to_json(geodata) FROM geodata;
INSERT 0 8647839
Time: 287158.457 ms
=# INSER

[continue reading]

Yes, truly possible and handled smartly by PostgreSQL. To demo this, first I need to take after the standard technique of Point in Time Recovery in PostgreSQL. Various Books/Articles/Blogs demoed extremely well by extraordinary authors, hence am not going into details of how to do it, however, heading off directly to the subject i.e., how to pause while recovering with same technique. Arguably, I put forth a mathematical expression out of PITR as "PITR = (Last Filesystem Backup(LFB) + WAL Archives generated after LFB + Un-Archived WAL's in current $PGDATA/pg_xlogs)". For better understanding, I have put this into graph, in light of the fact that it clear the thought more: (Sorry, this blog is bit long, unknowingly it happened while going in details of the concept)


PITR steps,which am going to follow with slight changes that I talk about soon:

Step 1. Restore the most recent File System-level backup(FSB) to any location where recovery is planned to perform.
Step 2. If FSB is tar,then untar it, and clean the pg_xlog directory leaving archive_status. If backup has excluded this directory, then create the empty pg_xlog directory in FSB.
Step 3. Copy un-archived WAL's from crashed cluster $PGDATA/pg_xlog into $FSB/pg_xlog (Step 2)
Step 4. Delete the postmaster.pid from FSB directory.
Step 5. Create recovery.conf file in FSB directory.
Step 6. Start the cluster (FSB). 

We should put question, when pausing the recovery required ?. Maybe, to prevent multiple base restorations or roll-forward recovery but check in between or rollback a particular tables data or interest to see how far it has recovered :). Remember, pause in recovery means, its allowing to connect while recovering. To outline this, I have reproduced a situation in chart of a particular table rows improvement until to a mishap.


From above diagram, its agreeable a DEMO table rows were 10,00,000 when file system-level backup($PGDATA) taken and 40,00,000 rows before crash. In my local VM, I have made the situation on groundwork of TIME instead of date.

[continue reading]

I have released version 4.12 of the buildfarm client.

In addition to numerous bug fixes, it has the following:

  • the global option branches_to_build can now be 'HEADPLUSLATESTn' for any single digit n
  • there is a new module TestCollateLinuxUTF8
  • there is a new module TestDecoding which is enabled by default, (but does nothing on MSVC systems, where we can't yet run these tests.) This runs the new contrib test_decoding module, which can't run under "make installcheck".
  • running "perl -cw" on the scripts will now give you failures for missing perl modules on almost every platform. The only exception should now be on older Msys systems.
  • improvements in the sample config file to make it better organized and better reflecting of best practice.
  • find_typdefs is now supported on OSX

In addition I recently enhanced the HOWTO at http://wiki.postgresql.org/wiki/PostgreSQL_Buildfarm_Howto covering especially best current git practice.

Thanks to Tom Lane for suggestions and complaints which are behind a number of the improvements and fixes, and for some code for OSX find_typedefs.

The release is available at http://www.pgbuildfarm.org/downloads/releases/build-farm-4_12.tgz
As we all know pg_upgrade handles tablespaces in a smart way. However it creates the tablespaces in the same location of old ones. If you want to change the location after upgrade, then you would need some work including updating the catalog table with the new location(IIRC, in PG9.4, you can do it by using ALTER TABLESPACE command). So, for current versions I would like to give you a work around to change the locations of tablespaces while running the pg_upgrade. This tweak is also applicable, If you have any contrib modules installed in databases of old cluster, you have to install them in new cluster also, however we can not create databases in new cluster as it should be empty to run pg_upgrade. "pg_upgrade" creates the databases using template "template 0" in the new cluster. 

You can follow below steps to tweak the running pg_upgrade process.

1. I've upgraded PG9.1 cluster "Im_old_one" to PG9.3 cluster "Hey_Im_New", as you see below it is created new tablespace in the same directory.

-bash-4.1$ /opt/PostgreSQL/9.1/bin/psql -p 5666 -U postgres postgres
Timing is on.
psql.bin (9.1.7)
Type "help" for help.


postgres=# CREATE TABLESPACE tablsc_91 LOCATION '/tmp/tablsc_91';
CREATE TABLESPACE
Time: 1.663 ms
-bash-4.1$ /opt/PostgreSQL/9.3/bin/pg_upgrade -b /opt/PostgreSQL/9.1/bin/ -B /opt/PostgreSQL/9.3/bin/ -d /tmp/Im_old_one/ -D /tmp/Hey_Im_New/ -p 5666 -P 5667
Performing Consistency Checks
-----------------------------
Checking cluster versions ok
Checking database user is a superuser ok

.
.
Upgrade Complete
----------------
Optimizer statistics are not transferred by pg_upgrade so,
once you start the new server, consider running:
analyze_new_cluster.sh

Running this script will delete the old cluster's data files:
delete_old_cluster.sh
-bash-4.1$
-bash-4.1$
-bash-4.1$ ls -ltrh /tmp/tablsc_91/
total 8.0K
drwx------. 2 postgres postgres 4.0K Apr 5 14:25 PG_9.1_201105231
drwx------. 2 postgres postgres 4.0K Apr 5 14:29 PG_9.3_201306121
2. Now delete the new cluster and run

[continue reading]