The world's most advanced open source database
Top posters
Number of posts in the past month
Top teams
Number of posts in the past month
  • Policy for being listed on Planet PostgreSQL.
  • Add your blog to Planet PostgreSQL.
  • List of all subscribed blogs.
  • Manage your registration.
  • Get in touch with the Planet PostgreSQL administrators at planet at
The next version of Parallel Query is set to hit in PostgreSQL v10, and you can learn all about it at PgConf.US next month!

Until then, we invite you to catch up on what Parallel Query can do now and Robert Haas's general thoughts on the feature. Watch the video from PgConf.US 2016 here.

Once you are done make sure and grab a ticket to the largest PostgreSQL conference in North America.

In my previous blog post, I showed how statistics generated on expression indexes can be used to produce more accurate row counts, and potentially better plans. While I did show more accurate row counts via EXPLAIN, I did not show changed query plans. I plan to do so in this blog post. First, the setup:

        SELECT * FROM generate_series(1, 100000) AS f(x);
        SELECT * FROM generate_series(1, 2) AS f(x);
ANALYZE test1;
ANALYZE test2;

Continue Reading »

Posted by Kaarel Moppel in Cybertec on 2017-02-22 at 09:40

I was recently asked if Postgres was “covered” in the area of common reporting needs? By reporting I mean ad hoc or predefined chart generation,  mostly drag-and-drop style software, used most commonly by Business Intelligence people. The answer is of course “yes” – you can do reporting on PostgreSQL as easily as with other popular […]

The post PostgreSQL and reporting software appeared first on Cybertec - The PostgreSQL Database Company.

It has taken a lot of planning and efforts, but I’m happy to see that PGConf India 2017 is coming along very well and promises to be a great event. This is our third year in a row in Bengaluru, but there are many distinguishing factors that make this year stand out:

  1. For the first time, we’re having a dedicated training day ahead of the main conference. All seats for the trainings are sold out and we had to turn away many interested folks.
  2. For the first time, we’re having a multi-track conference.
  3. And for the first time, we’ve crossed 225 conference delegate registrations, and there are still a few days left.

When we started planning for the conference, we were a bit nervous about whether we could get a good response to the training programme, but we’re almost stumped by the positive response we’ve received. It clearly shows PostgreSQL’s growing popularity and how companies and developers are adopting this technology.

The keynote this year will be delivered by Simon Riggs. He will share his thoughts on why “Persistence” is “Key to PostgreSQL’s Success“. Other than that, there are several topics which will interest developers and administrators, as well as IT managers. So if you want to know how to extract maximum performance from your PostgreSQL database or how to port from Oracle to PostgreSQL or want to know more about parallel query processing, then you must attend the conference. You’ll get to hear from not only some of the best developers, but also some very interesting case studies from real users of the database.

What started as a small meetup of interested folks 4 years back has now turned into a full scale conference with a large venue, great sponsors and amazing list of speakers.

Visit the conference website to know more details or contact the organisers at See you at the conference.

Everything you always wanted to know about PostgreSQL stats

This blogpost is all about postgres activity stats. Stats are the very important since they allow us to understand what's going on with the system. With that in mind, stats also have some weak points and that’s what I would like to discuss here.

In these blog series I will try to explain how to use stats effectively and how to detect and solve problems with their help. I will start with an overview and will move on discussion on specific usages with receipts and built in tools.

I hope my posts will provide you with some hands-on tools that will help you using postgres stats and will reassure you that stats aren’t as scary as they seems at first glance.
What is postgres? For people who are not familiar with postgres, it's a bunch of processes in the 'ps auxf' output. What they can see are operating system’s metrics, such as CPU usage, memory or swap consumed by these processes, but nothing more. This, however, is not sufficient to effectively troubleshoot postgres. To do that one needs to know how postgres stats are collected throughout postgres lifetime and to also be able to properly use them.

If we look inside postgres and try to understand what it consists of, we see that there are many subsystems that work together and problems in one part may cause problems in the other parts. Generally, postgres can be splitted with two abstract parts, first is the servicing clients and second is the background service operations, e.g. vacuums, checkpoints, write ahead logs. Thus slow down in background operations will negatively affect servicing clients tasks and vice versa. Postgres activity stats are used for observing, predicting or eliminating possible problems, and almost all postgres parts have specific stats which describe what's going on there.

Even provided all these strengths, stats also have a few weak points. First, is the fact that there are so many of them and one should know which source to use in each particular case. Second, almost all stats
Posted by Bruce Momjian in EnterpriseDB on 2017-02-20 at 17:15

Most people know that Postgres allows the creation of indexes on expressions. This is helpful if you need index lookups of expressions used in WHERE clauses.

However, there is another benefit to expression indexes, and that is optimizer statistics. Not only do expression indexes allow rapid lookups of matching expressions, but they also provide optimizer statistics, which improve row estimates and hence query plans. Here is an example:

INSERT INTO test SELECT x FROM generate_series(1, 100) AS t(x);
SELECT COUNT(*) FROM test WHERE x % 2 = 1;
                     QUERY PLAN
 Seq Scan on test  (cost=0.00..2.50 rows=1 width=4)
   Filter: ((x % 2) = 1)

Continue Reading »

Every so often someone needs solution to getting first (or couple of first) values for given column. Or last. For some cases (when there is not many groups) you can use recursive queries. But it's not always the best choice. Let's try to implement first() and last() aggregates, so these could be easily used by […]
Posted by Pavel Stehule on 2017-02-20 at 07:30
More, more times I had to import some XML documents to Postgres. How do it simply? More, some XML documents are not in UTF8 encoding, so some conversion and modification is necessary.

<?xml version="1.0" encoding="windows-1250"?>
<nazev>Alter Ego de Palmer</nazev>

It is not a hard work, but it is manual work, and it is terrible work, because PostgreSQL has enough functionality, but this functionality is not accessible from psql console.

I wrote simple tool pgimportdoc, that can helps with import any text, json, binary (to bytea field) or XML document to Postgres.

cat ~/Stažené/enprimeur.xml | ./pgimportdoc postgres -c 'insert into xmldata values($1)' -t XML
./pgimportdoc postgres -f ~/Stažené/enprimeur.xml -c 'insert into xmldata values($1)' -t XML

Supported formats are XML, BYTEA and TEXT. The TEXT format can be used for text, json, jsonb target formats.

Posted by Shaun M. Thomas on 2017-02-17 at 18:57

There are a lot of languages available for authoring Postgres functions, but there’s nothing quite like the the classic PL/pgSQL. It’s SQL! It’s not SQL! It’s a kind of horrifying mutant crossbreed suitable only for terrifying small children and generating complex reports from a cavalcade of dubious sources! And deep within its twisted entrails is an often overlooked feature usually only available in far more mature entities.

Just wait until they add sharding

That’s right, it’s obvious we’re referring to the ASSERT statement.

When we say “often overlooked”, we’re not kidding. The pertinent documentation offers a meager overview following a far more comprehensive summary of the RAISE statement. We thought it deserved better.

Let’s begin with a simple table example that can form the basis of a more complex function:

CREATE TABLE sensor_log (
  id            SERIAL PRIMARY KEY,
  location      VARCHAR NOT NULL,
  reading       BIGINT NOT NULL,
  reading_date  TIMESTAMP NOT NULL

INSERT INTO sensor_log (location, reading, reading_date) SELECT % 1000, % 100, CURRENT_DATE - (( * 10) || 's')::INTERVAL FROM generate_series(1, 5000000) s(id);
CREATE INDEX idx_sensor_log_date ON sensor_log (reading_date);
ANALYZE sensor_log;

What can we do with five million rows of simulated data from 1000 sensors that collect data every ten seconds? A good place to start is a summary or report table of some kind. We’re only doing cursory analysis, so we should begin with aggregating the daily average, minimum, and maximum. If we get adventurous later, we can add standard deviations, jitter, deltas, and other fancy logic report owners might find useful.

Since this type of data is easy to tabulate, we require one single table and a function to manage its contents.

CREATE TABLE sensor_report (
  id            SERIAL PRIMARY KEY,
  location      VARCHAR NOT NULL,
  reading_date  DATE NOT NULL,
  avg_reading   BIGINT NOT NULL,
  min_reading   BIGINT NOT NULL,
  max_reading   BIGINT NOT NULL

CREATE INDEX idx_sensor_report_date ON sensor_repo
Posted by Bruce Momjian in EnterpriseDB on 2017-02-17 at 16:45

So, it is 2017 and Postgres still doesn't support query hints like other relational databases? Yep, it's true, and probably will be forever since "'Oracle-style' optimizer hints" is listed in the "Features We Do Not Want" section of the Postgres TODO list. A wiki page outlines the reasons for this.

While this seems perfectly logical to people who have used Postgres for years, it strikes new users as rigid and extreme. There are several reasons for this divergence.

First, what new users don't realize is that there are ways to control the optimizer, just not with inline query hints. The most useful ways are by changing planner constants and other planner options, which allow you to tailor the optimizer to your specific hardware and query types. This is a more holistic approach to optimizer tuning because, if set properly, they improve all queries, compared to "use this index"-style hints which must be added to every query.

Continue Reading »

Microservices and NoSQL get a lot of hype, but in many cases what you really want is a relational database that simply works, and can easily scale as your application data grows. Microservices can help you split up areas of concern, but also introduce complexity and often heavy engineering work to migrate to them. Yet, there are a lot of monolithic apps out that do need to scale. If you don’t want the added complexity of microservices, but do need to continue scaling your relational database then you can with Citus. With Citus 6.1 we’re continuing to make scaling out your database even easier with all the benefits of Postgres (SQL, JSONB, PostGIS, indexes, etc.) still packed in there.

With this new release customers like Heap and Convertflow are able to scale from single node Postgres to horizontal linear scale. Citus 6.1 brings several improvements, making scaling your multi-tenant app even easier. These include:

  • Integrated reference table support
  • Tenant Isolation
  • View support on distributed tables
  • Distributed Vaccum / Analyze

All of this with the same language bindings, clients, drivers, libraries (like ActiveRecord) that Postgres already works with.

Give Citus 6.1 a try today on Citus Cloud, our fully managed database-as-a-service on top of AWS, or read on to learn more about all that’s included in this release.

Reference table support: Sharing data across tenants

Applications that are B2B fit smoothly into a model of sharding by customer. This means your customer only interacts with their own data, and all that data can be automatically co-located together–giving you all the power of SQL while still maintaining flexibility. Still in cases you may have smaller lookup or reference tables that don’t make sense to distribute. This could be something like a list of countries or an order status table. These type of tables don’t have the same large write volume as tables you’ll want to shard, but may have a relationship to them still. As of today you now have cleanly defined APIs to create these tables. To creat

Congratulations to Hans-Jürgen Schönig, who sold out his training: Detecting performance problems and fixing them. This marks one of the quickest training sales in PgConf US history.

Hans-Jürgen Schönig has been in professional PostgreSQL for over 16 years now and has dealt with customers around the globe. His company Cybertec Schönig & Schönig GmbH offers professional services for PostgreSQL, which includes training, 24x7 support, as well as consulting.

We at PgConf US are appreciative of all contributors willing to help us grow the largest PostgreSQL Conference in North America. It is quality content like this training that allows PostgreSQL to continue its upward growth. What a great community we have! Now might be the time to get your tickets!

Posted by Joshua Drake on 2017-02-15 at 18:37
At PgConf US 2016, PostgreSQL core team member Peter Eisentraut  presented: How PostgreSQL is tested. Below is the video for that presentation. Join us at the end of March to participate in knowledge craft that is rarely seen outside of the PostgreSQL ecosystem.

Posted by Bruce Momjian in EnterpriseDB on 2017-02-15 at 14:00

Postgres includes lots of monitoring tools that allow you to see what is happening, but there are a few settings that really go into detail, e.g. log_statement_stats:

SET client_min_messages = log;
SET log_statement_stats = true;
DETAIL:  ! system usage stats:
1 !       0.000000 s user, 0.000000 s system, 0.000201 s elapsed
2 !       [0.000000 s user, 0.000000 s system total]
3 !       0/0 [0/0] filesystem blocks in/out
4 !       0/14 [0/363] page faults/reclaims, 0 [0] swaps
5 !       0 [0] signals rcvd, 0/0 [0/0] messages rcvd/sent
6 !       0/0 [3/0] voluntary/involuntary context switches

Continue Reading »

Two weeks back we discovered an ancient bug in PostgreSQL which may cause index corruption when index is built via CREATE INDEX CONCURRENTLY (aka CIC). All supported releases of PostgreSQL are affected by this bug. The bug could be a source of index corruption reports that we receive from the field once in awhile.

Now before we discuss the problem, let me make it absolutely clear that PostgreSQL is known for its gold standards as far as code quality and product stability is concerned. But just like any other software, the database engine can also have unnoticed bugs. But PostgreSQL community takes every known bug very seriously and try to fix it on priority. No wonder that the bug fix went into the scheduled 9.6.2 release (and all other stable releases) at a very short notice, but not before community discussed and deliberated pros and cons of adding a crucial bug fix very close to a planned release.

To understand origin of the bug, you must first understand how CREATE INDEX CONCURRENTLY works. So if you haven’t, read my previous blog on the topic.

Relation Cache

You would also need to know about another technical detail. PostgreSQL engine internally uses something known as “Relation Cache” or relcache to store information about database objects, such as tables and indexes. These objects belong to system catalogs which are nothing more than a bunch of tables, but created in a system schema and whose presence is well known to the database engine. These catalogs are accessed very frequently since almost everything that get executed in the engine need access to the catalogs. It would be grossly damaging if these catalogs are scanned and queried like normal tables, given how frequently they are accessed. Also, catalogs don’t change too often. So information about various database objects is cached so that it can be quickly accessed without going to the database. There are variety of caches, but the cache that stores information about tables, their attributes, index columns etc is called a relcache. Caches are maint

Posted by Joshua Drake on 2017-02-14 at 18:23
Regulated Industry Summit

We just released the schedule for the Regulated Industry Summit!

The "Third Annual Regulated Industry Summit: Finance, Government, Healthcare, and PostgreSQL" is a community summit that brings together PostgreSQL contributors and enterprise developers to discuss challenges and solutions in utilizing PostgreSQL in regulated environments.

Some of the topics of the Regulated Industry Summit are:
  • Engineering cloud infrastructure for HIPAA-compliance
  • Blockchain, Distributed Security and Regulated Industries
  • Secure Technical Implementation Guide (STIG) For PostgreSQL
  • Security Enhancements in Postgres 10 

Entrance to the Regulated Industry Summit is part of conference admission, but if you are just interested in attending the Summit, you can purchase a ticket for the Regulated Industry Summit for $99.

The Largest PostgreSQL Conference in North America 

We are inching closer to March 28 - 31. This year PGConf US 2017 will feature the familiar conference format on March 29 - 31 with a mix of specialized sessions, keynotes, and the ability to connect with PostgreSQL fans at various receptions throughout the day. As with past PostgreSQL conferences, we will feature presentations from the perspective of developers, DBAs, systems administrators, business-intelligence analysts, and decisions makers.

If you haven't purchased your tickets yet, buy them before they sell out!

Get your tickets


Reach out to our sponsors for a special discount code.

For those planning to attend, don’t forget to book your room soon for the best rates as our group block is only available until March 6. We have reserved a block of rooms at the special conference rate of $236/night plus applicable taxes for attendees who make their reservations via the Westin website. You can book your room at:

PGConf US 2017 is hosted by the United States PostgreSQL Association, a nonprofit 501(c)(3) created t

A lot has been said and written about scaling PostgreSQL to many cores and subsequently to many machines. Running PostgreSQL in a server farm is something, which has not been possible for many years. However, this has definitely changed. A lot of development has gone into scaling PostgreSQL and to running a single query on […]

The post Experimenting with scaling and full parallelism in PostgreSQL appeared first on Cybertec - The PostgreSQL Database Company.

Posted by Joshua Drake on 2017-02-13 at 18:51
The following presentation took place at PgConf US 2016. Join us in 2017 to see other excellent content like this! We have a full line up and an extra day.

Posted by Bruce Momjian in EnterpriseDB on 2017-02-13 at 13:00

I often get the question, "How do I get good performance out of Postgres?," and sometimes I am kind of mifted because I get the feeling that they expect a short answer. Those who are familiar with Postgres know there is no short answer, and it really isn't Postgres's fault because there isn't a simple answer to that question for any relational database.

I guess it comes down to the complexity of relational systems. There is the SQL parser, the optimizer, and background workers. No matter how much auto-tuning we do, there are still going to be things that administrators must adjust for optimal performance. And then there is the hardware, which is stressed by all relational systems. Proper tuning of Postgres must happen at all three levels:

  • Hardware
  • Database parameter configuration
  • SQL (both data modeling and queries)

Unless all three are tuned, you can't say the system is optimized. I have already covered hardware tuning in my Database Hardware Selection Guidelines presentation. Database parameter configuration is a tough one because every site has different requirements, though there are a few common tuning cases. Tuning at the SQL level involves proper schema and index setup, plus monitoring with something like log_min_duration_statement, pg_stat_statements, or pgBadger. Once you identify a problem SQL query, you often have to do detective work with EXPLAIN.

Continue Reading »

The PostgreSQL at 10 TB And Beyond talk has now been released on Youtube. Feel free to watch.  For the folks seeing this on Planet Perl Iron Man, there is a short function which extends SQL written in Perl that runs in PostgreSQL in the final 10 minutes or so of the lecture.

This lecture discusses human and technical approaches to solving volume, velocity, and variety problems on PostgreSQL in the 10TB range on a single, non-sharded large server.

As a side but related note, I am teaching a course through Edument on the topics discussed in Sweden discussing many of the technical aspects discussed here, called Advanced PostgreSQL for Programmers.  You can book the course for the end of this month.  It will be held in Malmo, Sweden.

Here is a feature for Postgres 10 that a couple of people will find useful regarding the handling of password files:

commit: ba005f193d88a8404e81db3df223cf689d64d75e
author: Tom Lane <>
date: Tue, 24 Jan 2017 17:06:34 -0500
Allow password file name to be specified as a libpq connection parameter.

Formerly an alternate password file could only be selected via the
environment variable PGPASSFILE; now it can also be selected via a
new connection parameter "passfile", corresponding to the conventions
for most other connection parameters.  There was some concern about
this creating a security weakness, but it was agreed that that argument
was pretty thin, and there are clear use-cases for handling password
files this way.

Julian Markwort, reviewed by Fabien Coelho, some adjustments by me


Connection strings can be used to connect to a PostgreSQL instance and can be customized in many ways to decide how the client should try to connect with the backend server. The documentation offers a large list nicely documented, most of them being as well overridable using mapping environment variables listed here.

The commit above enables the possibility to override the position of a password file directly using a path, without the need of an environment variable. This is a major advantage for some class of users. For example imagine the case where Postgres is used on a host shared by many users, where trusted connections cannot be used even with local Unix domains path under the control of a specific group or user because those users rely on default paths like /tmp or default localhost (the limitation here being that pg_hba.conf assumes that “local” entries map to all local Unix domains). When creating a service that links to PostgreSQL, monitored by some higher-level application, this service may not be able to use the environment variables at its disposal to find the path to a password file. While it is necessary to har


In a previous post I discussed a way to find out which transaction dropped a table by examining the transaction log, in order to set a restore point to right before the table was dropped.

But what if we have the luxury of planning ahead (right? Well, let's call it the second time it happens?). Shouldn't we be able to log which transaction dropped a table, and use that? Of course we should.

The first thing one tries is then of course something like this in postgresql.conf:

log_line_prefix = '%t [%u@%d] <%x> '

to include the transaction id of the table. Unfortunately:

2017-02-12 12:16:39 CET [mha@postgres] <0> LOG:  statement: drop table testtable;

The 0 as a transaction id indicates that this command was run in a virtual transaction, and did not have a real transaction id. The reason for this is that the statement logging happens before the statement has actually acquired a transaction. For example, if I instead drop two tables, and do so in a transaction:

postgres=# BEGIN;
postgres=# DROP TABLE test1;
postgres=# DROP TABLE test2;
postgres=# COMMIT;

I get this interesting output:

2017-02-12 12:17:43 CET [mha@postgres] <0> LOG:  statement: DROP TABLE test1;
2017-02-12 12:17:45 CET [mha@postgres] <156960> LOG:  statement: DROP TABLE test2;

Which shows two different transaction ids (one real and one not) for statements in the same transaction. That's obviously not true - they were both dropped by transaction 156960. The transaction id just wasn't available at the time of logging.

So what can we do about that? Event triggers to the rescue!

I just co-authored a blog for IBM's Linux on Power developer site. While there is similar content in my YeSQL slides, the IBM blog explains the four options available to application developers, from pure relational to pure NoSQL, and the options in between possible with Postgres.

Posted by Andreas Scherbaum on 2017-02-10 at 20:59
Andreas 'ads' Scherbaum

FOSDEM 2017 is over, and hopefully everyone is back home. Time for a short review.

Once again, PostgreSQL ran a pre-conference to FOSDEM, called PGDay. One day just filled with awesome PostgreSQL talks, and meeting community members.

Many speakers uploaded or linked their talks to the PostgreSQL Wiki. If the slides for a talk are missing, please contact the speaker and ask them to upload or link the slides. (For the record, my slides are here)

The FOSDEM video team already uploaded the recordings, and the PostgreSQL Volunteers reviewed every single video. Find them here. Devrim also did short recordings of most of the talks, and posted them on Twitter.

The Hacker Public Radio did interviews with many projects, including PostgreSQL. Find the interview here.

The PostgreSQL Project had a Devroom on Sunday. The schedule is part of the regular PGDay/FOSDEM schedule. And of course we managed to fill the room to the last seat, standing room only. If only people would obey the "Room is full" sign on the door ;-)

We also had a booth (pic 2, pic 3) in the main building, where you can talk with community members about anything around PostgreSQL. Also we had some nice merchandise. Matching the bad weather, the umbrellas went very well. We promise to have more next year.

Last but not least, Slonik visited FOSDEM. Here is just a short selection of pictures.

Hope to see you all next year!


P.S.: The next PostgreSQL Europe Conference will be in Warsaw, in Orctober. Details can be found here.

Posted by Shaun M. Thomas on 2017-02-10 at 20:14

Recently we switched the PostgreSQL Buildfarm to use https for its front-facing web transactions. That’s worked pretty well, but one thing got broken in the change. The registration page uses a Captcha to help avoid web spam, which has been a problem in the past. However, the code to fetch the relevant html was still getting plain http, and thus browsers were refusing to display the Captcha because it came from an untrusted source. Without the Captcha the registration was rejected. This was discovered yesterday, and has now been fixed. The Captcha code now uses https. If you have recently found a buildfarm registration rejected like this, please resubmit it.

Posted by Jason Owen on 2017-02-09 at 08:00

In order to set a strong password for the PostgreSQL database I provisioned on Amazon RDS, I looked up the limits. In my case, there are two sources of constraints:

  1. Amazon RDS limits
    • Must contain 8 to 128 characters
    • The password for the master database user can be any printable ASCII character except /, `, or @.
  2. Characters allowed in Amazon Lambda environment variables
    • Member must satisfy regular expression pattern: [^,]* (I cannot find documentation for this, except the error message when you try to save a value that has a comma in it.)

We can generate a password that meets these restrictions with makepasswd(1):

$ makepasswd --chars=128 \ --string

Note the '\' at the end: that means "close the single-quoted string, and append an escaped single-quote."

You can then save this to your ~/.pgpass file, being sure to escape \ and : characters:

$ sed -e 's/\\/\\\\/g;s/:/\\:/g'

I am building a web application that delegates authentication to a third party. Once the third party authenticates the user, the app create a session for the user - and maybe create the user, too, if they don't already exist!

My first draft of this had all the SQL queries in the code. The logic is something like:

does user exist?
  yes: create session
  no: create user, then create session

I wasn't very happy with the code, for a couple of reasons. First, the SQL queries were rather ugly string constants. Multi-line strings aren't really great in any language, and embedding SQL in another language's source file makes it harder for editors to do syntax highlighting. Second, handling errors and encoding the (fairly simple) logic above was obscuring my intent.

Stored procedures are a way to keep database logic in the database. Among other benefits, this can dramatically simplify the calling code.

I ended up with a function like the following:

CREATE FUNCTION create_session(
  external_user_id bigint,
  external_user_name text,
  OUT session_id uuid,
  OUT session_expiration TIMESTAMP WITH TIME ZONE
) AS $$
  existing_user_id INTEGER;
  new_user_id INTEGER;
  SELECT INTO existing_user_id user_id
    FROM users_external
    WHERE users_external.external_user_id = external_user_id;

  IF existing_user_id IS NULL THEN
    INSERT INTO users_external (external_user_id, external_user_name)
      VALUES (external_user_id, external_user_name)
      RETURNING user_id
      INTO new_user_id;
    INSERT INTO sessions (user_id)
      VALUES (new_user_id)
      RETURNING session_id, session_expiration
      INTO session_id, session_expiration;
    INSERT INTO sessions (user_id)
      VALUES (existing_user_id)
      RETURNING session_id, expires
      INTO session_id, session_expiration;
$$ LANGUAGE plpgsql;

This is a syntactically correct function and will be accepted by PostgreSQL, but fails at runtime:

> select * from create_session(12345, 'example');
ERROR:  column reference "externa

This is part of a series of blogs on Java & PostgreSQL. For links to other blogs in this series, please scroll to the end of this post.


Starting v9.2, PostgreSQL is providing native data type support for JSON objects. Subsequent releases introduced JSONB (binary formatted JSON objects) and many data manipulation functions for JSONs, making it a very powerful tool for NoSQL operations.

String manipulation and parsing are very expensive operations in a database, so although you could have potentially stored JSON objects in strings in PostgreSQL before, introduction of the native data type has taken away overheads and made throughput a lot faster for JSON manipulation.


JSONB was introduced as a native data type in v9.4 of PostgreSQL and it stores JSON objects in binary format. Major differences between JSON & JSONB are highlighted in the table below:





Stores data in text format

Stores data in decomposed binary format


Input is fast, as no conversion are required

Input is slightly slower, as there is an overhead related to binary conversion


Processing functions must re-parse the data on each execution

Re-parsing is not needed, making data processing significantly faster


All white space and line feeds in the input are preserved as-is

Extra white space and line feeds are stripped


Indexing is not supported

Indexing is supported


Duplicate keys are retained, processing functions only consider the last value

Duplicate keys are purged at input, only the last value is stored


Order of the keys is preserved

Order is not preserved

JSON data definition

A JSON column is created just like any other data type. We create a table ‘sales’ below (which we will use in subsequent examples) containing 2 columns, ‘id’ and ‘sale’, with the latter being a JSON:

json_sample=# CREATE TABLE sales (id INT, sale JSON); 

JSON data insertion

The JSON data type checks for a valid JSON format, so insert statements should be mindf

Andreas 'ads' Scherbaum

I had the opportunity to talk about the many datatypes in PostgreSQL, in the PostgreSQL Devroom at FOSDEM 2017. It's quite interesting how many people only use the basic data types and don't know the broad range of types which PostgreSQL is offering.

My slides are available here: *click*