PostgreSQL
The world's most advanced open source database
Top posters
Number of posts in the past month
Top teams
Number of posts in the past month
Feeds
Twitter
Planet
  • Policy for being listed on Planet PostgreSQL.
  • Add your blog to Planet PostgreSQL.
  • List of all subscribed blogs.
  • Manage your registration.
Contact
  • Get in touch with the Planet PostgreSQL administrators at planet at postgresql.org.
Posted by Shaun M. Thomas on 2015-04-24 at 16:48:06

As a DBA, I strive not to live in an isolated ivory tower, away from the developers that are destined to fill our databases with volumes of data from a myriad of applications. It’s important, I think, to occasionally come back to the basics. So I’d like to discuss one of the core elements that PGDB DBAs might be intimately familiar with, but comes up often enough that some clarification is warranted. How does PostgreSQL store data? The answer may surprise you, and is critical to understand before regularly interacting with a PGDB system.

The storage engine PGDB uses is called Multi Version Concurrency Control, or MVCC for short. There is excellent documentation of the subject, along with further resources that attempt to convey a thorough understanding, and succeed in that regard. I strongly encourage anyone who works with PostgreSQL in any context, to read these until they can recite the basics from memory. It is the very bedrock of the database.

But we need an executive summary, so let’s start with an extremely simple table with an id and generic column:

ID Stuff
11 foo
21 bar
31 baz

Let’s say that some time in the future, we update ID 2. This would be the table at that point:

ID Stuff
11 foo
21 bar
31 baz
218 moo

Notice that there are now two copies of the row for ID 2. How is it possible for PGDB to differentiate between them? Which one is “correct”?

To answer those two questions, I’ll need a short aside to explain transaction counters. PGDB keeps a counter that it assigns to all database activities, regardless of origin. Because these numbers can’t cycle endlessly, it currently wraps after two billion values have been used. It then applies an “age” algorithm to produce an abstract representation of data antiquity. For the purposes of this discussion, the subscript in the examples represents the transaction age.

Every action gets a transaction ID, and every transaction ID acts like a snapshot because it provides a stable comparison point. Even if you don’t explicitly call BEGIN T

[continue reading]

On 5th of April, Simon Riggs committed patch: Reduce lock levels of some trigger DDL and add FKs   Reduce lock levels to ShareRowExclusive for the following SQL CREATE TRIGGER (but not DROP or ALTER) ALTER TABLE ENABLE TRIGGER ALTER TABLE DISABLE TRIGGER ALTER TABLE … ADD CONSTRAINT FOREIGN KEY   Original work by Simon […]
The PUG meeting was good. We now have a consistent if small group that are attending. Before the presentation we spoke about possibly moving the group to meetup to get a little better visibility. G+ Communities are awesome but Meetup seems to be where the people in the area look.

The presentation was provided by Eric Worden who happens to be a CMD employee. The talk overall is very good and provided a lot of information that I didn't know about dates. It also lead to the development of a new PostgreSQL extension (more on that at a later time).

The most interesting part of the talk to me was the use of a dimensions table to make date queries much, much faster. Either he or I will be submitting a blog post on that feature alone.

If you are interested in seeing what he has to say you can visit us at the Whatcom PgDay being hosted at LinuxFestNorthwest this weekend!

robotIn the second part of the Automating Barman with Puppet series we configured, via Puppet, two virtual machines: a PostgreSQL server and a Barman server to back it up. However, human intervention was required to perform the SSH key exchange and most of the manifest was written to allow the servers to access each other. In this third and final part of the series, we will look at how to configure a third VM that will act as the Puppet Master and use it to simplify the configuration of PostgreSQL and Barman. 

The entire code of this tutorial is on GitHub at https://github.com/2ndquadrant-it/vagrant-puppet-barman.

Configuring the Puppet Master: Vagrant

First, change the Vagrantfile to boot a third VM, called “puppet”, which will be our Puppet Master. To ensure that the machine is instantly accessible by the Puppet agents present on each VM, we add a “puppet” entry in the /etc/hosts file with the first script we run. We need also to enable the Puppet agent, as Debian-like distributions disable it by default.
Finally, within the Vagrantfile, let’s make a distinction between master and agents. The master will initially load its configuration straight from the manifest files, then the agents running on each host will apply the configuration sent from the master. Agents will also send data back to the master allowing other nodes to use it to build their configuration. For this reason, an agent is also set to run on the master.

The Vagrantfile is as follows:

Vagrant.configure("2") do |config|
  {
    :puppet => {
      :ip      => '192.168.56.220',
      :box     => 'ubuntu/trusty64',
      :role    => 'master'
    },
    :pg => {
      :ip      => '192.168.56.221',
      :box     => 'ubuntu/trusty64',
      :role    => 'agent'
    },
    :backup => {
      :ip      => '192.168.56.222',
      :box     => 'ubuntu/trusty64',
      :role    => 'agent'
    }
  }.each do |name,cfg|
    config.vm.define name do |local|
      local.vm.box = cfg[:box]
      local.vm.hostname = name.to_s + '.local.lan'
      local.vm.network :

[continue reading]

Similarly to the post of a couple of weeks back relating about the new memory allocation routine able to give a plan B route in case of OOM, here is a follow-up commit adding more infrastructure in the same area but this time for some widely-used memory allocation routines:

commit: 8c8a886268dfa616193dadc98e44e0715f884614
author: Fujii Masao <fujii@postgresql.org>
date: Fri, 3 Apr 2015 17:36:12 +0900
Add palloc_extended for frontend and backend.

This commit also adds pg_malloc_extended for frontend. These interfaces
can be used to control at a lower level memory allocation using an interface
similar to MemoryContextAllocExtended. For example, the callers can specify
MCXT_ALLOC_NO_OOM if they want to suppress the "out of memory" error while
allocating the memory and handle a NULL return value.

Michael Paquier, reviewed by me.

palloc_extended() is an equivalent of palloc() that operates on CurrentMemoryContext (understand by that the current memory context a process is using) with a set of flags, named the same way for frontend and backend:

  • MCXT_ALLOC_HUGE for allocations larger than 1GB. This flag has an effect on backend-side only, frontend routines using directly malloc.
  • MCXT_ALLOC_ZERO for zero allocation.
  • MCXT_ALLOC_NO_OOM to bypass an ERROR message in case of an out-of-memory and return NULL to the caller instead. This is the real meat. In the case of frontends, not using this flag results in leaving with exit(1) immediately.

The advantage of this routine is that it is made available for both frontends and backends, so when sharing code between both things, like xlogreader.c used by pg_xlogdump and pg_rewind on frontend-side and by Postgres backend, consistent code can be used for everything, making maintenance far easier.

A last thing to note is the addition of pg_malloc_extended(), which is available only for frontends, which is a natural extension similar to what already exists for pg_malloc0(), pg_realloc().

Posted by Joshua Drake in CommandPrompt on 2015-04-20 at 18:20:25
Saturday the 18th of April, I woke up to the following:

It was one of those moments that you realize just how blessed of a life you have. A moment where you stop and realize that you must have done something right, at least once. I was with my all of my ladies, there were no other people at the camp site, the weather was clear and it was set to hit 68F. The only sound was the gentle lapping of water and the occasional goose.

It was at this time that I was able to finally take a step back and reflect on PgConf.US. This conference meant a lot to me professionally. I didn't organize it. I only spoke at it, Command Prompt sponsored, and I occasionally offered feedback to the conference committee (as it is run by PgUS) via PgUS board meetings. This conference has become everything (and more) I tried to make the United States PostgreSQL Conference series. It is a conference of the best, the brightest and most importantly the users of our beloved elephant. In the United States, it is "The" PostgreSQL Conference.

That isn't to say there aren't other PostgreSQL conferences in the states. There are at least two others that run, but somehow after this latest conference I feel they are destined to niche attendance. There is nothing wrong with that and frankly we need the niche conferences. They fill a hole, else people wouldn't attend. It is just that experiencing the level of effort and greatness that was created by Jonathan and Jim was truly something awesome to behold.

They aren't paid to run these conferences. They do it because they want to help our community. They do it for free and speaking as somebody who has run more PostgreSQL conferences than any other current PostgreSQL conference organizer in the U.S., it is a thankless job. It is a volunteer effort that everyone should take a moment to thank them for. Jonathan, Jimmy: Thank you for all your efforts. The community could not have had such a great conference without you.

I have only two constructive feedback points for the organizers:

  1. Do not allow tutorials base

[continue reading]

I previously blogged about NoSQL support in PostgreSQL and then later wrote a tutorial on using JSON data type. Today, I will be talking about another angle of NoSQL in PostgreSQL, handling key-value data using the HSTORE contrib module.

HSTORE was introduced back in 2006 as part of PostgreSQL 8.2, and I doubt the authors had any idea at the time how NoSQL popularity will skyrocket in the coming years. Using HSTORE, developers can emulate a schema-less semi-structured data store while staying within the fully ACID-compliant domain of PostgreSQL, a relational database.

Without further ado, here is how some basic HSTORE functionality can be handled …

Create the extension

The very first step is to create the HSTORE extension, essentially loading the contrib module to your PostgreSQL instance.

hstore_db=# CREATE EXTENSION hstore;
CREATE EXTENSION

Create a table with HSTORE data type

Once the extension is loaded, creating a table with a column of HSTORE data type is fairly straightforward. The syntax is exactly the same as any native data type.

hstore_db=# CREATE TABLE hstore_test ( id SERIAL, sale HSTORE );
CREATE TABLE

Insert data into HSTORE column

You can use the ‘=>’operator to assign values to keys while you insert data.

hstore_db=# INSERT INTO hstore_test (sale) 
            VALUES ('"milk"=>"4", 
                     "bread"=>"2", 
                     "bananas"=>"12", 
                     "cereal"=>"1"');
INSERT 0 1

Retrieve data from an HSTORE column

HSTORE data can be retrieved like any other native PostgreSQL data type.

hstore_db=# SELECT sale FROM hstore_test;
                           sale
-----------------------------------------------------------
"milk"=>"4", "bread"=>"2", "cereal"=>"1", "bananas"=>"12"
(1 row)

Retrieve value for a particular key

HSTORE provides the ‘->’ operator to retrieve value of a particular key from the table.

hstore_db=# SELECT sale -> 'bread' AS quantity FROM hstore_test;
quantity
----------
2
(1 row)

Use value of a key in the WHERE cl

[continue reading]

Posted by Andrew Dunstan in pgExperts on 2015-04-18 at 15:22:00
Unfortunately there was a small bug in yesterday's buildfarm client release. The bug only affects MSVC builds, which would fail silently on the HEAD (master) branch.

There is a bug fix release available at http://www.pgbuildfarm.org/downloads/releases/build-farm-4_15_1.tgz or you can just pick up the fixed version of run_build.pl (the only thing changed) at https://raw.githubusercontent.com/PGBuildFarm/client-code/b80efc68c35ef8a1ced37b57b3d19a98b8ae5dd2/run_build.pl

Sorry for the inconvenience.
We all know that \copy command does not return anything when you load the data. The idea is to capture how many # of records got loaded into table through \copy command.
Here's a shell script that should work:
echo number of rows in input: $(wc -l data.in)
( echo "\copy test from stdin delimiter '|';" ; cat data.in ) | psql -v ON_ERROR_STOP=1
echo psql exit code $?

If the exit code printed is 0, everything went well, and the value printed by the first echo can be used to to indicate how many rows were inserted. If the printed exit code is non-zero, no rows were inserted, of course. If the exit code printed is 3 then the data being copied had some error.

From the docs: If the exit code printed is 1 or 2 then something went wrong in psql (like it ran out of memory) or the server connection was broken, respectively. Following facts play a role in the above script:

.) COPY (and hence \copy) expects the input records to be terminated by a newline. So counting the number of newlines in the input is a reliable way of counting the records inserted.
.) psql will exit with code 3 iff there's an error in script and ON_ERROR_STOP is set. 
Note: This seems to not apply to the `psql -c "sql command"` construct.

# Example clean input

$ pgsql -c "create table test(a text,b int);"
CREATE TABLE
$ cat data.in
column1|2
column1|2
column1|2
column1|2
column1|2
column1|2

$ echo number of rows in input: $(wc -l data.in); ( echo "\copy test from stdin delimiter '|';" ; cat data.in ) | psql -v ON_ERROR_STOP=1 ; echo psql exit code $?
number of rows in input: 6 data.in
psql exit code 0

# Example malformed input
$ cat data.in
column1|2
column1|2
column1|2c
column1|2
column1|2
column1|2

$ echo number of rows in input: $(wc -l data.in); ( echo "\copy test from stdin delimiter '|';" ; cat data.in ) | pgsql -v ON_ERROR_STOP=1 ; echo psql exit code $?
number of rows in input: 6 data.in
ERROR: invalid input syntax for integer: "2c"
CONTEXT: COPY test, line 3, column b: "2c"
psql exit code 3
 
I hope this helps someone.
(Baji is trying to impress 'X')
==========
Baji: Packt Publishing has published a book on troubleshooting PostgreSQL database.
 _X_: Uh, so what(!?). It published other 4 PostgreSQL books this year !
Baji: yeah, I know !
 _X_: then why do you care about thisssss.
Baji: I should care about it as I was part of technical reviewing team.. :(
 _X_: Oh really !, thats fantastic.. Congratulations !
==========

Note: Finally, Baji impressed _X_ :-)

Ok, in reality, I am glad to announce that "My first book as a Technical Reviewer has been published by Packt Publishing" ;-)

https://www.packtpub.com/big-data-and-business-intelligence/troubleshooting-postgresql
http://my.safaribooksonline.com/book/databases/postgresql/9781783555314/troubleshooting-postgresql/pr02_html

Author of this book is Hans-Jürgen Schönig, he has couple of other PostgreSQL Books as well.

This book is to provide a series of valuable troubleshooting solutions to database administrators responsible for maintaining a PostgreSQL database. It is aimed at PostgreSQL administrators who have developed an application with PostgreSQL, and need solutions to common administration problems they encounter when managing a database instance. So give a try ;-)

I would like to thank my loving parents for everything they did for me. Personal time always belongs to family, and I did this in my personal time.

I want to thank the Packt Publishing for giving me this opportunity and thanks to Sanchita Mandal and Paushali Desai for choosing me and working with me for this project.

Last but not least, would like to thanks Dinesh Kumar who taught me PostgreSQL and inspiring me for this. :)
Posted by Amit Kapila in EnterpriseDB on 2015-04-18 at 03:38:00

I have ran some benchmark tests to see the Write performance/scalability in 
PostgreSQL 9.5 and thought it would be good to share the same with others,
so writing this blog post.

I have ran a pgbench tests (TPC-B (sort of) load) to compare the performance
difference between different modes and scale factor in HEAD (e5f455f5) on
IBM POWER-8 having 24 cores, 192 hardware threads, 492GB RAM
and here are the performance result





























Some of the default settings used in all the tests are:
min_wal_size=15GB
max_wal_size=20GB
checkpoint_timeout    =35min
maintenance_work_mem = 1GB
checkpoint_completion_target = 0.9
autovacuum=off

I have kept auto vacuum as off to reduce the fluctuation due to same and is
dropping and re-creating the database after each run.  I have kept high values
of min_wal_size and max_wal_size to reduce the effect of checkpoints, probably
somewhat lower values could have served the purpose of this workload, but I
haven't tried it.

The data is mainly taken for 2 kind of modes (synchronous_commit = on | off) and
at 2 different scale factors to cover the cases when all the data fits in shared buffers
(scale_factor = 300) and when all the data can't fit in shared buffers, but can fit in
RAM (scale_factor = 3000).

First lets talk about synchronous_commit = off case, here when all the data fits in
shared_buffers (scale_factor = 300), we can see the scalability upto 64 client count
with TPS being approximately 75 percent higher at 64 client-count as compare to 8
client count which doesn't look bad. When all the data doesn't fit in shared buffers,
but fit in RAM (scale_factor = 3000), we can see scalability upto 32 client-count with
TPS being 64 percent higher than at 8 client-count and then it falls there on.

One major difference in case of Writes when data doesn't fit in shared_buffers is
that backends performing transactions needs to write the dirty buffers themselves
when they are not able to find a clean buffer to read the page, this can hamper
the TPS.

Now let's talk about synch

[continue reading]

PGDB has had anonymous blocks since the release of 9.0 in late 2010. But it must either be one of those features that got lost in the shuffle, or is otherwise considered too advanced, because I rarely see it used in the wild. If that’s the case, it’s a great shame considering the raw power it conveys. Without committing to a function, we can essentially execute any code in the database, with or without SQL input.

Why is that good? One potential element of overhead when communicating with a database is network transfer. If processing millions of rows, forcing PGDB to allocate and push those results over the network will be much slower than manipulating them locally within the database itself. However, the looming specter of ad-hoc scripts is always a threat as well.

It was the latter scenario that prompted this particular discussion. A few weeks ago, I addressed date-based constraints and how they’re easy to get wrong. Knowing this, there’s a good chance we have objects in our database that need revision in order to operate properly. In one particular instance, I needed to correct over 800 existing check constraints an automated system built over the last year.

I hope you can imagine that’s not something I would want to do by hand. So it was a great opportunity to invoke an anonymous block, because there’s very little chance I’d need to do this regularly enough to justify a fully-fledged function. In the end, I came up with something like this:

DO $$
DECLARE
  chk TEXT;
  col TEXT;
  edate DATE;
  sdate DATE;
  tab TEXT;
  ym TEXT;
BEGIN
  FOR tab, chk, col IN 
      SELECT i.inhrelid::REGCLASS::TEXT AS tab,
             co.conname AS cname,
             substring(co.consrc FROM '\w+') AS col
        FROM pg_inherits i
        JOIN pg_constraint co ON (co.conrelid = i.inhrelid)
       WHERE co.contype = 'c'
  LOOP
    ym := substring(tab FROM '......$');
    sdate := to_date(ym, 'YYYYMM01');
    edate := sdate + INTERVAL '1 mon';

    EXECUTE 'ALTER TABLE ' || tab || ' DROP CONSTRAINT ' ||
        quote_ident(ch

[continue reading]

Posted by Andrew Dunstan in pgExperts on 2015-04-17 at 14:34:00
I have just released version 4.15 of the PostgreSQL Buildfarm Client. Here's what's changed:
  • support the new location for pg_upgrade
  • support running tests of client programs
  • support building, installing and running testmodules
  • use a default ccache directory
  • improve logging when running pg_upgrade tests
  • handle odd location of Python3 regression tests
  • add timestamp to default log_line_prefix
  • make qw() errors in the config file fatal (helps detect errors)
  • minor bug fixes for web script settings.
  • allow for using linked git directories in non-master branches
The last item might need a little explanation.  Essentially this can reduce quite dramatically the amount of space required if you are building on more than one branch. Instead of keeping, say, 6 checked out repos for the current six tested branches, we keep one and link all the others to it. This works almost exactly the way git-new-workdir does (I stole the logic from there). This doesn't work in a couple of situations: if you are using Windows or if you are using git-reference. In these cases the new setting is simply ignored.

To enable this new setting in an existing installation, do the following after installing the new release:
  • in your config file, add this setting:
    git_use_workdirs => 1,
  • remove the pgsql directory in each branch directory other than HEAD
Another good thing to do in existing installations would be to add "%m" to the beginning of the log_line_prefix setting in extra_config stanza.

Enjoy!
Posted by Peter Eisentraut on 2015-04-17 at 00:00:00

About two months ago, this happened:

And a few hours later:

It took a few more hours and days after this to refine some details, but I have now tagged the first release of this extension. Give it a try and let me know what you think. Bug reports and feature requests are welcome.

(I chose to name the data type uri instead of url, as originally suggested, because that is more correct and matches what the parsing library calls it. One could create a domain if one prefers the other name or if one wants to restrict the values to certain kinds of URIs or URLs.)

(If you are interested in storing email addresses, here is an idea.)

Posted by Andreas 'ads' Scherbaum on 2015-04-16 at 23:30:13

As Josh posted before, the 2nd South Bay PostgreSQL Meetup will take place at Adobe in San Jose, on April 28th.

Happy to announce that CK Tan from VitesseData will speak about how Vitesse DB speeds up analytics queries in PostgreSQL. Heikki Linnakangas will speak about pg_rewind.

Please register for the event, only registered users will be allowed into the building by security.

Also if you want to have Wifi access, please send an email to mustain@adobe.com not later than the 24th, please include "SBPUG" in the subject.

See you there!

Posted by Josh Berkus in pgExperts on 2015-04-16 at 19:07:00
So ... you're using some of 9.4's new advanced aggregates, including FILTER and WITHIN GROUP.  You want to take some statistical samples of your data, including median, mode, and a count of validated rows.  However, your incoming data is floats and you want to store the samples as INTs, since the source data is actually whole numbers.  Also, COUNT(*) returns BIGINT by default, and you want to round it to INT as well.  So you do this:

    SELECT
        device_id,
        count(*)::INT as present,
        count(*)::INT FILTER (WHERE valid) as valid_count,
        mode()::INT WITHIN GROUP (order by val) as mode,
        percentile_disc(0.5)::INT WITHIN GROUP (order by val)
          as median
    FROM dataflow_0913
    GROUP BY device_id
    ORDER BY device_id;


And you get this unhelpful error message:

    ERROR:  syntax error at or near "FILTER"
    LINE 4:         count(*)::INT FILTER (WHERE valid)
            as valid_count,


And your first thought is that you're not using 9.4, or you got the filter clause wrong.  But that's not the problem.  The problem is that "aggregate() FILTER (where clause)" is a syntactical unit, and cannot be broken up by other expressions.  Hence the syntax error.  The correct expression is this one, with parens around the whole expression and then a cast to INT:

    SELECT
        device_id,
        count(*)::INT as present,
        (count(*) FILTER (WHERE valid))::INT as valid_count,
        (mode() WITHIN GROUP (order by val))::INT as mode,
        (percentile_disc(0.5) WITHIN GROUP (order by val))::INT
           as median
    FROM dataflow_0913
    GROUP BY device_id
    ORDER BY device_id;


If you don't understand this, and you use calculated expressions, you can get a worse result: one which does not produce and error but is nevertheless wrong.  For example, imagine that we were, for some dumb reason, calculating our own average over validated rows.  We might do this:

    SELECT
        device_id,
        sum(val)/count(*) FILTER (WHERE valid) as avg
    FROM dataflow_0913
  &nbs

[continue reading]

Author
Andreas 'ads' Scherbaum

PGConf.de 2015 is the sequel of the highly successful German-speaking PostgreSQL Conferences 2011 and 2013. Due to space limitations in the old location, we are moving to Hamburg. The conference takes place on Friday, November 27th. We also add a day with trainings on the 26th.

http://2015.pgconf.de/

Registration for the conference will be possible well in advance. Tickets must be purchased online. For sponsors, we have put together a package that includes among other things, a number of discounted ticket. More in the Call for Sponsors in a separate announcement.

Yesterday the pg_upgrade program was moved from contrib to bin in the source tree. Unfortunately this broke most of those buildfarm members which check pg_upgrade. There is a hot fix for the TestUpgrade buildfarm module that can be downloaded from github. I will work on cutting a new buildfarm release in the next few days, but this file can just be dropped in place on existing installations.

log_autovacuum_min_duration is a system-wide parameter controlling a threshold from which autovacuum activity is logged in the system logs. Every person who has already worked on looking at a system where a given set of table is bloated has for sure already been annoyed by the fact that even a high value of log_autovacuum_min_duration offers no guarantee in reducing log spams of not-much-bloated tables whose autovacuum runtime takes more than the threshold value, making its activity being logged (and this is after working on such a lambda system that the author of this feature wrote a patch for it). Postgres 9.5 is coming with a new feature allowing to control this logging threshold at relation level, feature introduced by this commit:

commit: 4ff695b17d32a9c330952192dbc789d31a5e2f5e
author: Alvaro Herrera <alvherre@alvh.no-ip.org>
date: Fri, 3 Apr 2015 11:55:50 -0300
Add log_min_autovacuum_duration per-table option

This is useful to control autovacuum log volume, for situations where
monitoring only a set of tables is necessary.

Author: Michael Paquier
Reviewed by: A team led by Naoya Anzai (also including Akira Kurosawa,
Taiki Kondo, Huong Dangminh), Fujii Masao.

This parameter can be set via CREATE TABLE or ALTER TABLE, with default value being the one defined by the equivalent parameter at server-level, like that for example:

=# CREATE TABLE vac_table (a int) WITH (log_autovacuum_min_duration = 100);
CREATE TABLE
=# ALTER TABLE vac_table SET (log_autovacuum_min_duration = 200);
ALTER TABLE

Note that This parameter has no unit and cannot use any units like the other relation-level options, and it has a default unit of milliseconds, so after CREATE TABLE the autovacuum activity of relation vac_table is logged if its run has taken more than 100ms, and 200ms after ALTER TABLE.

Thinking wider, there are two basically cases where this parameter is useful, an inclusive and an exclusive case:

  • when system-wide log_autovacuum_min_duration is -1, meaning that all the autovacuum activity is ignored for all th

[continue reading]

Posted by Rajeev Rastogi on 2015-04-14 at 05:47:00
I recently just got back from Indian PGDay conference 2015. It was an interesting, motivating and lot of knowledge sharing in terms of both attending and speaking at the conference.

I spoke about the various kind of "Optimizer Hint" provided by many database engines and also a new idea of "Hint", which can be provided to Optimizer. Some of the speakers shared their work on PostgreSQL as User.
Also it was interesting to know that many companies are evaluating migration or are in process of migrating from other DBs to PostgreSQL. This is really encouraging for all PostgreSQL experts.



Some of the details from presentation are as below (For complete presentation please visit Full Optimizer Hint)

Statistics Hint:

Statistics Hint is used to provide any kind of possible statistics related to query, which can be used by optimizer to yield the even better plan compare to what it would have done otherwise.
Since most of the databases stores statistics for a particular column or relation but doesn't store statistics related to join of column or relation. Rather these databases just multiply the statistics of individual column/relation to get the statistics of join, which may not be always correct.

Example:
Lets say there is query as
SELECT * FROM EMPLOYEE WHERE GRADE>5 AND SALARY > 10000;

If we calculate independent stats for a and b.
suppose sel(GRADE) = .2 and sel(SALARY) = .2;

then sel (GRADE and SALARY) =
sel(GRADE) * sel (SALARY) = .04.
 
In all practical cases if we see, these two components will be highly dependent i.e. if first column satisfy,second column will also satisfy. Then in that case sel (GRADE and SALARY) should be .2 not .04. But current optimizer will be incorrect in this case and may give wrong plan.

Data Hint:

This kind of hints provides the information about the relationship/ dependency among relations or column to influence the plan instead of directly hinting to provide desired plan or direct selectivity value. Optimizer can consider dependency information to derive the actual selectivi

[continue reading]

Posted by Josh Berkus on 2015-04-13 at 18:24:03

The South Bay PUG is having a meetup on April 28th. Speakers will include CK Tan of PostgreSQL enhancement company Vitesse Data, and Heikki Linnakangas, PostgreSQL Committer. We do not expect to have live video for this meetup, sorry!

RSVP on the Meetup Page.

For 2015, once again, Gartner’s Magic Quadrant for Business Intelligence and Analytics Platforms ranks Tableau pretty much at the top. How powerful would it be to combine Tableau with the world’s most advanced open source database, PostgreSQL, to create reports and analytics for businesses? This HOWTO walks you through the process of connecting Tableau with PostgreSQL and creating a simple report.

This HOWTO is based on the following configuration:

  • Windows 7
  • Tableau 9 Desktop
  • PostgreSQL 9.4.1

You can download the database used in this post from here: Database SQL (1 download ) .

Let’s create our first report!

Connecting to PostgreSQL

1) Launch Tableau. Click ‘More Servers …’ and select PostgreSQL from the menu as illustrated in the following screenshot:

 

2) Fill in the PostgreSQL connection properties and hit OK. If you don’t already have the required connection libraries installed, you will get an error as seen in following screenshot.

 

3) The (rather helpful) error dialog provides a link to download the libraries, click on it (requires working internet connection). This should take you to the driver’s section on Tableau’s official website. Locate PostgreSQL and download the corresponding setup file. See following screenshot for reference:

 

4) Run the downloaded file (may require administrator privileges). This will setup ODBC drivers and all system configurations required for PostgreSQL connectivity. Once you have completed setup, run Tableau Desktop again and connect to the PostgreSQL database you downloaded before the 1st step.

Creating a simple report

1) Once connected, you’ll find yourself on the Data Source tab. It lists the server, database and the tables as shown in the screen below.

 

2) You can drag and drop the tables you want to use. Alternatively, you can write a custom SQL for the required dataset. For the purpose of this blog, I have dragged and dropped ‘sales’ table. You can take a peek at a preview of the dataset in the result section of the window. Please note that you may have to c

[continue reading]

Posted by David E. Wheeler on 2015-04-11 at 04:34:08

Looks like it’s been close to two years since my last post on the PGXN blog. Apologies for that. I’ve thought for a while maybe I should organize an “extension of the week” series or something. Would there be interest in such a thing?

Meanwhile, I’m finally getting back to posting to report on a fun thing you can now do with your PGXN distributions. Thanks to the Version Badge service from the nice folks at Gemfury, you can badge your distributions! Badges look like this:

PGXN version

You’ve no doubt seem simlar badges for Ruby, Perl, and Python modules. Now the fun comes to PGXN. Want in? Assuming you have a disribution named pgfoo, just put code like this into the README file:

[![PGXN version](https://badge.fury.io/pg/pgfoo.svg)](https://badge.fury.io/pg/pgfoo)

This is Markdown format; use the syntax appropriate to your preferred README format to get the badg to show up on GitHub and PGXN.

That’s it! The badge will show the current releases version on PGXN, and the button will link through to PGXN.

Use Travis CI? You can badge your build status, too, as I’ve done for pgTAP, like this:

Build Status

[![Build Status](https://travis-ci.org/theory/pgtap.png)](https://travis-ci.org/theory/pgtap)

Coveralls provides patches, too. I’ve used them for Sqitch, though I’ve not yet taken the time figure out how to do coverage testing with PostgreSQL extensions. If you have, you can badge your current coverage like so:

Coverage Status

[![Coverage Status](https://coveralls.io/repos/theory/sqitch/badge.svg)](https://coveralls.io/r/theory/sqitch)

So get badging, and show off your PGXN distributions GitHub and elsewhere!

Fairly recently, a friend of mine presented a problem he wanted to solve with some JSON he had in a table. After he presented the end result he was trying to reach, I made the assumption that this would be pretty easy to do. But then I looked at the JSON Functions to try and find that quick fix. Though I read extensively and used rather liberal interpretations of the functions, there’s no way to directly manipulate JSON object contents with PostgreSQL.

Wait! Before you start yelling at me for being an idiot, I know what you’re going to say. I thought the same thing… at first. Go ahead and look, though. As of PostgreSQL 9.4, there is no built-in functionality to add or remove JSON elements without one or more intermediate transformation steps through PostgreSQL arrays or records. But that isn’t necessarily unexpected. Why?

Because PostgreSQL is a database. Its primary purpose is to store data and subsequently extract and view it. From this perspective, there’s no reason for PostgreSQL to have an entire library of JSON modification functions or operators. Regardless of this however, actions such as data merges and bulk updates still need to be possible. Yet all other fields allow a single update statement to append information, or otherwise perform a native calculation to replace the value in-line. There must be a way to do this with JSON too, without jumping through too many burning hoops.

Luckily there is, but it does require some preliminary work. Let’s start with a simple JSON document, as seen by PostgreSQL:

SELECT '{"Hairy": true, "Smelly": false}'::JSON;

               json
----------------------------------
 {"Hairy": true, "Smelly": false}

Ok. Now, how would I add an attribute named “Wobbly”? Well, I could pull the data into an external application, add it, and store the result. But suppose this was in a table of millions of records? That’s probably the least efficient way to modify them. This could be parallelized to a certain extent, but that requires a lot of scaffolding code and is way too much wor

[continue reading]

Posted by gabrielle roth on 2015-04-10 at 01:00:00

When: 6-8pm Thursday April 16, 2015
Where: Iovation
Who: Eric Hanson
What: Aquameta release!

Eric Hanson will give a tutorial for how to build applications with Aquameta, an open source web development platform built entirely in PostgreSQL. Aquameta is about to be launched as open source, so we’ll do a quick launch recap, and then dive into the tutorial.


Our meeting will be held at Iovation, on the 32nd floor of the US Bancorp Tower at 111 SW 5th (5th & Oak). It’s right on the Green & Yellow Max lines. Underground bike parking is available in the parking garage; outdoors all around the block in the usual spots. No bikes in the office, sorry!

Elevators open at 5:45 and building security closes access to the floor at 6:30.

The building is on the Green & Yellow Max lines. Underground bike parking is available in the parking garage; outdoors all around the block in the usual spots.

See you there!


Recently Simon Riggs committed a patch by himself and Andreas Karlsson to reduce the lock strength required by certain ALTER TABLE commands, including those to enable or disable triggers. Now the lock level required is SHARE ROW EXCLUSIVE instead of ACCESS EXCLUSIVE. That means it doesn't block SELECT commands any more, and isn't blocked by them, although it will still block and be blocked by INSERT, UPDATE and DELETE operations. Very nice.

However, without formally disabling a trigger you can tell it dynamically not to do anything in the current session without taking any locks at all. Here's a little bit of PLpgsql code I wrote recently for this sort of operation in an INSERT trigger:
    begin
disabled := current_setting('mypackage.foo_trigger_disabled');
exception
when others then disabled := 'false';
end;
if disabled = 'true' then
return NEW;
end if;
Note that this will only block the trigger from doing anything in sessions where this variable is set. But that's often exactly what you want. In the case this was written for, the trigger is redundant (and expensive) for certain bulk operations, but required for normal operations.  So in a session where we are performing the bulk operation, we can simply set this and avoid taking out a heavy lock on the table, and do this instead, before running our bulk operation:
    set mypackage.foo_trigger_disabled = 'true';
The code above is a bit ugly because of the requirement for the exception handler. There's a cure for that coming, too. David Christensen has submitted a patch to provide a form of current_setting() which will return NULL for unset variables instead of raising an exception.

Note, too, that you could use a value in a one-row one-column table if you wanted something that could apply in all sessions, not just the current session. It would be a bit less efficient, though. This mechanism is pretty light-weight.
Posted by Umair Shahid in Stormatics on 2015-04-07 at 11:51:10

In a previous post, I talked about the excitement that surrounds NoSQL support in PostgreSQL. Today, I will dive a little deeper into the technical details of native JSON support that makes NoSQL possible.

Below, I explain how to use some basic functions and operators.

Creating a table with JSON column

So this is fairly simple. You can declare a JSON column just like a column of any other data type. Below, I create a table ‘sales’ with 2 columns, ‘id’ and ‘sale’, with the latter being a JSON.

json_sample=# CREATE TABLE sales (id INT, sale JSON);
CREATE TABLE

Inserting JSON data

Insertion of data is pretty close to that of any other data type as well, except that you have to make sure the data is in a valid JSON format. Below, I am inserting 4 records into the table, each with a JSON containing nodes for ‘customer_name’ and a nested JSON for ‘items’ containing a ‘description’ and purchased ‘quantity’.
John bought 4 cartons of milk:

json_sample=# INSERT INTO sales 
                   VALUES (1,'{ "customer_name": "John", "items": { "description": "milk", "quantity": 4 } }');
INSERT 0 1

Susan bought 2 loaves of bread:

json_sample=# INSERT INTO sales 
                   VALUES (2,'{ "customer_name": "Susan", "items": { "description": "bread", "quantity": 2 } }');
INSERT 0 1

Mark bought a dozen bananas:

json_sample=# INSERT INTO sales 
                   VALUES (3,'{ "customer_name": "Mark", "items": { "description": "bananas", "quantity": 12 } }');
INSERT 0 1

Jane bought a pack of cereal:

json_sample=# INSERT INTO sales 
                   VALUES (4,'{ "customer_name": "Jane", "items": { "description": "cereal", "quantity": 1 } }');
INSERT 0 1

Retrieving JSON data

Like insertion, retrieving the JSON formatted data is fairly straightforward as well. Below, I am retrieving the data I inserted in the previous section.

json_sample=# SELECT * FROM sales;
id |                                      sale
----+------------------------------------------------------------------------------------
1 | { "c

[continue reading]

On 31st of March, Bruce Momjian committed patch: psql: add asciidoc output format   Patch by Szymon Guz, adjustments by me   Testing by Michael Paquier, Pavel Stehule To be honest, when Szymon posted first mail about asciidoc – it was the first time I heard about it. Immediately I thought: “why not markdown? or […]
Posted by Andrew Dunstan in pgExperts on 2015-04-06 at 23:29:00
Here's a picture of my two Raspberry Pi 2 boxes, both running headless and wireless.


One is running Raspbian, installed via NOOBS, and the other Fidora, a remix of Fedora 21 for Raspberry Pi 2. It turned out that Pidora doesn't work on the Raspberry Pi 2, a fact that is extremely well hidden on the Raspberry Pi web site.

I have set up test buildfarm animals on both of these. But something odd is happening. They are both getting intermittent failures of the stats regression test. Sometimes it happens during "make check", sometimes during "make installcheck" and sometimes during testing of pg_upgrade (which in turn runs "make installcheck").

These machines are not anything like speed demons. Far from it. But we also run other slow machines without getting this happening all the time. So I'm a bit perplexed about what might be going on.

Incidentally, if you want to play with one of these, I do recommend getting a starter kit from Amazon or elsewhere. It's probably cheaper than buying everything separately, and gets you everything you need to get started. Well worth the $69.99.

Posted by David Christensen in End Point on 2015-04-06 at 22:10:00
I recently just got back from PGConf 2015 NYC.  It was an invigorating, fun experience, both attending and speaking at the conference.

What follows is a brief summary of some of the talks I saw, as well as some insights/thoughts:

On Thursday:

"Managing PostgreSQL with Puppet" by Chris Everest.  This talk covered experiences by CoverMyMeds.com staff in deploying PostgreSQL instances and integrating with custom Puppet recipes.

"A TARDIS for your ORM - application level timetravel in PostgreSQL" by Magnus Hagander. Demonstrated how to construct a mirror schema of an existing database and manage (via triggers) a view of how data existed at some specific point in time.  This system utilized range types with exclusion constraints, views, and session variables to generate a similar-structured schema to be consumed by an existing ORM application.

"Building a 'Database of Things' with Foreign Data Wrappers" by Rick Otten.  This was a live demonstration of building a custom foreign data wrapper to control such attributes as hue, brightness, and on/off state of Philips Hue bulbs.  Very interesting live demo, nice audience response to the control systems.  Used a python framework to stub out the interface with the foreign data wrapper and integrate fully.

"Advanced use of pg_stat_statements: Filtering, Regression Testing & More" by Lukas Fittl.  Covered how to use the pg_stat_statements extension to normalize queries and locate common performance statistics for the same query.  This talk also covered the pg_query tool/library, a Ruby tool to parse/analyze queries offline and generate a JSON object representing the query.  The talk also covered the example of using a test database and the pg_stat_statements views/data to perform query analysis to theorize about planning of specific queries without particular database indexes, etc.

On Friday:

"Webscale's dead! Long live Postgres!" by Joshua Drake.  This talk covered improvements that PostgreSQL has made over the years, specific technologies that they have incorporated

[continue reading]