PostgreSQL
The world's most advanced open source database
Top posters
Number of posts in the past month
Top teams
Number of posts in the past month
Feeds
Twitter
Planet
  • Policy for being listed on Planet PostgreSQL.
  • Add your blog to Planet PostgreSQL.
  • List of all subscribed blogs.
  • Manage your registration.
Contact
  • Get in touch with the Planet PostgreSQL administrators at planet at postgresql.org.
On 16th of May, Andres Freund committed patch: Support GROUPING SETS, CUBE and ROLLUP.   This SQL standard functionality allows to aggregate data by different GROUP BY clauses at once. Each grouping set returns rows with columns grouped by in other sets set to NULL.   This could previously be achieved by doing each grouping […]

Things are shaping up nicely in PostGIS 2.2 development. We are going to hit feature freeze around June 30th 2015, and plan to ship late August or early September to be in line with PostgreSQL 9.5 release. So far we have committed a couple of neat features most itemized in PostGIS 2.2 New Functions. Many of the really sort after ones will require PostgreSQL 9.5 and GEOS 3.5. The geography measurement enhancements will require Proj 4.9.0+ to take advantage of. Things I'd like to highlight and then later dedicate full-length articles in our BostonGIS Waiting for PostGIS 2.2 series once they've been stress tested.


Continue reading "PostGIS 2.2 leveraging power of PostgreSQL 9.5"
On 15th of May, Simon Riggs committed patch: TABLESAMPLE, SQL Standard and extensible   Add a TABLESAMPLE clause to SELECT statements that allows user to specify random BERNOULLI sampling or block level SYSTEM sampling. Implementation allows for extensible sampling functions to be written, using a standard API. Basic version follows SQLStandard exactly. Usable concrete use […]

For the second of my ten part series on hidden PGDB (PostgreSQL) performance killers, I’m going to talk about something called an anti-join. It’s not a well-known approach outside of the database world, but due to how it works, it can impart almost magical plan revisions that drastically improve query performance in the right scenario. Developers can add it to a growing bag of tricks when working on database-driven content, since it comes in handy more often than you might expect.

Let’s build a test-case, shall we?

CREATE TABLE sys_product
(
    product_id   SERIAL  PRIMARY KEY,
    prod_name    TEXT    NOT NULL,
    quality      INT     NOT NULL,
    descr        TEXT    NOT NULL DEFAULT now()
);
 
INSERT INTO sys_product (prod_name, quality, descr)
SELECT 'Product ' || a.id::TEXT,
       log((a.id % 100) + 1)::INT,
       'It does stuff.'
  FROM generate_series(1, 100000) a(id);
 
CREATE TABLE sys_order
(
    order_id     SERIAL       NOT NULL,
    product_id   INT          NOT NULL,
    item_count   INT          NOT NULL,
    order_dt     TIMESTAMPTZ  NOT NULL DEFAULT now(),
    valid_dt     TIMESTAMPTZ  NULL
);
 
INSERT INTO sys_order (product_id, item_count, order_dt, valid_dt)
SELECT (a.id % 100000) + 1, (a.id % 100) + 1,
       now() - (id % 1000 || 'd')::INTERVAL,
       CASE WHEN a.id % 499 = 0
            THEN NULL
            ELSE now() - (id % 999 || 'd')::INTERVAL
       END
  FROM generate_series(1, 1000000) a(id);
 
ALTER TABLE sys_order ADD CONSTRAINT pk_order_order_id
      PRIMARY KEY (order_id);
 
CREATE INDEX idx_order_product_id
    ON sys_order (product_id);
 
CREATE INDEX idx_order_valid_dt
    ON sys_order (valid_dt);
 
ANALYZE sys_product;
ANALYZE sys_order;

This is a very basic product and order table structure, and we’ve used it before in the last installment of this series. The only columns we’ve added since last time is the quality column in sys_product and the valid_dt column in sys_order. This way, we can introduce some variability into the query plan due to data correlatio

[continue reading]

Posted by Marco Slot in CitusData on 2015-05-22 at 12:03:14

The pg_shard extension helps you scale out PostgreSQL for large data sets by transparently distributing tables across many nodes, storing (replicated) shards of the data as regular PostgreSQL tables on worker nodes. With pg_shard, you can perform INSERT, UPDATE, DELETE, SELECT on a distributed table as if you were using a regular table.

While first-class COPY support is still on the horizon, we've already added a copy_to_distributed_table tool for loading data files, which supports the same formats as COPY. When using this tool, pg_shard creates a temporary 'insert proxy' table which has a trigger that INSERTs into the distributed table. The copy_to_distributed_table script copies a file into the insert proxy.

Below we give an example of loading customer reviews data using the example from the pg_shard github page and the copy_to_distributed_table script that comes with pg_shard.

copy_to_distributed_table -C customer_reviews_1998.csv customer_reviews

Some users have noticed that pg_shard performs poorly when performing many INSERTs over a single connection and the same applies to copy_to_distributed_table. The reason is simple: pg_shard has to wait for a network round-trip on every INSERT. Since there is no notion of parallelism within a single PostgreSQL connection, consecutive INSERTs on the same connection will have to wait. However, pg_shard can still handle more INSERTs on other connections while waiting.

When loading a large data file, it's therefore recommended to split the file into multiple chunks first and use xargs -P 64 to load the chunks concurrently. The commands below have the same effect as the command above, but run much faster.

mkdir chunks
split -n l/64 customer_reviews_1998.csv chunks/
find chunks/ -type f | xargs -n 1 -P 64 sh -c 'echo $0 `copy_to_distributed_table -C $0 customer_reviews`'

On a c3.8xlarge EC2 instance, using parallel data loading for pg_shard is roughly 40x faster and inserts around 50,000 rows per second in the above example, but there are many ways of scaling out pg_sh

[continue reading]

On 14th of May, Stephen Frost committed patch: Add pg_audit, an auditing extension   This extension provides detailed logging classes, ability to control logging at a per-object level, and includes fully-qualified object names for logged statements (DML and DDL) in independent fields of the log output.   Authors: Ian Barwick, Abhijit Menon-Sen, David Steele Reviews […]
Posted by Feng Tian on 2015-05-21 at 23:33:00
Say you have carefully tuned your database and wow! it is ten times faster -- but, before you push it to the production cluster, how do you know the answer of the query is correct?   Or at least, same as before?   :-)    You may have same question if you want to upgrade or migrate your database, or, for us, we want to make sure the new great join algorithm actually produce correct answer.   Well, let's just put the result before, and after, in two tables, and compare the two tables.

This is great, but turns out to be quite tricky.   You need to take care of the cases that,
  •  Either table may not have a primary key
  •  Rows in the two tables may be physically stored in different orders.
  • Columns could contain nulls. 
  • The two tables could be huge, so performance must be good.
Here are something you could do.   

Method 1: You can dump the tables to text or csv files, the run diff -- but you need to take care of ordering, so you have to copy the table out with order by clause.    And good luck diff-ing 2TB of text file.

Method 2: Use except.  How about this query?  
select * from (
    select * from ta except select * from tb 
    union all 
    select * from tb except select * from ta
) foo; 

It is totally wrong!   See attached sql script for some surprise.   And this?

select * from (
    select * from (select * from ta except select * from tb) xxx
    union all
    select * from (select * from tb except select * from ta) yyy
) foo;
Better, but it did not take care of duplicate rows -- and, checking select count(*) is not good enough.

Method 3: Join by ourselves!  Here is a script from my friend CK.

with 
A as ( 
    select hashtext(textin(record_out(ta))) as h, count(*) as c 
    from ta group by h
),
B as (
    select hashtext(textin(record_out(tb))) as h, count(*) as c 
    from tb group by h
)
select * from A full outer join B on (A.h + A.c= B.h + B.c)
where A.h is null or B.h is null limit 5;

The hashtext part is optional -- for very wide rows, using hashtext will save some cycles and memory.   This is

[continue reading]

Posted by Joshua Drake in CommandPrompt on 2015-05-21 at 17:48:32
I spent a great deal of time working through the SQL DUMP portion of the 9.5devel docs this past week. Below is the current text of what I have and it would be great if my readers would take a look and offer some thoughtful feedback. What would you like to see added? What would you like to see changed? Please note that this is reference documentation not tutorial documentation.

This is just the straight HTML dump that is generated from Docbook but since it is inline the links won't work. <!a href="http://www.postgresql.org/docs/devel/static/backup-dump.html">The current -devel docs are here and the updated version I am working is below:

<!a name="BACKUP-DUMP" id="BACKUP-DUMP">24.1. SQL Dump

PostgreSQL provides the program <!a href="app-pgdump.html">pg_dump for generating a backup file with SQL commands that, when fed back to the server, will recreate the database in the same state as it was at the time of the dump. The basic usage of <!a href="app-pgdump.html">pg_dump is:

pg_dump -C -F p -f outfile dbname

The use of -C ensures that the dump file will contain the requisite <!a href="sql-createdatabase.html">CREATE DATABASE command within the dump file. The use of -Fp ensures that you are using the plain text format and the use of -f allows you to specify the name of the file the dump will be written to. It is also possible for pg_dump to create files in other formats that allow for parallelism and fine-grained control of object backup or restoration. For more details on all options available to pg_dump please refer to the <!a href="app-pgdump.html">pg_dump reference page.

The pg_dump application requires read access to all objects within the database that it will be operating with. This generally requires database super-user access. It is possible for any database user to use pg_dump to backup the objects that they own regardless of super-user access. This can be achieved using options such as -n schema or -t table.

The primary advantage of using pg_dump over the other backup methods described is that pg_dump

[continue reading]

In this entry, I would like to explain how to use pt-table-usage and pt-index-usage commands in the Postgres Toolkit which I introduced in the previous entry. If you have never heard about Postgres Toolkit, please visit the previous entry. A Hacker's Diary: Postgres Toolkit 0.2 released pt-table-usage and pt-index-usage are the commands which can be used to obtain usages of the tables/indexes
On 15th of May, Peter Eisentraut committed patch: Add pg_settings.pending_restart column   with input from David G. Johnston, Robert Haas, Michael Paquier This is, for me, great. It might not sound like much, but it is actually something that I'm very excited about. Basically – let's assume you change system settings. Either by modifying postgresql.conf, […]
Posted by Andrew Dunstan in pgExperts on 2015-05-20 at 16:47:00
There has been some debate going on on the hackers list about problems certain drivers have with operators that contain the ? character. Since drivers such as the JDBC driver consider any such character outside of quote marks to be a parameter placeholder, its use as part of an operator makes life difficult for them. Part of the problem has been that we allow almost every available non-alphanumeric ASCII character to be a part of an operator. Why, for example, do we allow `? I've never seen it that I can recall, but the fact means we can't use it for something else (such as marking an operator). That makes doing anything about the problem hard. Some people want to deprecate the use of ? in operators. I think it's far too late for that - its use is far too widespread.
On 12th of May, Andrew Dunstan committed patch: Additional functions and operators for jsonb   jsonb_pretty(jsonb) produces nicely indented json output. jsonb || jsonb concatenates two jsonb values. jsonb - text removes a key and its associated value from the json jsonb - int removes the designated array element jsonb - text[] removes a key […]

A couple of days back the following commit has landed in the Postgres world, for the upcoming 9.5 release:

commit: a486e35706eaea17e27e5fa0a2de6bc98546de1e
author: Peter Eisentraut <peter_e@gmx.net>
date: Thu, 14 May 2015 20:08:51 -0400
Add pg_settings.pending_restart column

with input from David G. Johnston, Robert Haas, Michael Paquier

Particularly useful for system doing a lot of server parameter updates, this allows tracking parameters in need of a server restart when their value is updated to have the new value take effect on the system. Note that this applies to all the parameters marked as PGC_POSTMASTER in guc.c, shared_buffers being one, as well as the custom parameters a system may have after their load by a plugin. This information is tracked by a new column called pending_restart in the system view pg_settings with a boolean value set to "true" if a given GUC parameter is indeed waiting for a server restart.

In order to make visible the fact that parameter waits for a restart, the server can have its parameters be reloaded with either pg_reload_conf(), "pg_ctl reload" or a SIGHUP signal. Of course, modifications made in postgresql.conf, as well as any configuration files included, or ALTER SYSTEM are taken into account. See for example:

=# \! echo "port = 6666" > $PGDATA/postgresql.conf
=# ALTER SYSTEM SET shared_buffers TO '1GB';
ALTER SYSTEM
=# SELECT pg_reload_conf();
 pg_reload_conf
----------------
 t
(1 row)
=# SELECT name FROM pg_settings WHERE pending_restart;
      name
----------------
 port
 shared_buffers
(2 rows)

This will prove to be useful for many systems around, like those doing automatic tuning of system parameters or even containers (not limited to it of course).

Posted by Giulio Calacoci in 2ndQuadrant on 2015-05-19 at 08:30:45

The release of PostgreSQL 9.5 is imminent so the time has come to analyse what’s new in this latest version.

A very interesting feature of version 9.5 is the ability to import a schema from a remote database, using Foreign Data Wrapper and the IMPORT FOREIGN SCHEMA command.

import-foreign-schema

Foreign Data Wrappers (FDW)

Before the introduction of Foreign Data Wrappers, the only way to connect a Postgres database with an external data source was using the dblink module.

In 2003 the set of rules for the standard management of external data sources was defined within the SQL language: SQL/MED  (management of external Data).

PostgreSQL 9.1 introduced a first implementation of the standard SQL/MED with the Foreign Data Wrappers, which provided Postgres with direct access to data sources such as files or other databases (Oracle, Mysql…), allowing their use as tables.

The advantage of this approach is obvious: it gives you the ability to connect to an external data source to extract data natively by running a simple query. The fact that it is not necessary to use external modules to obtain this result considerably simplifies the DBAs’ work.

If you want to know more, take a look at our 2011 blog post: PostgreSQL 9.1: Tabelle esterne con SQL/MED (Warning: blog post in Italian).

Small example of an FDW

PostgreSQL 9.3 introduces the support for Foreign Data Wrappers in writing (Warning: blog post in Italian) and also added support to the foreign data wrapper for PostgreSQL. Let’s have a look at a simple example of how to use an FDW by connecting together two Postgres databases.

First we create two databases:

CREATE DATABASE source;
CREATE DATABASE destination;

Within the source we create a test table with test data:

\c source
CREATE TABLE test1 AS SELECT id, md5(random()::text) FROM generate_series(1,5) id;

Now we connect to the destination db and then, connect the two databases:

\c destination
CREATE EXTENSION postgres_fdw ;
CREATE SERVER src_srv FOREIGN DATA WRAPPER postgres_fdw OPTIONS( dbname 'source' );
CREATE 

[continue reading]

Tom Lane introduced new state for TOAST types - expanded state. The side effect of this change together with some changes in plpgsql internals has big impact to speed of array operation in plpgsql.
I have two plpgsql block with cycles. First enforces a array fields update, second a array append.
DO $$ 
DECLARE a int[] = array_fill(10, ARRAY[10000]);
BEGIN
FOR i IN 1 .. 10000 LOOP
a[i] := 0;
END LOOP;
END;
$$;

DO $$
DECLARE a int[] = '{}';
BEGIN
FOR i IN 1 .. 10000 LOOP
a := a || 10;
END LOOP;
END;
$$;
You can try this code on PostgreSQL 9.4 - and you can get time 450 and 220ms. Same code needs 6 and 5 ms only on PostgreSQL 9.5! It is more than one order speedup.
Posted by gabrielle roth on 2015-05-19 at 02:07:00
I do a lot of “is this !#@(* really working correctly” testing. (See also: simple test for autovacuum log messages.) Here’s a simple test to verify that log_lock_waits is configured correctly. This is essential for troubleshooting lock contention. -- verify settings -- log_lock_waits should be 'on'; it is off by default. SHOW log_lock_waits; -- lock_wait […]
Posted by Greg Sabino Mullane in End Point on 2015-05-18 at 12:00:00

Determining if your current database connection is using PgBouncer, or going directly to Postgres itself, can be challenging, as PgBouncer is a very low-level, transparent interface. It is possible, and here are some detection methods you can use.

This was inspired by someone asking on the Perl DBD IRC channel if it was possible to easily tell if your current database handle (usually "$dbh") is connected to PgBouncer or not. Since I've seen this question asked in other venues, I decided to take a crack at it.

There are actually two questions to be answered: (1) are we connected to PgBouncer, and if so, (2) what pool_mode is being run? The quickest and easiest wat I found to answer the first question is to try and connect to a non-existent database. Normally, this is a FATAL message, as seen here:

$ psql testdb -p 5432
testdb=# \c ghostdb
FATAL:  database "ghostdb" does not exist
Previous connection kept
testdb=# 

However, a slightly different ERROR message is returned if the same thing is attempted while connected to PgBouncer:

$ psql testdb -p 6432
testdb=# \c ghostdb
ERROR:  No such database: ghostdb
Previous connection kept
testdb=# 

Thus, an ERROR will always indicate that you are connected to PgBouncer and not directly to Postgres, which will always issue a FATAL.

In the future, there will be an even simpler method. As of this writing, pgBouncer 1.6 has not been released, but it will have the ability to customize the application_name. This is a configurable session-level variable that is fairly new in Postgres. Andrew Dunstan wrote a patch which enables adding this to your pgbouncer.ini file:

application_name_add_host = 1

This will make PgBouncer modify the application_name to append some information to it such as the remote host, the remote port, and the local port. This is a feature many PgBouncer users will appreciate, as it offers an escape from the black hole of connection information that PgBouncer suffers from. Here is what it looks like on both a normal Postgres conne

[continue reading]

Posted by Andrew Dunstan in pgExperts on 2015-05-17 at 17:20:00
One of the things I like about data recovery jobs is that each one is different. Right now I'm wrestling with this problem: how can I recover whatever is possible from a backup that's missing a handful of xlog files. We know this will result in a probably corrupt database, but we still want to see what data is available.

My first two approaches didn't work too well. I tried mangling the control file, and creating dummy xlog files to replace the handful that are missing.

Right now I'm trying to follow a suggestion from Andres Freund to drive the process via the debugger, moving it on to the next available log file when it gets to where it would want the first missing log file.  The trouble is it's hard to find what to set exactly and where to set it (this is Postgres 9.3). Every time I try something I get errors like "unexpected page_addr". If I were more familiar with this section of the code it might be easier, but it's something I've pretty much never dabbled in.

It all makes for an exciting life.
Posted by Robins Tharakan on 2015-05-16 at 14:50:00
While reviewing an existing application, I thought it'd be worthwhile to review how good / bad PostgreSQL is in terms of OLAP. This (growing) post is going to be my (un)learning of how ready is PostgreSQL. Row Numbering Support: Yes.  Use: Row_Number() function numbers rows generated in a result-set. Example: SELECT   row_number() OVER (ORDER BY marks DESC) AS rn,  nameFROM x; Review: Some
Finally ! A *much* awaited feature, this attempt at adding the GROUPING SETS / ROLLUP / CUBE feature to PostgreSQL has been in the works for about a year (besides the so many in the past decade and a half that didn't get through), and thankfully this has finally got the approval of the powers that be, so the upcoming Postgres 9.5 would finally have this long pending SQL feature. MSSQL and
Posted by Joel Jacobson on 2015-05-16 at 10:08:21

By coincidence I stumbled upon a table where one of its columns didn’t have a foreign key. I found this strange, since our convention is to always add foreign keys when appropriate.

But humans are humans, and will eventually forget to add a foreign key, so we better have both belt and suspenders and find a way to deal with this inevitable problem in an efficient way.

It would be a tedious job to manually look for missing foreign keys in all tables and columns.
But if you’re lucky enough to have the pleasure of working with a system where all tables and columns have been given their names by following a strict naming convention, it might be possible to fully automate the task.

This is the namning convention we use:

  1. Table names are always in plural form with a tailing “s”, e.g. users
  2. Primary key column names are always equal to the table name in singular form with a tailing id, e.g. userid.
  3. Foreign key columns are always equal to the primary key they are referecning, e.g. transactions.userid -> users.userid

This means you always know based on the column name alone, what table if any that column might be referencing.

Example: If table transactions have a column named userid, and there is a table where userid is also the primary key, but if there isn’t any foreign key on transactions.userid, then it’s a missing foreign key, or else someone has not followed the namning convention.

Thanks to PostgreSQL’s fantastic pg_catalog system tables, we can write a query which uses the rules from the naming convention and returns all the columns which appears to be missing foreign keys. It doesn’t support multi-column keys, but we don’t have many of those, so it’s not a problem in my case.

Thanks to the view below, I automatically found three more missing foreign keys of the same type, which saves me many hours of boring work today.

pg1:joel=#* SELECT * FROM view_missing_foreign_keys;
 nspname |      relname    | attname 
---------+-----------------+---------
 public  | transactions    | userid
 public  | someohtertable1 | use

[continue reading]

Yesterday I gave the developers at my company what I call a DBA Chat. It’s something I try to do every month to keep them apprised on various features, caveats, performance considerations, and so on. I find that educating the folks who regularly work with the database does wonders for application performance and my sanity. The benefit of this long format is that I can go over more information than a time constrained set of slides.

This month we went over one of my slightly older talks, and even though the material was three years old from my perspective, it was all new to them. And why not? Developers have a job to do, and while they do work with the database, it generally isn’t their responsibility to research the voluminous minutia and quirks commonly associated with a specific platform. That’s my job.

So here is the first of a ten part series on anti-patterns that can annihilate PGDB (PostgreSQL) performance, some of which are outright insidious. This week we’ll start slow with a short discussion on the importance of EXPLAIN. This is PGDB’s first line of defense against bad queries, and it’s overlooked more than you might expect, as well as somewhat difficult to interpret for the uninformed. I’ll probably have a more in-depth version of this article in the future for truly interpreting EXPLAIN output, but this time we’ll be focusing on using it in general.

First, we need to set up a use case to illustrate a couple of variants. Two tables with a close relationship should do it:

CREATE TABLE sys_product
(
    product_id   SERIAL  PRIMARY KEY,
    prod_name    TEXT    NOT NULL,
    descr        TEXT    NOT NULL DEFAULT now()
);
 
INSERT INTO sys_product (prod_name, descr)
SELECT 'Product ' || a.id::TEXT, 'It does stuff.'
  FROM generate_series(1, 1000) a(id);
 
CREATE TABLE sys_order
(
    order_id     SERIAL       NOT NULL,
    product_id   INT          NOT NULL,
    item_count   INT          NOT NULL,
    order_dt     TIMESTAMPTZ  NOT NULL DEFAULT now()
);
 
INSERT INTO sys_order (product_id, item_count)
S

[continue reading]

Posted by Josh Berkus on 2015-05-15 at 19:13:28

We're having two meetups in May: one in Oakland, and one in San Francisco:

Unfortunately, the Oakland one already has a waiting list, but get on it if you're interested in case of cancellations. And a big thank-you to Rackspace.com and Turnitin.com for sponsoring these meetups.

Posted by Andrew Dunstan in pgExperts on 2015-05-15 at 13:04:00
Every so often I've wanted a SET metatype in Postgres.

We do have bit fields, which can be used as sets with a little work, and this is how MySQL does it's SET types as I understand it. But what if you want a set of enums (our enums can have very large lists of values), or a set of text values, or of integers? Bit fields won't work for these - we would need some sort of sparse representation.

One obvious way would be to store the set as a one-dimensional array, kept in sorted order for efficient searching. For text values, we could probably use some smart form of sorting such as is used by jsonb and hstore to store object keys, to reduce the need for string comparisons.

Obviously, I have done no work on this, but it seems like it would be a worthwhile project.

In the meantime, you can use hstore more or less in this way. Just treat it as a mapping from whatever values you have to 1.  It's a bit of a hack, but it should work. You'll just need to cast your values to text.
Posted by Andrew Dunstan in pgExperts on 2015-05-13 at 16:32:00
Yesterday I committed some new features for jsonb. These are:
  • jsonb_pretty(jsonb) produces nicely indented json output.
  • jsonb || jsonb concatenates two jsonb values.
  • jsonb - text removes a key and its associated value, or a matching array element, from the json
  • jsonb - int removes the designated array element
  • jsonb - text[] removes a key and associated value or array element at the designated path
  • jsonb_replace(jsonb,text[],jsonb) replaces the array element designated by the path or the value associated with the key designated by the path with the given value.
This was based Dmitry Dolgov's jsonbx extension for 9.4, which in turn was based on earlier work by Oleg Bartunov, Teodor Sigaev and Alexander Korotkov on nested hstore. Petr Jelinek also did some useful review and tidy up work on the core features. So, a good team effort.
Posted by Amit Kapila in EnterpriseDB on 2015-05-13 at 14:26:00

As of now, one can't reliably use tar format to take backup on Windows
because it can't restore tablespaces data which is stored in form of symbolic
links in <data_directory>/pg_tblspc/.  The reason for the same is that  native
windows utilites are not able to create symbolic links while extracting files
from tar.  It might be possible to create symbolic links if cygwin is installed
on your system, however we need this feature to work for native windows as
well.

In PostgreSQL 9.5, a new feature (commit id - 72d422a5) to extend existing
tar format made it possible to reliably take the backup (in tar mode). 
From user perspective, there is nothing much that is changed to take the backup
except that  tar format mode (--format=tar) in pg_basebackup (of the PostgreSQL
9.5 version) will only work with server version 9.5 or later.  This feature is mainly
required for windows, but for the sake consistency it has been changed for all
platforms and also it should enable long (length greater than 99) target symbolic
link for tar format (I think the changes for same are still not done, but we can do
the same now as this feature is committed).

The basic idea behind the feature is that it forms the tablespace map of
all the tablespace symbolic links that are present inside
<data_directory>/pg_tblspc/ and store the same in data_directory for
Exclusive backups (aka backups taken via pg_start_backup() and
pg_stop_backup() functions) and store in backup archive for Non-Exclusive
backups (aka backups taken by pg_basebackup).

The format of tablespace_map file is:
16384 E:\WorkSpace\PostgreSQL\master\tbs
16388 E:\WorkSpace\PostgreSQL\master\tbs              2               3

The tablespace symbolic links are restored during archive recovery and the
tablespace_map file will be renamed to tablespace_map.old at the end of
recovery similar to backup_label file.
Posted by Satoshi Nagayasu on 2015-05-12 at 00:34:00
Today, we pleased to announce the release of Postegres Toolkit 0.2. http://www.github.com/uptimejp/postgres-toolkit/ https://postgres-toolkit-ja.readthedocs.org/ ■What is "Postgres Toolkit"? Postgres Toolkit is a collection of scripts and utilities which is intended to help PostgreSQL DBA to improve quality and productivity of their daily jobs and operations. With having Postgres Toolkit,
Posted by Josh Berkus in pgExperts on 2015-05-12 at 00:24:00
In prior posts, I've gone over some methods to prevent cycles from being added to your database.  However, imagine that someone has handed you an existing adjacency list tree -- perhaps migrated from another DBMS -- and you need to find all of the cycles as part of data cleaning?  How do you do that?

One way, obviously, would be just explore all paths and flag the ones where any ID appeared twice:

    WITH RECURSVIE prev AS (
        SELECT folders.id, 1 AS depth, array[id] as seen, false as cycle
        FROM folders
        UNION ALL
        SELECT folders.id, prev.depth + 1, path || folders.id as seen,
            folders.id = any(seen) as cycle
        FROM prev
        INNER JOIN folders on prev.id = parent_id
    )
    SELECT *
    FROM prev;



However, the above has a serious issue: the query itself will cycle and never complete (in fact, it will error out). So we need to terminate each cycle when the first repeat happens.  Fortunately, that's easy to do, and we'll filter for only the cycles while we're at it:

    WITH RECURSVIE prev AS (
        SELECT folders.id, 1 AS depth, array[id] as seen, false as cycle
        FROM folders
        UNION ALL
        SELECT folders.id, prev.depth + 1, seen || folders.id as seen,
            folders.id = any(seen) as cycle
        FROM prev
        INNER JOIN folders on prev.id = parent_id
        AND prev.cycle = false
    )
    SELECT *
    FROM prev
    WHERE cycle = true;


The results of the above query look like this:

    id | depth |       seen       | cycle
   ----+-------+------------------+-------
    21 |     2 | {21,21}          | t
    13 |     5 | {13,14,15,11,13} | t
    14 |     5 | {14,15,11,13,14} | t
    15 |     5 | {15,11,13,14,15} | t
    11 |     5 | {11,13,14,15,11} | t
    (5 rows)


One thing to notice is that you'll get a row for every node in a cycle loop.  That's because with cycles, the choice of starting point is arbitrary.  So the above query isn't the best choice for a deep tree where you have a lot of existing cycles.  There's a 2nd

[continue reading]

On 12th of April, Magnus Hagander committed patch: Add system view pg_stat_ssl   This view shows information about all connections, such as if the connection is using SSL, which cipher is used, and which client certificate (if any) is used.   Reviews by Alex Shulgin, Heikki Linnakangas, Andres Freund & Michael Paquier It was committed […]
I had to setup it relatively recently, and hit some roadblocks, so figured I'll write about my experiences – for myself in the future, or for anyone else that might want to set it up. First, let's state goals: remote communication to PostgreSQL (as in: not within localhost) should go via ssl encrypted channels to […]