PostgreSQL
The world's most advanced open source database
Top posters
Number of posts in the past month
Top teams
Number of posts in the past month
Feeds
Planet
  • Policy for being listed on Planet PostgreSQL.
  • Add your blog to Planet PostgreSQL.
  • List of all subscribed blogs.
  • Manage your registration.
Contact
  • Get in touch with the Planet PostgreSQL administrators at planet at postgresql.org.

We've been playing around with new phrase text feature of full-text PostgreSQL 9.6. In doing so, I was curious how big a number one can designate as max that words can be apart. I discovered thru trial and error, that the magic number is 16384 which is much bigger than I had suspected.


Continue reading "PostgreSQL 9.6 phrase text searching how far apart can you go"

A few days ago I’ve blogged about the common issues with roles and privileges we discover during security reviews.

Of course, PostgreSQL offers many advanced security-related features, one of them being Row Level Security (RLS), available since PostgreSQL 9.5.

As 9.5 was released in January 2016 (so just a few months ago), RLS is fairly new feature and we’re not really dealing with many production deployments yet. Instead RLS is a common subject of “how to implement” discussions, and one of the most common questions is how to make it work with application-level users. So let’s see what possible solutions there are.

Introduction to RLS

Let’s see a very simple example first, explaining what RLS is about. Let’s say we have a chat table storing messages sent between users – the users can insert rows into it to send messages to other users, and query it to see messages sent to them by other users. So the table might look like this:

CREATE TABLE chat (
    message_uuid    UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
    message_time    TIMESTAMP NOT NULL DEFAULT now(),
    message_from    NAME      NOT NULL DEFAULT current_user,
    message_to      NAME      NOT NULL,
    message_subject VARCHAR(64) NOT NULL,
    message_body    TEXT
);

The classic role-based security only allows us to restrict access to either the whole table or vertical slices of it (columns). So we can’t use it to prevent users from reading messages intended for other users, or sending messages with a fake message_from field.

And that’s exactly what RLS is for – it allows you to create rules (policies) restricting access to subsets of rows. So for example you can do this:

CREATE POLICY chat_policy ON chat
    USING ((message_to = current_user) OR (message_from = current_user))
    WITH CHECK (message_from = current_user)

This policy ensures a user can only see messages sent by him or intended for him – that’s what the condition in USING clause does. The second part of the policy (WITH CHECK) assures a user can only insert messages with his use

[...]
When you are creating table in postgres, you are creating up to two relations in a row.
In case when you create table with fixed-length attributes only, only one relation is created. A heap relation.
If you have at least one variable-length attribute in your table, then both heap and toast relations will be created.

Relations also have options: reloptions. You can set them while creating and altering table. To set options for toast relations you should use toast. prefix before reloption name:
CREATE TABLE reloptions_test (s varchar) WITH (toast.autovacuum_vacuum_cost_delay = 23 );

The only problem is that if you have table with no varlen values, postgres will accept toast reloption, but will not write it anywhere.
#CREATE TABLE reloptions_test (i int) WITH (toast.autovacuum_vacuum_cost_delay = 23 );
CREATE TABLE
# select reltoastrelid from pg_class where oid = 'reloptions_test'::regclass;
 reltoastrelid 
---------------
             0
(1 row)

there is no toast relation and reloption is not saved at all, postgres reports, everything is ok

Same for alter table:
# ALTER TABLE reloptions_test SET (toast.autovacuum_vacuum_cost_delay = 24 );
ALTER TABLE
# select reltoastrelid from pg_class where oid = 'reloptions_test'::regclass;
 reltoastrelid 
---------------
             0
(1 row)

This is not nice behavior, isn't it?

PS please when writing a comment, login with any account you have, or just leave a name and/or e-mail so I will be able to answer that comment ;-)

PostgreSQL 9.5 has introduced new JSONB functionalities, greatly improving its already present NoSQL characteristics. With the inclusion of new operators and functions, now it is possible to easily modify the JSONB data. In this article these new modifiers will be presented together with some examples of how to use them.

With the inclusion of the JSON data type in its 9.2 release, PostgreSQL finally started supporting JSON natively. Although with this release it was possible to use Postgres as a “NoSQL” database, not much could actually be done at the time due to the lack of operators and interesting functions. Since 9.2, JSON support has been improving significantly in each new version of PostgreSQL, resulting today in the complete overcome of the initial limitations.

jasonb

Probably, the most remarkable improvements were the addition of the JSONB data type in Postgres 9.4 and, in the current Postgres 9.5 release, the introduction of new operators and functions that permit you to modify and manipulate JSONB data.

In this article we will focus on the new capabilities brought by Postgres 9.5. However, before diving into that, if you want to know more about the differences between the JSON and JSONB data types, or if you have doubts with respect to a “NoSQL” database being a good solution in your use case (which, you should ;)), I suggest you read the previous articles we have written regarding these topics:

The new JSONB Operators

The operators and functions present in PostgreSQL until 9.4 only made it possible to extract JSONB data. Therefore, to actually modify this data, one would extract it, modify it, and then reinsert the data. Not too practical, some would say.

The new operators included in PostgreSQL 9.5, which were based on the jsonbx extension for PostgreSQL 9.4 have changed this, greatly improving how to han

[...]
Posted by Oleg Bartunov in Postgres Professional on 2016-05-28 at 22:03
I don't know, when our FTS-slides will be available from pgcon page, but meanwhile the presentation could be downloaded directly
http://www.sai.msu.su/~megera/postgres/talks/pgcon-2016-fts.pdf

We'll update github repository for rum once we'll fixed some small bugs.

Now that we’ve decided to really start embracing horizontal scaling builds, there is a critically important engine-agnostic element we need to examine. Given an existing table, how exactly should we split up the contents across our various nodes during the conversion process? Generally this is done by selecting a specific column and applying some kind of hash or custom distribution mechanism to ensure all node contents are reasonably balanced. But how do we go about figuring that out?

This question is usually answered with “use the primary key!” But this gets a bit more complex in cases where tables rely on composite keys. This doesn’t happen often, but can really throw a wrench into the works. Imagine for example, we’re using Postgres-XL and have four nodes numbered data0000 through data0003. Then we find this table:

CREATE TABLE comp_event
(
  group_code   TEXT NOT NULL,
  event_id     BIGINT NOT NULL,
  entry_tm     TIMETZ NOT NULL,
  some_data    TEXT NOT NULL
);
 
INSERT INTO comp_event
SELECT a.id % 10, a.id % 100000,
       '08:00'::TIMETZ + (a.id % 43200 || 's')::INTERVAL,
       repeat('a', a.id%10)
  FROM generate_series(1, 1000000) a(id);
 
ALTER TABLE comp_event ADD CONSTRAINT pk_comp_event
      PRIMARY KEY (group_code, event_id, entry_tm);
 
ANALYZE comp_event;

The default for Postgres-XL is to simply use the first column for distribution. This tends to fit most cases, as the first column is usually either the primary key, or a reasonable facsimile of it. We can even use a system view to confirm this is the case:

SELECT pcrelid::regclass AS TABLE_NAME, a.attname AS column_name
  FROM pgxc_class c
  JOIN pg_attribute a ON (a.attrelid = c.pcrelid)
 WHERE a.attnum = c.pcattnum;
 
 TABLE_NAME | column_name 
------------+-------------
 comp_event | group_code

But is this what we want? What would happen if we naively went ahead with the default value and converted the database? Well, the major problem is that we don’t know the hash algorithm Postgres-XL is using. It’s entirely possible t

[...]
Posted by Keith Fiske in OmniTI on 2016-05-27 at 15:55

My post almost 2 years ago about checking for PostgreSQL bloat is still one of the most popular ones on my blog (according to Google Analytics anyway). Since that’s the case, I’ve gone and changed the URL to my old post and reused that one for this post. I’d rather people be directed to correct and current information as quickly as possible instead of adding an update to my old post pointing to a new one. I’ve included my summary on just what exactly bloat is again below since that seemed to be the most popular part.

The intent of the original post was to discuss a python script I’d written for monitoring bloat status: pg_bloat_check.py. Since that time, I’ve been noticing that the query used in v1.x of that script (obtained from the check_postgres.pl module) was not always accurate and was often not reporting on bloat that I knew for a fact was there (Ex: I just deleted over 300 million rows, vacuumed & analyzed the table and still no bloat? Sure it could happen, but highly unlikely). So I continued looking around and discovered the pgstattuple contrib module that comes with PostgreSQL. After discussing it with several of the core developers at recent PostgreSQL conferences (PGConfUS & PGCon) I believe this is a much, much better way to get an accurate assessment of the bloat situation. This encouraged me to do a near complete rewrite of my script and v2.0.0 is now available. It’s not a drop-in replacement for v1.x, so please check the –help for new options.

pgstattuple is a very simple, but powerful extension. It doesn’t require any additional libraries to be loaded and just adds a few functions you can call on database objects to get some statistics about them. The key function for bloat being the default one, pgstattuple(regclass), which returns information about live & dead tuples and free space contained in the given object. If you read the description below on what bloat actually is, you’ll see that those data points are exactly what we’re looking for. The difference between what this function is doing and

[...]
Posted by Markus Winand on 2016-05-27 at 00:00

The filter clause extends aggregate functions (sum, avg, count, …) by an additional where clause. The result of the aggregate is built from only the rows that satisfy the additional where clause too.

Syntax

The filter clause follows an aggregate function:

SUM(<expression>) FILTER(WHERE <condition>)

With the exception of subqueries and window functions, the <condition> may contain any expression that is allowed in regular where clauses0.

The filter clause works for any aggregate function: besides the well-known functions such as sum and count, it also works for array_agg and sorted set functions (e.g., percentile_cont).

If an aggregate function is used as a window function (over clause), the syntactic order is: aggregate function, filter clause, over clause:

SUM(...) FILTER(WHERE ...) OVER (...)

However, the filter clause is not generally allowed before overrather, it is only allowed after an aggregate function, but not after other window functions—it is not allowed before ranking functions (rank, dense_rank, etc.) for example.

Use Cases

The following articles describe common use cases of filter:

Compatibility

SQL:2003 introduced the filter clause as part of the optional feature “Advanced OLAP operations” (T612). It is barely supported today, but is easy to emulate using case (see Conforming Alternatives).

Availability of FILTER

Conforming Alternatives

Generally, the filter clause can be implemented as a case expression inside the aggregate function: the filter condition has to be put into the when-clause, the value to be aggregated into the then clause. Because aggregate functions generally skip over null values1, the implicit else null clause is enough to ignore non-matching rows. The following two expressions are equivalent:

SUM(<expression>) FILTER(WHERE <condition>)
SUM(CASE WHEN <condition> THEN <expression> END)

Count(*) needs some special treatment because “*” cannot be put into the then clause. Instead, it is enough to

[...]
Posted by Robert Haas in EnterpriseDB on 2016-05-26 at 20:49
Yesterday evening, I ran the PostgreSQL regression tests (make check-world) on master and on each supported back-branch three times on hydra, a community test machine provided by IBM.  Here are the median results:

9.1 - 3m49.942s
9.2 - 5m17.278s
9.3 - 6m36.609s
9.4 - 9m48.211s
9.5 - 8m58.544s
master, or 9.6 - 13m16.762s
Read more »
It seems to me that I found a bug in current implementation of reloptions: When you are creating a custom index with it's own reloptions, you have no ways to prevent it from changing using ALTER INDEX .... SET (op=value);
For example if you do for bloom index
alter index bloomidx set ( length=15 );

postgres will successfully run this, change the value of reloptions attribute in pg_class, and bloom index will work wrong after it.
And there is no way to forbid this from inside of an extension.

I think I would add there a flag in reloption descriptor that will tell whether it is allowed to change this reloption using ALTER INDEX, or not

In my previous blog, I talked about using Java arrays to talk to PostgreSQL arrays. This blog is going to go one step further in using Java against the database. Hibernate is an ORM implementation available to use with PostgreSQL. Here we discuss its query language, HQL.

The syntax for HQL is very close to that of SQL, so anyone who knows SQL should be able to ramp up very quickly. The major difference is that rather than addressing tables and columns, HQL deals in objects and their properties. Essentially, it is a complete object oriented language to query your database using Java objects and their properties. As opposed to SQL, HQL understands inheritance, polymorphism, & association. Code written in HQL is translated by Hibernate to SQL at runtime and executed against the PostgreSQL database.

An important point to note here is, references to objects and their properties in HQL are case-sensitive; all other constructs are case insensitive.  

Why Use HQL?

The main driver to using HQL would be database portability. Because its implementation is designed to be database agnostic, if your application uses HQL for querying the database, you can interchange the underlying database by making simple changes to the configuration XML file. As opposed to native SQL, the actual code will remain largely unchanged if your application starts talking to a different database.

Prominent Features

A complete list of features implemented by HQL can be found on their website. Here, we present examples of some basic and salient features that will help you get going on HQL. These examples are using a table by the name of ‘largecities’ that lists out the 10 largest metropolitans of the world. The descriptor and data are:

postgres=# \d largecities
 Table "public.largecities"
 Column |          Type          | Modifiers 
--------+------------------------+-----------
 rank   | integer                | not null
 name   | character varying(255) | 
Indexes:
 "largecities_pkey" PRIMARY KEY, btree (rank)
postgres=# select * from largecities;
[...]

We've just flipped the switch on www.postgresql.org to be served on https only. This has been done for a number of reasons:

  • In response to popular request
  • Google, and possibly other search engines, have started to give higher scores to sites on https, and we were previously redirecting accesses to cleartext
  • Simplification of the site code which now doesn't have to keep track of which pages need to be secure and which does not
  • Prevention of evil things like WiFi hotspot providers injecting ads or javascript into the pages

We have not yet enabled HTTP Strict Transport Security, but will do so in a couple of days once we have verified all functionality. We have also not enabled HTTP/2 yet, this will probably come at a future date.

Please help us out with testing this, and let us know if you find something that's not working, by emailing the pgsql-www mailinglist.

There are still some other postgresql.org websites that are not available over https, and we will be working on those as well over the coming weeks or months.

Today, organizations stores information(data) in different database systems. Each database system has a set of applications that run against it. This data is just bits and bytes on a file system - and only a database can turn the bits and bytes of data into business information. Integration and consolidation of such information(data) into one database system is often difficult. Because many of the applications that run against one database may not have an equivalent application that runs against another. To consolidate the information into one database system, we need a heterogeneous database connection.  In this post, I'll demo on how you may connect PostgreSQL to one of heterogeneous database Oracle using different methods.

Below are few methods to make connection to Oracle database in PostgreSQL.
  • Using ODBC Driver
  • Using Foreign DataWrappers
  • Using Oracle Call Interface(OCI) Driver

Using ODBC Driver

Open DataBase Connectivity(ODBC) is a standard software API for using DBMS. The ODBC driver/ODBC Data source(API) is a library that allows applications to connect to any database for which an ODBC driver is available. It's a middle layer translates the application's data queries into commands that the DBMS understands. To use this method, an open source unixODBC and Oracle ODBC driver(Basic/ODBC/Devel) packages required. Along with a module in PostgreSQL that can communicate to DSN created using unixODBC and Oracle ODBC driver. Few years back CyberTec has released a module ODBC-Link, at present it is obsolete, however, it has a dblink-style implementation for PostgreSQL to connect to any other ODBC compliant database. We can use this module for basic connectivity to Oracle. Let's see.

Install unixODBC
tar -xvf unixODBC-2.3.4.tar.gz
cd unixODBC-2.3.4/
./configure --sysconfdir=/etc
make
make install
Binary/Libraries
[...]
Posted by Pavel Stehule on 2016-05-24 at 12:55
New version of plpgsql_check is available. The most important change is support for future PostgreSQL 9.6

https://manager.pgxn.org/distributions/plpgsql_check/1.0.5
https://github.com/okbob/plpgsql_check/releases/tag/v1.0.5
Posted by Daniel Pocock on 2016-05-23 at 17:35

PostBooks 4.9.5 was recently released and the packages for Debian (including jessie-backports), Ubuntu and Fedora have been updated.

Postbooks at pgDay.ch in Rapperswil, Switzerland

pgDay.ch is coming on Friday, 24 June. It is at the HSR Hochschule für Technik Rapperswil, at the eastern end of Lake Zurich.

I'll be making a presentation about Postbooks in the business track at 11:00.

Getting started with accounting using free, open source software

If you are not currently using a double-entry accounting system or if you are looking to move to a system that is based on completely free, open source software, please see my comparison of free, open source accounting software.

Free and open source solutions offer significant advantages: flexibility, businesses can choose any programmer to modify the code, and use of SQL back-ends, multi-user support and multi-currency support are standard. These are all things that proprietary vendors charge extra money for.

Accounting software is the lowest common denominator in the world of business software, people keen on the success of free and open source software may find that encouraging businesses to use one of these solutions is a great way to lay a foundation where other free software solutions can thrive.

PostBooks new web and mobile front end

xTuple, the team behind Postbooks, has been busy developing a new Web and Mobile front-end for their ERP, CRM and accounting suite, powered by the same PostgreSQL backend as the Linux desktop client.

More help is needed to create official packages of the JavaScript dependencies before the Web and Mobile solution itself can be packaged.

Goldfields Pipeline, Western Australia

Goldfields Pipeline, by SeanMac (Wikimedia Commons)

If you’re trying to optimise the performance of your PostgreSQL-based application you’re probably focusing on the usual tools: EXPLAIN (BUFFERS, ANALYZE), pg_stat_statements, auto_explain, log_statement_min_duration, etc.

Maybe you’re looking into lock contention with log_lock_waits, monitoring your checkpoint performance, etc too.

But did you think about network latency? Gamers know about network latency, but did you think it mattered for your application server?

Latency matters

Typical client/server round-trip network latencies can range from 0.01ms (localhost) through the ~0.5ms of a switched network, 5ms of WiFi, 20ms of ADSL, 300ms of intercontinental routing, and even more for things like satellite and WWAN links.

A trivial SELECT can take in the order of 0.1ms to execute server-side. A trivial INSERT can take 0.5ms.

Every time your application runs a query it has to wait for the server to respond with success/failure and possibly a result set, query metadata, etc. This incurs at least one network round trip delay.

When you’re working with small, simple queries network latency can be significant relative to the execution time of your queries if your database isn’t on the same host as your application.

Many applictions, particularly ORMs, are very prone to running lots of quite simple queries. For example, if your Hibernate app is fetching an entity with a lazily fetched @OneToMany relationship to 1000 child items it’s probably going to do 1001 queries thanks to the n+1 select problem, if not more. That means it’s probably spending 1000 times your network round trip latency just waiting. You can left join fetch to avoid that… but then you transfer the parent entity 1000 times in the join and have to deduplicate it.

Similarly, if you’re populating the database from an ORM, you’re probably doing hundreds of thousands of trivial INSERTs… and waiting after each and every one for the server to confirm it’s OK.

It’s easy to try to focus on query execution time

[...]

I ran into an issue while trying to setup a simple test system for myself to move from CentOS 6 to 7. When I was about to start a standby slave systemd reported PostgreSQL startup timeout and stopped it. Although it was running perfectly fine doing recovery.

Issue

In var log message systemd reports failure

systemd: postgresql-9.5.service start operation timed out. Terminating.
systemd: Failed to start PostgreSQL 9.5 database server.
systemd: Unit postgresql-9.5.service entered failed state.
systemd: postgresql-9.5.service failed.

Meanwhile in postgresql

[1342]: [3-1] host=,user=,db=,tx=0,vtx= LOG:  received smart shutdown request
[1383]: [3-1] host=,user=,db=,tx=0,vtx= LOG:  shutting down
[1383]: [4-1] host=,user=,db=,tx=0,vtx= LOG:  database system is shut down

Cause

In the systemd script postgresql is being started with -w flag which means ”  -w                     wait until operation complete”. Hence systemd fails after configured timeout.

ExecStart=/usr/pgsql-9.5/bin/pg_ctl start -D ${PGDATA} -s -w -t 300

Fix

Change the -w flag to -W and systemd service file (/usr/lib/systemd/system/postgresql-9.5.service) and reload the daemon.

systemctl daemon-reload
service postgresql-9.5 start

Hopefully this will save you couple of minutes debugging.

pgRouting 2.2.3 was released last week. Main change is this version now supports PostgreSQL 9.6. Many thanks to Vicky Vergara for working thru the issues with PostgreSQL 9.6 and getting it to work. Vicky has also been doing a good chunk of the coding (a lot of Boost refactoring and integrating more Boost features), testing, and documentation in pgRouting, osm2pgrouting, and QGIS pgRoutingLayer in general for pgRouting 2.1, 2.2, and upcoming 2.3. We are very indebted to her for her hard work.

If you are a windows user testing the waters of PostgreSQL 9.6beta1, we have pgRouting 2.2.3 binaries and PostGIS 2.3.0dev binaries at http://postgis.net/windows_downloads.


Continue reading "pgRouting 2.2.3 released with support for PostgreSQL 9.6beta1"
Posted by Shaun M. Thomas on 2016-05-20 at 16:00

I always advocate breaking up large Postgres tables for a few reasons. Beyond query performance concerns, maintaining one monolithic structure is always more time consuming and consequentially more dangerous. The time required to create a dozen small indexes may be slightly longer than a single larger one, but we can treat the smaller indexes as incremental. If we want to rebuild, add more indexes, or fix any corruption, why advocate an all-or-nothing proposition? Deleting from one large table will be positively glacial compared to simply dropping an entire expired partition. The list just goes on and on.

On the other hand, partitioning in Postgres can be pretty intimidating. There are so many manual steps involved, that it’s easy to just kick the can down the road and tackle the problem later, or not at all. Extensions like the excellent pg_partman remove much of the pain involved in wrangling an army of partitions, and we strongly suggest using some kind of tool-kit instead of reinventing the wheel.

The main limitation with most existing partition management libraries is that they never deviate from the examples listed in the Postgres documentation. It’s always: create inherited tables, add redirection triggers, automate, rinse, repeat. In most cases, this is exactly the right approach. Unfortunately triggers are slow, and especially in an OLTP context, this can introduce sufficient overhead that partitions are avoided entirely.

Well, there is another way to do partitioning that’s almost never mentioned. The idea is to actually utilize the base table as a storage target, and in lieu of triggers, schedule data movement during low-volume time periods. The primary benefit to this is that there’s no more trigger overhead. It also means we can poll the base table itself for recent data with the ONLY clause. This is a massive win for extremely active tables, and the reason tab_tier was born.

Let’s create some data for testing this out:

CREATE TABLE sensor_log (
  id            INT PRIMARY KEY,
  location      V
[...]

One of the services we offer are security reviews (or audits, if you want), covering a range of areas related to security. It may be a bit surprising, but a topic that often yields the most serious issues is roles and privileges. Perhaps the reason why roles and privileges are a frequent source of issues is that it seems to be quite simple and similar to things the engineers are familiar with (e.g. Unix system of users and groups), but it turns out there are a few key differences with major consequences.

The other parts are either very straightforward and understandable even for sysadmins without much PostgreSQL experience (e.g. authentication config in pg_hba.conf), or the engineers recognize the complexity and take their time to familiarize with the details (a good example of this is Row Level Security).

That is not to say there are no interesting topics e.g. how to use RLS with application-level users but I’ll leave that for another blog post, as this one is about roles and privileges.

So let’s look at roles and privileges a bit a closer …

Owner is a small superuser

When it comes to roles, the initial checks are mostly expected. The role should not be a superuser (as superusers simply bypass various checks), and in general should not have any excessive privileges (e.g. CREATEDB, CREATEROLE and so on).

But it also should not own the database objects (tables, functions, …), since owners can simply grant themselves arbitrary privileges on the objects they own, which turns them into small superusers.

Consider the following example, where we attempt to protect the table from the owner by revoking all the privileges from that role:

db=# CREATE USER u;
CREATE ROLE

db=# SELECT rolsuper FROM pg_roles WHERE rolname = 'u';
 rolsuper
----------
 f
(1 row)

db=# \c 'user=u dbname=db'
You are now connected to database "db" as user "u".

So we have created a user who is not a superuser, and we have connected using that account (that’s the slightly cryptic psql command). Let’s create a table (so the user is an owner) and r

[...]
Posted by Scott Mead in OpenSCG on 2016-05-19 at 17:32

I use PostgreSQL every single day.  I develop against multiple versions of the database (4 versions running in parallel as I write), I have apps that use it and, my daily DBA-ing.  The biggest question I get from newbies and veterans a like is: “What are your PostgreSQL defaults?”

If you follow postgres, you already know that the default configuration (postgresql.conf) is very conservative from a resource (CPU, IOPS, Disk Space) perspective.  Over the last 10 years, we [the community] have developed some straightforward and easy to understand formulas that will help you tune… shared_buffers for example.  The item that always gets left out though is logging.  As a developer, I’m always looking for ways to see “How the database is answering the questions I ask”.  When I get a new postgres instance set up, I have a personal (somewhere in the cobwebs) checklist that I run.  Some are based on the purpose of the deployment (shared_buffers, work_mem, etc…), some are things that I always set.  Aside from memory, the biggest set of “standard” items I set are all related to logging.  I’m big on monitoring (my pg_stat_activity patch was accepted back for 9.2) and having the right detail presented to me is important.

TL;DR

 logging_collector = on
 log_filename = ‘postgresql-%a.log’
log_truncate_on_rotation=on
 log_line_prefix = ‘%t [%p]: [%l-1] user=%u,db=%d,app=%a,client=%h ‘
 log_checkpoints = on
 log_autovacuum_min_duration = 0
 log_temp_files = 0
 log_lock_waits = on
the goal here is to have postgres ‘bubble-up’ details about what’s going on to us so

that we can see exactly what is happening when.  In addition to setting the log files, you can pick up a great log analyzer (like pgBadger or pgCluu) that will break all of this down in to a gorgeous, easy-to-use report.

Scott’s logging defaults

logging_collector = on

By default, the logging_collector is disabled, this means that all of the info / error / warning, etc… logs that come out of postgres get kicked are sent to stderr.  Problem is, when your postgres forks from t
[...]
Posted by Scott Mead in OpenSCG on 2016-05-19 at 17:32

I use PostgreSQL every single day.  I develop against multiple versions of the database (4 versions running in parallel as I write), I have apps that use it and, my daily DBA-ing.  The biggest question I get from newbies and veterans a like is: “What are your PostgreSQL defaults?”

If you follow postgres, you already know that the default configuration (postgresql.conf) is very conservative from a resource (CPU, IOPS, Disk Space) perspective.  Over the last 10 years, we [the community] have developed some straightforward and easy to understand formulas that will help you tune… shared_buffers for example.  The item that always gets left out though is logging.  As a developer, I’m always looking for ways to see “How the database is answering the questions I ask”.  When I get a new postgres instance set up, I have a personal (somewhere in the cobwebs) checklist that I run.  Some are based on the purpose of the deployment (shared_buffers, work_mem, etc…), some are things that I always set.  Aside from memory, the biggest set of “standard” items I set are all related to logging.  I’m big on monitoring (my pg_stat_activity patch was accepted back for 9.2) and having the right detail presented to me is important.

TL;DR

 logging_collector = on
 log_filename = ‘postgresql-%a.log’
log_truncate_on_rotation=on
 log_line_prefix = ‘%t [%p]: [%l-1] user=%u,db=%d,app=%a,client=%h ‘
 log_checkpoints = on
 log_autovacuum_min_duration = 0
 log_temp_files = 0
 log_lock_waits = on
the goal here is to have postgres ‘bubble-up’ details about what’s going on to us so

that we can see exactly what is happening when.  In addition to setting the log files, you can pick up a great log analyzer (like pgBadger or pgCluu) that will break all of this down in to a gorgeous, easy-to-use report.

Scott’s logging defaults

logging_collector = on

By default, the logging_collector is disabled, this means that all of the info / error / warning, etc… logs that come out of postgres get kicked are sent to stderr.  Problem is, when your postgres forks from t
[...]
Posted by Pallavi Sontakke in 2ndQuadrant on 2016-05-19 at 12:31

The purpose of this blog is to explain the process of Quality Analysis that Postgres-XL goes through internally at 2ndQuadrant. Here, I describe the bare minimum tests that each release goes through, not to mention the other tests carried out by many 2ndQuadrant and other community members.

  •   Regression tests are carried out to ensure that the defect fixes or enhancements to PostgreSQL have not affected Postgres-XL.

We want to keep up-to-date with PostgreSQL features and performance enhancements. Also, many tools may work with only newer PostgreSQL releases. So, we merge each PostgreSQL minor release in a timely fashion. When merging the regression tests therein, we need to continuously and consciously ensure if the new features are supported in Postgres-XL fine. Sometimes, there may be a gap in the expected outputs of PostgreSQL and Postgres-XL, which has to be tracked.

  • Functional tests are done to validate that the functionalities of Postgres-XL are working as per business requirements. We ensure that all features are functioning as they are expected to.

Initially we created the ‘Major differences and Limitations’ tests’ module. Postgres-XL project (Postgres-XC, initially) was based on PostgreSQL 9.2. It picked up speed again, during PostgreSQL 9.5 development timeframe. Due to this, there is a known gap of features that Postgres-XL would like to support, but does not currently. We keep track of these with our xl_* functional tests. These tests cover limitations like materialized views, event triggers, foreign data wrappers, etc. Also, on the other hand they cover positive functional tests for features like BRIN, logical decoding of WAL data, jsonb, etc.

Additionally, each time a new feature or enhancement is added, we keep adding tests to validate the functionality.

  • Usability tests are performed to validate the ease with which the user interfaces can be used.

We have cluster-setup-utility tests. Creating a Postgres-XL cluster manually requires quite a few steps. We have automated these for ease-of-use wi

[...]
Posted by Vasilis Ventirozos in OmniTI on 2016-05-19 at 09:23
Yesterday, i got a page from a client about a possible corruption issue, to one of his clients,  so i decided to share how i dealt with it. Before starting with how i approached the problem, i want to say that corruptions don't just happen in postgres, in OmniTI, we manage A LOT of databases, all with different major versions and in different operating systems and in my time, I haven't seen (many) cases that corruption happened because of a postgres bug (i've seen indexes getting corrupted but i don't remember ever seeing data being corrupted). What i want to say is that corruptions don't just happen, hardware failures, power outages, disks getting full are common reasons for corruptions. 
A replica and backups should always be in place and the server should be properly monitored. Unfortunately this server was not managed by us so none of the above was in place..

At first I saw in the logs entries like :

From the logs:
2016-05-18 15:50:06 EDT::@:[10191]: ERROR:  could not access status of transaction 128604706
2016-05-18 15:50:06 EDT::@:[10191]: DETAIL:  Could not open file "pg_clog/007A": No such file or directory.
2016-05-18 15:50:06 EDT::@:[10191]: CONTEXT:  automatic vacuum of table "pe.pg_toast.pg_toast_4409334"
...
much more to be expected...
...

At this point you know that something went wrong, ideally you want to copy your whole $PGDATA to somewhere else and work there while at the same time you start considering uncompressing your most recent backup. In my case,  there was no backup and not enough disk space to copy $PGDATA.
FUN TIMES :)

I started by fixing all clogs missing which i found  logs
dd if=/dev/zero of=/var/db/pgdata/pg_clog/0114 bs=256k count=1
dd if=/dev/zero of=/var/db/pgdata/pg_clog/00D1 bs=256k count=1
dd if=/dev/zero of=/var/db/pgdata/pg_clog/0106 bs=256k count=1
...
...
keep creating until logs are clean, they can be a lot, in my case they were more than 100....
...
...

From the logs i also found the tables :

pg_toast.pg_toast_18454
pg_toast.pg_toast_35597
pg_toast.pg_toast_35607
pg
[...]
Posted by Oleg Bartunov in Postgres Professional on 2016-05-18 at 23:44
Sometimes, using too specific can be risky, since the search could return zero or little results.
Better search for a wide query and use ranking to sort the results.

Example:
SELECT title,ts_rank_cd(fts, to_tsquery('english','x-ray & russian')) AS rank 
FROM apod WHERE fts  @@ to_tsquery('english','x-ray&russian') ORDER BY rank 
DESC LIMIT 5;
                 title                  |   rank
----------------------------------------+-----------
 The High Energy Heart Of The Milky Way | 0.0240938
(1 row)

SELECT title,ts_rank_cd(fts, to_tsquery('english','x-ray & russian')) AS rank 
FROM apod WHERE fts  @@ to_tsquery('english','x-ray') ORDER BY rank DESC LIMIT 5;
                 title                  |   rank
----------------------------------------+-----------
 The High Energy Heart Of The Milky Way | 0.0240938
 X-Ray Jet From Centaurus A             |         0
 Barnacle Bill And Sojourner            |         0
 The Crab Nebula in X-Rays              |         0
 M27: The Dumbbell Nebula               |         0
(5 rows)
Posted by US PostgreSQL Association on 2016-05-18 at 16:29

Continuing my Building a Better community series, I contacted Steve Atkins. There was an interesting twist to this particular Unsung Hero. I initially contacted lluad on #postgresql (irc.freenode.net) due to his continuous and untiring efforts to help people on channel. What I didn't know is that Steve Atkins is actually lluad. Read on for some interesting bits.

How do you use PostgreSQL?

I use PostgreSQL almost any time I need to store or process structured data.

read more

Posted by Josh Berkus in pgExperts on 2016-05-18 at 16:02
Per yesterday's developer meeting, the PostgreSQL Project is contemplating a change to how we do version numbers.  First, let me explain how we do version numbers now.  Our current version number composition is:

9 . 5 . 3 
Major1 . Major2 . Minor


That is, the second number is the "major version" number, reflecting our annual release.  The third number is the update release number, reflecting cumulative patch releases.  Therefore "9.5.3" is the third update to to version 9.5.

The problem is the first number, in that we have no clear criteria when to advance it.  Historically, we've advanced it because of major milestones in feature development: crash-proofing for 7.0, Windows port for 8.0, and in-core replication for 9.0.  However, as PostgreSQL's feature set matures, it has become less and less clear on what milestones would be considered "first digit" releases.  The result is arguments about version numbering on the mailing lists every year which waste time and irritate developers.

As a result, the PostgreSQL Project is proposing a version numbering change, to the following:

10 . 2
Major . Minor

Thus "10.2" would be the second update release for major version 10.   The version we release in 2017 would be "10" (instead of 10.0), and the version we release in 2018 will be "11".

The idea is that this will both put an end to the annual arguments, as well as ending the need to explain to users that 9.5 to 9.6 is really a major version upgrade requiring downtime.

Obviously, there is potential for breakage of a lot of tools, scripts, automation, packaging and more in this.  That's one reason we're discussing this now, almost a year before 10 beta is due to come out.

The reason for this blog post is that I'm looking for feedback on what this version number change will break for you.  Particularly, I want to hear from driver authors, automation engineers, cloud owners, application stack owners, and other folks who are "downstream" of PostgreSQL.  Please let us know what technical problems this will cause for you[...]
Posted by Joshua Drake in CommandPrompt on 2016-05-18 at 15:53
This is just a short blog to thank EDB and Robert Haas for the this blog. It is great to see all aspects of the community coming together. This is how we defeat the true competitors. Open and Closed Source products will in the long run fail to materialize a true return on investment that is anywhere close to what a coopetition based true Open Source project can.
Posted by US PostgreSQL Association on 2016-05-17 at 18:21

Austin, Texas – May 16, 2016 – The PostgreSQL community announces their participation at OSCON 2016, the largest open source conference in North America. OSCON takes place in Austin, Texas, May 16 – 19, 2016 and is the best opportunity to meet the open source community.

read more