PostgreSQL
The world's most advanced open source database
Top posters
Number of posts in the past month
Top teams
Number of posts in the past month
Feeds
Planet
  • Policy for being listed on Planet PostgreSQL.
  • Add your blog to Planet PostgreSQL.
  • List of all subscribed blogs.
  • Manage your registration.
Contact
  • Get in touch with the Planet PostgreSQL administrators at planet at postgresql.org.

Oh my goodness, Data Days!


When we rescheduled PGConf US Local: Seattle from August to November we did so due to attendee feedback. It was amazing - people didn't want to go to a conference on Saturday in August (I wonder why). I know, we should have known but it was a new model and we tried. We are extremely pleased with the results of the shift in schedule. The conference now takes place during "professional hours" on "professional days."

Image result for creative commons professional

Because of the shift and sponsor support we have added three new tracks, reopened the CFP, and created Data Days. The new tracks are: Big Data, AWS/Cloud, and Data Science. As these three Postgres content areas are Postgres independent we are also requesting that all communities within this realm submit to present. Let's turn PGConf US Local: Seattle into not only the best West Coast Postgres Conference but also the most highly integrated, heterogeneous data event in the Pacific Northwest.

CFP Dates:

  • Open until: 10/15/2017
  • Notification:  10/18/2017
  • CFP Link

1. Introduction

Logical replication uses a publish/subscribe model and so we create publications on the upstream (or publisher) and subscriptions on downstream (or subscriber). For more details about it, please refer to this blog post from my colleague Petr Jelinek, and also to the PostgreSQL documentation.

Here we will show how to build a test environment to play with this new feature from PostgreSQL 10, and how to configure it using OmniDB 2.1.

2. Building test environment

Let’s build a 2-node test environment to illustrate how to configure PG10 logical replication feature within OmniDB.

2.1. Pull OmniDB repo

The first thing you need to do is to download OmniDB in the repo from GitHub and make sure you are in the development branch. Run the following:

2.2. Create 2 virtual machines with PostgreSQL 10

On your host machine, you need to have installed:

  • VirtualBox
  • Vagrant
  • Vagrant plugin vbguest

Please refer to the VirtualBox and Vagrant websites for more information.

For this test environment to work, you need to put both machines in the same internal network. If necessary, please edit the file Vagrantfile for both machines like this:

config.vm.network "private_network", ip: '192.168.56.101', :name => 'vboxnet0', :adapter => 2
Now, on your terminal (assuming you are on the root directory of OmniDB repo), to create the first virtual machine with PostgreSQL 10, you need to do:
cd OmniDB/OmniDB_app/tests/vagrant/multinode/pg10_node1
vagrant up
While inside this directory, you can connect via SSH to the machine with vagrant ssh. User vagrant has root access with sudo su. Now let’s create the second virtual machine with PostgreSQL 10:
cd ../pg10_node2
vagrant up
You can connect to this machine via SSH the same way. Also, port 5432 of the fist machine is mapped to the port 5401 of the host machine, and port 5432 of the second machine is mapped to the port 5402 of the host machine. From the host machine, you can connect to both virtual machines through OmniDB, using the user omnidb.

3. Configuring logi

[...]

SQL extract provides access to the components of temporal data types—i.e. date, time, timestamp, and interval.

SQL extract uses the keyword from to separate the field name from the value.

EXTRACT(<field> FROM <expression>)

The field names are SQL keywords too—you must not put them in double or single quotes.

SQL extract returns an exact numeric value. For second, it also includes fractions.0 The following table lists the extract fields defined by the SQL standard.

Meaning extract field
Year YEAR
Month MONTH
Day of month DAY
24 hour HOUR
Minute MINUTE
Seconds (including fractions) SECOND
Time zone hour TIMEZONE_HOUR
Time zone minute TIMEZONE_MINUTE

Related Features

Extract can only get single fields. To extract the full date (year, month, day) or time (hour, minute, second) from a timestamp, cast can be used:1

CAST(<timestamp> AS [DATE|TIME])

This is particularly useful for the group by clause. In the where clause, it is often the wrong choice. For more on this, see Inappropriate Use in The Where Clause below.


Caution: Oracle Database

The Oracle database doesn’t have a predefined date type without time components. Even the Oracle type date has a time component—in this regard, Oracle date is more like the standard timestamp.2 A type conversion (cast) to date therefore does not drop the time component in the Oracle database.

To use the date only—without time components—it is common practice to use the proprietary trunc function to set all time fields to zero (0):

trunc(<timestamp>)

Note that the result still has the time components—they are just set to zero. The effect is basically like the following cast expression in standard SQL:

CAST(CAST(<timestamp> AS DATE) AS TIMESTAMP)

Compatibility

SQL extract was available in SQL-92 (intermediate) and is now part of the optional feature F052, “Intervals and datetime arithmetic”. Despite its maturity and relevance, extract is still not supported by all major databases yet.

Availability of EXTRACT

Related Anti-Patterns

String Formatting Functions

A very common ant

[...]

In our last blog post, set_user: Installation and Set-Up, we introduced you to the set_user component, installed the extension in a database, created a new user, and granted it set_user function execution privileges.

If you recall, the set_user documentation states the main goal is to:

grant the EXECUTE privilege to the set_user() and/or set_user_u() function to otherwise unprivileged postgres users. These users can then transition to other roles, possibly escalating themselves to superuser through use of set_user_u(), when needed to perform specific actions.

In an upcoming post we will address the set_user function and what it allows you.

In this post, we are focusing on the set_user_u function and how its magic happens. Wave this magic wand and poof! Your normal worker bee user has become a superuser!

But not so fast! We don’t trust you without some parental controls – enhanced logging – to make sure you aren’t taking advantage of your new powers.

How it works

Syntax

set_user_u(text rolename) returns text
reset_user() returns text

Inputs

“rolename” is the role to be transitioned to.

What happens

When an allowed user executes set user_u(’rolename’), several actions occur:

  • The current effective user becomes rolename.
  • The role transition is logged, with specific notation if rolename is a superuser.
  • log statement setting is set to ’all’, meaning every SQL statement executed while in this state will also get logged.

When finished with required actions as rolename, the reset_user() function is executed and these actions occur:

  • User role is restored the original user.
  • Role transition is logged.
  • log statement setting is set to its original value.
  • Blocked command behaviors return to normal.

So what’s the diff: current vs session user?

The session user is the user that started the session. The current user is the user that has been set via set local session authorization, or via the SECURITY DEFINER attribute of a function. The current user is what is used for permission checks. So, to put it in the context, c

[...]
Posted by Bruce Momjian in EnterpriseDB on 2017-09-18 at 15:30

The Postgres hackers list is a steady-stream of great ideas and discussion, but occasionally something comes along that really makes you sit back and think, "Wow, where did that come from?" Such was a February 2017 email from Konstantin Knizhnik presenting a proof-of-concept vectorization optimization for the executor.

In May Andres Freund presented a plan for speeding up the executor using Just In Time Compilation (JIT) and LLVM. This work is pending for Postgres 11. (In fact, it was almost committed to Postgres 10 on April Fool's Day.)

Konstantin's work adds vectorization to the executor, which can be revolutionary. Once Andres's work is in, we can research how to make the executor even faster using vectorization. This would open up Postgres to an entirely new class of big-data applications.

Posted by Dimitri Fontaine on 2017-09-18 at 08:49

PostgreSQL has had proper json support for a while now. The unique extensibility approach of the PostgreSQL system allows it to enable native SQL friendly JSON processing.

In this article we’ll play with the Magic: the Gathering card data in JSON format data set, provided with a CC0 licence, and process the information provided. We also see how to normalize a JSON document into a proper database model that benefits from some PostgreSQL advanced features, and how to then inject the JSON documents into the normalized database schema. Finally, we compare some non-trivial processing done against both versions of the database schema.

Managing connections in Postgres is a topic that seems to come up several times a week in conversations. I’ve written some about scaling your connections and the right approach when you truly need a high level of connections, which is to use a connection pooler like pgBouncer. But what do you do before that point and how can you better track what is going on with your connections in Postgres?

Postgres under the covers has a lot of metadata about both historical and current activity against a system. Within Postgres you can run the following query which will give you a few results:

```sql SELECT count(*),

   state 

FROM pg_stat_activity GROUP BY 2; count | state ———–+———————————————–

 7 | active
69 | idle
26 | idle in transaction
11 | idle in transaction (aborted)

(4 rows)

Time: 30.337 ms ```

Each of these is useful in determining what you should do to better manage your connection count. All of these numbers can be useful to record every say 30 seconds and chart on your own internal monitoring. Lets break down each:

  • active – This is currently running queries, in a sense this is truly how many connections you may require at a time
  • idle – This is where you have opened a connection to the DB (most frameworks do this and maintain a pool of them), but nothing is happening. This is the one area that a connection pooler like pgBouncer can most help.
  • idle in transaction – This is where your app has run a BEGIN but it’s now waiting somewhere in a transaction and not doing work.

For idle as mentioned above it’s one that you do want to monitor and if you see a high number here it’s worth investing in setting up a pgBouncer.

For idle in transaction this one is a bit more interesting. Here what you likely want to do when first investigating is get an idea of how old those are. You can do this by querying pg_stat_activity and filtering for where the state is idle in transaction and checking how old those queries are.

If you find that you have some stale transactions hanging around this could be for days, hours, or eve

[...]
Posted by gabrielle roth on 2017-09-16 at 04:30

… ok, maybe a bit less than a week.

When: 6-8pm Thursday Sept 21, 2017
Where: iovation
Who: Mark Wong
What: Postgres BI Performeant & News From the Pg-XL Project

If you missed Postgres Open, here’s your chance to catch one of the talks!

Mark will share performance improvements in analytical / BI workloads in recent PostgreSQL releases, illustrated with results from multiple major versions, and also about what additional improvements might happen in the near future. He’ll will also discuss news from Postgres-XL, the distributed PostgreSQL fork, and how it compares to the current PostgreSQL in this area.

Mark leads the 2ndQuadrant performance practice as a Performance Consultant for English Speaking Territories, based out of Oregon in the USA.


If you have a job posting or event you would like me to announce at the meeting, please send it along. The deadline for inclusion is 5pm the day before the meeting.

Our meeting will be held at iovation, on the 32nd floor of the US Bancorp Tower at 111 SW 5th (5th & Oak). It’s right on the Green & Yellow Max lines. Underground bike parking is available in the parking garage; outdoors all around the block in the usual spots. No bikes in the office, sorry!

iovation provides us a light dinner (usually sandwiches or pizza).

Elevators open at 5:45 and building security closes access to the floor at 6:30.

See you there!


Posted by Bruce Momjian in EnterpriseDB on 2017-09-16 at 01:30

When setting up a database schema, indexing is always a consideration. While Postgres supports traditional btree indexes for most data types, it can be quite heavy, often requiring a significant percentage of the table size for index storage.

There are two options for schemas that need to index a large percentage of columns. The first is BRIN indexes which allow for small, low-overhead indexes that can be created on many columns. One downside of BRIN indexes is that they must be defined on each column that needs it.

A more generic solution is to place all data in a JSONB column and create a GIN index. The GIN index indexes every JSON key and value, and only stores the keys and values once in the index. This is a great "index everything" option for administrators that aren't sure of their indexing needs.

Posted by Holly Orr in OpenSCG on 2017-09-15 at 19:21

If you haven’t installed the BigSQL Postgres Distribution, you can start here to use an installer or here if you want to create a sandbox via command line.

Next, via command line, navigate to the BigSQL or PostgreSQL directory where pgc is located and install the latest set_user component by running the following command (Windows users don’t prefix the pgc command with ./ as shown in the below examples):

./pgc install setuser1-pg96

Invoke session environment variables

Navigate to the pg96 directory. And run the environment variable file.

Linux / OSX:

 source pg96.env

Windows:

 pg96-env.bat

Edit postgresql.conf

Add set_user to the shared_preload_libraries line to postgresql.conf using your favorite editor. There are many ways of doing this (including the ALTER SYSTEM command):

vi $PGDATA/postgresql.conf

SHIFT + G (to get to end of file) and add this line:

shared_preload_libraries = 'set_user'

Save the change you made and restart PostgreSQL by navigating to the directory (bigsql or PostgreSQL) and run:

./pgc restart

Create a database

psql -U postgres postgres
postgres=# CREATE DATABASE testdb;
    CREATE DATABASE

Create the set_user extension

postgres=# \connect testdb;
You are now connected to database "testdb" as user "postgres".

testdb=# CREATE EXTENSION set_user;
    CREATE EXTENSION

testdb=# \dx
                     List of installed extensions
   Name   | Version |   Schema   |                Description
----------+---------+------------+--------------------------------------------
 plpgsql  | 1.0     | pg_catalog | PL/pgSQL procedural language
 set_user | 1.4     | public     | similar to SET ROLE but with added logging

Create a new user

Create user with a password:

testdb=# CREATE USER dba_user WITH PASSWORD 'passuser';

Give the new user privileges to execute set_user functions

From https://github.com/pgaudit/set_user

The concept is to grant the EXECUTE privilege to the set_user() and/or set_user_u() function to otherwise unprivileged postgres users. These users can then trans

[...]
Posted by Craig Kerstiens in CitusData on 2017-09-15 at 17:55

In the beginning there was Postgres

We love Postgres at Citus. And rather than create a newfangled database from scratch, we implemented Citus as an extension to Postgres that allows you to seamlessly scale out your database. We’ve talked a lot on our blog here about you can leverage Citus, about key use cases Citus is good for, and different data models and sharding approaches that make sense with Citus. But we haven’t spent a lot of time explaining how Citus works. So if you want to dive deeper into how Citus actually works under the covers, here we’re going to walk through how Citus works from sharding the data all the way through to how the executors perform queries to keep things snappy.

Distributing data within Citus

Citus gets its benefits from sharding your data which allows us to split the data across multiple physical nodes. When your tables are significantly smaller due to sharding your indexes are smaller, vacuum runs faster, everything works like it did when your database was smaller and easier to manage.

To setup your initial shards you first create a table, then you tell Citus to shard it with a function:

CREATE TABLE events (id serial, data jsonb);
SELECT create_distributed_table('events', 'id');

When you run the create_distributed_table function we then create a number of other tables under the covers and spread those out across the other nodes within your Citus cluster:

events_102008
events_102009
events_102010
events_102011
...
events_102039

Most of the time you never need to know the names of these tables, look at them, or even be aware that they exist. Once you’ve done this initial setup Citus then starts to kick in for all new queries that come in. Let’s first look at INSERT then we can move on to SELECT queries and see how they’re handled.

Getting data in

When you insert data to Citus you’re inserting directly into Postgres. As an extension Citus on the fly determines how to route that insert. In order to do this we first get a hash value for the column you sharded on. We use the i

[...]

Postgres: The center of your data universe

This talk is proving to be great content for those who are not neccessarily Postgres Users. Last night I presented this talk at the Seattle Web Developers Meetup. The location was Adobe, next to Google and Tableau. I didn't even know there was a small tech complex on N. 34th in Seattle. There were about 27 people, which falls in line with the 50% rule of Meetups[1]. Here are my observations from the audience.

  1. Surprisngly the majority of developers were Python web developers (as a group)
  2. Many of them do not use Postgres and wanted to hear more about it
  3. There were several government employees that were wondering how to get Postgres deployed
  4. Presenting with two screens next to each other is difficult because I walk when I present
  5. People are still interested in PLphp

I wasn't suprised that there were Python web developers. I was surprisd that there were not more Node or Ruby developers. In fact, when asked there wasn't a single Ruby person in the room and Node was only a hobby for one other.

About half the room raised their hand when asked if they used Postgres. However, the rest were very engaged, asked intelligent questions and most were particularly interested in our JSON and FDW support.

The government employee was trying to figure out how to get out of MSSQL. My rseponse was the classic, "Just don't tell them". Obviously I don't want to get anyone in trouble but more often than people like to admit the boss doesn't care if you are using MSSQL, they care that you aren't going to lose data and the application is going to perform well. The spec could just say SQL DB, which MSSQL and PostgreSQL both are. We laughed at the idea but I also informed her of Crunchy's Postgres with Common Criteria and other governmental agecies that are running Postgres.

The two screen things is weird. I do o.k. when the screens are on the left and right of me but both screens were to the left so I ended up cutting off the screen anytime I walked. I might need to figure out how to stand still

[...]
Posted by Pavel Stehule on 2017-09-15 at 04:58
plpgsql_check 1.2.1 is available. Now there some analyse of dead code and detection of missing RETURN command is included.
create or replace function fx(x int)
returns int as $$
begin
begin
if (x > 0) then
raise exception 'xxx' using errcode = 'XX888';
else
raise exception 'yyy' using errcode = 'YY888';
end if;
return -1; -- dead code;
end;
return -1;
end;
$$ language plpgsql;
select * from plpgsql_check_function_tb('fx(int)');
functionid | lineno | statement | sqlstate | message | detail | hint | level | position | query | context.
------------+--------+-----------+----------+------------------+--------+------+---------------+----------+-------+---------
fx | 9 | RETURN | 00000 | unreachable code | | | warning extra | | |.
fx | 11 | RETURN | 00000 | unreachable code | | | warning extra | | |.
(2 rows)

create or replace function fx(x int)
returns int as $$
begin
begin
if (x > 0) then
raise exception 'xxx' using errcode = 'XX888';
else
raise exception 'yyy' using errcode = 'YY888';
end if;
exception
when sqlstate 'XX888' then
null;
when sqlstate 'YY888' then
null;
end;
end; -- missing return;
$$ language plpgsql;
select * from plpgsql_check_function_tb('fx(int)');
functionid | lineno | statement | sqlstate | message | detail | hint | level | position | query | context.
------------+--------+-----------+----------+------------------------------------------------+--------+------+-------+----------+-------+---------
fx | | | 2F005 | control reached end of function without RETURN | | | error | | |.
(1 row)

Source code: https://github.com/okbob/plpgsql_check/releases
Posted by Holly Orr in OpenSCG on 2017-09-15 at 02:34

We are excited to announce our newest addition to the available components with BigSQL PostgreSQL Windows, Linux, and OSX distributions: set_user 1.4.0!

And… to kick it off, we will be writing multiple posts about this essential security extension.

But first (for the uninitiated) some context…

Since the Sarbanes–Oxley Act passed in 2002 as a response to the Enron/Tyco/WorldCom scandals, auditing a user’s (esp. superuser’s) access and interactions with data has become a requirement by law for publicly owned companies:

Section 302.4.B – Establish verifiable controls to track data access. Requires internal controls over data, so that officers are aware of all relevant data for reporting purposes. Data must exist in a verifiably secure framework which is internally controlled.

Companies storing financial data are particularly vulnerable to hacking attempts as has been reiterated in the latest Equifax breach.

In addition, organizations that deal with human subject data (e.g. health and research) are bound to HIPAA and/or IRB rules that if violated, can result in jail time.

And.. as fear of hacking, invasion of privacy, and possibly worse in an uncertain technological future creeps into our collective consciousness… securing our data has become a top priority for all IT professionals.

No company (or DBA) wants to end up as a headline in the daily news or the inspiration for the plot of another modern day dystopian television show.

set_user extension

Panicking yet? Well, you can take some proactive measures to better secure your data and avoid disaster. One component you can add to your security toolbox is the PostgreSQL set_user extension which provides:

Privilege escalation control at a granular level; and the required audit trail of actions taken by a user while its privileges have been escalated.

Anyone who has worked on modern Linux distros is familiar with the use of sudo to control access to root (the OS superuser). set_user is the same, but for your database.

And… it is now available with the BigSQL dist

[...]
Posted by Paul Ramsey in PostGIS on 2017-09-14 at 16:00

A couple of days back a new feature has landed in the PostgreSQL world for the development of version 11, in the shape of the following commit:

commit: 6d9fa52645e71711410a66b5349df3be0dd49608
author: Peter Eisentraut <peter_e@gmx.net>
date: Mon, 11 Sep 2017 16:48:30 -0400
pg_receivewal: Add --endpos option

This is primarily useful for making tests of this utility more
deterministic, to avoid the complexity of starting pg_receivewal as a
deamon in TAP tests.

While this is less useful than the equivalent pg_recvlogical option,
users can as well use it for example to enforce WAL streaming up to a
end-of-backup position, to save only a minimal amount of WAL.

Use this new option to stream WAL data in a deterministic way within a
new set of TAP tests.

Author: Michael Paquier

pg_receivewal is mainly used in production deployments as a standalone daemon continuously streaming WAL from a given Postgres instance, be it a primary or a standby, to archive WAL which is afterwards used for recovery of backups or PITR purposes. This is also useful to allow standbys to catch up a primary when they have been disconnected for a long time, even wanted WAL segments may have been recycled on the primary after too many checkpoints. The use of replication slots can leverage that, but those can show as well their limits if the local pg_wal data gets too much bloated, leading to a replication slot being forcibly dropped (note that there is a patch proposed for integration with Postgres 11 allowing to advance the oldest WAL position of a physical slot).

Being firstly designed for testing purposes thanks to which the code coverage of for example pg_receivewal.c has been increased from 15% to 55% using a set of dedicated TAP tests really simplified by the fact that pg_receivewal can be controlled easily without signals with this feature, this option is as well useful for backup purposes. When taking a base backup using the set of low-level APIs pg_start_backup() and pg_stop_backup(), the WAL position (LSN) returned by pg_stop_bac

[...]
This is one of the quickest blog am publishing :). I am publishing from my terminal as is of my testing to create language plpython3u.

Using trusted or untrusted distributions of python we can create plpython3u language in PostgreSQL. In my testing, am trying with SCL distribution(am not recommending, I tried for testing) of python3.3 to create language plpython3u.

Let's begin creating language on a binary version of PostgreSQL 9.6 installation without any tweaking.
-bash-4.2$ psql
psql.bin (9.6.4)
Type "help" for help.

postgres=# CREATE LANGUAGE plpython3u;
ERROR: could not load library "/opt/PostgreSQL/9.6/lib/postgresql/plpython3.so": libpython3.3m.so.1.0: cannot open shared object file: No such file or directory
Hmmm, "/opt/PostgreSQL/9.6/lib/postgresql/plpython3.so" looking for a library "libpython3.3m.so.1.0". To confirm, run "ldd" command
-bash-4.2$ cd /opt/PostgreSQL/9.6/lib/postgresql/
-bash-4.2$ ldd plpython3.so
linux-vdso.so.1 => (0x00007fff9db12000)
libpython3.3m.so.1.0 => not found
libc.so.6 => /lib64/libc.so.6 (0x00007fe75e42f000)
/lib64/ld-linux-x86-64.so.2 (0x00007fe75ea27000)
Now, its clear we need libpython3.3m.so.1.0 to create plpython3u language.

Lets get started as a root user to install python3.3 from SCL repo by enabling it.
#yum install centos-release-scl
#yum install python33

After installing, find for a library "libpython3.3m.so.1.0" required by plpython3.so
[root@tools ~]# find / -name libpython3.3m.so.1.0/opt/rh/python33/root/usr/lib64/libpython3.3m.so.1.0

Cool. To make use of Python3 bundle switch as a postgres user and set the environment variable PYTHONPATH, PYTHONHOME, PATH and LD_LIBRARY_PATH.
-bash-4.2$ export PYTHONPATH=/opt/rh/python33/root/usr
-bash-4.2$ export PYTHONHOME=/opt/rh/python33/root/usr
-bash-4.2$ export LD_LIBRARY_PATH=/opt/rh/python33/root/usr/lib64:$LD_LIBRARY_PATH
-bash-4.2$ export PATH=$PYTHONPATH:$PATH
Try running "ldd" on "/opt/PostgreSQL/9.6/lib/postgresql/plpython3.so" again to check the libraries are properly picked.

-bash-4.2$ cd /opt/PostgreSQL/9.
[...]

Upcoming community awesomeness

Now that summer is over and we have officially decided never to schedule anything in August again, we need to share a bunch of upcoming community goodness!

  • Postgres the center of your data universe at the Seattle Web Developers meetup on September 14th. This is an updated presentation that I gave at Datalayer last May. I am adding some goodies specifically for Web developers.
  • PGConf US and NYCPUG are hosting: PGConf US Mini: NYC tomorrow! Bruce Momjian will be speaking on Postgres v10. This is the second mini that PGConf US has organized. The first was last May in Austin. It looks like a great lineup and turnout.
  • I will be speaking at PGConf US Local: Ohio in conjunction with Ohio Linux Fest. I will also be training on Postgres Performance and Maintenance. It is great to integrate with other communities and finally bring a formal Postgres event to Ohio is exciting. I hope we can continue to grow it.
  • Debbie Cerda will be hosting the Silicon Valley Postgres Meetup on September 19th with speakers Erik Brandsberg of Heimdall Data and Roland Lee on: How to Auto-cache Postgres with no code changes . Hiring Debbie was one of the best decisions Command Prompt ever made. She is allowing us to focus much more on community and business development than ever before. Her ability to connect with people and her willingness to travel has made it a lot easier to meet one of Command Prompt's long term goals: Building out more Postgres community. The Silicon Valley meetup is just one instance. We also launched Denver which has a meeting in November (and possibly October).
  • And last but certainly not least, PostgreSQL version 10 is set to be released very, very soon!

It is easy to forget that although code contribution to an Open Source project is vital, so is building the people within and external to the community. The majority of people that go to meetups do not follow -hackers or -general. They are professionals trying to get a job done. It takes a different approach to continue to grow that sid

[...]
Posted by Bruce Momjian in EnterpriseDB on 2017-09-13 at 14:15

The Postgres optimizer has improved dramatically over the years. However, there are still a few areas where it can be improved.

First, it would be nice if users could be informed when an increase of default_statistics_target is wise. Second, while multivariate statistics will be added in Postgres 10, there are still other multivariate optimizations to add.

A more sophisticated improvement, which bypasses the problem of inaccurate statistics, would be to have the executor send feedback on selectivity found during query execution to the optimizer to improve the future query plans.

Posted by Regina Obe in PostGIS on 2017-09-13 at 00:00

The PostGIS development team is pleased to announce the release of PostGIS 2.4.0rc1. Best served with PostgreSQL 10beta4 or rc1 which is coming soon and pgRouting 2.5.0rc 2.5.0 release is eminent. See the full list of changes in the news file.

We encourage everyone to test and in particular package maintainers to insure no surprises at final release time.

IMPORTANT NOTES

If you are upgrading from an existing PostGIS install, make sure after installing PostGIS 2.4.0rc1 binaries to do.

ALTER EXTENSION postgis UPDATE;

— if you have additional postgishy extensions below upgrade them too

ALTER EXTENSION postgis_sfcgal UPDATE;
ALTER EXTENSION postgis_topology UPDATE;
ALTER EXTENSION postgis_tiger_geocoder UPDATE;
--pgRouting 2.5.0 is imminent
ALTER EXTENSION pgrouting UPDATE;

In order to have Map Box Vector Tiles support enabled, you’ll need to compile with protobuf support and pkg-config to verify the correct minimum version of protobuf-c see protobuf for details. ST_FrechetDistance function will not be enabled if PostGIS is compiled with lower than GEOS 3.7.0. GEOS 3.7.0 will probably not be released before PostGIS 2.4.0 is out.

Continue Reading by clicking title hyperlink ..

Postgres 10 is adding partitioning syntax to Postgres. This ends the previous Frankenstein-style partitioning setup of having to configure child tables, CHECK constraints, and triggers or rules.

You would think that the partitioning feature is now complete. Oh, no! It seems there are now more partitioning enhancement requests than we had before adding the syntax. I knew there would be new optimizations once there was a canonical representation of partitions in the backend, but I didn't expect this feature-request growth.

Ottawa's PGCon conference had a talk about partitioning, but the major partition feature list was created during its unconference. The good news is that Amit Langote recorded all the requests. I expect it will be a few years until we can say we are done improving partitioning in Postgres.

Continue Reading »

Mastering PostgreSQL in Application Development is the full title of the book I am currently writing. Running the PostgreSQL is YeSQL series of blog posts has shown me developers need a PostgreSQL book for developers. A book with the same properties as the YeSQL series articles in this blog: we use real world data sets to put every query and SQL technique we learn in the context of a user story or business case,

While it’s generally well known how to create physical replicas (i.e. block level copies) using the super simple and really effortless pg_basebackup tool, not so many know actually what can be done when the process is painfully slow for bigger databases. The reason for slowness can be typically linked to machine hardware (slow disks, high […]

The post Speeding up creation of Postgres replicas appeared first on Cybertec - The PostgreSQL Database Company.

Best Practices with Managed PostgreSQL in the Cloud @ Postgres Open SV 2017 (#pgopen2017)



As your database grows and scales there are some operations that you need to take more care of than you did when you were just starting. When working with your application in your dev environment you may not be fully aware of the cost of some operations until you run them against production. And at some point most of us have been guilty of it, running some migration that starts at 5 minutes, then 15 minutes in it’s still running, and suddenly production traffic is impacted.

There are two operations that tend to happen quite frequently, each with some straightforward approaches to mitigate having any noticable amount of downtime. Let’s look at each of the operations, how they work and then how you can approach them in a safer way.

Adding new columns

Adding a new column is actually quite cheap in Postgres. When you do this it updates it’s underlying tracking of the columns that exist–which is almost instant. The part that becomes expensive is when you have some constraint against the column. A constraint could be a primary or foreign key, or some uniqueness constraint. Here Postgres has to scan through all the records in the table to ensure that it’s not being violated. Adding some constraint such as not null does happen some, but is not the most common cause.

The most common reason for slowness of adding a new column is that most frameworks make it very simple for you to set a default value for the new column. It’s one thing to do this for all new records, but when you do this when an existing table it means the database has to read all the records and re-write them with the new default value attached. This isn’t so bad for a table with a few hundred records, but for a few hundred million run it then go get yourself coffee, or lunch, or a 5 course meal because you’ll be waiting for a while.

In short, not null and setting a default value (on creation) of your new column will cause you pain. The solution is to not do those things. But, what if you want to have a default value and don’t want to allow nulls. There’s

[...]

A graphical overview of Postgres ecosystem

Have you ever wondered how Postgres resources are distributed?

With the below illustration I attempted to capture key processes that encompass Postgres workflow. Here is a quick overview.

There are two main flows that happen in Postgres:
  1. Servicing client’s connections
  2. Execution of background processes
Let’s take a look at the diagram and start from the first flow (right left corner). Once client is connected to Postgres, it gets dedicated backend that will provide service to all queries of that client. Each client can send readings and renewal queries to Postgres, that in turn generates Planning and Execution.

During query execution the process of data input/output is performed at the level of tables and indexes. This process is directly involved in updating Shared Buffers area. If data required for the query execution are not found in Shared Buffers they will be loaded from the disc that as a result generates additional input/output data.

Each and every change of the data that is placed in the Shared Buffer area (with the exception of some special cases, such as unlogged tables) will be logged in the WAL (Write Ahead Log) while a result of the query will be reported back to the client.

In addition to servicing client’s queries, another function of Postgres is the execution of the background processes of which there are several, for example Autovacuum Launcher/Worker, BackgroundWriter, Checkpointer, Logger, Stats Collector and so on. The main purpose of these background processes is to support the healthy working of PostgreSQL as a system. These are not necessarily connected to each other’s background services, however, there are two important points to remember:
a. to be able to exchange data they use Shared Buffer area
b. if, as a result of their performance, they modify any data within Shared Buffers, these changes are then logged in the WAL.

Among background processes there are those connected to streaming replication – these are wholly rely on the data in
[...]

“Thirty years ago, my older brother was trying to get a report on birds written that he’d had three months to write. It was due the next day.

We were out at our family cabin in Bolinas, and he was at the kitchen table close to tears, surrounded by binder paper and pencils and unopened books on birds, immobilized by the hugeness of the task ahead. Then my father sat down beside him, put his arm around my brother’s shoulder, and said, ‘Bird by bird, buddy. Just take it bird by bird.’”

Bird by Bird: Some Instructions on Writing and Life, by Anne LaMott

When we started working on Citus, our vision was to combine the power of relational databases with the elastic scale of NoSQL. To do this, we took a different approach. Instead of building a new database from scratch, we leveraged PostgreSQL’s new extension APIs. This way, Citus would make Postgres a distributed database and integrate with the rich ecosystem of tools you already use.

When PostgreSQL is involved, executing on this vision isn’t a simple task. The PostgreSQL manual offers 3,558 pages of features built over two decades. The tools built around Postgres use and combine these features in unimaginable ways.

After our Citus open source announcement, we talked to many of you about scaling out your relational database. In every conversation, we’d hear about different Postgres features that needed to scale out of the box. We’d take notes from our meeting and add these features into an internal document. The list would keep getting longer, and longer, and longer.

Like the child writing a report on birds, the task ahead felt insurmountable. So how do you take a solid relational database and make sure that all those complex features scale? You take it bird by bird. We broke down the problem of scaling into five hundred smaller ones and started implementing these features one by one.

Citus 7: Transactions, Framework Integration, and Postgres 10

Citus 7 marks a major milestone in our journey to scale out Postgres. Over the past year, many SaaS (B2B) businesses

[...]

I’ll be presenting “Humans do not have a primary key” at PGOpen SV today. Two links from the presentation are:

I wrote a article about migration large set of PL/SQL procedures to Postgres. This article is in Czech language, but Google translator works.