PostgreSQL
The world's most advanced open source database
Top posters
Number of posts in the past month
Top teams
Number of posts in the past month
Feeds
Planet
  • Policy for being listed on Planet PostgreSQL.
  • Add your blog to Planet PostgreSQL.
  • List of all subscribed blogs.
  • Manage your registration.
Contact
  • Get in touch with the Planet PostgreSQL administrators at planet at postgresql.org.
Posted by William Ivanski in 2ndQuadrant on 2018-06-26 at 05:15

Introduction

OmniDB 2.8 introduced support for Postgres-BDR 3.0, the ground-breaking multi-master replication tool for PostgreSQL databases, announced last month in PostgresConf US.

Here we have 2 virtual machines with Postgres-BDR 3.0 installed and we will use OmniDB to connect to them and setup replication between the machines.

Pre-requisites

Postgres-BDR 3.0 requires PostgreSQL 10 or better and also pglogical 3.0 extension should be installed, as Postgres-BDR 3.0 works on top of pglogical 3.0. Make sure you put the required entries in pg_hba.conf to make both machines communicate to each other via streaming replication. Then, in postgresql.conf you should set the following parameters in both machines:

listen_addresses = '*'
client_encoding = utf8
wal_level = 'logical'
track_commit_timestamp = on
max_wal_senders = 10
max_replication_slots = 10
max_worker_processes = 10
shared_preload_libraries = 'pglogical,bdr'

Then in both machines:

  • Restart PostgreSQL
  • Create a database (we are calling it omnidb_tests)
  • In this database, create required extensions:
CREATE EXTENSION pglogical;
CREATE EXTENSION bdr;

Connecting to both machines

We have installed OmniDB in the host machine and we also have included the appropriate entries in the pg_hba.conf file for both virtual machines, to allow connections from the host machine. Then in OmniDB, we need to create both connections:

In the Connection Grid, click on the Select Connection green button to open connections to both nodes as tabs. Then expand the PostgreSQL node, you will see that we are using latest PostgreSQL minor release (currently 10.4). Then expand the Databases node, omnidb_tests node and finally the Extensions node to check the extensions that are installed in this database: plpgsql, pglogical and bdr.

Now expand the BDR node to see some BDR properties about this node.

We can see that BDR version I am using is 3.0.1-a2 and I didn’t activate BDR in this node yet. Also there are no Groups yet.

Creating local BDR nodes

So far both machines have PostgreSQL 10,

[...]
Posted by Luca Ferrari on 2018-06-26 at 00:00

Sqitch has nothing particular to do with PostgreSQL, except it does support our beloved database!

Sqitch and Sqitchers

Long story short: if you are not using sqitch you should give it a look.

sqitch does not ties itself to only PostgreSQL, but it does support a lot of relational engines. However, if you want to know how to start using Sqitch over PostgreSQL go read the excellent Introduction to Sqitch on PostgreSQL.

I’ve already written about sqitch in the past (in italian).
sqitch is a great tool to manage database changes, mainly schema changes. The idea is to provide a git-like interface to manage changes, a change is made by three scripts appropriately written for the backend database:

  • a deploy script (what to do);
  • a revert script (how to undo);
  • a test script (how to check the deploy succeeded).



Introducing sqitchers.


Around a month ago, the sqitch creator, David E. Wheeler, created a GitHub Organization named sqitchers that now holds all the Sqitch related stuff including, obviously, the codebase for the project. At the same time, the Sqitch steering committee grown, and this is a good thing since this project quickly became an handy tool for database management.

In conclusion, sqitch is growing and getting more free every day. If you are curious about project history and explaination by its own creator David E. Wheeler, I suggest you listening to this (old) FLOSS Weekly podcast.

On 10th of June 2018, Tom Lane committed patch: Improve run-time partition pruning to handle any stable expression. The initial coding of the run-time-pruning feature only coped with cases where the partition key(s) are compared to Params. That is a bit silly; we can allow it to work with any non-Var-containing stable expression, as long […]
PostgreSQL is a relational database management system. It’s even the world’s most advanced open source one of them. As such, as its core, Postgres solves concurrent access to a set of data and maintains consistency while allowing concurrent operations. Postgres exposes its concurrency APIs in the SQL language, in particular in the DML parts of it: you can read the Data Manipulation Language chapter of the PostgreSQL docs for all the details.
Posted by REGINA OBE in PostGIS on 2018-06-25 at 05:47

We've got customers discovering PostGIS and GIS in general or migrating away from ArcGIS family of tools. When they ask, "How do I see my data?", we often point them at QGIS which is an open source GIS desktop with rich integration with PostGIS/PostgreSQL.

QGIS is something that is great for people who need to live in their GIS environment since it allows for easily laying on other datasources, web services and maps. The DBManager tool allows for more advanced querying (like writing Spatial SQL queries that take advantage of the 100s of functions PostGIS has to offer) , ability to import/export data, and create PostgreSQL views.

QGIS has this thing called Projects, which allow for defining map layers and the symbology associated with them. For example what colors do you color your roads, and any extra symbols, what field attributes do you overlay - street name etc. Projects are usually saved in files with a .qgs or .qgz extension. If you spent a lot of time styling these layers, chances are you want to share them with other people in your group. This can become challenging if your group is not connected via network share.

Continue reading "New in QGIS 3.2 Save Project to PostgreSQL"
Posted by Bruce Momjian in EnterpriseDB on 2018-06-24 at 00:45

Software Engineering Radio has just posted a one-hour audio recording of an interview I did about the Postgres query optimizer. It is generic enough to be useful to anyone wanting to understand how relational databases optimize queries.

On an unrelated note, I am leaving soon for a 25-day European Postgres speaking tour. I am speaking at one-day conferences in Zürich, London, and Amsterdam, a user group in Frankfurt, and will be presenting at 11 EnterpriseDB events around Europe.

Before PostgreSQL 10, Postgres users partitioned their data using inheritance based partitioning. The method used constraints to define the partitions and rules or triggers to route the data to appropriate partition. A user had to write and maintain code for all that. PostgreSQL 10 introduced declarative partitioning, which is much easier to setup and requires almost no maintenance. PostgreSQL 11
is adding a number of partitioning related enhancements that work with declarative partitioning. Users who have implemented inheritance based partitioning would want to move to declarative partitioning (after upgrading to v11, of course) to benefit from those features. Here's how they can do so.

Example setup

You may have created a parent table and several child tables, one per partition, and triggers, rules and constraints as required. Here's an example setup similar to the one described in PostgreSQL documentation.

\d+ measurement
                                Table "inh_part.measurement"
  Column   |  Type   | Collation | Nullable | Default | Storage | Stats target | Description 
-----------+---------+-----------+----------+---------+---------+--------------+-------------
 city_id   | integer |           | not null |         | plain   |              | 
 logdate   | date    |           | not null |         | plain   |              | 
 peaktemp  | integer |           |          |         | plain   |              | 
 unitsales | integer |           |          |         | plain   |              | 
Child tables: measurement_y2006m02,
              measurement_y2006m03,
              measurement_y2006m04,
              measurement_y2006m05,
              measurement_y2006m06

-- here's how a child looks like
\d+ measurement_y2006m03
                            Table "inh_part.measurement_y2006m03"
  Column   |  Type   | Collation | Nullable | Default | Storage | Stats target | Description 
-----------+---------+-----------+----------+---------+---------+--------------+-------------
 city_id   | integer |           | n[...]

Cost based optimization

A query may be executed in many different ways, modelled as plans in query optimizer, differing in resources required and/or execution time. A typical DBMS's query optimizer tries to find all the possible plans for executing a given query and chooses the fastest plan amongst those. But it's not possible to calculate the time required by a plan unless the query is executed. Thus an optimizer tries to associate an estimate of execution time with each possible plan and choose the one with the least estimated value. PostgreSQL is no different. It associates a cost with each possible plan. The cost is a rough estimation of the time required to execute the query. The plan with the lowest cost is chosen for execution. The time required to execute a query is sum of time required to perform various operations involved in the plan being executed e.g. time required to scan the tables in the query, time required to compute joins, etc. Thus a plan's cost is sum of the costs of operations involved in the plan. In order to efficiently and correctly estimate the cost of a plan, PostgreSQL maintains the statistics about sizes of tables, indexes, the values stores in various columns of tables and so on. In a DBMS, where data keeps changing, the statistics often gets stale and needs to be updated. PostgreSQL keeps the statistics up-to-date by frequently sampling the tables. This works reasonably well as long as the tables involved in the query are part of the DBMS.

But now a days, often, applications run queries which require data external to the DBMS. PostgreSQL supports querying external data through a Foreign Data Wrapper (FDW in short), a method based on SQL/MED standard. We will discuss the methods employed by the query optimizer to plan such queries and methods to maintain the statistics about the external data, esp. the data residing in other PostgreSQL server/s, in this post.

Foreign tables and statistics

PostgreSQL allows external data to be represented as "foreign tables". While PostgreSQL scans t[...]

PostgreSQL is one of the most popular open-source databases in the world and has successful implementations across several mission-critical environments across various domains, using real-time high-end OLTP applications performing millions and billions of transactions per day. PostgreSQL I/O is quite reliable, stable and performant on pretty much any hardware, including even cloud.

To ensure that databases perform at the expected scale with expected response times, there is a need for some performance engineering. Well, the accomplishment of good database performance depends on various factors. Database performance can go bad for various reasons such as infrastructure dimensioning, inefficient database maintenance strategy, poor SQL code or badly configured database processes that fail to utilize all the available resources - CPU, memory, network bandwidth and disk I/O.

What can cause database performance to degrade?

  • Badly written queries with bad joins, logic etc. that take a lot of CPU and memory
  • Queries performing full-table-scans on big tables due to improper Indexing
  • Bad database maintenance with no proper statistics in place
  • Inefficient capacity planning resulting in inadequately dimensioned infrastructure
  • Improper logical and physical design
  • No connection pooling in place, which cause applications to make huge number of connections in an uncontrollable manner

So that’s a lot of potential areas which can cause performance problems. One of the significant areas I would like to focus on in this blog is how to tune PostgreSQL I/O (Input / Output) performance. Tuning the Input / Output operations of PostgreSQL is essential, especially in a high-transactional environment like OLTP or in a Data warehousing environment with complex data analysis on huge size data sets.

Most of the times, database performance problems are caused mainly due to high I/O. This means, database processes are spending more time either writing to or reading from the disk. Any real-time data operation is I/O bound, it is imperative to

[...]
Posted by Paul Ramsey in PostGIS on 2018-06-21 at 20:00

One of the joys of geospatial processing is the variety of tools in the tool box, and the ways that putting them together can yield surprising results. I have been in the guts of PostGIS for so long that I tend to thing in terms of primitives: either there’s a function that does what you want or there isn’t. I’m too quick to ignore the power of combining the parts that we already have.

A community member on the users list asked (paraphrased): “is there a way to split a polygon into sub-polygons of more-or-less equal areas?”

I didn’t see the question and answer, which is lucky, because I would have said: “No, you’re SOL, we don’t have a good way to solve that problem.” (An exact algorithm showed up in the Twitter thread about this solution, and maybe I should implement that.)

PostGIS developer Darafei Praliaskouski did answer, and provided a working solution that is absolutely brilliant in combining the parts of the PostGIS toolkit to solve a pretty tricky problem. He said:

The way I see it, for any kind of polygon:

  • Convert a polygon to a set of points proportional to the area by ST_GeneratePoints (the more points, the more beautiful it will be, guess 1000 is ok);
  • Decide how many parts you’d like to split into, (ST_Area(geom)/max_area), let it be K;
  • Take KMeans of the point cloud with K clusters;
  • For each cluster, take a ST_Centroid(ST_Collect(point));
  • Feed these centroids into ST_VoronoiPolygons, that will get you a mask for each part of polygon;
  • ST_Intersection of original polygon and each cell of Voronoi polygons will get you a good split of your polygon into K parts.

Let’s take it one step at a time to see how it works.

We’ll use Peru as the example polygon, it’s got a nice concavity to it which makes it a little trickier than an average shape.

CREATE TABLE peru AS 
  SELECT *
  FROM countries
  WHERE name = 'Peru'

Original Polygon (Petu)

Now create a point field that fills the polygon. On average, each randomly placed point ends up “occupying” an equal area within the polygon.

CREATE TABLE peru_pts AS
  SELECT (ST_Dump(ST_Ge
[...]
Posted by Craig Kerstiens in CitusData on 2018-06-21 at 17:15

In our previous Fun with SQL post on the Citus Data blog, we covered window functions. Window functions are a special class of function that allow you to grab values across rows and then perform some logic. By jumping ahead to window functions, we missed so many of the other handy functions that exist within Postgres natively. There are in fact several hundred built-in functions. And when needed, you can also create your own user defined functions (UDFs), if you need something custom. Today we’re going to walk through just a small sampling of SQL functions that can be extremely handy in PostgreSQL.

Arrays

First, arrays are a first class datatype within Postgres. You can have an array of text or an array of numbers, personally I love using arrays when dealing with category tags. You can also index arrays which can make querying extremely fast. But even if you’re not putting arrays directly into your database, you may want to build up arrays within your query.

A good example might be when you have a team feature within your application, and you want to group all members that belong to that team into a single row. To do this, we can use the array_agg function which will aggregate some column together into a single row:

SELECT teams.id, 
       teams.name, 
       array_agg(users.email)
FROM teams,
     users
WHERE users.team_id = teams.id
GROUP BY 1, 2;
 id | name  |                          array_agg
----+-------+--------------------------------------------------------------
  2 | ACME  | {jennifer@acmecorp.com,tom@acmecorp.com,peyton@acmecorp.com}
  1 | Citus | {craig@citusdata.com,farina@citusdata.com}
(2 rows)

The above by itself is pretty handy, but we can go even further. If we want to send this directly to some email client, we can go ahead and parse out the array to just be a comma separated list by wrapping our array_agg function with a function to convert it to a string array_to_string:

SELECT teams.id, 
       teams.name, 
       array_to_string(array_agg(users.email), ', ')
FROM teams,
     users
WHE
[...]
Posted by Andrew Dunstan in 2ndQuadrant on 2018-06-21 at 15:04

Recently I have been refining and adding utilities to look after our Perl code.  You might be surprised to learn that as well as 1.3 million or so lines of C code, there are about 30,000 lines of Perl code in our sources. This a sizeable body of code, even if it’s dwarfed by our C code. What does it do? Well, lots of things. It runs some very critical code in building from source, so the code to set up our catalogs is created by some Perl code.  All the new data setup for catalogs is in fact Perl code. That’s another 20,000 lines or so of code on top of the 30,000 mentioned above. We also use Perl to run TAP tests, such as testing initdb and pg_dump. And it runs building and testing when we’re building with the Microsoft tool-sets on Windows.

So, what changes have been made? First, we’ve refined slightly the setup for pgperltidy, our utility for formatting perl code. This utility, based on a well known perl utility, does for perl code what pgindent does for C code.

Second, we’ve added a script and a profile to run a utility called perlcritic. This utility checks to see if perl code complies with a set of “best practises”.  Currently we’re only testing for the “worst” practices, but I hope in future to be able to check in a much stricter way. The infrastructure is now there to support it.

Finally, there have been some code adjustments to allow us to check all the perl files for compile time errors and warnings, and a utility to run those checks.

These changes mirror some changes I have made in the buildfarm client and server code.

It’s easy to forget about these things, so I’ve also added a Buildfarm module to run the check on perl. If it finds any policy violations or compiler time errors or warnings we’ll soon know about it.

Posted by Robert Haas in EnterpriseDB on 2018-06-21 at 14:47
I admit it: I invented force_parallel_mode.  I believed then, and still believe now, that it is valuable for testing purposes.  Certainly, testing using force_parallel_mode=on or force_parallel_mode=regress has uncovered many bugs in PostgreSQL's parallel query support that would otherwise have been very difficult to find.  At the same time, it's pretty clear that this setting has caused enormous confusion, even among PostgreSQL experts.  In fact, in my experience, almost everyone who sets force_parallel_mode is doing so for the wrong reasons.
Read more »
Posted by Federico Campoli on 2018-06-21 at 00:00

The ansible is a fictional device made popular by the book Ender’s game written by Orson Scott Gards’s. The ansible allows instant communication between two points across the space, regardless of the distance.

It doesn’t surprise that the Red Hat’s ansible shares the same name as it gives a simple and efficient way to manage servers and automate tasks, yet remaining very lightweight with minimal requirements on the target machines.

Businesses and services deliver value based on data. Availability, consistent state, and durability are top priorities for keeping customers and end-users satisfied. Lost or inaccessible data could possibly equate to lost customers.

Database backups should be at the forefront of daily operations and tasks.

We should be prepared for the event that our data becomes corrupted or lost.

I'm a firm believer in an old saying I’ve heard: "It's better to have it and not need it than to need it and not have it."

That applies to database backups as well. Let's face it, without them, you basically have nothing. Operating on the notion that nothing can happen to your data is a fallacy.

Most DBMS's provide some means of built-in backup utilities. PostgreSQL has pg_dump and pg_dumpall out of the box.

Both present numerous customization and structuring options. Covering them all individually in one blog post would be next to impossible. Instead, I'll look at those examples I can apply best, to my personal development/learning environment.

That being said, this blog post is not targeted at a production environment. More likely, a single workstation/development environment should benefit the most.

What are pg_dump and pg_dumpall?

The documentation describes pg_dump as: “pg_dump is a utility for backing up a PostgreSQL database”

And the pg_dumpall documentation: “pg_dumpall is a utility for writing out (“dumping”) all PostgreSQL databases of a cluster into one script file.”

Backing up a Database and/or Table(s)

To start, I'll create a practice database and some tables to work with using the below SQL:

postgres=# CREATE DATABASE example_backups;
CREATE DATABASE
example_backups=# CREATE TABLE students(id INTEGER,
example_backups(# f_name VARCHAR(20),
example_backups(# l_name VARCHAR(20));
CREATE TABLE
example_backups=# CREATE TABLE classes(id INTEGER,
example_backups(# subject VARCHAR(20));
CREATE TABLE
example_backups=# INSERT INTO students(id, f_name, l_name)
example_backups-# VALUES (1, 'John', 'Thorn'), (2, 'Phil', 'Hampt'),
ex
[...]
Posted by Daniel Vérité on 2018-06-19 at 11:22

What’s a pivot?

Pivoting is the operation by which values in a column become column names, so that visually a kind of 90° rotation happens: what was read vertically top-to-bottom appears horizontally from left to right. This concept is also called “transposition”, or “crosstab”, and it brings a bit of the spreadsheet way of thinking into the relational way.

In the simplest case, we start from only two columns, one being a function of the other. We’ll use weather as an example: let’s consider a year column, and a raining days column, expressing the number of days when it rained more than 1 mm at a certain location.

Before pivoting:

 Year  | RainDays |
-------+-----------+
 2012  |      112 |
 2013  |      116 |
 2014  |      111 |
 2015  |       80 |
 2016  |      110 |
 2017  |      102 |

After pivoting:

 2012 | 2013 | 2014 | 2015 | 2016 | 2017 
------+------+------+------+------+------
  112 |  116 |  111 |   80 |  110 |  102

Often there’s a second dimension, giving us 3 columns with a functional dependency: (X dimension, Y dimension) => Value

With the rain example, the second dimension could be a city name, like below:

Before pivoting:

 Year  |   City    | RainDays
-------+-----------+---------
  2012 | Lyon      |     112
  2013 | Lyon      |     116
  2014 | Lyon      |     111
  ...  | ...       |     ...
  2014 | Toulouse  |     111
  2015 | Toulouse  |      83

Let’s consider a simple dataset with data for 13 cities over 6 years, or 78 rows. (an SQL dump for this exemple is available here: rainfall-example.sql; The raw data consists of monthly measurements for a few cities in France coming from https://www.infoclimat.fr/climatologie/).

Here’s a pivoted typical resultset showing the per-city,per-year data:

   City    | 2012 | 2013 | 2014 | 2015 | 2016 | 2017 
-----------+------+------+------+------+------+------
 Ajaccio   |   69 |   91 |   78 |   48 |   81 |   51
 Bordeaux  |  116 |  138 |  137 |  101 |  117 |  110
 Brest     |  178 |  161 |  180 |  160 |  165 |  144
 Dijon     |  114 |  124 |  1
[...]
Posted by REGINA OBE in PostGIS on 2018-06-18 at 21:42

PostGIS 2.5 is just around the corner. One of the new functions coming is the ST_OrientedEnvelop. This is something I've been waiting for for years. It is the true minimum bounding rectangle, as opposed to ST_Envelope which is an axis aligned bounding rectable.

Below is a pictorial view showing the difference between the two.

Continue reading "Coming PostGIS 2.5 ST_OrientedEnvelope"

Performance is one of the most important and most complex tasks when managing a database. It can be affected by the configuration, the hardware or even the design of the system. By default, PostgreSQL is configured with compatibility and stability in mind, since the performance depends a lot on the hardware and on our system itself. We can have a system with a lot of data being read but the information does not change frequently. Or we can have a system that writes continuously. For this reason, it is impossible to define a default configuration that works for all types of workloads.

In this blog, we will see how one goes about analyzing the workload, or queries, that are running. We shall then review some basic configuration parameters to improve the performance of our PostgreSQL database. As we mentioned, we will see only some of the parameters. The list of PostgreSQL parameters is extensive, we would only touch on some of the key ones. However, one can always consult the official documentation to delve into the parameters and configurations that seem most important or useful in our environment.

EXPLAIN

One of the first steps we can take to understand how to improve the performance of our database is to analyze the queries that are made.

PostgreSQL devises a query plan for each query it receives. To see this plan, we will use EXPLAIN.

The structure of a query plan is a tree of plan nodes. The nodes in the lower level of the tree are scan nodes. They return raw rows from a table. There are different types of scan nodes for different methods of accessing the table. The EXPLAIN output has a line for each node in the plan tree.

world=# EXPLAIN SELECT * FROM city t1,country t2 WHERE id>100 AND t1.population>700000 AND t2.population<7000000;
                               QUERY PLAN                                
--------------------------------------------------------------------------
Nested Loop  (cost=0.00..734.81 rows=50662 width=144)
  ->  Seq Scan on city t1  (cost=0.00..93.19 rows=347 width=31)
        Fil
[...]

When people are talking about database performance monitoring they usually think of inspecting one PostgreSQL database server at a time. While this is certainly useful it can also be quite beneficial to inspect the status of an entire database cluster or to inspect a set of servers working together at once. Fortunately there are easy means to achieve that with PostgreSQL. How this works can be outlined in this post.

pg_stat_statements: The best tool to monitor PostgreSQL performance

If you want to take a deep loop at PostgreSQL performance there is really no way around pg_stat_statements. It offers a lot of information and is really easy to use.

To install pg_stat_statements, the following steps are necessary:

  • run “CREATE EXTENSION pg_stat_statements” in your desired database
  • add the following line to postgresql.conf:
    • shared_preload_libraries = ‘pg_stat_statements’
  • restart PostgreSQL

Once this is done, PostgreSQL will already be busy collecting data on your database hosts. However, how can we create a “clusterwide pg_stat_statements” view so that we can inspect an entire set of servers at once?

Using pg_stat_statements to check an entire database cluster

Our goal is to show data from a list of servers in a single view. One way to do that is to make use of PostgreSQL’s foreign data wrapper infrastructure. We can simply connect to all servers in the cluster and unify the data in a single view.

Let us assume we have 3 servers, a local machine, “a_server”, and “b_server”. Let us get started by connecting to the local server to run the following commands:

CREATE USER dbmonitoring LOGIN PASSWORD 'abcd' SUPERUSER;
GRANT USAGE ON SCHEMA pg_catalog TO dbmonitoring;
GRANT ALL ON pg_stat_statements TO dbmonitoring;

In the first step I created a simple user to do the database monitoring. Of course you can handle users and so on differently but it seems like an attractive idea to use a special user for that purpose.

The next command enables the postgres_fdw extension, which is necessary to connect to those remote serve

[...]
While back go, I posted an article titled "More load balancing fine control" to explain a new feature of upcoming Pgpool-II 4.0. Today I would like talk about yet another new feature of load balancing in Pgpool-II.

Pgpool-II is already able to control read query load balancing in several granularity:
(See the document for more details).
  •  Whether enable load balancing or not (load_balance_mode)
  • By database (database_redirect_preference_list)
  • By application name (app_name_redirect_preference_list)
  • By functions used in the query (white_function_list, black_function_list)
  • Statement level (/*NO LOAD BALANCE*/ comment)
 The last method allows the statement level control on load balancing but it needs to rewrite a query which is often impossible if you are using commercial software.

A new configuration parameter called "black_query_pattern_list" allows you to disable load balancing for queries specified by the parameter. The parameter takes a regular expression string. If a query matched with the expression, load balancing is disabled for the query: the query is sent to the primary (master) PostgreSQL server.

Here are some examples.

We have:
black_query_pattern_list = 'SELECT \* FROM t1\;;'
in pgpool.conf. Note that some special characters must be qualified by using '\' (back slash) characters.

We have no SELECTs issued to both primary and standby.

test=# show pool_nodes;
 node_id | hostname | port  | status | lb_weight |  role   | select_cnt | load_balance_node | replication_delay | last_status_change 
---------+----------+-------+--------+-----------+---------+------------+-------------------+-------------------+---------------------
 0       | /tmp     | 11002 | up     | 0.500000  | primary | 0          | false             | 0                 | 2018-06-15 11:57:25
 1       | /tmp     | 11003 | up     | 0.500000  | standby | 0          | true              | 0                 | 2018-06-15 11:57:25
(2 rows)


If following query is issued, then the "select_cnt" column of standby should be incremented since standb[...]

Many companies generate large volumes of time series data from events happening in their application. It’s often useful to have a real-time analytics dashboard to spot trends and changes as they happen. You can build a real-time analytics dashboard on Postgres by constructing a simple pipeline:

  1. Load events into a raw data table in batches
  2. Periodically aggregate new events into a rollup table
  3. Select from the rollup table in the dashboard

For large data streams, Citus (an open source extension to Postgres that scales out Postgres horizontally) can scale out each of these steps across all the cores in a cluster of Postgres nodes.

One of the challenges of maintaining a rollup table is tracking which events have already been aggregated—so you can make sure that each event is aggregated exactly once. A common technique to ensure exactly-once aggregation is to run the aggregation for a particular time period after that time period is over. We often recommend aggregating at the end of the time period for its simplicity, but you cannot provide any results before the time period is over and backfilling is complicated.

Building rollup tables in a new and different way

In this blog post, we’ll introduce a new approach to building rollup tables which addresses the limitations of using time windows. When you load older data into the events table, the rollup tables will automatically be updated, which enables backfilling and late arrivals. You can also start aggregating events from the current time period before the current time period is over, giving a more real time view of the data.

We assume all events have an identifier which is drawn from a sequence and provide a simple SQL function that enables you to incrementally aggregate ranges of sequence values in a safe, transactional manner.

We tested this approach for a CDN use case and found that a 4-node Citus database cluster can simultaneously:

  • Ingest and aggregate over a million rows per second
  • Keep the rollup table up-to-date within ~10s
  • Answer analytical queries in u
[...]
Posted by gabrielle roth on 2018-06-14 at 13:35

When: 6-8pm Thursday June 21, 2018
Where: iovation
Who: Mark Wong
What: Intro to OmniDB with PostgreSQL

OmniDB is an open source browser-based app designed to access and manage many different Database Management systems, e.g. PostgreSQL, Oracle and MySQL. OmniDB can run either as an App or via Browser, combining the flexibility needed for various access paths with a design that puts security first.

OmniDB’s main objective is to offer an unified workspace with all functionalities needed to manipulate different DBMS. It is built with simplicity in mind, designed to be a fast and lightweight browser-based application.

Get a tour of OmniDB with PostgreSQL!

 

Mark leads the 2ndQuadrant performance practice as a Performance Consultant for English Speaking Territories, based out of Oregon in the USA. He is a long time Contributor to PostgreSQL, co-organizer of the Portland PostgreSQL User Group, and serves as a Director and Treasurer
for the United States PostgreSQL Association.


If you have a job posting or event you would like me to announce at the meeting, please send it along. The deadline for inclusion is 5pm the day before the meeting.

Our meeting will be held at iovation, on the 3rd floor of the US Bancorp Tower at 111 SW 5th (5th & Oak). It’s right on the Green & Yellow Max lines. Underground bike parking is available in the parking garage; outdoors all around the block in the usual spots. No bikes in the office, sorry! For access to the 3rd floor of the plaza, please either take the lobby stairs to the third floor or take the plaza elevator (near Subway and Rabbit’s Cafe) to the third floor. There will be signs directing you to the meeting room. All attendess must check in at the iovation front desk.

See you there!

Posted by Andrew Dunstan in 2ndQuadrant on 2018-06-14 at 12:39

If you have Docker installed on your development machine, there is a simple way to road test your code using the buildfarm client, in a nicely contained environment.

These preparatory steps only need to be done once. First clone the repository that has the required container definitions:

git clone https://github.com/PGBuildFarm/Dockerfiles.git bf-docker

Then build a container image to run the command (in this example we use the file based on Fedora 28):

cd bf-docker
docker build --rm=true -t bf-f28 -f Dockerfile.fedora-28 .

Make a directory to contain all the build artefacts:

mkdir buildroot-f28

That’s all the preparation required. Now you can road test your code with this command:

docker run -v buildroot-f28:/app/buildroot \
  -v /path/to/postgres/source:/app/pgsrc bf-f28 \ 
  run_build.pl --config=build-fromsource.conf

The config file can be customized if required, but this is a pretty simple way to get started.

Posted by Regina Obe in PostGIS on 2018-06-14 at 07:12

Most of our use-cases for the built-in json support in PostgreSQL is not to implement schemaless design storage, but instead to remold data. Remolding can take the form of restructuring data into json documents suitable for web maps, javascript charting web apps, or datagrids. It also has uses beyond just outputting data in json form. In addition the functions are useful for unraveling json data into a more meaningful relational form.

One of the common cases we use json support is what we call UNPIVOTING data. We demonstrated this in Postgres Vision 2018 presentation in slide 23. This trick won't work in other relational databases that support JSON because it also uses a long existing feature of PostgreSQL to be able to treat a row as a data field.

Continue reading "Unpivoting data using JSON functions"

Internet of Things tends to generate large volumes of data at a great velocity. Often times this data is collected from geographically distributed sources and aggregated at a central location for data scientists to perform their magic i.e. find patterns, trends and make predictions.

Let’s explore what the IoT Solution using Postgres-BDR has to offer for Data Analytics. Postgres-BDR is offered as an extension on top of PostgreSQL 10 and above. It is not a fork. Therefore, we get the power of complete set of analytic functions that PostgreSQL has to offer. In particular, I am going to play around with PostgreSQL’s Window Functions here to analyze a sample of time series data from temperature sensors.

Let’s take an example of IoT temperature sensor time series data spread over a period of 7 days. In a typical scenario, temperature sensors are sending readings every minute. Some cases could be even more frequent. For the sake of simplicity, however, I am going to use one reading per day. The objective is to use PostgreSQL’s Window Functions for running analytics and that would not change with increasing the frequency of data points.

Here’s our sample data. Again, the number of table fields in a real world IoT temperature sensor would be higher, but our fields of interest in this case are restricted to timestamp of the temperature recording, device that reported it and the actual reading.

CREATE TABLE iot_temperature_sensor_data (
    ts timestamp without time zone,
    device_id text,
    reading float
);

and we add some random timeseries temperature sensor reading data for seven consecutive days

         ts          |            device_id             | reading 

---------------------+----------------------------------+---------

 2017-06-01 00:00:00 | ff0d1c4fd33f8429b7e3f163753a9cb0 |      10

 2017-06-02 00:00:00 | d125efa43a62af9f50c1a1edb733424d |       9

 2017-06-03 00:00:00 | 0b4ee949cc1ae588dd092a23f794177c |       1

 2017-06-04 00:00:00 | 8b9cef086e02930a808ee97b87b07b03 |       3

 2017-06-05 00:00:00 
[...]
Posted by Pavel Stehule on 2018-06-13 at 08:52
My new article is in Czech language, but Google translator can help.
Posted by Craig Kerstiens in CitusData on 2018-06-12 at 16:44

work_mem is perhaps the most confusing setting within Postgres. work_mem is a configuration within Postgres that determines how much memory can be used during certain operations. At its surface, the work_mem setting seems simple: after all, work_mem just specifies the amount of memory available to be used by internal sort operations and hash tables before writing data to disk. And yet, leaving work_mem unconfigured can bring on a host of issues. What perhaps is more troubling, though, is when you receive an out of memory error on your database and you jump in to tune work_mem, only for it to behave in an un-intuitive manner.

Setting your default memory

The work_mem value defaults to 4MB in Postgres, and that’s likely a bit low. This means that per Postgres activity (each join, some sorts, etc.) can consume 4MB before it starts spilling to disk. When Postgres starts writing temp files to disk, obviously things will be much slower than in memory. You can find out if you’re spilling to disk by searching for temporary file within your PostgreSQL logs when you have log_temp_files enabled. If you see temporary file, it can be worth increasing your work_mem.

On Citus Cloud (our fully-managed database as a service that scales out Postgres horizontally), we automatically tune work_mem based on the overall memory available to the box. Our tuning is based on the years of experience of what we’ve seen work for a variety of production Postgres workloads, coupled with statistics to compute variations based on cluster sizing.

It’s tough to get the right value for work_mem perfect, but often a sane default can be something like 64 MB, if you’re looking for a one size fits all answer.

It’s not just about the memory for queries

Let’s use an example to explore how to think about optimizing your work_mem setting.

Say you have a certain amount of memory, say 10 GB. If you have 100 running Postgres queries, and each of those queries has a 10 MB connection overhead, then 100*10 MB (1 GB) of memory is taken up by the 100 connections—whi

[...]

The first time I got paid for doing PostgreSQL work on the side, I spent most of the proceeds on the mortgage (boring, I know), but I did get myself one little treat: a boxed set of DVDs from a favorite old television show. They became part of my evening ritual, watching an episode while cleaning the kitchen before bed. The show features three military draftees, one of whom, Frank, is universally disliked. In one episode, we learn that Frank has been unexpectedly transferred away, leaving his two roommates the unenviable responsibility of collecting Frank’s belongings and sending them to his new assignment. After some grumbling, they settle into the job, and one of them picks a pair of shorts off the clothesline, saying, “One pair of shorts, perfect condition: mine,” and he throws the shorts onto his own bed. Picking up another pair, he says, “One pair of shorts. Holes, buttons missing: Frank’s.”

The other starts on the socks: “One pair of socks, perfect condition: mine. One pair socks, holes: Frank’s. You know, this is going to be a lot easier than I thought.”

“A matter of having a system,” responds the first.

I find most things go better when I have a system, as a recent query writing task made clear. It involved data from the Instituto Nacional de Estadística y Geografía, or INEGI, an organization of the Mexican government tasked with collecting and managing country-wide statistics and geographical information. The data set contained the geographic outline of each city block in Mexico City, along with demographic and statistical data for each block: total population, a numeric score representing average educational level, how much of the block had sidewalks and landscaping, whether the homes had access to the municipal sewer and water systems, etc. We wanted to display the data on a Liquid Galaxy in some meaningful way, so I loaded it all in a PostGIS database and built a simple visualization showing each city block as a polygon extruded from the earth, with the height and color of the polygon proportional to

[...]

In a previous article of mine I’ve been bitten by the “temporary function issue” (which isn’t an issue at all, of course).

I needed a way to use the now() function in different ways to do a sort of “time travel” over a history table.

I’ve found a few easy™ ways to accomplish the same task and have now distilled one that seems to me to be the “best one™. I will make use of a temporary table.

The trick is that a new function, called mynow() will access a table without an explicit schema qualifier but relying on a controlled search_path. This approach opens the door to a controlled table masking thanks to the temporary schema each session has. Let’s see this function first.

create or replace function mynow( out ts timestamp )
language plpgsql
as $l0$
begin
  select * into ts from mytime;
  if not found then
    ts := now();
  end if;
end;
$l0$;

If you put a timestamp in the table mytime, then that timestamp will be used a the current time. If there’s nothing in there, then the normal function output will be used.

First of all, you need to create an always-empty non-temporary table like this in the public schema:

create table mytime ( ts timestamp );

So the function will start working soon with the default time line. If I inserted a time stamp straight in there, I’d set the function behavior for all sessions at once. But this isn’t normally the objective.

As soon as a user needs a different reference time for its time travelling, she needs to do the following:

set search_path = pg_temp,"$user", public; -- 1st
create table mytime ( like public.mytime including all ); -- 2nd
insert into mytime values ( '2000-01-01' ); -- 3rd

That’s it. More or less.

The first statement alters the search_path setting so the pg_temp schema becomes the first one to be searched in and, thus, the “default” schema.

The second one creates a temporary table (re-read the previous sentence, please!) “just like the public one“. Please note the schema qualifying used with the like predicate.

The third one will insert into that temporary table (you

[...]
A bad query plan ...
© Laurenz Albe 2018

 

We all know that you have to pay a price for a new index you create — data modifying operations will become slower, and indexes use disk space. That’s why you try to have no more indexes than you actually need.

But most people think that SELECT performance will never suffer from a new index. The worst that can happen is that the new index is not used.

However, this is not always true, as I have seen more than once in the field. I’ll show you such a case and tell you what you can do about it.

An example

We will experiment with this table:

CREATE TABLE skewed (
   sort        integer NOT NULL,
   category    integer NOT NULL,
   interesting boolean NOT NULL
);

INSERT INTO skewed
   SELECT i, i%1000, i>50000
   FROM generate_series(1, 1000000) i;

CREATE INDEX skewed_category_idx ON skewed (category);

VACUUM (ANALYZE) skewed;

We want to find the first twenty interesting rows in category 42:

EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM skewed
WHERE interesting AND category = 42
ORDER BY sort
LIMIT 20;

This performs fine:

                             QUERY PLAN
--------------------------------------------------------------------
 Limit  (cost=2528.75..2528.80 rows=20 width=9)
        (actual time=4.548..4.558 rows=20 loops=1)
   Buffers: shared hit=1000 read=6
   ->  Sort  (cost=2528.75..2531.05 rows=919 width=9)
             (actual time=4.545..4.549 rows=20 loops=1)
         Sort Key: sort
         Sort Method: top-N heapsort  Memory: 25kB
         Buffers: shared hit=1000 read=6
         ->  Bitmap Heap Scan on skewed
                        (cost=19.91..2504.30 rows=919 width=9)
                        (actual time=0.685..4.108 rows=950 loops=1)
               Recheck Cond: (category = 42)
               Filter: interesting
               Rows Removed by Filter: 50
               Heap Blocks: exact=1000
               Buffers: shared hit=1000 read=6
               ->  Bitmap Index Scan on skewed_category_idx
                        (cost=0.00..19.68 rows=967 width=0)
                   
[...]