PostgreSQL
The world's most advanced open source database
Top posters
Number of posts in the past month
Top teams
Number of posts in the past month
Feeds
Planet
  • Policy for being listed on Planet PostgreSQL.
  • Add your blog to Planet PostgreSQL.
  • List of all subscribed blogs.
  • Manage your registration.
Contact
  • Get in touch with the Planet PostgreSQL administrators at planet at postgresql.org.
Posted by Denish Patel on 2019-08-19 at 18:35

Postgres doesn’t have built in function to clone schema. However,  may of backend developers would like to perform a number of operations at schema level directly from command line tools such as psql or any database developer tools (DBeaver). One of the common operation is to copy or clone the entire schema. While researching any existing solutions, I stumbled upon  postgres-general thread where one of the developer(s) provided clone_schema.sql function. I uploaded the existing function provided in thread to my git repo here.

However, when I tried to use it, it was throwing error related to sequence. I updated the function to solve the error reported . Additionally, I added support to copy Materialized Views as well. I have uploaded the final working function here

I have tested the public.clone_schema function for complex schema and it seems to be working with copying schema ONLY or schema WITH DATA as well .

You can call function like this to copy schema with data:

select clone_schema('development', 'development_clone', true);

Alternatively, if you want to copy only schema without data:

select clone_schema('development', 'development_clone', false);

Hope this will help someone. If you have better or other easier approach, please don’t hesitate to put your comments so I can improve processes on my end!

Parallel query has been a part of PostgreSQL since 2016 with the release of version 9.6 and in theory PostGIS should have been benefiting from parallelism ever since.

In practice, the complex nature of PostGIS has meant that very few queries would parallelize under normal operating configurations -- they could only be forced to parallelize using oddball configurations.

With PostgreSQL 12 and PostGIS 3, parallel query plans will be generated and executed far more often, because of changes to both pieces of software:

  • PostgreSQL 12 includes a new API that allows extensions to modify query plans and add index clauses. This has allowed PostGIS to remove a large number of inlined SQL functions that were previously acting as optimization barriers to the planner.
  • PostGIS 3 has taken advantage of the removal of the SQL inlines to re-cost all the spatial functions with much higher costs. The combination of function inlining and high costs used to cause the planner to make poor decisions, but with the updates in PostgreSQL that can now be avoided.
In a developer’s ideal world there’s only one stable and one development version of the product. Everyone updates immediately to the next stable version and nobody cares about anything older. As far as I can tell this has never actually happened, ever, in the history of anything. # Working on many versions PostgreSQL is a […]

2ndQuadrant has written before about how to build your own extension for Windows. This time we take a look at what it takes to port one written by someone else. I am building this for PostgreSQL 11 on Windows using Visual Studio 2019 and clang.

Here we take are working with the hashtypes extension. Unfortunately, at the time of writing, version 0.1.5 has not been released with the support for PostgreSQL 11. So we build one from git.

git clone https://github.com/adjust/hashtypes.git

The first thing we do is to take a look at the C source files. There are only four of them, common.c, crc32.c, md5.c, and sha.c so we can be reasonably sure the porting effort will be quick.

Then we use the directions from 2ndQuadrant to build our own makefile, called windows.mak. We start by setting the compiler and PostgreSQL directory in variables. Here we are using clang to compile so the flags will be familiar to Unix programmers but alien to those used to cl.

## You will need to change these two settings for your own system.
CC="C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\Llvm\8.0.0\bin\clang.exe"

POSTGRES="C:\Program Files\PostgreSQL\11"

Then we set the compilation flags in CFLAGS.

CFLAGS=-m64 -Wall -Wextra -O3 \
        -I$(POSTGRES)\include\server\port\win32_msvc \
        -I$(POSTGRES)\include\server\port\win32 \
        -I$(POSTGRES)\include\server \
        -I$(POSTGRES)\include

The CFLAGS are set to build 64bit binaries with -m64 then we add the usual warnings and some extra warnings for extra credit with -Wall and -Wextra. Then since we’re compiling production code we add optimization with -O3. The various -I flags are for include directories and we take their names directly from the tutorial by 2ndQuadrant.

The link flags are easy, we just add the PostgreSQL lib directory with -L and link with postgres with -l.

LDFLAGS=-L$(POSTGRES)\lib -lpostgres

Now we can start our make rules. We make a generic rule for the object files.

.c.obj:
        $(CC) $(CF
[...]

Windowing functions and analytics have been around for quite some time and many people already make use of this awesome stuff in the PostgreSQL world. Timeseries  are an especially important area in this context. However, not all features have been widely adopted and thus many developers have to implement functionality at the application level in a painful way instead of just using some of the more advanced SQL techniques.

The idea of this blog is to demonstrate some of the advanced stuff so that more people out there can make use of PostgreSQL’s true power.

Preparing data for analytics

For the purpose of this post I have created a simple data set:

test=# CREATE TABLE t_demo AS 
        SELECT   ordinality, day, date_part('week', day) AS week 
        FROM    generate_series('2020-01-02', '2020-01-15', '1 day'::interval) 
                        WITH ORDINALITY AS day;
SELECT 14

In PostgreSQL the generate_series function will return one row each day spanning January 1st, 2020 to January 15th, 2020. The WITH ORDINALITY clause tells PostgreSQL to add an “id” column to the resultset of the function. The date_part function will extract the number of the week out of our date. The purpose of this column is to have a couple of identical values in our timeseries.

In the next list you see the data set we will use:

test=# SELECT * FROM t_demo;
 ordinality |          day           | week 
------------+------------------------+------
          1 | 2020-01-02 00:00:00+01 |    1
          2 | 2020-01-03 00:00:00+01 |    1
          3 | 2020-01-04 00:00:00+01 |    1
          4 | 2020-01-05 00:00:00+01 |    1
          5 | 2020-01-06 00:00:00+01 |    2
          6 | 2020-01-07 00:00:00+01 |    2
          7 | 2020-01-08 00:00:00+01 |    2
          8 | 2020-01-09 00:00:00+01 |    2
          9 | 2020-01-10 00:00:00+01 |    2
         10 | 2020-01-11 00:00:00+01 |    2
         11 | 2020-01-12 00:00:00+01 |    2
         12 | 2020-01-13 00:00:00+01 |    3
         13 | 2020-01-14 00:00:00+01 |    3
   
[...]
Posted by Robert Treat in credativ on 2019-08-15 at 21:17

This post marks the official launch of the Charm City Postgres Meetup group. Our goal is to help bring together the growing community of Developers, DBAs, DBREs, and other technologist in the greater Baltimore area who are working with Postgres. We're currently working to organize folks and would encourage interested people to sign-up on the meetup site, and have set an initial meetup at Smart Office @ BWI. If you have questions or want to speak, sponsor, or host a meeting in the future, you can contact me either through the meetup page or find me on the Postgres Team Slack.

Trigger Function in C for PostgreSQL

Trigger Function in C for PostgreSQLWe have been writing blog posts about how to write simple extensions in C language and a little more complex one by Ibrar which were well received by PostgreSQL user community. Then we observed that many PostgreSQL users create simple triggers for small auditing requirements, and then feel the pain of trigger on transactions. So we were discussing how simple/lightweight and faster a trigger function is when written in C. Generally, Trigger functions are written in high-level languages like PlpgSQL, but it has a higher overhead during execution and it can impact the transactions – and thereby application performance.

This blog post is an attempt to create a simple trigger function to address one of the common use-cases of triggers, which is to update auditing columns in a table.

In this post, we are going to introduce SPI (Server Programming Interface) functions for novice users. Towards the end of the blog, we share some of the quick benchmark results for understanding the benefits.

Example of Audit timestamp

Let’s proceed with taking up a case and assume that we have a table to hold transaction details. But auditing requirements say that there should be a timestamp on each tuple when the tuple is inserted and when it was last updated.

CREATE TABLE transdtls(
  transaction_id int,
  cust_id int,
  amount  int,
...
  insert_ts timestamp,
  update_ts timestamp
);

For demonstration purpose, let’s remove and trim the other columns and create a table with only 3 essential columns.

CREATE TABLE transdtls(
  transaction_id int,
  insert_ts timestamp,
  update_ts timestamp
);

Developing Trigger Function

The trigger function can also be developed and packaged as an extension, which we discussed in s previous blog post here. So we are not going to repeat those steps here. The difference is that file names are named as “trgr” instead of “addme” in the previous blog. Makefile is also modified to refer “trgr” files. This need not be same as the function name “trig_test” in the C source detailed

[...]

With the availability of MVT tile format in PostGIS via ST_AsMVT(), more and more people are generating tiles directly from the database. Doing so usually involves a couple common steps:

Timeseries are an increasingly important topic – not just in PostgreSQL. Recently I gave a presentation @AGIT in Salzburg about timeseries and I demonstrated some super simple examples. The presentation was well received so I decided to share this stuff in the form of a blog PostgreSQL, so that more people can learn about windowing functions and SQL in general. A link to the video is available at the end of the post so that you can listen to the original material in German.

Loading timeseries data the easy way

To show how data can be loaded, I compiled a simple dataset, which can be found on my website. Here is how it works:

test=# CREATE TABLE t_oil 
(
        region          text, 
        country         text, 
        year            int, 
        production      int, 
        consumption     int
);
CREATE TABLE
test=# COPY t_oil FROM 
        PROGRAM 'curl https://www.cybertec-postgresql.com/secret/oil_ext.txt';
COPY 644

The cool thing is that if you happen to be a superuser you can easily load the data from the web directly. COPY FROM PROGRAM allows you to execute code on the server and pipe it directly to PostgreSQL, which is super simple. But keep in mind: This only works if you are a PostgreSQL superuser (for security reasons).

lag: The backbone of timeseries analysis

If you are dealing with timeseries calculating the difference to the previous period is really important and is needed in many cases. Fortunately SQL allows you to do that pretty easily. Here is how it works:

test=# SELECT year, production, 
                lag(production, 1) OVER (ORDER BY year) 
        FROM    t_oil 
        WHERE   country = 'USA' 
        LIMIT 5;
 year | production |  lag  
------+------------+-------
 1965 |       9014 |      
 1966 |       9579 |  9014
 1967 |      10219 |  9579
 1968 |      10600 | 10219
 1969 |      10828 | 10600
(5 rows)

The lag functions takes two parameters: The first column defines the column, which should be used in this case. The second parameter is optional. I

[...]

The PostGIS development team is pleased to provide enhancements/features 3.0.0alpha4 for 3.0 feature major branch bug fix 2.5.3, 2.4.8, and 2.3.10 for the 2.5, 2.4, and 2.3 stable branches.

3.0.0alpha4 This release works with PostgreSQL 9.5-12beta3 and GEOS >= 3.6

Best served with PostgreSQL 12beta3. Designed to take advantage of features in PostgreSQL 12 and Proj 6

2.5.3 This release supports PostgreSQL 9.3-12 You are encouraged to use the PostGIS 3.0 unreleased branch with PostgreSQL 12 , which has features specifically designed to take advantage of features new in PostgreSQL 12.

2.4.8 This release supports PostgreSQL 9.3-10.

2.3.10

This release supports PostgreSQL 9.2-10.

View all closed tickets for 3.0.0, 2.5.3, 2.4.8, 2.3.10.

After installing the binaries or after running pg_upgrade, make sure to do:

ALTER EXTENSION postgis UPDATE;

— if you use the other extensions packaged with postgis — make sure to upgrade those as well

ALTER EXTENSION postgis_sfcgal UPDATE;
ALTER EXTENSION postgis_topology UPDATE;
ALTER EXTENSION postgis_tiger_geocoder UPDATE;

If you use legacy.sql or legacy_minimal.sql, make sure to rerun the version packaged with these releases.

Posted by Alexander Hagerman on 2019-08-08 at 13:47

Testing Postgres advisory locks with asyncio and asyncpg.

Recently, here on the Cloud team at Elastic we started working on building a new service in Python 3.7. This service fetches data from a Postgres database, transforms it, and then submits that data to another service. Like many cloud-based services, ours runs in an orchestrated container environment where N instances can be running at any time. Often that's a good thing, but our service has a few critical sections where only one instance should be able to process data. Since we are retrieving data from Postgres, we decided to go ahead and make use of advisory locks to control these critical sections. In this article I want to explain what advisory locks are, provide an implementation, and test to verify functionality.

Advisory locks

Postgres provides the ability to create locks that only have meaning within the context of your application. These are advisory locks. You use advisory locks to control an application’s ability to process data. Anytime your application is about to enter a critical path, you attempt to acquire the lock. When you acquire the lock, you can safely continue processing.

async with AdvisoryLock(self.dbconfig, "gold_leader") as connection:

If it fails, then your application may retry, wait, or exit. Since this lock is external to the application, this allows for multiple instances of the application to run while providing safe critical path concurrency.

Building the lock

As part of our work, we wanted to make using advisory locks easy. To do this, we created the PostgresAdvisoryLock context manager. Since this is meant to be used with asyncio and asyncpg, we control the acquisition and release of the lock via __aenter__ and __aexit__.

class AdvisoryLock:
    async def __aenter__(self) -> asyncpg.connection.Connection:
        self.locked_connection = await asyncpg.connect(...)
        await self._set_lock()
        if self.got_lock:
            return self.locked_connection
        else:
            
[...]

Recently, I did some work with Sawada-san on the TDE. So I studied on the encryption algorithm. So far, I study five modes in the AES. In this document, I will introduce the difference in the five kinds of mode.

General

The block ciphers are schemes for encryption or decryption where a block of plaintext is treated as a single block and is used to obtain a block of ciphertext with the same size. Today, AES (Advanced Encryption Standard) is one of the most used algorithms for block encryption. It has been standardized by the NIST (National Institute of Standards and Technology) in 2001, in order to replace DES and 3DES which were used for encryption in that period. The size of an AES block is 128 bits, whereas the size of the encryption key can be 128, 192 or 256 bits. Please note this, there is three length in the key, but the size of the encryption block always is 128 bits. Block cipher algorithms should enable encryption of the plaintext with size which is different from the defined size of one block as well. We can use some algorithms for padding block when the plaintext is not enough a block, like PKCS5 or PKCS7, it also can defend against PA attack, if we use ECB or CBC mode. Or we can use the mode of AES which support a stream of plaintext, like CFB, OFB, CTR mode.

Now let’s introduce the five modes of AES.

  • ECB mode: Electronic Code Book mode
  • CBC mode: Cipher Block Chaining mode
  • CFB mode: Cipher FeedBack mode
  • OFB mode: Output FeedBack mode
  • CTR mode: Counter mode

The attack mode:

  • PA: Padding attack
  • CPA: Chosen Plaintext Attack
  • CCA: Chosen Ci

ECB Mode

The ECB (Electronic Code Book) mode is the simplest of all. Due to obvious weaknesses, it is generally not recommended. A block scheme of this mode is presented in Fig. 1.


We can see it in Fig. 1, the plaintext is divided into blocks as the length of the block of AES, 128. So the ECB mode needs to pad data until it is same as the length of the block. Then every block

[...]

A few months ago, I wrote about how SortSupport works in Postgres to vastly speed up sorting on large data types 1 like numeric or text, and varchar. It works by generating abbreviated keys for values that are representative of them for purposes of sorting, but which fit nicely into the pointer-sized value (called a “datum”) in memory that Postgres uses for sorting. Most values can be sorted just based on their abbreviated key, saving trips to the heap and increasing sorting throughput. Faster sorting leads to speedup on common operations like DISTINCT, ORDER BY, and CREATE INDEX.

A patch of mine was recently committed to add SortSupport for the inet and cidr types, which by my measurement, a little more than doubles sorting speed on them. inet and cidr are the types used to store network addresses or individual hosts and in either IPv4 or IPv6 (they generally look something like 1.2.3.0/24 or 1.2.3.4).

inet and cidr have some important subtleties in how they’re sorted which made designing an abbreviated key that would be faithful to those subtleties but still efficient, a non-trivial problem. Because their size is limited, abbreviated keys are allowed to show equality even for values that aren’t equal (Postgres will fall back to authoritative comparison to confirm equality or tiebreak), but they should never falsely indicate inequality.

Network type anatomy, and inet vs. cidr

A property that’s not necessarily obvious to anyone unfamiliar with them is that network types (inet or cidr) can either address a single host (what most people are used to seeing) or an entire subnetwork of arbitrary size. For example:

  • 1.2.3.4/32 specifies a 32-bit netmask on an IPv4 value, which is 32 bits wide, which means that it defines exactly one address: 1.2.3.4. /128 would work similarly for IPv6.

  • 1.2.3.0/24 specifies a 24-bit netmask. It identifies the network at 1.2.3.*. The last byte may be anywhere in the range of 0 to 255.

  • Similarly, 1.0.0.0/8 speci

[...]
Materialized views are convenient and efficient way to retrieve information from database. Unlike ordinary views, materialized views save the query result and provide faster access to the data. For example, user can create a simple materialized view containing the number of rows in a table:

 CREATE MATERIALIZED VIEW mv1 AS SELECT count(*) FROM pgbench_accounts;

Obtaining the number of rows using the materialized view mv1 is much faster than  directly accessing number of rows in pgbench_accounts.

test=# SELECT * from v1;
 count 
--------
 100000
(1 row)

Time: 0.810 ms
test=# SELECT count(*) FROM pgbench_accounts;
 count 
--------
 100000
(1 row)

Time: 13.945 ms


However if you delete a row from pgbench_accounts, the number of rows in mv1 is still unchanged. To reflect the change of the base table (in this case pgbench_accounts) , you need to recreate or refresh (this actually recreate the contents of materialize views from scratch), which may take long time.

To overcome the problem, SRA OSS is proposing to add a new feature to existing materialized view  "incremental materialized view maintenance". This works like this.

  • add new syntax to allow incrementally update the materialized view when it is created.
  • install triggers automatically to be attached to the base table(s).
  • when UPDATE/DELETE/INSERT applies on the base tables, the trigger calculate the difference to the rows in the materialized view and update the rows.
These should update the materialized view in a real time manner. The price is the over head of trigger invocation. So frequently updating tables are not best suite for the incremental materialized view maintenance. Here are demonstrations how to use the incrementally updating materialized view.

First create incrementally updating materialized view.

CREATE INCREMENTAL MATERIALIZED view mv2 AS SELECT count(*) FROM pgbench_accounts;

The view initially says there are 100,000 rows.

SELECT * FROM mv2;
 count 
--------
 100000
(1 row)


Delete a row from pgbench_accounts. Now t[...]

The raster functionality in PostGIS has been part of the main extension since it was introduced. When PostGIS 3 is released, if you want raster functionality you will need to install both the core postgis extension, and also the postgis_raster extension.

When I wrote about Reason #3: No Vendor Lock-in, I leaned on the liberal PostgreSQL License and the fact that there are many vendors that can Support PostgreSQL in production. Another aspect that deserves its own mention is the fact that PostgreSQL can not be bought.  Let me explain the significance of that statement.  MySQL.  […]
Posted by Mark Wong on 2019-08-06 at 04:41

When: 6-8pm Thursday August 22, 2019
Where: PSU Business Accelerator (Parking is open after 5pm.)
Who: Jackson Voelkel

Jackson will discuss the basics of GIS, and the current environment in which enterprise spatial analytics are performed. Considering the many pitfalls of this current system, he will discuss how FOSS tools – especially PostgreSQL/PostGIS – are vitally important for modern spatial analytics. In addition to the PostGIS extension, Jackson will talk about network routing using the PgRouting extension and the interface between R and PostgreSQL. This talk will act more as a showcase of GIS and “spatial SQL” within PostgreSQL than it will nitty-gritty database development.

Jackson Voelkel is a Health Data Analyst on Kaiser Permanente’s GIS Team as well as an Adjunct Professor of Geography at Portland State University. He focuses on developing infrastructure for and performing advanced spatial analytics across healthcare, environment, urban planning, utilities, and econometrics. He teaches courses on advanced spatial analytics in R as well as spatial database design with PostgreSQL/PostGIS.

The temporary tables are a useful concept present in most SGBDs, even though they often work differently.

This blog describes the technical features for this kind of tables either in PostgreSQL (version 11) or Oracle (version 12c) databases with some specific examples. Although the purpose of these tables could be the same for all SGBD’s, their specifics, or the way of implementation and manipulation, are completely different.

This feature could be used both by developers or database administrators to store intermediate results that will be needed for further processing in order to provide good performance metrics.

Temporary Tables in PostgreSQL

In PostgreSQL these objects are valid only for the current session: they are created, used and dropped along the same session: the structure of the table and managed data only are visible for the current session, thus the other sessions don’t have access to the temporary tables created on the other sessions.

Below it’s showed a simple example to create a temporary table:

CREATE TEMPORARY TABLE tt_customer
(
     customer_id INTEGER
)
ON COMMIT DELETE ROWS;

The temporary tables are created in a temporary schema: pg_temp_nn and it’s possible to create indexes on these tables:

creation index  tt_cusomer_idx_1 on tt_customer(customer_id)

As the data rows on these tables could be also deleted, it’s possible to release the occupied storage through the execution of vaccum command:

VACUUM VERBOSE tt_customer

The analyze command can be executed as well on the temporary tables in order to collect the statistics:

ANALYZE VERBOSE tt_customer;

Both commands can be executed for this kind of table as SQL command, however, the autovaccum daemon that execute them does not act on the temporary tables.

Another important point to consider it’s related to the permanent and temporary tables with the same name: once it happens the permanent table only is taken into account when ca

[...]
Posted by Ibrar Ahmed in Percona on 2019-08-02 at 13:29
Out-Of-Memory Killer

In a Linux world, whenever your database server crashes or gets terminated, you need to find its cause. There can be several reasons for this. It can be SIGSEGV, which is a crash due to some bug in the backend server, but this is the least likely reason. The most common reason is running out of disk space or running out of memory. If you are getting the “running out of space” error, the only solution is to clear some space and restart your database.

Out-Of-Memory Killer

Whenever your server/process is out of memory, Linux has two ways to handle that, the first one is an OS(Linux) crash and your whole system is down, and the second one is to kill the process (application) making the system run out of memory. The best bet for the second option is to kill the process and save the OS from crashing. In short, the Out-Of-Memory Killer is the process which is responsible for terminating the application to save the kernel from crashing, as it only kills the application and saves the entire OS from crashing. Let’s first discuss how OOM and works and how to control that, and later we will discuss how OOM Killer decides which application to kill.

One of the primary jobs of a Linux operating system is to allocate memory to a process when it is requesting memory allocation. In most cases, the process/application will request the OS for memory, but it will not use all of the memory that was requested. If the OS allocates memory to all the processes that are requesting memory but doesn’t plan to use it, it will soon run out of memory – and the system will crash. In order to handle this scenario, the operating system has a feature that enables the OS to commit memory to a process without actually allocating it. The allocation is done only when the process actually plans to use that memory. At times the OS may not have available memory but it will commit the memory to the process, and when the process plans to use the memory, the OS will allocate the committed memory if it is available. The downside of this featur

[...]
CREATE TABLE varchar_demo(description varchar(200));

INSERT INTO varchar_demo VALUES('こんにちは');
INSERT INTO varchar_demo VALUES('ABCDE');

SELECT * FROM varchar_demo;
 description 
-------------
 こんにちは
 ABCDE
(2 rows)


CREATE OR REPLACE FUNCTION dump(columnname TEXT, tablename TEXT,
                                rowid INT) RETURNS TEXT
LANGUAGE plpgsql
AS
$function$
DECLARE 
  hex_str TEXT;
  hex_byte TEXT;
  query TEXT;
  i INT;
  len INT;
BEGIN
  query := format('SELECT octet_length(%I) FROM %I offset %s
                   LIMIT 1', columnname, tablename, rowid);
  EXECUTE query INTO len;
  hex_str := 'Len=' || len || ' ';
  FOR i in 1..len LOOP
    query := format('SELECT to_hex(get_byte(%I::bytea, %s)) 
                     FROM %I offset %s LIMIT 1', 
                     columnname, i - 1, tablename, rowid);
    EXECUTE query INTO hex_byte;
    hex_str := hex_str || hex_byte || ',';
  END LOOP;
  RETURN hex_str;
END;
$function$;


select dump('description', 'varchar_demo', 0);
                         dump                         
------------------------------------------------------
 Len=15 e3,81,93,e3,82,93,e3,81,ab,e3,81,a1,e3,81,af,
(1 row)

select dump('description', 'varchar_demo', 1);
         dump          
-----------------------
 Len=5 41,42,43,44,45,
(1 row)

OrderedAppend: An optimization for range partitioning

With this feature, we’ve seen up to 100x performance improvements for certain queries.

In our previous post on implementing constraint exclusion, we discussed how TimescaleDB leverages PostgreSQL’s foundation and expands on its capabilities to improve performance. Continuing with the same theme, in this post we will discuss how we’ve added support for ordered appends which optimize a large range of queries, particularly those that are ordered by time.

We’ve seen performance improvements up to 100x for certain queries after applying this feature, so we encourage you to keep reading!

Optimizing Appends for large queries

PostgreSQL represents how plans should be executed using “nodes”. There are a variety of different nodes that may appear in an EXPLAIN output, but we want to focus specifically on Append nodes, which essentially combine the results from multiple sources into a single result.

PostgreSQL has two standard Appends that are commonly used that you can find in an EXPLAIN output:

  • Append: appends results of child nodes to return a unioned result
  • MergeAppend: merge output of child nodes by sort key; all child nodes must be sorted by that same sort key; accesses every chunk when used in TimescaleDB

When MergeAppend nodes are used with TimescaleDB, we necessarily access every chunk to figure out if the chunk has keys that we need to merge. However, this is obviously less efficient since it requires us to touch every chunk.

To address this issue, with the release of TimescaleDB 1.2 we introduced OrderedAppend as an optimization for range partitioning. The purpose of this feature is to optimize a large range of queries, particularly those that are ordered by time and contain a LIMIT clause. This optimization takes advantage of the fact that we know the range of time held in each chunk, and can stop accessing chunks once we’ve found enough rows to satisfy the LIMIT clause. As mentioned above, with this optimization we see performance improvements of up to 100x

[...]
PostegreSQL simple C extension development

PostgreSQL Simple C extensionOne of the great features of PostgreSQL is its extendability. My colleague and senior PostgreSQL developer Ibar has blogged about developing an extension with much broader capabilities including callback functionality. But in this blog post, I am trying to address a complete novice user who has never tried but wants to develop a simple function with business logic. Towards the end of the blog post, I want to show how lightweight the function is by doing simple benchmarking which is repeatable and should act as a strong justification for why end-users should do such development.

Generally, PostgreSQL and extension developers work on a PostgreSQL source build. For a novice user, that may not be required, instead, dev/devel packages provided for the Linux distro would be sufficient. Assuming that you have installed PostgreSQL already, the following steps can get you the additional development libraries required.

On Ubuntu/Debian

$ sudo apt install postgresql-server-dev-11

On RHEL/CentOS

sudo yum install postgresql11-devel

The next step is to add a PostgreSQL binary path to your environment, to ensure that pg_config is there in the path. In my Ubuntu laptop, this is how:

export PATH=/usr/lib/postgresql/11/bin:$PATH

Above mentioned paths may vary according to the environment.

Please make sure that the pg_config is executing without specifying the path:

$ pg_config

PostgreSQL installation provides a build infrastructure for extensions, called PGXS, so that simple extension modules can be built simply against an already-installed server. It automates common build rules for simple server extension modules.

$ pg_config --pgxs
/usr/lib/postgresql/11/lib/pgxs/src/makefiles/pgxs.mk

Now let’s create a directory for development. I am going to develop a simple extension addme with a function addme to add 2 numbers.

$ mkdir addme

Now we need to create a Makefile which builds the extension. Luckily, we can use all PGXS macros.

MODULES = addme
EXTENSION = addme
DATA = addme--0.0.1.sql
PG_CO
[...]
A regular expression is a special text string used to describe a search pattern. PostgreSQL’s regular expressions supports three separate approaches to pattern matching: POSIX-style regular expressions (BREs and EREs) SIMILAR TO operator added in SQL:1999 SQL LIKE operator There are some more advanced techniques for advanced pattern matching requirements but those will very likely […]

Vector tiles are the new hotness, allowing large amounts of dynamic data to be sent for rendering right on web clients and mobile devices, and making very beautiful and highly interactive maps possible.

Since the introduction of ST_AsMVT(), people have been generating their tiles directly in the database more and more, and as a result wanting tile generation to go faster and faster.

Every tile generation query has to carry out the following steps:

  • Gather all the relevant rows for the tile
  • Simplify the data appropriately to match the resolution of the tile
  • Clip the data to the bounds of the tile
  • Encode the data into the MVT protobuf format

For PostGIS 3.0, performance of tile generation has been vastly improved.

On 27th of July 2019, Michael Paquier committed patch: Add support for --jobs in reindexdb   When doing a schema-level or a database-level operation, a list of relations to build is created which gets processed in parallel using multiple connections, based on the recent refactoring for parallel slots in src/bin/scripts/. System catalogs are processed first … Continue reading "Waiting for PostgreSQL 13 – Add support for –jobs in reindexdb"
Posted by Paul Ramsey in PostGIS on 2019-07-30 at 20:16

And, late on a Friday afternoon, the plaintive cry was heard!

And indeed, into the sea they do go!

And ‘lo, the SQL faeries were curious, and gave it a shot!

##### Commandline OSX/Linux #####

# Get the Shape files
# http://www.elections.bc.ca/index.php/voting/electoral-maps-profiles/
wget http://www.elections.bc.ca/docs/map/redis08/GIS/ED_Province.exe

# Exe? No prob, it's actually a self-extracting ZIP
unzip ED_Province

# Get a PostGIS database ready for the data
createdb ed_clip
psql -c "create extension postgis" -d ed_clip

# Load into PostGIS
# The .prj says it is "Canada Albers Equal Area", but they
# lie! It's actually BC Albers, EPSG:3005
shp2pgsql -s 3005 -i -I ED_Province ed | psql -d ed_clip

# We need some ocean! Use Natural Earth...
# http://www.naturalearthdata.com/downloads/
wget http://www.naturalearthdata.com/http//www.naturalearthdata.com/download/10m/physical/ne_10m_ocean.zip
unzip ne_10m_ocean.zip

# Load the Ocean into PostGIS!
shp2pgsql -s 4326 -i -I ne_10m_ocean ocean | psql -d ed_clip

# OK, now we connect to PostGIS and start working in SQL
psql -e ed_clip
-- How big is the Ocean table?
SELECT Count(*) FROM ocean;

-- Oh, only 1 polygon. Well, that makes it easy... 
-- For each electoral district, we want to difference away the ocean.
-- The ocean is a one big polygon, this will take a while (if we
-- were being more subtle, we'd first clip the ocean down to 
-- a reasonable area around BC.)
CREATE TABLE ed_clipped AS
SELECT 
  CASE 
  WHEN ST_Intersects(o.geom, ST_Transform(e.geom,4326))
  THEN ST_Difference(ST_Transform(e.geom,4326), o.geom)
  ELSE ST_Transform(e.geom,4326)
  END AS geom,
  e.edabbr,
  e.edname
FROM ed e, ocean o;

-- Check our geometry types...
SELECT DISTINCT ST_GeometryType(geom) FROM ed_clipped;

-- Oh, they are heterogeneous. Let's force them all multi
U
[...]
plpgsql_check

plpgsql_checkThere is always a need for profiling tools in databases for admins or developers. While it is easy to understand the point where an SQL is spending more time using

EXPLAIN
or
EXPLAIN ANALYZE
in PostgreSQL, the same would not work for functions. Recently, Jobin has published a blog post where he demonstrated how plprofiler can be useful in profiling functions.
plprofiler
builds call graphs and creates flame graphs which make the report very easy to understand. Similarly, there is another interesting project called
plpgsql_check
which can be used for a similar purpose as
plprofiler
, while it also looks at code and points out compilation errors. Let us see all of that in action, in this blog post.

Installing plpgsql-check

You could use yum on RedHat/CentOS to install this extension from PGDG repository. Steps to perform source installation on Ubuntu/Debian are also mentioned in the following logs.

On RedHat/CentOS

$ sudo yum install plpgsql_check_11

On Ubuntu/Debian

$ sudo apt-get install postgresql-server-dev-11 libicu-dev gcc make 
$ git clone https://github.com/okbob/plpgsql_check.git 
$ cd plpgsql_check/ 
$ make && make install
Creating and enabling this extension

There are 3 advantages of using

plpgsql_check
  1. Checking for compilation errors in a function code
  2. Finding dependencies in functions
  3. Profiling functions

When using plpgsql_check for the first 2 requirements, you may not need to add any entry to

shared_preload_libraries
. However, if you need to use it for profiling functions (3), then you should add appropriate entries to
shared_preload_libraries
 so that it could load both
plpgsql
and
plpgsql_check
. Due to dependencies,
plpgsql
must be before
plpgsql_check
in the
shared_preload_libraries
config as you see in the following example :
shared_preload_libraries = plpgsql, plpgsql_check

Any change to

shared_preload_libraries
requires a restart. You may see the following error when you do not have
plpgsql
before
plpgsql_check
in the
shared_preload_libraries
config.
$ g
[...]
Posted by Ibrar Ahmed in Percona on 2019-07-30 at 13:31
Parallelism in PostgreSQL

Parallelism in PostgreSQLPostgreSQL is one of the finest object-relational databases, and its architecture is process-based instead of thread-based. While almost all the current database systems utilize threads for parallelism, PostgreSQL’s process-based architecture was implemented prior to POSIX threads. PostgreSQL launches a process “postmaster” on startup, and after that spans new process whenever a new client connects to the PostgreSQL.

Before version 10 there was no parallelism in a single connection. It is true that multiple queries from the different clients can have parallelism because of process architecture, but they couldn’t gain any performance benefit from one another. In other words, a single query runs serially and did not have parallelism. This is a huge limitation because a single query cannot utilize the multi-core. Parallelism in PostgreSQL was introduced from version 9.6. Parallelism, in a sense, is where a single process can have multiple threads to query the system and utilize the multicore in a system. This gives PostgreSQL intra-query parallelism. 

Parallelism in PostgreSQL was implemented as part of multiple features which cover sequential scans, aggregates, and joins.

Components of Parallelism in PostgreSQL

There are three important components of parallelism in PostgreSQL. These are the process itself, gather, and workers. Without parallelism the process itself handles all the data, however, when planner decides that a query or part of it can be parallelized, it adds a Gather node within the parallelizable portion of the plan and makes a gather root node of that subtree.  Query execution starts at the process (leader) level and all the serial parts of the plan are run by the leader. However, if parallelism is enabled and permissible for any part (or whole) of the query, then gather node with a set of workers is allocated for it. Workers are the threads that run in parallel with part of the tree (partial-plan) that needs to be parallelized. The relation’s blocks are divided amongst threads su

[...]
Posted by Paul Ramsey in PostGIS on 2019-07-30 at 08:00

One question that comes up often during our PostGIS training is “how do I do an overlay?” The terminology can vary: sometimes they call the operation a “union” sometimes an “intersect”. What they mean is, “can you turn a collection of overlapping polygons into a collection of non-overlapping polygons that retain information about the overlapping polygons that formed them?”

So an overlapping set of three circles becomes a non-overlapping set of 7 polygons.

Calculating the overlapping parts of a pair of shapes is easy, using the ST_Intersection() function in PostGIS, but that only works for pairs, and doesn’t capture the areas that have no overlaps at all.

How can we handle multiple overlaps and get out a polygon set that covers 100% of the area of the input sets? By taking the polygon geometry apart into lines, and then building new polygons back up.

Let’s construct a synthetic example: first, generate a collection of random points, using a Gaussian distribution, so there’s more overlap in the middle. The crazy math in the SQL below just converts the uniform random numbers from the random() function into normally distributed numbers.

CREATE TABLE pts AS
WITH rands AS (
  SELECT generate_series as id, random() AS u1, random() AS u2 FROM generate_series(1,100)
)
SELECT
  id,
  ST_SetSRID(ST_MakePoint(
    50 * sqrt(-2 * ln(u1)) * cos(2*pi()*u2),
    50 * sqrt(-2 * ln(u1)) * sin(2*pi()*u2)),4326) AS geom
FROM rands;

The result looks like this:

Now, we turn the points into circles, big enough to have overlaps.

CREATE TABLE circles AS
SELECT id, ST_Buffer(geom, 10) AS geom FROM pts;

Which looks like this:

Now it’s time to take the polygons apart. In this case we’ll take the exterior ring of the circles, using ST_ExteriorRing(). If we were dealing with complex polygons with holes, we’d have to use ST_DumpRings(). Once we have the rings, we want to make sure that everywhere rings cross the lines are broken, so that no lines cross, they only touch at their e

[...]

A “composite index”, also known as “concatenated index”, is an index on multiple columns in a table. Many people are wondering, what is more beneficial: Using separate or using composite indexes? Whenever we do training, consulting or support this question is high up on the agenda and many people keep asking this question. Therefore, I decided to shed some light on this question.

Which indexes shall one create?

To discuss the topic on a more practical level, I created a table consisting of three columns. Then I loaded 1 million rows and added a composite index covering all three columns:

test=# CREATE TABLE t_data (a int, b int, c int);
CREATE TABLE
test=# INSERT INTO t_data 
                SELECT random()*100000, 
                        random()*100000, 
                        random()*100000 
                FROM generate_series(1, 1000000);
INSERT 0 1000000
test=# CREATE INDEX idx_data ON t_data(a, b, c);
CREATE INDEX

The layout of the table is therefore as follows:

test=# \d t_data
               Table "public.t_data"
 Column |  Type   | Collation | Nullable | Default 
--------+---------+-----------+----------+---------
 a      | integer |           |          | 
 b      | integer |           |          | 
 c      | integer |           |          | 
Indexes:
    "idx_data" btree (a, b, c)

Let us run ANALYZE now to ensure that optimizer statistics are there. Usually autovacuum will kick in and create statistics for your table, but to make sure running ANALYZE does not hurt in this case.

test=#  ANALYZE t_data;
ANALYZE

PostgreSQL will rearrange filters for you

The first important thing to observe is that PostgreSQL will try to arrange the filters in your query for you. The following query will filter on all indexed columns:

test=# explain SELECT * 
       FROM   t_data 
       WHERE  c = 10 
              AND b = 20 
              AND a = 10;
                   QUERY PLAN                                  
---------------------------------------------------
 Index Only Scan 
[...]