PostgreSQL
The world's most advanced open source database
Top posters
Number of posts in the past two months
Top teams
Number of posts in the past two months
Feeds
Planet
  • Policy for being listed on Planet PostgreSQL.
  • Add your blog to Planet PostgreSQL.
  • List of all subscribed blogs.
  • Manage your registration.
Contact
  • Get in touch with the Planet PostgreSQL administrators at planet at postgresql.org.

PostgreSQL table partitioning is by far one of the most popular new PostgreSQL features. Many people look to partitioned tables as a way to improve performance and broaden scalability. However, partitioning comes with a little problem: How can you partition an existing table without locking up the database? The answer is: pg_rewrite can help you with PostgreSQL table partitioning. Here you will learn how to implement pg_rewrite to help you solve partitioning problems in the most elegant way possible.

Installing pg_rewrite

pg_rewrite is Open Source and can be quickly and easily downloaded from our Github profile. Cloning the repository works as shown in the next listing:

hs@fedora src]$ git clone https://github.com/cybertec-postgresql/pg_rewrite.git

Cloning into 'pg_rewrite'...

remote: Enumerating objects: 22, done.

remote: Counting objects: 100% (22/22), done.

remote: Compressing objects: 100% (17/17), done.

remote: Total 22 (delta 3), reused 22 (delta 3), pack-reused 0

Receiving objects: 100% (22/22), 44.51 KiB | 1.78 MiB/s, done.

Resolving deltas: 100% (3/3), done.

Once this is done, we can enter the directory:

[hs@fedora src]$ cd pg_rewrite/

To build the code, we have to make sure that the correct version of pg_config is in the path. In my case, I have a working binary in my home directory, so I can effortlessly build the code:

[hs@fedora pg_rewrite]$ which pg_config

~/pg14/bin/pg_config

If pg_config is in the path, all you have to do is to run “make install” to compile and install the code:

[hs@fedora pg_rewrite]$ make install

gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Werror=vla -Wendif-labels -Wmissing-format-attribute -Wimplicit-fallthrough=3 -Wcast-function-type -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -Wno-format-truncation -Wno-stringop-truncation -g -O2 -fPIC -I. -I./ -I/home/hs/pg14/include/postgresql/server -I/home/hs/pg14/include/postgresql/internal  -D_GNU_SOURCE 
[...]
Posted by Michael Christofides on 2021-12-06 at 17:36

Cover photo: Clark Young

We all know that indexing is important, but it can be difficult to know where to start. In this post, my aim is to collate some of the best advice I’ve seen for indexing Postgres, and simplify it for folks just getting started. 

In a previous post, we covered a very basic introduction to indexing, so I won’t go over that again here. However, I will start with a super quick reminder on why we index. 

Why index anything at all?

Postgres is perfectly able to run queries without indexes, by scanning tables and filtering out results.

But, as our data grows, this gets more and more inefficient, and as a result, slower. Indexes allow us to look up the rows we need efficiently, and therefore quickly.

With that in mind, let’s take a look at some do’s and don’ts.

Some indexing best practices

1. Don’t index every column

If indexes are so useful, why don’t we add them to every column? 

There are a few good reasons, but a big one is that indexes add overhead to writes. Not only do indexes need to be kept up to date when we write to the table, but they can also prevent optimizations like HOT (Heap-Only Tuple) updates.

As such, the more write-heavy a table is, the more carefully you should think about the benefit of adding an index.

If you’ve inherited a database with too many indexes, you can start to understand how much each one is used with the view pg_stat_user_indexes. Remember to check any read replicas too!

2. Index columns that you filter on

The best candidates for indexes are columns that you filter on regularly. Primary keys are indexed by Postgres automatically, but foreign keys, and other columns, are not.

Let’s consider a simple users table with only two columns:

CREATE TABLE users (
    id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    email text NOT NULL
);
INSERT INTO users
(email)
select n || '@gmail.com'
from generate_series(1, 100000) AS n;
ANALYZE users;

If we want to look up a user by their email address, here

[...]
Posted by Andreas 'ads' Scherbaum on 2021-12-06 at 14:00
PostgreSQL Person of the Week Interview with Emre Hasegeli: I was born and grew up in İzmir, Turkey, studied in İstanbul, lived and worked in Germany and in the UK for a while, and moved back to my hometown this year. I am currently working remotely for End Point, a US based software consultancy company which develops Bucardo.
Posted by Luca Ferrari on 2021-12-06 at 00:00

How to kill a backend process, the right way!

kill that backend!

Sometimes it happens: you need, as a DBA, to be harsh and terminate a backend, that is a user connection.
There are two main ways to do that:

  • use the operating system kill(1) command to, well, kill such process;
  • use PostgreSQL administrative functions like pg_terminate_backend() or the more polite `pg_cancel_backend()**.

PostgreSQL pg_cancel_backend() and pg_terminate_backend()

What is the difference between the two functions?
Quite easy to understand: pg_cancel_backend() sends a SIGINT to the backend process, that is it asks politely to exit. It is the equivalent of a standard kill -INT against the process.
But, what does it mean to aks politely to exit? It means to cancel the current query, that is it does not terminates the user session, rather the user interaction. That is why it is mapped to SIGINT, the equivalent to CTRL-c (interrupt by keyboard).
On the other hand, pg_terminate_backend() sends a SIGTERM to the process, that is equivalent to kill -TERM and forces brutally the process to exit.

Now, Kill it!

Which method should you use?
If you are absolutely sure about what you are doing, you can use whatever method you want!
But sometimes caffeine is at a too low level in your body to do it right, you should use the PostgreSQL way! There are at least two good reasons to use the PostgreSQL administrative functions:

  • you don’t need access to the server, i.e., you don’t need an operating system shell;
  • you will not accidentally kill another process.


The first reason is really simple to understand, and improves security about the machine hosting PostgreSQL, at least in my opinion.
The second reason is a little less obvious, and relies on the fact that pg_cancel_backends() and pg_terminate_backend() act only against processes within the PostgreSQL space, that is only processes spawn by the postmaster.
Let’s see this in action: imagine we select the wrong process to kill, like 17

[...]
Posted by Luca Ferrari on 2021-12-06 at 00:00

A not-scientific look at how to compress a set of SQL dumps.

pgdump, text and xz

I have a database that contains around 50 GB of data. I do continuos backup thru pgBackRest, I also do regular pg_dump in directory format via multiple jobs, so I’m fine with backups.
However, why not have a look at SQL backups?
First of all: the content of the database is mostly numeric, being a quite large container of sensors data. This means that the data should be very good for compression.
Moreover, tables are partitioned on a per-year and per-month basis, therefore I have a regular structure with one year table and twelve month childrens. For instance, in the current year there is a table named y2021 with other partitions named y2021m01 thru y2021m12.

pg_dump in text mode

I did a simple for loop in my shell to produce a few backup files, separating every single file by its year:



% for y in $(echo 2018 2019 2020 2021 2022 ); do
echo "Backup year $y"
time pg_dump -h miguel -U postgres -f sensorsdb.$y.sql -t "respi.y${y}*" sensorsdb
done



This produce the following amount of data:



% ls -sh1 *.sql     
3,5G sensorsdb.2018.sql
 13G sensorsdb.2019.sql
 12G sensorsdb.2020.sql
 10G sensorsdb.2021.sql
 20K sensorsdb.2022.sql

The following is a table that summarizes the file size and the time required to create it:


year SQL size time
2018 3.5 GB 7 minutes
2019 13 GB 20 minutes
2020 12 GB 20 minutes
2021 10 GB 17 minutes


Compress them!

Use xz with the default settings, that according to my installation is a compression level 6:



% for y in $(echo 2018 2019 
[...]
Posted by Regina Obe in PostGIS on 2021-12-04 at 00:00

The PostGIS Team is pleased to release the third beta of the upcoming PostGIS 3.2.0 release.

Best served with PostgreSQL 14. This version of PostGIS can utilize the faster GiST building support API introduced in PostgreSQL 14. If compiled with recently released GEOS 3.10.1 you can take advantage of improvements in ST_MakeValid and numerous speed improvements. This release also includes many additional functions and improvements for postgis, postgis_raster and postgis_topology extensions and a new input/export format FlatGeobuf.

Continue Reading by clicking title hyperlink ..
Posted by Paul Ramsey in Crunchy Data on 2021-12-03 at 17:07

One of the curious aspects of spatial indexes is that the nodes of the tree can overlap, because the objects being indexed themselves also overlap.

Podman is a drop in replacement for Docker, and can handle containers daemonless and rootless ("ruthless"?). Containers work based on cgroups, namespaces and IPC, which is existing in Linux, and therefore requires a linux system to support it (which is based on Fedora CoreOS, and runs in QEMU).

Setup

Much of the configuration depends on the existence of 'brew' on OSX. If you haven't got brew (homebrew) installed, you can do so using:

/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

In order to run the podman machine, the podman software needs to be installed (step 1), a virtual machine for running podman on linux needs to be created (step 2), and run (step 3).

1.Install podman

brew install podman

2.Initialize podman machine

podman machine init

3.Start podman machine

podman machine start

Verify podman machine

Because the podman machine must run before it can run containers, it is useful to understand if the podman machine is running. This is done with 'podman machine list':

Up:

podman machine list
NAME      VM TYPE     CREATED       LAST UP           CPUS    MEMORY     DISK SIZE
podman-m* qemu        17 hours ago  Currently running 1       2.147GB    10.74GB

Down:

podman machine list
NAME      VM TYPE     CREATED       LAST UP           CPUS    MEMORY      DISK SIZE
podman-m* qemu        17 hours ago  3 seconds ago     1       2.147GB     10.74GB

Containers & yugabyte

This setup is ideal for developers who want an easy way to setup YugabyteDB without all the hassle of configuration.

Any type of work with podman with containers requires the podman machine to be running. The podman machine is what actually performs the container commands.

For any type of coordinated work it's important to select a version to work on for the software you are using. Using the latest version can be a different version in time, and can cause version sprawl, so I would strongly recommend alw

[...]
Posted by Jimmy Angelakos in EDB on 2021-12-03 at 15:30

It's easy to get misled into overconfidence based on the performance of powerful servers, given today's monster core counts and RAM sizes.
However, the reality of high concurrency usage is often disappointing, with less throughput than one would expect.
Because of its internals and its multi-process architecture, PostgreSQL is very particular about how it likes to deal with high concurrency and in some cases it can slow down to the point where it looks like it's not performing as it should.
In this talk we'll take a look at potential pitfalls when you throw a lot of work at your database. Specifically, very high concurrency and resource contention can cause problems with lock waits in Postgres. Very high transaction rates can also cause problems of a different nature.
Finally, we will be looking at ways to mitigate these by examining our queries and connection parameters, leveraging connection pooling and replication, or adapting the workload.

Video from my talk at this year's Postgres Build 👇

Accidental wisdom: "You can't avoid Postgres" -Jimmy

You can find the slides from the talk here.

Adding, tuning and removing indexes is an essential part of maintaining an application that uses a database. Oftentimes, our applications rely on sophisticated database features and data types, such as JSONB, array types or full text search in Postgres. A simple B-tree index does not work in such situations, for example to index a JSONB column. Instead, we need to look beyond, to GIN indexes. Almost 15 years ago to the dot, GIN indexes were added in Postgres 8.2, and they have since become an…
Posted by Amit Kapila in Fujitsu on 2021-12-01 at 23:00

I would like to talk about the key features in PostgreSQL 14, and what is being discussed in the community for PostgreSQL 15 and beyond.

Posted by Elizabeth Garrett Christensen in Crunchy Data on 2021-12-01 at 20:42

Crunchy Data hosted the third annual PostGIS Day on November 18th.This was our second year with a virtual format and another year of record attendance! We had attendees from more than 99 countries.

Posted by Miranda Auhl in Timescale on 2021-12-01 at 14:59

Introduction

During analysis, you rarely - if ever - get to go directly from evaluating data to transforming and analyzing it. Sometimes to properly evaluate your data, you may need to do some pre-cleaning before you get to the main data cleaning, and that’s a lot of cleaning! In order to accomplish all this work, you may use Excel, R, or Python, but are these the best tools for data cleaning tasks?

In this blog post, I explore some classic data cleaning scenarios and show how you can perform them directly within your database using TimescaleDB and PostgreSQL, replacing the tasks that you may have done in Excel, R, or Python. TimescaleDB and PostgreSQL cannot replace these tools entirely, but they can help your data munging/cleaning tasks be more efficient and, in turn, let Excel, R, and Python shine where they do best: in visualizations, modeling, and machine learning.  

Cleaning is a very important part of the analysis process and generally can be the most grueling from my experience! By cleaning data directly within my database, I am able to perform a lot of my cleaning tasks one time rather than repetitively within a script, saving me considerable time in the long run.

A recap of the data analysis process

I began this series of posts on data analysis by presenting the following summary of the analysis process:

Image showing Evaluate -> Clean -> Transform -> Model, accompanied by icons which relate to each step
Data Analysis Lifecycle

The first three steps of the analysis lifecycle (evaluate, clean, transform) comprise the “data munging” stages of analysis. Historically, I have done my data munging and modeling all within Python or R, these being excellent options for analysis. However, once I was introduced to PostgreSQL and TimescaleDB, I found how efficient and fast it was to do my data munging directly within my database. In my previous post, I focused on showing data evaluation techniques and how you can replace tasks previously done in Python with PostgreSQL and TimescaleDB code. I now want to move on to the second step, data cleaning. Cleaning may not be the mos

[...]
Posted by Ryan Lambert on 2021-12-01 at 05:01

PostGIS is a widely popular spatial database extension for Postgres. It's also one of my favorite tools! A recent discussion on the People, Postgres, Data Discord server highlighted that the permissions required for various PostGIS operations were not clearly explained in the PostGIS documentation. As it turned out, I didn't know exactly what was required either. The basic PostGIS install page provides resources for installing the binary on the server and the basic CREATE EXTENSION commands, but does not explain permissions required.

This post explores the permissions required for three types of PostGIS interactions:

  • Install/Create PostGIS
  • Use PostGIS
  • Load data from pg_dump

Database and Users

I am using Postgres installed on my laptop for these tests, Postgres 13.5 and PostGIS 3.1. I created an empty database named postgis_perms and check the \du slash command in psql to see the current roles. This instance has my my ryanlambert role, a superuser, and the default postgres role. The postgres role is not used in this post outside of this example.

([local] 🐘) ryanlambert@postgis_perms=# \du
                                     List of roles
┌─────────────┬────────────────────────────────────────────────────────────┬───────────┐
│  Role name  │                         Attributes                         │ Member of │
╞═════════════╪════════════════════════════════════════════════════════════╪═══════════╡
│ postgres    │ Superuser, Create role, Create DB, Replication, Bypass RLS │ {}        │
│ ryanlambert │ Superuser, Create role, Create DB                          │ {}        │
└─────────────┴────────────────────────────────────────────────────────────┴───────────┘

Most of my readers will know about primary keys and all kinds of table constraints. However, only a few of you may have ever thought about the difference between a primary key and a UNIQUE constraint. Isn’t it all just the same? In both cases, PostgreSQL will create an index that avoids duplicate entries. So what is the difference? Let’s dig in and find out…

What primary keys and UNIQUE constraints do

The following example shows both a primary key and a unique constraint:

test=# CREATE TABLE t_sample (a int PRIMARY KEY, b int UNIQUE);

CREATE TABLE

test=# \d t_sample

            Table "public.t_sample"
 Column |  Type   | Collation | Nullable | Default
--------+---------+-----------+----------+---------
 a      | integer |           | not null | 
 b      | integer |           |          | 

Indexes:
    "t_sample_pkey" PRIMARY KEY, btree (a)
    "t_sample_b_key" UNIQUE CONSTRAINT, btree (b)

The really important observation is that both features make PostgreSQL create an index. This is important because people often use additional indexes on primary keys or unique columns. These additional indexes are not only unnecessary, but actually counterproductive.

The key to success: NULL handling

What makes a primary key different from a unique index is the way NULL entries are handled. Let’s take a look at a simple example:

test=# INSERT INTO t_sample VALUES (1, NULL);

INSERT 0 1

The example above works perfectly. PostgreSQL will accept the NULL value for the second column. As long as the primary key contains a unique value, we are OK. However, if that changes, then an error will occur:

test=# INSERT INTO t_sample VALUES (NULL, 2);

ERROR:  null value in column "a" of relation "t_sample" violates not-null constraint
DETAIL:  Failing row contains (null, 2).

This is actually the single biggest difference between these two types of constraints. Keep that in mind.

Using foreign keys

The next logical question which arises is: What does that mean for foreign keys? Does it make a

[...]
Posted by Robert Haas in EDB on 2021-11-29 at 20:39
When PostgreSQL needs to sort strings, it relies on either the operating system (by default) or the ICU collation library (if your PostgreSQL has been built with support for ICU and you have chosen to use an ICU-based collation) to tell it in what order the strings ought to be sorted. Unfortunately, operating system behaviors are confusing and inconsistent, and they change relatively frequently for reasons that most people can't understand. That's a problem for PostgreSQL users, especially PostgreSQL users who create indexes on text columns. The first step in building a btree index to sort the data, and if this sort order differs from the one used for later index lookups, data that is actually present in the index may not be found, and your queries may return wrong answers.Read more »
Posted by Franck Pachot in Yugabyte on 2021-11-29 at 17:42

This post exposes some basic examples for one of the greatest feature of PostgreSQL: text search. In standard SQL, we can use LIKE, SIMILAR and REGEXP. But general text search cannot be optimized with simple B-Tree indexes on the column value. Text contain words, and indexing the text as a whole is not sufficient. Fortunately, PostgreSQL provides many index types, and one of them is GIN - Generalized Inverted Index. We can index the words, automatically, with functions to extract them as "tsvector" - Text Search vectors.

If you are an advanced user of PostgreSQL, you will probably not learn new things here. Except if you want to see how I use it, with views and stored function to encapsulate the functions. If you are a user of Oracle or SQL Server, you know the idea but may be surprised by how it is easy to use in an Open Source database. If you are a user of ElasticSearch, you may see that for simple searches, SQL databases can provide this without an additional service.

My goal here is to show that we can use the same on the latest version of Yugabyte (I'm using 2.11 there). YugabyteDB is a distributed SQL database that reuses the PostgreSQL query layer, which means that many features come without additional effort. However, the distributed storage is different from the monolithic postgres, using LSM Tree instead of B-Tree and Heap tables. The YugabyteDB YBGIN is similar to YugabyteDB GIN, but implemented on top of LSM Tree indexes.

PostgreSQL: HEAP, BTREE and GIN

In PostgreSQL, here is how you define an HEAP table and a GIN index:

postgres=# create table demo
           (id bigint primary key, description text)
           USING HEAP;
CREATE TABLE

postgres=# create index demo_index on demo
           ( length(description) );
CREATE INDEX

postgres=# create index demo_gin on demo
           USING GIN
           ( (to_tsvector('simple',description)) );
CREATE INDEX

postgres=# select relname,reltype,amname 
           from pg_class left outer join pg_am
           on pg_class.relam=pg_am.
[...]
PostgreSQL-14 Database Monitoring and Logging Enhancements

PostgreSQL-14 was released in September 2021, and it contained many performance improvements and feature enhancements, including some features from a monitoring perspective. As we know, monitoring is the key element of any database management system, and PostgreSQL keeps updating and enhancing the monitoring capabilities. Here are some key ones in PostgreSQL-14.

Query Identifier

Query identifier is used to identify the query, which can be cross-referenced between extensions. Prior to PostgreSQL-14, extensions used an algorithm to calculate the query_id. Usually, the same algorithm is used to calculate the query_id, but any extension can use its own algorithm. Now, PostgreSQL-14 optionally provides a query_id to be computed in the core. Now PostgreSQL-14’s monitoring extensions and utilities like pg_stat_activity, explain, and in pg_stat_statments use this query_id instead of calculating its own. This query_id can be seen in csvlog, after specifying in the log_line_prefix. From a user perspective, there are two benefits of this feature.

  • All the utilities/extensions will use the same query_id calculated by core, which provides an ease to cross-reference this query_id. Previously, all the utilities/extensions needed to use the same algorithm in their code to achieve this capability.
  • The second benefit is extension/utilities can use calculated query_id and don’t need to again, which is a performance benefit.

PostgreSQL introduces a new GUC configuration parameter compute_query_id to enable/disable this feature. The default is auto; this can be turned on/off in postgresql.conf file, or using the SET command.

  • pg_stat_activity

SET compute_query_id = off;

SELECT datname, query, query_id FROM pg_stat_activity;
 datname  |                                 query                                 | query_id 
----------+-----------------------------------------------------------------------+----------
 postgres | select datname, query, query_id from pg_stat_activity;                | 
[...]

The case

I have been involved into a case where an important application has been behaving erratically in regards of execution time. This is the background.

The application is customised installation of a very-well-known analtics suite in use at a very large shipping and logstics Dutch company. This has been deployed in AWS with the back storage provided by a rather large Postgres RDS both in acceptance and in production environment, with the former being a slimmed down installation when compared to the latter. The UI is web-based.
The company I work with is in charge for the cloud infra, and that alone.

The particular issue happens when browsing a list of postal addresses. This very function is used to browse and search among about 9 million postal addresses all around the country (the Netherlands for the records). To my personal opinion and experience this is not a really darge data set. At the very start, this function displays the first 100 addresses with the usual search boxes, paging widgets and a number that represents the overall counter. You can click on an address and get an expanded view. For some unclear reasons the problem has been moved to the company responsible for the infrastructure.

In acceptance the function seems to work quite well 95% of the times: the first page appears almost instantaneously but 5 times out of 100 it can take up to two minutes (yes, 120 seconds!) before displaying any data. In production, counterintuitively, it’s the other way around: the startup takes a very log time to display the first page of data. Subsequent pages appear almost instantaneously in either environment.

The whole investigation was time-boxed to just 1 working day.

First ask for the logs!

As usual I tried to gather as much details as possible about the application itself. It’s a name I have heard and seens in almost all my previous working experiences from the past 10 years, but this is my very first time to look at it this deep inside.
According to the customer the specif

[...]
Posted by Andreas 'ads' Scherbaum on 2021-11-29 at 06:00
PostgreSQL Person of the Week Interview with Fabien Coelho: I’m French, born and raised in Paris over 50 years ago. I work in Fontainebleau and Paris, and live in the Centre Val de Loire region, along the Loire river.

Introduction

In a previous post, I explained how we use transition tables in our implementation of Incremental View Maintenance (IVM) on PostgreSQL. Transition table is a features of AFTER trigger which allows trigger functions to refer to the changes of a table that occurred in a statement. We are using transition tables in order to extract table changes needed to calculate changes to be applied on views .

In this article I describes a more complicated situation, specifically how we handle transition tables when multiple tables are modified in a statement.

Single Table Modification

In a case where a single table is modified in a statement, the view maintenance process is simple. For example, suppose we have three tables R, S, and T. We also define a materialized view V = R ⨝ S ⨝ T that joins these tables as bellow:

SELECT x,y,z FROM R,S,T WHERE R.i=S.i AND S.j=T.j;

Then, suppose that table R was modified in a statement. This operation can be written as R ← R ∸ ∇R ⊎ ΔR, where ∇R is a bag of tuples deleted from R, and ΔR is a bag of tuples inserted into R in this statement. In this case, the changes are calculated as ∇V = ∇R ⨝ S ⨝ T and ΔV = ΔR ⨝ S ⨝ T, and we can update the view as V ← V ∸ ∇V ⊎ ΔV. The SQL representation of these calculations is as follows:

-- ∇V: tuples to be deleted from the view 
SELECT x,y,z FROM R_old,S,T WHERE R_old.i=S.i AND S.j=T.j;
-- ΔV: tuples to be inserted into the view
SELECT x,y,z FROM R_new,S,T WHERE R_new.i=S.i AND S.j=T.j;

where R_old and R_new are transition tables corresponding to ∇R and ΔR, respectively.

Multiple Tables Modification

Now, let’s see cases where multiple tables are modified in a statement. You can observe it when you use modifying CTEs (WITH clause), like:

WITH i1 AS (INSERT INTO R VALUES(1,10) RETURNING 1), 
i2 AS (INSERT INTO S VALUES(1,100) RETURNING 1)
SELECT;

In addition, multiple tables can be updated when you use triggers, or foreign key constraint.

Pre-Update State of Tables

At that time, we need the state

[...]
Posted by David Z in Highgo Software on 2021-11-27 at 00:31

1. Overview

I have been working on an internal project based on PostgreSQL for a while, and from time to time, I need to run some specific test cases to verify my changes. Here, I want to shared a tip to run a specific regression TAP test quickly, especially, when you are focusing on a particular bug and you know which test case can help verify the fix. A details document about the regression test can be found at Running the Tests.

2. Regression test

PostgreSQL provides a comprehensive set of regression tests to verify the SQL implementation embedded in PostgreSQL as well as the extended capabilities of PostgreSQL. Whenever you make some changes, you should run these existing test cases to make sure your change doesn’t break any existing features. Other than these regression tests, there are some special features using a test framework call TAP test. For example, kerberos, ssl, recovery etc.

If you want to run these tests, you have to make sure the option --enable-tap-tests has been configured. for example,
./configure --prefix=$HOME/pgapp --enable-tap-tests --enable-debug CFLAGS="-g3 -O0 -fno-omit-frame-pointer"

You can run the TAP test using either make check or make installcheck, but compared with those non-TAP tests, the different is that these TAP tests will always start a test server even you run make installcheck. Because of this different, some tests may take a longer time than you expected, and even worse, if some test cases failed in the middle then the entire test will stop, and your test cases may never get the chance to run. For example, I changed somethings related to the recovery features, and those changes suppose to be tested by test cases 021_row_visibility.pl and 025_stuck_on_old_timeline.pl, but whenever I run make check or make installcheck, it ends up with something like below.

t/001_stream_rep.pl .................. ok     
t/002_archiving.pl ................... ok   
t/003_recovery_targets.pl ............ ok   
t/004_timeline_switch.pl ............. ok   
t/005_rep
[...]
On 3rd of August 2021, Tom Lane committed patch: Add assorted new regexp_xxx SQL functions.   This patch adds new functions regexp_count(), regexp_instr(), regexp_like(), and regexp_substr(), and extends regexp_replace() with some new optional arguments. All these functions follow the definitions used in Oracle, although there are small differences in the regexp language due to using … Continue reading "Waiting for PostgreSQL 15 – Add assorted new regexp_xxx SQL functions."

An ugly way to introspect database changes.

Monitoring Schema Changes via Last Commit Timestamp

A few days ago, a colleague of mine shown to me that a commercial database keeps track of last DDL change timestamp against database objects.
I began to mumble… is that possible in PostgreSQL? Of course it is, but what is the smartest way to achieve it?
I asked on the mailing list, because the first idea that came into my mind was to use commit timestamps.
Clearly, it is possible to implement something that can do the job using event triggers, that in short are triggers not attached to table tuples rather to database event like DDL commands. Great! And in fact, a very good explaination can be found here.
In this article, I present my first idea about using commit timestamps.
The system used for the test is PostgreSQL 13.4 running on Fedora Linux, with only myself connected to it (this simplifies following transactions). The idea is, in any case, general and easy enough to be used on busy systems.

Introduction to pg_last_committed_xact()

The special function pg_last_committed_xact() allows the database administrator (or an user) to get information about which transaction has committed last.
Let’s see this in action:



% psql -U luca -h miguel -c 'select pg_last_committed_xact();'   testdb
ERROR:  could not get commit timestamp data
HINT:  Make sure the configuration parameter "track_commit_timestamp" is set.



First of all in order to get information about the committed transaction timestamps, there must be the option track_commit_timestamp configured.
Turning on and off the parameter will not provide historic data, that is even if you had the parameter on and then you turned off, you will not be able to access collected data.
Let’s turn on the parameter and see how it works. The track_commit_timestamp is a parameter with the postmaster context, and therefore requires a server restart!



% psql -U postgres -h miguel \
       -c 'ALTER SYSTEM SET track_com
[...]
Posted by Regina Obe in PostGIS on 2021-11-26 at 00:00

The PostGIS Team is pleased to release the second beta of the upcoming PostGIS 3.2.0 release.

Best served with PostgreSQL 14. This version of PostGIS utilizes the faster GiST building support API introduced in PostgreSQL 14. If compiled with recently released GEOS 3.10.1 you can take advantage of improvements in ST_MakeValid and numerous speed improvements. This release also includes many additional functions and improvements for postgis_raster and postgis_topology extensions.

Continue Reading by clicking title hyperlink ..
Posted by Luca Ferrari on 2021-11-24 at 00:00

pgenv 1.2.1 introduces a different configuration setup.

pgenv config migrate

Just a few hours I blogged about some new cool features in pgenv, I completed the work about configuration in one place.
Now pgenv will keep all configuration files into a single directory, named config . This is useful because it allows you to backup and/or migrate all the configuration from one machine to another easily.
But it’s not all: since the configuration is now under a single directory, the single configuration file name has changed. Before this release, a configuration file was named like .pgenv.PGVERSION.conf, with the .pgenv prefix that both made the file hidden and stated to which application such file belongs to. Since the configuration files are now into a subdirectory, the prefix has been dropped, so that every configuration file is now simply named as PGVERSION.conf, like for example 10.4.conf.
And since we like to make things easy, there is a config migrate command that helps you move your existing configuration from the old naming scheme to the new one:



% pgenv config migrate
Migrated 3 configuration file(s) from previous versions (0 not migrated)
Your configuration file(s) are now into [~/git/misc/PostgreSQL/pgenv/config]



Let’s have fun with pgenv!

One of the many reasons "the answer is Postgres" is due to its extensibility.

The ability to extend Postgres has given rise to an ecosystem of Postgres extensions that change the behavior of the database to support a wide range of interesting capabilities. At Crunchy Data we are big fans of PostGIS, the geospatial extender for Postgres.

Another extension we are asked about often is TimescaleDB.

TimescaleDB is an open-source extension designed to make SQL scalable for time-series data. Timescale, Inc., the company behind TimescaleDB, provides an Apache licensed community edition of TimescaleDB that is packaged as a Postgres extension that provides automated partitioning across time and space.

We are often asked about the potential to deploy the Apache licensed community edition of TimescaleDB as an extension within our Crunchy PostgreSQL for Kubernetes using PGO, the open source Postgres Operator. We announced that we added the Apache licensed "community edition" of TimescaleDB to PGO 4.7, and we have brought TimescaleDB into PGO v5.

Let us look at how you can deploy the TimescaleDB extension as part of an HA Postgres cluster native to Kubernetes using the PGO Postgres Operator.

Deploying TimescaleDB on Kubernetes with PGO

good (?) reasons to use an entity-attribute-value design
© Laurenz Albe 2021

Customers have often asked me what I think of “Entity-attribute-value” (EAV) design. So I thought it would be a good idea to lay down my opinion in writing.

What is entity-attribute-value design?

The idea is not to create a table for each entity in the application. Rather, you store each attribute as a separate entry in an attribute table:

CREATE TABLE objects (
   objectid bigint PRIMARY KEY
   /* other object-level properties */
);

CREATE TABLE attstring (
   objectid bigint
      REFERENCES objects ON DELETE CASCADE NOT NULL,
   attname text NOT NULL,
   attval text,
   PRIMARY KEY (objectid, attname)
);

CREATE TABLE attint (
   objectid bigint
      REFERENCES objects ON DELETE CASCADE NOT NULL,
   attname text NOT NULL,
   attval integer,
   PRIMARY KEY (objectid, attname)
);

/* more tables for other data types */

The name of the model is derived from the “att...” tables, which have the three columns: “entity ID”, “attribute name” and “value”.

There are several variations of the basic theme, among them:

  • omit the objects table
  • add additional tables that define “object types”, so that each type can only have certain attributes

Why would anybody consider an entity-attribute-value design?

The principal argument I hear in support of the EAV design is flexibility. You can create new entity types without having to create a database table. Taken to the extreme, each entity can have different attributes.

I suspect that another reason for people to consider such a data model is that they are more familiar with key-value stores than with relational databases.

Performance considerations of entity-attribute-value design

In my opinion, EAV database design is the worst possible design when it comes to performance. You will never get good database performance with such a data model.

The only use cases where EAV shines are when it is used as a key-value store.

INSERT

Inserting an entity will look like this:

INSERT INTO objects (o
[...]
Posted by Andreas 'ads' Scherbaum on 2021-11-22 at 14:00
PostgreSQL Person of the Week Interview with Pavel Luzanov: I live in Moscow, and work at Postgres Professional. I am responsible for educational projects.
Some time ago on Slack some person said: varchar is better (storage efficiency), i recommend using it for less than 2048 chars, for the best : TEXT There was discussion that followed, the person that claimed this efficiency never backed their claim, saying only that: VARChar takes much less ‘place' than TEXT … but have … Continue reading "Does varchar(n) use less disk space than varchar() or text?"