PostgreSQL
The world's most advanced open source database
Top posters
Number of posts in the past two months
Top teams
Number of posts in the past two months
Feeds
Planet
  • Policy for being listed on Planet PostgreSQL.
  • Add your blog to Planet PostgreSQL.
  • List of all subscribed blogs.
  • Manage your registration.
Contact
  • Get in touch with the Planet PostgreSQL administrators at planet at postgresql.org.
Posted by Regina Obe in PostGIS on 2021-12-04 at 00:00

The PostGIS Team is pleased to release the third beta of the upcoming PostGIS 3.2.0 release.

Best served with PostgreSQL 14. This version of PostGIS can utilize the faster GiST building support API introduced in PostgreSQL 14. If compiled with recently released GEOS 3.10.1 you can take advantage of improvements in ST_MakeValid and numerous speed improvements. This release also includes many additional functions and improvements for postgis, postgis_raster and postgis_topology extensions and a new input/export format FlatGeobuf.

Continue Reading by clicking title hyperlink ..
Posted by Paul Ramsey in Crunchy Data on 2021-12-03 at 17:07

One of the curious aspects of spatial indexes is that the nodes of the tree can overlap, because the objects being indexed themselves also overlap.

Podman is a drop in replacement for Docker, and can handle containers daemonless and rootless ("ruthless"?). Containers work based on cgroups, namespaces and IPC, which is existing in Linux, and therefore requires a linux system to support it (which is based on Fedora CoreOS, and runs in QEMU).

Setup

Much of the configuration depends on the existence of 'brew' on OSX. If you haven't got brew (homebrew) installed, you can do so using:

/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

In order to run the podman machine, the podman software needs to be installed (step 1), a virtual machine for running podman on linux needs to be created (step 2), and run (step 3).

1.Install podman

brew install podman

2.Initialize podman machine

podman machine init

3.Start podman machine

podman machine start

Verify podman machine

Because the podman machine must run before it can run containers, it is useful to understand if the podman machine is running. This is done with 'podman machine list':

Up:

podman machine list
NAME      VM TYPE     CREATED       LAST UP           CPUS    MEMORY     DISK SIZE
podman-m* qemu        17 hours ago  Currently running 1       2.147GB    10.74GB

Down:

podman machine list
NAME      VM TYPE     CREATED       LAST UP           CPUS    MEMORY      DISK SIZE
podman-m* qemu        17 hours ago  3 seconds ago     1       2.147GB     10.74GB

Containers & yugabyte

This setup is ideal for developers who want an easy way to setup YugabyteDB without all the hassle of configuration.

Any type of work with podman with containers requires the podman machine to be running. The podman machine is what actually performs the container commands.

For any type of coordinated work it's important to select a version to work on for the software you are using. Using the latest version can be a different version in time, and can cause version sprawl, so I would strongly recommend alw

[...]
Posted by Jimmy Angelakos in EDB on 2021-12-03 at 15:30

It's easy to get misled into overconfidence based on the performance of powerful servers, given today's monster core counts and RAM sizes.
However, the reality of high concurrency usage is often disappointing, with less throughput than one would expect.
Because of its internals and its multi-process architecture, PostgreSQL is very particular about how it likes to deal with high concurrency and in some cases it can slow down to the point where it looks like it's not performing as it should.
In this talk we'll take a look at potential pitfalls when you throw a lot of work at your database. Specifically, very high concurrency and resource contention can cause problems with lock waits in Postgres. Very high transaction rates can also cause problems of a different nature.
Finally, we will be looking at ways to mitigate these by examining our queries and connection parameters, leveraging connection pooling and replication, or adapting the workload.

Video from my talk at this year's Postgres Build πŸ‘‡

Accidental wisdom: "You can't avoid Postgres" -Jimmy

You can find the slides from the talk here.

Adding, tuning and removing indexes is an essential part of maintaining an application that uses a database. Oftentimes, our applications rely on sophisticated database features and data types, such as JSONB, array types or full text search in Postgres. A simple B-tree index does not work in such situations, for example to index a JSONB column. Instead, we need to look beyond, to GIN indexes. Almost 15 years ago to the dot, GIN indexes were added in Postgres 8.2, and they have since become an…
Posted by Amit Kapila in Fujitsu on 2021-12-01 at 23:00

I would like to talk about the key features in PostgreSQL 14, and what is being discussed in the community for PostgreSQL 15 and beyond.

Posted by Elizabeth Garrett Christensen in Crunchy Data on 2021-12-01 at 20:42

Crunchy Data hosted the third annual PostGIS Day on November 18th.This was our second year with a virtual format and another year of record attendance! We had attendees from more than 99 countries.

Posted by Miranda Auhl in Timescale on 2021-12-01 at 14:59

Introduction

During analysis, you rarely - if ever - get to go directly from evaluating data to transforming and analyzing it. Sometimes to properly evaluate your data, you may need to do some pre-cleaning before you get to the main data cleaning, and that’s a lot of cleaning! In order to accomplish all this work, you may use Excel, R, or Python, but are these the best tools for data cleaning tasks?

In this blog post, I explore some classic data cleaning scenarios and show how you can perform them directly within your database using TimescaleDB and PostgreSQL, replacing the tasks that you may have done in Excel, R, or Python. TimescaleDB and PostgreSQL cannot replace these tools entirely, but they can help your data munging/cleaning tasks be more efficient and, in turn, let Excel, R, and Python shine where they do best: in visualizations, modeling, and machine learning. Β 

Cleaning is a very important part of the analysis process and generally can be the most grueling from my experience! By cleaning data directly within my database, I am able to perform a lot of my cleaning tasks one time rather than repetitively within a script, saving me considerable time in the long run.

A recap of the data analysis process

I began this series of posts on data analysis by presenting the following summary of the analysis process:

Image showing Evaluate -> Clean -> Transform -> Model, accompanied by icons which relate to each step
Data Analysis Lifecycle

The first three steps of the analysis lifecycle (evaluate, clean, transform) comprise the β€œdata munging” stages of analysis. Historically, I have done my data munging and modeling all within Python or R, these being excellent options for analysis. However, once I was introduced to PostgreSQL and TimescaleDB, I found how efficient and fast it was to do my data munging directly within my database. In my previous post, I focused on showing data evaluation techniques and how you can replace tasks previously done in Python with PostgreSQL and TimescaleDB code. I now want to move on to the second step, data cleaning. Cleaning may not be the mos

[...]
Posted by Ryan Lambert on 2021-12-01 at 05:01

PostGIS is a widely popular spatial database extension for Postgres. It's also one of my favorite tools! A recent discussion on the People, Postgres, Data Discord server highlighted that the permissions required for various PostGIS operations were not clearly explained in the PostGIS documentation. As it turned out, I didn't know exactly what was required either. The basic PostGIS install page provides resources for installing the binary on the server and the basic CREATE EXTENSION commands, but does not explain permissions required.

This post explores the permissions required for three types of PostGIS interactions:

  • Install/Create PostGIS
  • Use PostGIS
  • Load data from pg_dump

Database and Users

I am using Postgres installed on my laptop for these tests, Postgres 13.5 and PostGIS 3.1. I created an empty database named postgis_perms and check the \du slash command in psql to see the current roles. This instance has my my ryanlambert role, a superuser, and the default postgres role. The postgres role is not used in this post outside of this example.

([local] 🐘) ryanlambert@postgis_perms=# \du
                                     List of roles
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Role name  β”‚                         Attributes                         β”‚ Member of β”‚
β•žβ•β•β•β•β•β•β•β•β•β•β•β•β•β•ͺ════════════════════════════════════════════════════════════β•ͺ═══════════║
β”‚ postgres    β”‚ Superuser, Create role, Create DB, Replication, Bypass RLS β”‚ {}        β”‚
β”‚ ryanlambert β”‚ Superuser, Create role, Create DB                          β”‚ {}        β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Most of my readers will know about primary keys and all kinds of table constraints. However, only a few of you may have ever thought about the difference between a primary key and a UNIQUE constraint. Isn’t it all just the same? In both cases, PostgreSQL will create an index that avoids duplicate entries. So what is the difference? Let’s dig in and find out…

What primary keys and UNIQUE constraints do

The following example shows both a primary key and a unique constraint:

test=# CREATE TABLE t_sample (a int PRIMARY KEY, b int UNIQUE);

CREATE TABLE

test=# \d t_sample

            Table "public.t_sample"
 Column |  Type   | Collation | Nullable | Default
--------+---------+-----------+----------+---------
 a      | integer |           | not null | 
 b      | integer |           |          | 

Indexes:
    "t_sample_pkey" PRIMARY KEY, btree (a)
    "t_sample_b_key" UNIQUE CONSTRAINT, btree (b)

The really important observation is that both features make PostgreSQL create an index. This is important because peopleΒ often use additional indexes on primary keys or unique columns. These additional indexes are not only unnecessary, but actually counterproductive.

The key to success: NULL handling

What makes a primary key different from a unique index is the way NULL entries are handled. Let’s take a look at a simple example:

test=# INSERT INTO t_sample VALUES (1, NULL);

INSERT 0 1

The example above works perfectly. PostgreSQL will accept the NULL value for the second column. As long as the primary key contains a unique value, we are OK. However, if that changes, then an error will occur:

test=# INSERT INTO t_sample VALUES (NULL, 2);

ERROR:  null value in column "a" of relation "t_sample" violates not-null constraint
DETAIL:  Failing row contains (null, 2).

This is actually the single biggest difference between these two types of constraints. Keep that in mind.

Using foreign keys

The next logical question which arises is: What does that mean for foreign keys? Does it make a

[...]
Posted by Robert Haas in EDB on 2021-11-29 at 20:39
When PostgreSQL needs to sort strings, it relies on either the operating system (by default) or the ICU collation library (if your PostgreSQL has been built with support for ICU and you have chosen to use an ICU-based collation) to tell it in what order the strings ought to be sorted. Unfortunately, operating system behaviors are confusing and inconsistent, and they change relatively frequently for reasons that most people can't understand. That's a problem for PostgreSQL users, especially PostgreSQL users who create indexes on text columns. The first step in building a btree index to sort the data, and if this sort order differs from the one used for later index lookups, data that is actually present in the index may not be found, and your queries may return wrong answers.Read more Β»
Posted by Franck Pachot in Yugabyte on 2021-11-29 at 17:42

This post exposes some basic examples for one of the greatest feature of PostgreSQL: text search. In standard SQL, we can use LIKE, SIMILAR and REGEXP. But general text search cannot be optimized with simple B-Tree indexes on the column value. Text contain words, and indexing the text as a whole is not sufficient. Fortunately, PostgreSQL provides many index types, and one of them is GIN - Generalized Inverted Index. We can index the words, automatically, with functions to extract them as "tsvector" - Text Search vectors.

If you are an advanced user of PostgreSQL, you will probably not learn new things here. Except if you want to see how I use it, with views and stored function to encapsulate the functions. If you are a user of Oracle or SQL Server, you know the idea but may be surprised by how it is easy to use in an Open Source database. If you are a user of ElasticSearch, you may see that for simple searches, SQL databases can provide this without an additional service.

My goal here is to show that we can use the same on the latest version of Yugabyte (I'm using 2.11 there). YugabyteDB is a distributed SQL database that reuses the PostgreSQL query layer, which means that many features come without additional effort. However, the distributed storage is different from the monolithic postgres, using LSM Tree instead of B-Tree and Heap tables. The YugabyteDB YBGIN is similar to YugabyteDB GIN, but implemented on top of LSM Tree indexes.

PostgreSQL: HEAP, BTREE and GIN

In PostgreSQL, here is how you define an HEAP table and a GIN index:

postgres=# create table demo
           (id bigint primary key, description text)
           USING HEAP;
CREATE TABLE

postgres=# create index demo_index on demo
           ( length(description) );
CREATE INDEX

postgres=# create index demo_gin on demo
           USING GIN
           ( (to_tsvector('simple',description)) );
CREATE INDEX

postgres=# select relname,reltype,amname 
           from pg_class left outer join pg_am
           on pg_class.relam=pg_am.
[...]
PostgreSQL-14 Database Monitoring and Logging Enhancements

PostgreSQL-14 was released in September 2021, and it contained many performance improvements and feature enhancements, including some features from a monitoring perspective. As we know, monitoring is the key element of any database management system, and PostgreSQL keeps updating and enhancing the monitoring capabilities. Here are some key ones in PostgreSQL-14.

Query Identifier

Query identifier is used to identify the query, which can be cross-referenced between extensions. Prior to PostgreSQL-14, extensions used an algorithm to calculate the query_id. Usually, the same algorithm is used to calculate the query_id, but any extension can use its own algorithm. Now, PostgreSQL-14 optionally provides a query_id to be computed in the core. Now PostgreSQL-14’s monitoring extensions and utilities like pg_stat_activity, explain, and in pg_stat_statments use this query_id instead of calculating its own. This query_id can be seen in csvlog, after specifying in the log_line_prefix. From a user perspective, there are two benefits of this feature.

  • All the utilities/extensions will use the same query_id calculated by core, which provides an ease to cross-reference this query_id. Previously, all the utilities/extensions needed to use the same algorithm in their code to achieve this capability.
  • The second benefit is extension/utilities can use calculated query_id and don’t need to again, which is a performance benefit.

PostgreSQL introduces a new GUC configuration parameter compute_query_id to enable/disable this feature. The default is auto; this can be turned on/off in postgresql.conf file, or using the SET command.

  • pg_stat_activity

SET compute_query_id = off;

SELECT datname, query, query_id FROM pg_stat_activity;
 datname  |                                 query                                 | query_id 
----------+-----------------------------------------------------------------------+----------
 postgres | select datname, query, query_id from pg_stat_activity;                | 
[...]

The case

I have been involved into a case where an important application has been behaving erratically in regards of execution time. This is the background.

The application is customised installation of a very-well-known analtics suite in use at a very large shipping and logstics Dutch company. This has been deployed in AWS with the back storage provided by a rather large Postgres RDS both in acceptance and in production environment, with the former being a slimmed down installation when compared to the latter. The UI is web-based.
The company I work with is in charge for the cloud infra, and that alone.

The particular issue happens when browsing a list of postal addresses. This very function is used to browse and search among about 9 million postal addresses all around the country (the Netherlands for the records). To my personal opinion and experience this is not a really darge data set. At the very start, this function displays the first 100 addresses with the usual search boxes, paging widgets and a number that represents the overall counter. You can click on an address and get an expanded view. For some unclear reasons the problem has been moved to the company responsible for the infrastructure.

In acceptance the function seems to work quite well 95% of the times: the first page appears almost instantaneously but 5 times out of 100 it can take up to two minutes (yes, 120 seconds!) before displaying any data. In production, counterintuitively, it’s the other way around: the startup takes a very log time to display the first page of data. Subsequent pages appear almost instantaneously in either environment.

The whole investigation was time-boxed to just 1 working day.

First ask for the logs!

As usual I tried to gather as much details as possible about the application itself. It’s a name I have heard and seens in almost all my previous working experiences from the past 10 years, but this is my very first time to look at it this deep inside.
According to the customer the specif

[...]
Posted by Andreas 'ads' Scherbaum on 2021-11-29 at 06:00
PostgreSQL Person of the Week Interview with Fabien Coelho: I’m French, born and raised in Paris over 50 years ago. I work in Fontainebleau and Paris, and live in the Centre Val de Loire region, along the Loire river.

Introduction

In a previous post, I explained how we use transition tables in our implementation of Incremental View Maintenance (IVM) on PostgreSQL. Transition table is a features of AFTER trigger which allows trigger functions to refer to the changes of a table that occurred in a statement. We are using transition tables in order to extract table changes needed to calculate changes to be applied on views .

In this article I describes a more complicated situation, specifically how we handle transition tables when multiple tables are modified in a statement.

Single Table Modification

In a case where a single table is modified in a statement, the view maintenance process is simple. For example, suppose we have three tables R, S, and T. We also define a materialized view V = R ⨝ S ⨝ T that joins these tables as bellow:

SELECT x,y,z FROM R,S,T WHERE R.i=S.i AND S.j=T.j;

Then, suppose that table R was modified in a statement. This operation can be written as R ← R ∸ βˆ‡R ⊎ Ξ”R, where βˆ‡R is a bag of tuples deleted from R, and Ξ”R is a bag of tuples inserted into R in this statement. In this case, the changes are calculated as βˆ‡V = βˆ‡R ⨝ S ⨝ T and Ξ”V = Ξ”R ⨝ S ⨝ T, and we can update the view as V ← V ∸ βˆ‡V ⊎ Ξ”V. The SQL representation of these calculations is as follows:

-- βˆ‡V: tuples to be deleted from the view 
SELECT x,y,z FROM R_old,S,T WHERE R_old.i=S.i AND S.j=T.j;
-- Ξ”V: tuples to be inserted into the view
SELECT x,y,z FROM R_new,S,T WHERE R_new.i=S.i AND S.j=T.j;

where R_old and R_new are transition tables corresponding to βˆ‡R and Ξ”R, respectively.

Multiple Tables Modification

Now, let’s see cases where multiple tables are modified in a statement. You can observe it when you use modifying CTEs (WITH clause), like:

WITH i1 AS (INSERT INTO R VALUES(1,10) RETURNING 1), 
i2 AS (INSERT INTO S VALUES(1,100) RETURNING 1)
SELECT;

In addition, multiple tables can be updated when you use triggers, or foreign key constraint.

Pre-Update State of Tables

At that time, we need the state

[...]
Posted by David Z in Highgo Software on 2021-11-27 at 00:31

1. Overview

I have been working on an internal project based on PostgreSQL for a while, and from time to time, I need to run some specific test cases to verify my changes. Here, I want to shared a tip to run a specific regression TAP test quickly, especially, when you are focusing on a particular bug and you know which test case can help verify the fix. A details document about the regression test can be found at Running the Tests.

2. Regression test

PostgreSQL provides a comprehensive set of regression tests to verify the SQL implementation embedded in PostgreSQL as well as the extended capabilities of PostgreSQL. Whenever you make some changes, you should run these existing test cases to make sure your change doesn’t break any existing features. Other than these regression tests, there are some special features using a test framework call TAP test. For example, kerberos, ssl, recovery etc.

If you want to run these tests, you have to make sure the option --enable-tap-tests has been configured. for example,
./configure --prefix=$HOME/pgapp --enable-tap-tests --enable-debug CFLAGS="-g3 -O0 -fno-omit-frame-pointer"

You can run the TAP test using either make check or make installcheck, but compared with those non-TAP tests, the different is that these TAP tests will always start a test server even you run make installcheck. Because of this different, some tests may take a longer time than you expected, and even worse, if some test cases failed in the middle then the entire test will stop, and your test cases may never get the chance to run. For example, I changed somethings related to the recovery features, and those changes suppose to be tested by test cases 021_row_visibility.pl and 025_stuck_on_old_timeline.pl, but whenever I run make check or make installcheck, it ends up with something like below.

t/001_stream_rep.pl .................. ok     
t/002_archiving.pl ................... ok   
t/003_recovery_targets.pl ............ ok   
t/004_timeline_switch.pl ............. ok   
t/005_rep
[...]
On 3rd of August 2021, Tom Lane committed patch: Add assorted new regexp_xxx SQL functions.   This patch adds new functions regexp_count(), regexp_instr(), regexp_like(), and regexp_substr(), and extends regexp_replace() with some new optional arguments. All these functions follow the definitions used in Oracle, although there are small differences in the regexp language due to using … Continue reading "Waiting for PostgreSQL 15 – Add assorted new regexp_xxx SQL functions."

An ugly way to introspect database changes.

Monitoring Schema Changes via Last Commit Timestamp

A few days ago, a colleague of mine shown to me that a commercial database keeps track of last DDL change timestamp against database objects.
I began to mumble… is that possible in PostgreSQL? Of course it is, but what is the smartest way to achieve it?
I asked on the mailing list, because the first idea that came into my mind was to use commit timestamps.
Clearly, it is possible to implement something that can do the job using event triggers, that in short are triggers not attached to table tuples rather to database event like DDL commands. Great! And in fact, a very good explaination can be found here.
In this article, I present my first idea about using commit timestamps.
The system used for the test is PostgreSQL 13.4 running on Fedora Linux, with only myself connected to it (this simplifies following transactions). The idea is, in any case, general and easy enough to be used on busy systems.

Introduction to pg_last_committed_xact()

The special function pg_last_committed_xact() allows the database administrator (or an user) to get information about which transaction has committed last.
Let’s see this in action:



% psql -U luca -h miguel -c 'select pg_last_committed_xact();'   testdb
ERROR:  could not get commit timestamp data
HINT:  Make sure the configuration parameter "track_commit_timestamp" is set.



First of all in order to get information about the committed transaction timestamps, there must be the option track_commit_timestamp configured.
Turning on and off the parameter will not provide historic data, that is even if you had the parameter on and then you turned off, you will not be able to access collected data.
Let’s turn on the parameter and see how it works. The track_commit_timestamp is a parameter with the postmaster context, and therefore requires a server restart!



% psql -U postgres -h miguel \
       -c 'ALTER SYSTEM SET track_com
[...]
Posted by Regina Obe in PostGIS on 2021-11-26 at 00:00

The PostGIS Team is pleased to release the second beta of the upcoming PostGIS 3.2.0 release.

Best served with PostgreSQL 14. This version of PostGIS utilizes the faster GiST building support API introduced in PostgreSQL 14. If compiled with recently released GEOS 3.10.1 you can take advantage of improvements in ST_MakeValid and numerous speed improvements. This release also includes many additional functions and improvements for postgis_raster and postgis_topology extensions.

Continue Reading by clicking title hyperlink ..
Posted by Luca Ferrari on 2021-11-24 at 00:00

pgenv 1.2.1 introduces a different configuration setup.

pgenv config migrate

Just a few hours I blogged about some new cool features in pgenv, I completed the work about configuration in one place.
Now pgenv will keep all configuration files into a single directory, named config . This is useful because it allows you to backup and/or migrate all the configuration from one machine to another easily.
But it’s not all: since the configuration is now under a single directory, the single configuration file name has changed. Before this release, a configuration file was named like .pgenv.PGVERSION.conf, with the .pgenv prefix that both made the file hidden and stated to which application such file belongs to. Since the configuration files are now into a subdirectory, the prefix has been dropped, so that every configuration file is now simply named as PGVERSION.conf, like for example 10.4.conf.
And since we like to make things easy, there is a config migrate command that helps you move your existing configuration from the old naming scheme to the new one:



% pgenv config migrate
Migrated 3 configuration file(s) from previous versions (0 not migrated)
Your configuration file(s) are now into [~/git/misc/PostgreSQL/pgenv/config]



Let’s have fun with pgenv!

One of the many reasons "the answer is Postgres" is due to its extensibility.

The ability to extend Postgres has given rise to an ecosystem of Postgres extensions that change the behavior of the database to support a wide range of interesting capabilities. At Crunchy Data we are big fans of PostGIS, the geospatial extender for Postgres.

Another extension we are asked about often is TimescaleDB.

TimescaleDB is an open-source extension designed to make SQL scalable for time-series data. Timescale, Inc., the company behind TimescaleDB, provides an Apache licensed community edition of TimescaleDB that is packaged as a Postgres extension that provides automated partitioning across time and space.

We are often asked about the potential to deploy the Apache licensed community edition of TimescaleDB as an extension within our Crunchy PostgreSQL for Kubernetes using PGO, the open source Postgres Operator. We announced that we added the Apache licensed "community edition" of TimescaleDB to PGO 4.7, and we have brought TimescaleDB into PGO v5.

Let us look at how you can deploy the TimescaleDB extension as part of an HA Postgres cluster native to Kubernetes using the PGO Postgres Operator.

Deploying TimescaleDB on Kubernetes with PGO

good (?) reasons to use an entity-attribute-value design
Β© Laurenz Albe 2021

Customers have often asked me what I think of β€œEntity-attribute-value” (EAV) design. So I thought it would be a good idea to lay down my opinion in writing.

What is entity-attribute-value design?

The idea is not to create a table for each entity in the application. Rather, you store each attribute as a separate entry in an attribute table:

CREATE TABLE objects (
   objectid bigint PRIMARY KEY
   /* other object-level properties */
);

CREATE TABLE attstring (
   objectid bigint
      REFERENCES objects ON DELETE CASCADE NOT NULL,
   attname text NOT NULL,
   attval text,
   PRIMARY KEY (objectid, attname)
);

CREATE TABLE attint (
   objectid bigint
      REFERENCES objects ON DELETE CASCADE NOT NULL,
   attname text NOT NULL,
   attval integer,
   PRIMARY KEY (objectid, attname)
);

/* more tables for other data types */

The name of the model is derived from the β€œatt...” tables, which have the three columns: β€œentity ID”, β€œattribute name” and β€œvalue”.

There are several variations of the basic theme, among them:

  • omit the objects table
  • add additional tables that define β€œobject types”, so that each type can only have certain attributes

Why would anybody consider an entity-attribute-value design?

The principal argument I hear in support of the EAV design is flexibility. You can create new entity types without having to create a database table. Taken to the extreme, each entity can have different attributes.

I suspect that another reason for people to consider such a data model is that they are more familiar with key-value stores than with relational databases.

Performance considerations of entity-attribute-value design

In my opinion, EAV database design is the worst possible design when it comes to performance. You will never get good database performance with such a data model.

The only use cases where EAV shines are when it is used as a key-value store.

INSERT

Inserting an entity will look like this:

INSERT INTO objects (o
[...]
Posted by Andreas 'ads' Scherbaum on 2021-11-22 at 14:00
PostgreSQL Person of the Week Interview with Pavel Luzanov: I live in Moscow, and work at Postgres Professional. I am responsible for educational projects.
Some time ago on Slack some person said: varchar is better (storage efficiency), i recommend using it for less than 2048 chars, for the best : TEXT There was discussion that followed, the person that claimed this efficiency never backed their claim, saying only that: VARChar takes much less β€˜place' than TEXT … but have … Continue reading "Does varchar(n) use less disk space than varchar() or text?"
Posted by Shaun M. Thomas in EDB on 2021-11-19 at 09:55
In this weeks PG Phriday, High Availability Architect Shaun Thomas explores some of the more advanced repmgr use cases that will bring your Postgres High Availability game to the next level. [Continue reading...]
Posted by Charly Batista in Percona on 2021-11-18 at 13:55
Index on Foreign Keys in PostgreSQL

Welcome to a weekly blog where I get to answer (like, really answer) some of the questions I’ve seen in the webinars I’ve presented lately. If you missed the latest one, PostgreSQL Performance Tuning Secrets, it might be helpful to give some of it a listen before or after you read this post. Each week, I’ll dive deep into one question. Let me know what you think in the comments.Β 

We constantly hear that indexes improve read performance and it’s usually true, but we also know that it will always have an impact on writes. What we don’t hear about too often is that in some cases, it may not give any performance improvement at all. This happens more than we want and might happen more than we even notice, and foreign keys (FKs) are a great example. I’m not saying that all FK’s indexes are bad, but most of the ones I’ve seen are just unnecessary, only adding load to the system.

For example, the below relationship where we have a 1:N relationship between the table β€œSupplier” and table β€œProduct”:

foreign keys index

If we pay close attention to the FK’s in this example it won’t have a high number of lookups on the child table using the FK column, β€œSupplierID” in this example, if we compare with the number of lookups using β€œProductID” and probably β€œProductName”. The major usage will be to keep the relationship consistent and search in the other direction, finding the supplier for a certain product. In this circumstance, adding an index to the FK child without ensuring the access pattern requires it will add the extra cost of updating the index every time we update the β€œProduct” table.

Another point we need to pay attention to is the index cardinality. If the index cardinality is too low Postgres won’t use it and the index will be just ignored. One can ask why that happens and if that wouldn’t still be cheaper for the database to go, for example, through half of the indexes instead of doing a full table scan? The answer is no, especially for databases that use heap tables like Postgres. The table access in Postgres (h

[...]
Posted by Luca Ferrari on 2021-11-18 at 00:00

pgenv 1.2 introduces a few nice features.

New features in pgenv

pgenv is a great tool to simply manage different binary installations of PostgreSQL.
It is a shell script, specifically designed for the Bash shell, that provides a single command named pgenv that accepts sub-commands to fetch, configure, install, start and stop different PostgreSQL versions on the same machine.
It is not designed to be used in production or in an enterprise environment, even if it could, but rather it is designed to be used as a compact and simple way to switch between different versions in order to test applications and libraries.

In the last few weeks, there has been quite work around pgenv, most notably:

  • support for multiple configuration flags;
  • consistent behavior about configuration files.


In the following, I briefly describe each of the above.

Support for multiple configuration flags

pgenv does support configuration files, where you can store shell variables that drive the PostgreSQL build and configuration. One problem pgenv had was due to the limitation of the shell environment variables: since they represent a single value, passing multiple values separated by spaces was not possible. This made build flags, e.g., CFLAGS hard to write if not impossible.
Since this commit, David (the original author) introduced the capability to configure options containing spaces. The trick was to switch from simple environment variables to Bash arrays, so that the configuration can be written as



PGENV_CONFIGURE_OPTIONS=(
    --with-perl
    --with-openssl
    'CFLAGS=-I/opt/local/opt/openssl/include -I/opt/local/opt/libxml2/include'
    'LDFLAGS=-L/opt/local/opt/openssl/lib -L/opt/local/opt/libxml2/lib'
)



where the CFLAGS and LDFLAGS both contain spaces.
To be coherent, this also renamed a lot of _OPT_ parameters to _OPTIONS_ to reflect the fact that they now can contain multiple values.

Consistent behavior about configuration files

pgenv exploits a default co

[...]
Posted by Bruce Momjian in EDB on 2021-11-17 at 20:45

I presented a new slide deck, Enterprise Postgres Growth in Japan, at last week's Japan PostgreSQL User Group (JPUG) conference. I have been closely involved with the Japanese Postgres community for 20 years, and distilling lessons from my involvement was challenging. However, I was very happy with the result, and I think the audience benefited. I broke down the time into three periods, and concluded that the Japanese are now heavily involved in Postgres server development, and the community greatly relies on them.

Posted by Christoph Berg in credativ on 2021-11-17 at 15:46

pg_dirtyread

Earlier this week, I updated pg_dirtyread to work with PostgreSQL 14. pg_dirtyread is a PostgreSQL extension that allows reading "dead" rows from tables, i.e. rows that have already been deleted, or updated. Of course that works only if the table has not been cleaned-up yet by a VACUUM command or autovacuum, which is PostgreSQL's garbage collection machinery.

Here's an example of pg_dirtyread in action:

# create table foo (id int, t text);
CREATE TABLE
# insert into foo values (1, 'Doc1');
INSERT 0 1
# insert into foo values (2, 'Doc2');
INSERT 0 1
# insert into foo values (3, 'Doc3');
INSERT 0 1

# select * from foo;
 id β”‚  t
────┼──────
  1 β”‚ Doc1
  2 β”‚ Doc2
  3 β”‚ Doc3
(3 rows)

# delete from foo where id < 3;
DELETE 2

# select * from foo;
 id β”‚  t
────┼──────
  3 β”‚ Doc3
(1 row)

Oops! The first two documents have disappeared.

Now let's use pg_dirtyread to look at the table:

# create extension pg_dirtyread;
CREATE EXTENSION

# select * from pg_dirtyread('foo') t(id int, t text);
 id β”‚  t
────┼──────
  1 β”‚ Doc1
  2 β”‚ Doc2
  3 β”‚ Doc3

All three documents are still there, just now all of them are visible.

pg_dirtyread can also show PostgreSQL's system colums with the row location and visibility information. For the first two documents, xmax is set, which means the row has been deleted:

# select * from pg_dirtyread('foo') t(ctid tid, xmin xid, xmax xid, id int, t text);
 ctid  β”‚ xmin β”‚ xmax β”‚ id β”‚  t
───────┼──────┼──────┼────┼──────
 (0,1) β”‚ 1577 β”‚ 1580 β”‚  1 β”‚ Doc1
 (0,2) β”‚ 1578 β”‚ 1580 β”‚  2 β”‚ Doc2
 (0,3) β”‚ 1579 β”‚    0 β”‚  3 β”‚ Doc3
(3 rows)

I always had plans to extend pg_dirtyread to include some "undelete" command to make deleted rows reappear, but never got around to trying that. But rows can already be restored by using the output of pg_dirtyread itself:

# insert into foo select * from pg_dirtyread('foo') t(id int, t text) where id = 1;

This is not a true "undelete", though - it just inserts new rows from the data read from the table.

pg_surgery

Enter pg

[...]