PostgreSQL
The world's most advanced open source database
Top posters
Number of posts in the past two months
Top teams
Number of posts in the past two months
Feeds
Planet
  • Policy for being listed on Planet PostgreSQL.
  • Add your blog to Planet PostgreSQL.
  • List of all subscribed blogs.
  • Manage your registration.
Contact
  • Get in touch with the Planet PostgreSQL administrators at planet at postgresql.org.
Posted by David Z in Highgo Software on 2021-11-27 at 00:31

1. Overview

I have been working on an internal project based on PostgreSQL for a while, and from time to time, I need to run some specific test cases to verify my changes. Here, I want to shared a tip to run a specific regression TAP test quickly, especially, when you are focusing on a particular bug and you know which test case can help verify the fix. A details document about the regression test can be found at Running the Tests.

2. Regression test

PostgreSQL provides a comprehensive set of regression tests to verify the SQL implementation embedded in PostgreSQL as well as the extended capabilities of PostgreSQL. Whenever you make some changes, you should run these existing test cases to make sure your change doesn’t break any existing features. Other than these regression tests, there are some special features using a test framework call TAP test. For example, kerberos, ssl, recovery etc.

If you want to run these tests, you have to make sure the option --enable-tap-tests has been configured. for example,
./configure --prefix=$HOME/pgapp --enable-tap-tests --enable-debug CFLAGS="-g3 -O0 -fno-omit-frame-pointer"

You can run the TAP test using either make check or make installcheck, but compared with those non-TAP tests, the different is that these TAP tests will always start a test server even you run make installcheck. Because of this different, some tests may take a longer time than you expected, and even worse, if some test cases failed in the middle then the entire test will stop, and your test cases may never get the chance to run. For example, I changed somethings related to the recovery features, and those changes suppose to be tested by test cases 021_row_visibility.pl and 025_stuck_on_old_timeline.pl, but whenever I run make check or make installcheck, it ends up with something like below.

t/001_stream_rep.pl .................. ok     
t/002_archiving.pl ................... ok   
t/003_recovery_targets.pl ............ ok   
t/004_timeline_switch.pl ............. ok   
t/005_rep
[...]
On 3rd of August 2021, Tom Lane committed patch: Add assorted new regexp_xxx SQL functions.   This patch adds new functions regexp_count(), regexp_instr(), regexp_like(), and regexp_substr(), and extends regexp_replace() with some new optional arguments. All these functions follow the definitions used in Oracle, although there are small differences in the regexp language due to using … Continue reading "Waiting for PostgreSQL 15 – Add assorted new regexp_xxx SQL functions."

An ugly way to introspect database changes.

Monitoring Schema Changes via Last Commit Timestamp

A few days ago, a colleague of mine shown to me that a commercial database keeps track of last DDL change timestamp against database objects.
I began to mumble… is that possible in PostgreSQL? Of course it is, but what is the smartest way to achieve it?
I asked on the mailing list, because the first idea that came into my mind was to use commit timestamps.
Clearly, it is possible to implement something that can do the job using event triggers, that in short are triggers not attached to table tuples rather to database event like DDL commands. Great! And in fact, a very good explaination can be found here.
In this article, I present my first idea about using commit timestamps.
The system used for the test is PostgreSQL 13.4 running on Fedora Linux, with only myself connected to it (this simplifies following transactions). The idea is, in any case, general and easy enough to be used on busy systems.

Introduction to pg_last_committed_xact()

The special function pg_last_committed_xact() allows the database administrator (or an user) to get information about which transaction has committed last.
Let’s see this in action:



% psql -U luca -h miguel -c 'select pg_last_committed_xact();'   testdb
ERROR:  could not get commit timestamp data
HINT:  Make sure the configuration parameter "track_commit_timestamp" is set.



First of all in order to get information about the committed transaction timestamps, there must be the option track_commit_timestamp configured.
Turning on and off the parameter will not provide historic data, that is even if you had the parameter on and then you turned off, you will not be able to access collected data.
Let’s turn on the parameter and see how it works. The track_commit_timestamp is a parameter with the postmaster context, and therefore requires a server restart!



% psql -U postgres -h miguel \
       -c 'ALTER SYSTEM SET track_com
[...]
Posted by Regina Obe in PostGIS on 2021-11-26 at 00:00

The PostGIS Team is pleased to release the second beta of the upcoming PostGIS 3.2.0 release.

Best served with PostgreSQL 14. This version of PostGIS utilizes the faster GiST building support API introduced in PostgreSQL 14. If compiled with recently released GEOS 3.10.1 you can take advantage of improvements in ST_MakeValid and numerous speed improvements. This release also includes many additional functions and improvements for postgis_raster and postgis_topology extensions.

Continue Reading by clicking title hyperlink ..
Posted by Luca Ferrari on 2021-11-24 at 00:00

pgenv 1.2.1 introduces a different configuration setup.

pgenv config migrate

Just a few hours I blogged about some new cool features in pgenv, I completed the work about configuration in one place.
Now pgenv will keep all configuration files into a single directory, named config . This is useful because it allows you to backup and/or migrate all the configuration from one machine to another easily.
But it’s not all: since the configuration is now under a single directory, the single configuration file name has changed. Before this release, a configuration file was named like .pgenv.PGVERSION.conf, with the .pgenv prefix that both made the file hidden and stated to which application such file belongs to. Since the configuration files are now into a subdirectory, the prefix has been dropped, so that every configuration file is now simply named as PGVERSION.conf, like for example 10.4.conf.
And since we like to make things easy, there is a config migrate command that helps you move your existing configuration from the old naming scheme to the new one:



% pgenv config migrate
Migrated 3 configuration file(s) from previous versions (0 not migrated)
Your configuration file(s) are now into [~/git/misc/PostgreSQL/pgenv/config]



Let’s have fun with pgenv!

One of the many reasons "the answer is Postgres" is due to its extensibility.

The ability to extend Postgres has given rise to an ecosystem of Postgres extensions that change the behavior of the database to support a wide range of interesting capabilities. At Crunchy Data we are big fans of PostGIS, the geospatial extender for Postgres.

Another extension we are asked about often is TimescaleDB.

TimescaleDB is an open-source extension designed to make SQL scalable for time-series data. Timescale, Inc., the company behind TimescaleDB, provides an Apache licensed community edition of TimescaleDB that is packaged as a Postgres extension that provides automated partitioning across time and space.

We are often asked about the potential to deploy the Apache licensed community edition of TimescaleDB as an extension within our Crunchy PostgreSQL for Kubernetes using PGO, the open source Postgres Operator. We announced that we added the Apache licensed "community edition" of TimescaleDB to PGO 4.7, and we have brought TimescaleDB into PGO v5.

Let us look at how you can deploy the TimescaleDB extension as part of an HA Postgres cluster native to Kubernetes using the PGO Postgres Operator.

Deploying TimescaleDB on Kubernetes with PGO

good (?) reasons to use an entity-attribute-value design
© Laurenz Albe 2021

Customers have often asked me what I think of “Entity-attribute-value” (EAV) design. So I thought it would be a good idea to lay down my opinion in writing.

What is entity-attribute-value design?

The idea is not to create a table for each entity in the application. Rather, you store each attribute as a separate entry in an attribute table:

CREATE TABLE objects (
   objectid bigint PRIMARY KEY
   /* other object-level properties */
);

CREATE TABLE attstring (
   objectid bigint
      REFERENCES objects ON DELETE CASCADE NOT NULL,
   attname text NOT NULL,
   attval text,
   PRIMARY KEY (objectid, attname)
);

CREATE TABLE attint (
   objectid bigint
      REFERENCES objects ON DELETE CASCADE NOT NULL,
   attname text NOT NULL,
   attval integer,
   PRIMARY KEY (objectid, attname)
);

/* more tables for other data types */

The name of the model is derived from the “att...” tables, which have the three columns: “entity ID”, “attribute name” and “value”.

There are several variations of the basic theme, among them:

  • omit the objects table
  • add additional tables that define “object types”, so that each type can only have certain attributes

Why would anybody consider an entity-attribute-value design?

The principal argument I hear in support of the EAV design is flexibility. You can create new entity types without having to create a database table. Taken to the extreme, each entity can have different attributes.

I suspect that another reason for people to consider such a data model is that they are more familiar with key-value stores than with relational databases.

Performance considerations of entity-attribute-value design

In my opinion, EAV database design is the worst possible design when it comes to performance. You will never get good database performance with such a data model.

The only use cases where EAV shines are when it is used as a key-value store.

INSERT

Inserting an entity will look like this:

INSERT INTO objects (o
[...]
Posted by Andreas 'ads' Scherbaum on 2021-11-22 at 14:00
PostgreSQL Person of the Week Interview with Pavel Luzanov: I live in Moscow, and work at Postgres Professional. I am responsible for educational projects.
Some time ago on Slack some person said: varchar is better (storage efficiency), i recommend using it for less than 2048 chars, for the best : TEXT There was discussion that followed, the person that claimed this efficiency never backed their claim, saying only that: VARChar takes much less ‘place' than TEXT … but have … Continue reading "Does varchar(n) use less disk space than varchar() or text?"
Posted by Shaun M. Thomas in EDB on 2021-11-19 at 09:55
In this weeks PG Phriday, High Availability Architect Shaun Thomas explores some of the more advanced repmgr use cases that will bring your Postgres High Availability game to the next level. [Continue reading...]
Posted by Charly Batista in Percona on 2021-11-18 at 13:55
Index on Foreign Keys in PostgreSQL

Welcome to a weekly blog where I get to answer (like, really answer) some of the questions I’ve seen in the webinars I’ve presented lately. If you missed the latest one, PostgreSQL Performance Tuning Secrets, it might be helpful to give some of it a listen before or after you read this post. Each week, I’ll dive deep into one question. Let me know what you think in the comments. 

We constantly hear that indexes improve read performance and it’s usually true, but we also know that it will always have an impact on writes. What we don’t hear about too often is that in some cases, it may not give any performance improvement at all. This happens more than we want and might happen more than we even notice, and foreign keys (FKs) are a great example. I’m not saying that all FK’s indexes are bad, but most of the ones I’ve seen are just unnecessary, only adding load to the system.

For example, the below relationship where we have a 1:N relationship between the table “Supplier” and table “Product”:

foreign keys index

If we pay close attention to the FK’s in this example it won’t have a high number of lookups on the child table using the FK column, “SupplierID” in this example, if we compare with the number of lookups using “ProductID” and probably “ProductName”. The major usage will be to keep the relationship consistent and search in the other direction, finding the supplier for a certain product. In this circumstance, adding an index to the FK child without ensuring the access pattern requires it will add the extra cost of updating the index every time we update the “Product” table.

Another point we need to pay attention to is the index cardinality. If the index cardinality is too low Postgres won’t use it and the index will be just ignored. One can ask why that happens and if that wouldn’t still be cheaper for the database to go, for example, through half of the indexes instead of doing a full table scan? The answer is no, especially for databases that use heap tables like Postgres. The table access in Postgres (h

[...]
Posted by Luca Ferrari on 2021-11-18 at 00:00

pgenv 1.2 introduces a few nice features.

New features in pgenv

pgenv is a great tool to simply manage different binary installations of PostgreSQL.
It is a shell script, specifically designed for the Bash shell, that provides a single command named pgenv that accepts sub-commands to fetch, configure, install, start and stop different PostgreSQL versions on the same machine.
It is not designed to be used in production or in an enterprise environment, even if it could, but rather it is designed to be used as a compact and simple way to switch between different versions in order to test applications and libraries.

In the last few weeks, there has been quite work around pgenv, most notably:

  • support for multiple configuration flags;
  • consistent behavior about configuration files.


In the following, I briefly describe each of the above.

Support for multiple configuration flags

pgenv does support configuration files, where you can store shell variables that drive the PostgreSQL build and configuration. One problem pgenv had was due to the limitation of the shell environment variables: since they represent a single value, passing multiple values separated by spaces was not possible. This made build flags, e.g., CFLAGS hard to write if not impossible.
Since this commit, David (the original author) introduced the capability to configure options containing spaces. The trick was to switch from simple environment variables to Bash arrays, so that the configuration can be written as



PGENV_CONFIGURE_OPTIONS=(
    --with-perl
    --with-openssl
    'CFLAGS=-I/opt/local/opt/openssl/include -I/opt/local/opt/libxml2/include'
    'LDFLAGS=-L/opt/local/opt/openssl/lib -L/opt/local/opt/libxml2/lib'
)



where the CFLAGS and LDFLAGS both contain spaces.
To be coherent, this also renamed a lot of _OPT_ parameters to _OPTIONS_ to reflect the fact that they now can contain multiple values.

Consistent behavior about configuration files

pgenv exploits a default co

[...]
Posted by Bruce Momjian in EDB on 2021-11-17 at 20:45

I presented a new slide deck, Enterprise Postgres Growth in Japan, at last week's Japan PostgreSQL User Group (JPUG) conference. I have been closely involved with the Japanese Postgres community for 20 years, and distilling lessons from my involvement was challenging. However, I was very happy with the result, and I think the audience benefited. I broke down the time into three periods, and concluded that the Japanese are now heavily involved in Postgres server development, and the community greatly relies on them.

Posted by Christoph Berg in credativ on 2021-11-17 at 15:46

pg_dirtyread

Earlier this week, I updated pg_dirtyread to work with PostgreSQL 14. pg_dirtyread is a PostgreSQL extension that allows reading "dead" rows from tables, i.e. rows that have already been deleted, or updated. Of course that works only if the table has not been cleaned-up yet by a VACUUM command or autovacuum, which is PostgreSQL's garbage collection machinery.

Here's an example of pg_dirtyread in action:

# create table foo (id int, t text);
CREATE TABLE
# insert into foo values (1, 'Doc1');
INSERT 0 1
# insert into foo values (2, 'Doc2');
INSERT 0 1
# insert into foo values (3, 'Doc3');
INSERT 0 1

# select * from foo;
 id │  t
────┼──────
  1 │ Doc1
  2 │ Doc2
  3 │ Doc3
(3 rows)

# delete from foo where id < 3;
DELETE 2

# select * from foo;
 id │  t
────┼──────
  3 │ Doc3
(1 row)

Oops! The first two documents have disappeared.

Now let's use pg_dirtyread to look at the table:

# create extension pg_dirtyread;
CREATE EXTENSION

# select * from pg_dirtyread('foo') t(id int, t text);
 id │  t
────┼──────
  1 │ Doc1
  2 │ Doc2
  3 │ Doc3

All three documents are still there, just now all of them are visible.

pg_dirtyread can also show PostgreSQL's system colums with the row location and visibility information. For the first two documents, xmax is set, which means the row has been deleted:

# select * from pg_dirtyread('foo') t(ctid tid, xmin xid, xmax xid, id int, t text);
 ctid  │ xmin │ xmax │ id │  t
───────┼──────┼──────┼────┼──────
 (0,1) │ 1577 │ 1580 │  1 │ Doc1
 (0,2) │ 1578 │ 1580 │  2 │ Doc2
 (0,3) │ 1579 │    0 │  3 │ Doc3
(3 rows)

I always had plans to extend pg_dirtyread to include some "undelete" command to make deleted rows reappear, but never got around to trying that. But rows can already be restored by using the output of pg_dirtyread itself:

# insert into foo select * from pg_dirtyread('foo') t(id int, t text) where id = 1;

This is not a true "undelete", though - it just inserts new rows from the data read from the table.

pg_surgery

Enter pg

[...]
On 27th of October 2021, Amit Kapila committed patch: Allow publishing the tables of schema.   A new option "FOR ALL TABLES IN SCHEMA" in Create/Alter Publication allows one or more schemas to be specified, whose tables are selected by the publisher for sending the data to the subscriber.   The new syntax allows specifying … Continue reading "Waiting for PostgreSQL 15 – Allow publishing the tables of schema."

This post explains how to install PostgreSQL on WSL2 for Windows, apply the necessary changes to PostgreSQL settings, and access the database from the Windows host. Even though this knowledge can be found in different bits and pieces spread out all over the internet, I want to compile a short and straightforward how-to article. I want you to be able to complete all the steps from scratch, without having to skip all over the place.

Why do I need PostgreSQL on WSL2?

Even though there is a strong feeling that a true programmer uses Linux in their work, this statement is not really close to the truth. At least, according to this Stack Overflow survey 2021:

What is the primary operating system in which you work?

There are a ton of reasons why a developer might want to use WSL2 with PostgreSQL onboard, but let’s name a few:

  • psql is the standard tool for learning and working with PostgreSQL. However, there are some limiting issues under Windows, e.g., the lack of tab completion, issues with encoding, etc. Running psql under WSL2 will provide you with a smoother experience.
  • It’s a good idea to test and debug your application in a remote environment rather than on a local host. That way, you can immediately find issues with client authentication, or with connection settings. Since WSL2 is a standalone virtual machine under the hood, using it might be the easiest way to achieve this.
  • WSL2 will provide the environment for advanced developers to build and test different PostgreSQL extensions not available in binary form or created exclusively for Linux, e.g., pg_squeeze, pg_show_plans, pg_crash, pg_partman, etc.

Install WSL2

To install WSL2 from PowerShell or the Windows Command Prompt, just run:

PS> wsl --install

From the manual:

  • This command will enable the required optional components, download the latest Linux kernel, set WSL2 as your default, and install a Ubuntu distribution for you by default.
  • The first time you launch a newly installed Linux distribution, a console window will open and you’ll be asked to wai
[...]
Posted by Álvaro Hernández in OnGres on 2021-11-15 at 19:05

Easily Running Babelfish for PostgreSQL on Kubernetes

TL;DR

Babelfish for PostgreSQL (“Babelfish” in short) is an open source project created by Amazon AWS that adds SQL Server compatibility on top of Postgres. It was made public a couple of weeks ago both as a managed service and as an open source project. Using the latter involves as of today compiling it from source, which requires some effort and expertise. To contribute a better user’s experience, the upcoming StackGres 1.1.0 release has added Babelfish, making it trivially easy to run Babelfish on Kubernetes.

If, for any reason, you don’t have a Kubernetes cluster handy; and/or you are not familiar with Kubernetes, please jump first to the Appendix in the last section, where I show how to get up & running a lightweight Kubernetes cluster in 1 minute.

Babelfish for PostgreSQL

Around one year ago, Amazon surprised us all by announcing Babelfish for PostgreSQL, a project that would bring a SQL Server compatibility layer on top of Postgres. Babelfish would become both an AWS managed service (on Aurora); as well as an open source project! I then blogged about it, knowing that this was a disruption point for Postgres. Babelfish enables Postgres to reach out to many other use cases, users and Communities: the SQL Server ecosystem.

Adding yet another capability to Postgres reflects on the thoughts shared by Stephen O’Grady on a recent post, A Return to the General Purpose Database. Postgres is not only a feature-full relational database; but with its extensions, it’s also a time-series database; a sharded database; a graph database; and now, also a SQL Server-compatible database. Postgres is, and will be, the unifying database for almost every imaginable database workload.

At StackGres we were almost literally eating our finger’s nails, waiting for Babelfish to be finally published as open source. It finally happened a couple of weeks ago. Since then, our team has been working tirelessly to give you an easy way to run Babelfish on Kub

[...]
Posted by Franck Pachot in Yugabyte on 2021-11-15 at 18:24

In IT, like in math, a negative test can prove that an assertion is wrong. But a positive test doesn't prove that the assertion is right. It is worse in IT because the algorithm may show an apparently correct result with a probablilty that is higher than just being lucky:

postgres=# create table demo (n int primary key);
CREATE TABLE

postgres=# insert into demo  
           select n from generate_series(1,1e6) n;
INSERT 0 1000000

postgres=# select n from demo limit 10;

 n
----
  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
(10 rows)

With this example, you may think that a select returns the rows ordered. This is wrong. The SQL language is declarative. Without an ORDER BY, the result is random. The apparently sorted result here is just a side effect of inserting into heap tables by appending at the end of the file. And reading the file from beginning to end with one thread. The rows are displayed as they are fetched.

When I insert the rows in another order:

postgres=# drop table if exists demo;
DROP TABLE
postgres=# create table demo (n int primary key);
CREATE TABLE
postgres=# insert into demo  select 1e6-n from generate_series(1,1e6) n;
INSERT 0 1000000
postgres=# select n from demo limit 10;
   n
--------
 999999
 999998
 999997
 999996
 999995
 999994
 999993
 999992
 999991
 999990
(10 rows)

they come as they were stored. This is typical of a serial SeqScan:

postgres=# explain select n from demo limit 10;
                             QUERY PLAN
-------------------------------------------------------------
 Limit  (cost=0.00..0.14 rows=10 width=4)
   ->  Seq Scan on demo  (cost=0.00..14425.00 rows=1000000 width=4)
(2 rows)

You cannot rely on any of this behavior. With another execution plan, this order may change:

postgres=# set enable_seqscan=false;
SET
postgres=# select n from demo limit 10;
 n
--------
 0
 1
 2
 3
 4
 5
 6
 7
 8
 9
(10 rows)

postgres=# explain select n from demo limit 10;

                                        QUERY PLAN
---------------------------------
[...]
Posted by Andreas 'ads' Scherbaum on 2021-11-15 at 14:00
PostgreSQL Person of the Week Interview with Louise Grandjonc: I’m Louise, I come from France and recently moved to Vancouver (Canada, not Washington, I was not aware there was another Vancouver before announcing I was moving, and several US people asked) to work at Crunchy Data as a senior software engineer.
Posted by Gilles Darold in MigOps on 2021-11-15 at 13:56

While migrating from other databases like Oracle to PostgreSQL, we may come across some functionalities that may be only available with a complex work-around. At MigOps, we have migrated many customers from Oracle to PostgreSQL and I have personally seen many scenarios where direct equivalent is not possible. One of them was while working with regular expressions. There are many regexp functions that are currently supported by PostgreSQL. However, there are some regexp functions that are currently missing. For this reason, I have contributed to a patch and with that, PostgreSQL 15 will include some more regexp functions. In this article, I am going to talk about new regexp functions that will be seen in PostgreSQL 15.

Currently existing regexp functions until PostgreSQL 14

Historically in PostgreSQL, there were a bunch of functions supporting POSIX regular expressions. Actually the total count was six as seen in the following section :

  • substring ( string text FROM pattern text ) => text
    Extracts the first substring matching POSIX regular expression
  • regexp_match ( string text, pattern text [, flags text ] ) => text[]
    Returns captured substrings resulting from the first match of a POSIX regular expression to the string
  • regexp_matches ( string text, pattern text [, flags text ] ) => setof text[]
    Returns captured substrings resulting from the first match of a POSIX regular expression to the string, or multiple matches if the g flag is used
  • regexp_replace ( string text, pattern text, replacement text [, flags text ] ) => text
    Replaces substrings resulting from the first match of a POSIX regular expression, or multiple substring matches if the g flag is used
  • regexp_split_to_array ( string text, pattern text [, flags text ] ) => text[]
    Splits string using a POSIX regular expression as the delimiter, producing an array of results
  • regexp_split_to_table ( string text, pattern text [, flags text ] ) => setof text
    Splits string usin
[...]
Posted by Frits Hoogland in Yugabyte on 2021-11-15 at 13:39

This blogpost is about linux memory management, and specifically about the question that has been asked about probably any operating system throughout history: how much free memory do I need to consider it to be healthy?

To start off, a reference to a starwars quote: 'This is not the free memory you're looking for'.

What this quote means to say is that whilst the free memory statistic obviously shows free memory, what you are actually looking for is the amount of memory that can be used for memory allocations on the system. On linux, this is not the same as free memory.

Of course free memory is directly available, actually free memory, which can be directly used by a process that needs free memory (a free page). A free page is produced by the kernel page daemon (commonly named 'swapper'), or by a process that explicitly frees memory.

The linux operating system frees a low amount of memory, for the reason to make use of memory as optimal as it can. One of the many optimizations in linux is to use memory for a purpose, such as storing an earlier read page from disk. It doesn't make sense to free all used memory right after usage, such as a page read from disk. In fact, radically cleaning all used pages after use would eliminate the (disk) page cache in linux.

In linux, there are no settings for dedicating a memory area as disk cache, instead it essentially takes all non-used memory and keeps the data in it available as long as there isn't another, better purpose for the page.

However, there must be some sort of indicator that tells you how much memory the operating system can use. I just said that free memory is not that indicator. So what is that indicator? Since this commit there is the notion of 'available memory' (unsurprisingly, 'MemAvailable' in /proc/meminfo). This is the amount of memory that could be used if memory is needed by any process. So, if you want to know how much memory can be used on a linux system, available memory is the statistic to look for, and not free memory.

[...]

In this three-part series on generating sample time-series data, we demonstrate how to use the built-in PostgreSQL function, generate_series(), to more easily create large sets of data to help test various workloads, database features, or just to create fun samples.

In part 1 of the series, we reviewed how generate_series() works, including the ability to join multiple series into a larger table of time-series data - through a feature known as a CROSS (or Cartesian) JOIN. We ended the first post by showing you how to quickly calculate the number of rows a query will produce and modify the parameters for generate_series() to fine-tune the size and shape of the data.

However, there was one problem with the data we could produce at the end of the first post. The data that we were able to generate was very basic and not very realistic. Without more effort, using functions like random() to generate values doesn't provide much control over precisely what numbers are produced, so the data still feels more fake than we might want.

This second post will demonstrate a few ways to create more realistic-looking data beyond a column or two of random decimal values. Read on for more.

In the coming weeks, part 3 of this blog series will add one final tool to the mix - combining the data formatting techniques below with additional equations and relational data to shape your sample time-series output into something that more closely resembles real-life applications.

By the end of this series, you'll be ready to test almost any feature that TimescaleDB offers and create quick datasets for your testing and demos!

A brief review of generate_series()

In the first post, we demonstrated how generate_series() (a Set Returning Function) could quickly create a data set based on a range of numeric values or dates. The generated data is essentially an in-memory table that can quickly create large sets of sample data.

-- create a series of values, 1 through 5, incrementing by 1
SELECT * FROM generate_series(1,
[...]

We just released pg_auto_failover version 1.6.3 on GitHub, and the binary packages should be already available at the usual PGDG and CitusData places, both for debian based distributions and RPM based distributions too.

This article is an introduction to the pg_auto_failover project: we answer the Five W questions, starting with why does the project exist in the first place?

TL;DR pg_auto_failover is an awesome project. It fills the gap between “Postgres is awesome, makes developping my application so much easier, it solves so many problems for me!” and the next step “so, how do I run Postgres in Production?”. If you’re not sure how to bridge that gap yourself, how to deploy your first production system with automated failover, then pg_auto_failover is for you. It is simple to use, user friendly, and well documented. Star it on the pg_auto_failover GitHub repository and get started today. Consider contributing to the project, it is fully Open Source, and you are welcome to join us.

Buckle up, our guide tour is starting now!

Posted by Frits Hoogland in Yugabyte on 2021-11-10 at 15:21

This blogpost is about a connectionpool that is lesser known than pgbouncer, which is pgagroal. Both are socalled 'external connectionpools', which mean they can serve application/user connections but are not part of an application.

They also serve the same function, which is to serve as a proxy between clients and applications on one side, and to a postgres instance on the other side. In that position, the first obvious advantage is that it can perform as an edge service, concentrating connections from one network, and proxy the requests to the database in a non-exposed network.

Another advantage is that the client/application side connections are decoupled from the database side connections, and therefore can serve badly behaving applications (which create and destroy connections to a database repeatedly) by linking the database connection request to an already setup database connection, instead of initializing and destroying a connection.

CentOS/RHEL/Alma/Rocky/enz. 8 only

Pgagroal is EL version 8 only, because its build scripts check minimal required versions. When you try to build pgagroal on CentOS 7, it will error with the message:

CMake Error at CMakeLists.txt:1 (cmake_minimum_required):
  CMake 3.14.0 or higher is required.  You are running version 2.8.12.2

'EL' is a general naming for all Linux distributions that take RedHat's Enterprise distribution as a basis.

Installation

However, when you are on EL version 8, you can use the postgres yum repository to install pgagroal in a very simple way. There is no need to download the source and compile it yourself.

  1. Add the EL 8 postgres yum repositories:

    sudo dnf install -y https://download.postgresql.org/pub/repos/yum/reporpms/EL-8-x86_64/pgdg-redhat-repo-latest.noarch.rpm
    
  2. Install pgagroal:

    sudo dnf -y install pgagroal
    
  3. Add the pgagroal user:

    sudo useradd pgagroal
    

Now you're set, and can use the pgagroal by starting pgagroal via systemd:

[...]

Did you know that PostgreSQL 12 introduced a way for you to provide multifactor (aka "two-factor") authentication to your database?

This comes from the ability to set clientcert=verify-full as part of your pg_hba.conf file, which manages how clients can authenticate to PostgreSQL. When you specify clientcert=verify-full, PostgreSQL requires a connecting client to provide a certificate that is valid against its certificate authority (CA) and the certificate's common name (CN) matches the username the client is authenticating as. This is similar to using the cert method of authentication.

Where does the second factor come in? You can add clientcert=verify-full to another authentication method, such as the password-based scram-sha-256. When you do this, your client has to provide both a valid certificate AND password. Cool!

If you have a public key infrastructure (PKI) set up, you effectively have a single-sign on system for your PostgreSQL databases. You can then treat the password for the user in a local database as a "second factor" for logging in. Again, cool!

Let's put this all together, and see how we can deploy a multifactor single sign-on (SSO) authentication system for Postgres on Kubernetes using cert-manager and PGO, the open source Postgres Operator from Crunchy Data!

Pushing PostgreSQL solutions to my own repositories.

My Perl Weekly Challenge Solutions in PostgreSQL

Starting back at Perl Weekly Challenge 136, I decided to try to implement, whenever possible (to me), the challenges not only in Raku (i.e., Perl 6), but also in PostgreSQL (either pure SQL or plpgsql).

Recently, I modified my sync script that drags solutions from the official Perl Weekly Challenge repository to my own repositories, and of course, I added a way to synchronized PostgreSQL solutions.

The solutions are now available on GitHub under the PWC directory of my PostgreSQL examples repository.

Posted by Andreas 'ads' Scherbaum on 2021-11-08 at 14:00
PostgreSQL Person of the Week Interview with John Naylor: I was born in Oklahoma, USA, but have been nomadic since 2017. During that time I’ve mostly lived in Southeast Asia, but in 2020-21 I lived in Barbados and now the Dominican Republic. The “thing that need not be named” slowed me down but didn’t stop me. I haven’t commuted to an office since 2011, and I’ve always had a fascination for foreign cultures, so it was natural that I’d end up doing this.
Posted by Luca Ferrari on 2021-11-08 at 00:00

USB sticks I found in the attic…

PostgreSQL USB Sticks in the Attic!

TLDR: this is not a technical post!

Cleaning the attic, I found a couple of old PostgreSQL USB Sticks.
It happened that, back at the Italian PostgreSQL Day (PGDay.IT) 2012, we (at the time I was an happy memeber of ITPUG) created PostgreSQL-branded USB sticks to give away as gadgets to participants.
The USB stick was cool, with soft rubber envelope, a clear white and blue elephant logo on its sides, the size of 4 GB (that back then, it was quite common) and a necklace.
However, it had something that I didn’t like.
So, when I was the ITPUG president back in 2013, I decided to change the design of the USB stick (as well as doubling its size).
Let’s inspect the differences, and please apologize if the sticks printing is not clear anymore, but well, some years have gone by:





The upper stick is the 2012 edition, the lower one is the 2013 edition.
Do you spot the difference?
Yes, the 2013 edition USB stick did have the PostgreSQL logo on one side and the ITPUG logo on the other side, while the 2012 edition did not have any reference to the organizing and local user group ITPUG!

When I decided to give a new spark to the ITPUG, I also decided to improve its visibility via such gadgets, that were too much generic and, for this reason, also re-usable in other events as PostgreSQL related gadgets.

Therefore, such gadget was both presenting PostgreSQL and the italian users’ group, no shame at all!

Posted by Franck Pachot in Yugabyte on 2021-11-07 at 22:30

A recent tweet by Nikolay Samokhvalov, draws attention to the importance of understanding database structures:


Andy Pavlo also raised the point that people are suggesting many radical changes, without understanding proper indexing:

In this post, I'll take this example to explain that thinking about indexes should not be too difficult. And, anyway, this work must be done before scaling to a distributed database. The problem was encountered in PostgreSQL. I'll run my demo on YugabyteDB to show that the concepts are the same on a distributed database. The SQL is the same.

James Long's approach was go

[...]
Posted by Haki Benita on 2021-11-07 at 22:00

In 2006 Microsoft conducted a customer survey to find what new features users want in new versions of Microsoft Office. To their surprise, more than 90% of what users asked for already existed, they just didn't know about it. To address the "discoverability" issue, they came up with the "Ribbon UI" that we know from Microsoft Office products today.

Office is not unique in this sense. Most of us are not aware of all the features in tools we use on a daily basis, especially if it's big and extensive like PostgreSQL. With PostgreSQL 14 released just a few weeks ago, what a better opportunity to shed a light on some lesser known features that already exist in PostgreSQL, but you may not know.

In this article I present lesser known features of PostgreSQL.

<small>Illustration by <a href="https://www.instagram.com/_wrightdesign/">Eleanor Wright</a></small>
Illustration by Eleanor Wright
Table of Contents

Get the Number of Updated and Inserted Rows in an Upsert

INSERT ON CONFLICT, also known as "merge" (in O

[...]