PostgreSQL
The world's most advanced open source database
Top posters
Number of posts in the past two months
Top teams
Number of posts in the past two months
Feeds
Planet
  • Policy for being listed on Planet PostgreSQL.
  • Add your blog to Planet PostgreSQL.
  • List of all subscribed blogs.
  • Manage your registration.
Contact
  • Get in touch with the Planet PostgreSQL administrators at planet at postgresql.org.
Posted by Shaun M. Thomas in EDB on 2022-01-28 at 14:00
Given Postgres’ growing reputation as a leading and preferred relational database system among enterprises and hobbyists alike, it has garnered attention within the database tooling space as well. Some of these tools aim to fill a niche for what they perceive as missing Postgres functionality, while others proclaim Postgres compatibility with their pre-existing software. There are essentially dozens of these in the realm of logical replication technology alone, so why not explore a few of them, and see how they compare to native Postgres capabilities? [Continue reading...]
On 28th of January 2022, Peter Eisentraut committed patch: Add HEADER support to COPY text format   The COPY CSV format supports the HEADER option to output a header line. This patch adds the same option to the default text format. On input, the HEADER option causes the first line to be skipped, same as … Continue reading "Waiting for PostgreSQL 15 – Add HEADER support to COPY text format"

Christophe Pettus posted an interesting challenge here. Two strings which on the surface look equal, but if you ask PostgreSQL to compare them, they are not equal.

Now let me start with a note: Twitter totally screws this challenge up.

How so? Although the two strings are different in the original, when posting this to Twitter the strings are made equal. Where is the fun in that?

I asked Christophe for the original query:

INSERT INTO t VALUES (E'Zo\u0065\u0301', E'Zo\u00e9');

And you end up with the following texts in the table:

SELECT * FROM t;
  a  |  b  
-----+-----
 Zoé | Zoé
(1 row)

If you translate the UTF-8 strings into hex, you get "0x5a 0x6f 0x65 0xcc 0x81" and "0x5a 0x6f 0xc3 0xa9". Clearly they are different.

However if you convert the two strings from the Tweet, you get "0x5a 0x6f 0xc3 0xa9" and "0x5a 0x6f 0xc3 0xa9". Same string. Poor Twitter.

Checking the hex values was actually one of my first ideas when I spotted this challenge. But nevertheless based on my experience from my "Data Types in PostgreSQL" and "Advanced Data Types in PostgreSQL" talks, I figured it should be possible to "solve" this puzzle even if the strings are in fact equal.

Buckle up! We are about to dive deep into how extendible PostgreSQL really is!

 

Continue reading "Not so equal texts in PostgreSQL - and how to compare texts in a more elaborate way"
Posted by Franck Pachot in Yugabyte on 2022-01-26 at 16:12

Anything I do, especially when it can change execution plans, should have its counterpart to revert back. In PostgreSQL we have an ANALYZE that gathers the statistics used by the query planner, but nothing to remove them. The idea is probably that, anyway, the automatic statistic gathering will put them back. But I still want to be able to control, at least for testing.

In YugabyteDB we use the PostgreSQL query planner but with additions to be cluster-aware, because it is a distributed SQL database. This is still in development. In the current version (2.11) ANALYZE is still a beta feature:

yugabyte=# select version();
                                                   version
-------------------------------------------------------------------------------------------------------------
 PostgreSQL 11.2-YB-2.11.2.0-b0 on x86_64-pc-linux-gnu, compiled by gcc (Homebrew gcc 5.5.0_4) 5.5.0, 64-bit
(1 row)

yugabyte=# analyze my_table;
WARNING:  'analyze' is a beta feature!
HINT:  Set 'ysql_beta_features' yb-tserver gflag to true to suppress the warning for all beta features.
ANALYZE

I can easily check the column statistics (histograms) from pg_stats. For example here is how I check the columns that have a lot of values in the histograms:

select cardinality(most_common_vals),* 
from pg_stats 
where tablename='my_table'
order by cardinality(most_common_vals) desc nulls last;

The default of 100 max can be changed with set default_statistics_target=100; but that's not the point. pg_stats is a view. The table behind it is pg_statistic. Then to delete the columns statistics gathered, here is an example:

delete from pg_statistic  
where starelid in (
select c.oid from pg_class c 
join pg_namespace n on n.oid = c.relnamespace
and relname in ('my_table' )
);

Note that on YugabyteDB, modifying the catalog is protected and you will get:

ERROR:  Illegal state: Transaction for catalog table write operation 'pg_statistic' not found

You need to explicitely allow it with:

set yb_non_ddl_t
[...]
PostgreSQL Migrate to SCRAM

Recently, a few PostgreSQL users reported that they got connection failures after switching to PostgreSQL 14.

Why do I get the error FATAL:  password authentication failed for a user in the new server?” has become one of the most intriguing questions.

At least in one case, it was a bit of a surprise that the application message was as follows:

FATAL: Connection to database failed: connection to server at “localhost” (::1), port 5432 failed: fe_sendauth: no password supplied

The reason for these errors is the defaults for password encryption are changed in new versions of PostgreSQL to SCRAM authentication. Even though the last one appears nothing directly related to SCRAM, oh yes, some post-installation script failed which was looking for “md5”.

SCRAM authentication is not something new in PostgreSQL. It was there from PostgreSQL 10 onwards but never affected DBA life in general because that has never been the default. It was an opt-in feature by explicitly changing the default settings. Those who do an opt-in generally understand and do it intentionally, and it’s never been known to cause any problem. The PostgreSQL community was reluctant to make it a prime method for years because many of the client/application libraries were not ready for SCRAM authentication.

But that is changing in PostgreSQL 14. With PostgreSQL 9.6 going out of support, the landscape is changing. Now we expect all old client libraries to get upgraded and SCRAM authentication is becoming the prime password authentication method. But, those who are completely unaware are going to be greeted with a surprise one day or another. The purpose of this post is to create a quick awareness for those who are not yet, and address some of the commonly asked questions.

What is SCRAM Authentication?

In simple words, the database client and the server prove and convince each other that they know the password without exchanging the password or the password hash. Yes, it is possible by doing a Salted Challenge and Respons

[...]
Posted by Aleksei Kozlov in Data Egret on 2022-01-25 at 12:55

Couple of days ago we received a question in our blog about whether it is possible to speed up the VACUUM for large tables once the process is launched. 

In short, once the VACUUM starts working it is not possible to influence its speed.

The main reason for this, is that the majority of resources in this process directed towards disk operations and these are not very speedy. Besides, there is still a need to leave enough resources towards routine database processes.

With that in mind, it is possible to change system preferences that will allow individual VACUUM workers to speed up its performance. 

To do this you need to deploy quite ”aggressive” configurations. This way VACUUM processes (AUTOVACUUM to be exact) will be launched more frequently, but with each launch these processes will require less actions that in turn will reduce overall working time. 

Here is the setup process for AUTOVACUUM parameters:

  1. Allocation of appropriate number of autovacuum_max_workers (the default is 3, which is low, so something like 10 would be more appropriate). 
  2.  Increasing frequency of AUTOVACUUM checks launch (autovacuum_naptime = 1s, which the default is – 1min).
  3. Lowering the threshold that triggers AUTOVACUUM launch (autovacuum_vacuum_threshold = 50, autovacuum_analyze_threshold = 10, autovacuum_vacuum_scale_factor = 0.05, autovacuum_analyze_scale_factor = 0.05).
  4. Lowering the threshold that triggers FREEZE (vacuum_freeze_min_age = 5000000, vacuum_freeze_table_age = 15000000)

Parameter autovacuum_max_workers requires service restart while others can be amended without service disruption.

Using the above parameters will increase the frequency in which AUTOVACUUM visits tables while breaking down its work to smaller parts. Following these changes the launch of VACUUM will require a lower share of workload and will, in turn, speed up the overall command execution. 

In addition, a more “aggressive” parameter values can be set up for the VACUUM command (va

[...]

A community member on the postgis-users mailing list had a question recently:

Posted by Andreas 'ads' Scherbaum on 2022-01-24 at 14:00
PostgreSQL Person of the Week Interview with Flavio Gurgel: Hello, I’m Flavio Gurgel. I was born in Brazil, since 2014 I live in France, in the greater Paris area known as Île de France. I’m a Lead Infrastructure Engineer for leboncoin, a classified ads company group in France.
Posted by Jonathan Katz on 2022-01-23 at 00:00

Typically when discussing having “read-only” connections to a PostgreSQL database, it is in the context of connecting to a replica.

There are a variety of methods available to route connections with known read-only queries (i.e. queries with SELECT statements…that are not calling VOLATILE functions that modify data). This includes connection proxy software like Pgpool-II or framework mechanisms such as Django’s database router.

However, there are situations where you might need to force read-only connections to your primary (read-write) Postgres instance. Some examples include putting your application into a degraded state to perform a database move or upgrade, or allowing an administrator to inspect a system that may be accumulating write-ahead logs that track all changes to the system.

PostgreSQL has a configuration parameter call default_transaction_read_only. Setting default_transaction_read_only globally to on forces all connections to disallow writes to the database. default_transaction_read_only is a reloadable parameter, so you do not need to restart your Postgres instance to use it.

Here is a quick example of how default_transaction_read_only works. First, ensure your system does not have default_transaction_read_only set:

postgres=# SHOW default_transaction_read_only ;

 default_transaction_read_only
-------------------------------
 off
(1 row)

postgres=# CREATE TABLE IF NOT EXISTS abc (id int); INSERT INTO abc VALUES (1) RETURNING id;
CREATE TABLE

 id
----
 1
(1 row)

INSERT 0 1

This works as expected: we’re able to create a table and insert data. Now let’s put the system into default_transaction_read_only mode (note that I am running this on PostgreSQL 14)

ALTER SYSTEM SET default_transaction_read_only TO on;
SELECT pg_reload_conf();
SHOW default_transaction_read_only;

Ensure that default_transaction_read_only is enabled:

postgres=# SHOW default_transaction_read_only;

 default_transaction_read_only
-------------------------------
 on
(1 row)

Now verify

[...]
Posted by Gürkan İndibay in CitusData on 2022-01-22 at 17:19

Citus is a great extension for scaling out Postgres databases horizontally. You can use Citus either on the cloud on Azure or you can download Citus open source and install it wherever. In this blog post, we will focus on Citus open source packaging and installation.

When you go to the Citus download page to download the Citus packages—or you visit the Citus open source docs—many of you jump straight to the install instructions and the particular OS you’re looking for. That way, you can get straight to sharding Postgres with Citus.

But what if you want to see which operating systems the Citus packages support? Or what if you want to install Citus with an older version of Postgres?

This post will answer these types of nitty-gritty questions about Citus packages and their usages. Specifically, this post will cover these questions:

Do you have to install Postgres separately before installing Citus?

Citus is a Postgres extension which executes side by side with Postgres. You can get detailed information about Postgres extensions on the PostgreSQL wiki and this Citus blog post about what it means to be a Postgres extension. Additionally, you can get lots more information about Citus in the Citus docs

When you install Citus, you execute two commands as shown on the Citus Downloads page. For example, the commands below are the current installation commands for Debian-based systems. At the time of publication of this blog post, these commands will install Postgres 14 and Citus 10.2. We’re using the Debian-based install

[...]

In the lifecycle of any application, developers are often asked to create proof-of-concept features, test newly released functionality, and visualize data analysis. In many cases, the available test data is stale, not representative of normal usage, or simply doesn't exist for the feature being implemented. In situations like this, knowing how to quickly create sample time-series data with native PostgreSQL and SQL functions is a valuable skill to draw upon!

In this three-part series on generating sample time-series data, we demonstrate how to use the built-in PostgreSQL function, generate_series(), to more easily create large sets of data to help test various workloads, database features, or just to create fun samples.

In part 1 and part 2 of the series, we reviewed how generate_series() works, how to join multiple series using a CROSS JOIN to create large datasets quickly, and finally how to create and use custom PostgreSQL functions as part of the query to generate more realistic values for your dataset. If you haven't used generate_series() much before, we recommend first reading the other two posts. The first one is an intro to the generate_series() function, and the second one shows how to generate more realistic data.

With those skills in hand, you can quickly and easily generate tens of millions of rows of realistic-looking data. Even still, there's one more problem that we hinted at in part 2 - all of our data, regardless of how it's formatted or constrained, is still based on the random() function. This means that over thousands or millions of samples, every device we create data for will likely have the same MAX() and MIN() value, and the distribution of random values over millions of rows for each device generally means that all devices will have similar average values.

This third post demonstrates a few methods for influencing how to create data that mimics a desired shape or trend. Do you need to simulate time-series values that cycle over time? What about demonstrating a counter

[...]
Posted by Elizabeth Garrett Christensen in Crunchy Data on 2022-01-19 at 16:52

 If you’ve read Crunchy blogs recently you probably noticed by now that we’re all big fans of indexing. Indexing is key to optimizing your database workloads and reducing query times. Postgres now supports quite a few types of indexes and knowing the basics is a key part of working with Postgres. 

The role of database indexes is similar to the index section at the back of a book. A database index stores information on where a data row is located in a table so the database doesn't have to scan the entire table for information. When the database has a query to retrieve, it goes to the index first and then uses that information to retrieve the requested data.

Indexes are their own data structures and they’re part of the Postgres data definition language (the DDL). They're stored on disk along with data tables and other objects. 

Posted by Martin Davis in Crunchy Data on 2022-01-18 at 14:53

A common spatial need is to find a polygon that accurately represents a set of points.  The convex hull of the points often does not provide this, since it can enclose large areas which contain no points.  What is required is a non-convex hull, often termed the concave hull.  

The Convex Hull and a Concave Hull of a point set

A concave hull is generally considered to have some or all of the following properties:

  • The hull is a simply connected polygon
  • It contains all the input points
  • The vertices in the hull polygon boundary are all input points
  • The hull may or may not contain holes
For a typical point set there are many polygons which meet these criteria, with varying degrees of concaveness.  Concave Hull algorithms provide a numeric parameter which controls the amount of concaveness in the result.  The nature of this parameter is particularly important, since it affects the ease-of-use in practical scenarios.  Ideally it has the following characteristics:
  • Simple geometric basis: this allows the user to understand the effect of the parameter and aids in determining an effective value
  • Scale-free (dimensionless): this allows a single parameter value to be effective on varying sizes of geometry, which is essential for batch or automated processing
  • Local (as opposed to global):  A local property (such as edge length) gives the algorithm latitude to determine the concave shape of the points. A global property (such as area) over-constrains the possible result. 
  • Monotonic area:  larger (or smaller) values produce a sequence of more concave areas
  • Monotonic containment :the sequence of hulls produced are topologically nested
  • Convex-bounded: an extremal value produces the convex hull

This is a well-studied problem, and many different approaches have been proposed.  Some notable ones are:

[...]

The pgDay Paris 2022 conference schedule has now been posted. Eight great sessions, eight great speakers with diverse perspectives, all in a single-track event so no one has to miss out on any of the presentations.

Space is limited at the event due to COVID-19 restrictions. Tickets can be purchased on the website.

Posted by Robert Haas in EDB on 2022-01-18 at 13:00

If PostgreSQL had the ability to give to a privileged non-superuser the right to administer objects belonging to some designated group of superusers just as if the privileged account were superuser, it would get us much closer to a world in which the database can be effectively administered by a non-superuser. A highly privileged user - let's call him sauron - could be given the right to administer tables, schemas, functions, procedures, and a variety of other objects owned by dependent users witchking and khamul just as if sauron were superuser. sauron might indeed feel himself to be virtually a superuser, at least within his own domain, as long as he didn't spend too much time thinking about the users over which he had not been given administrative rights. However, sauron might notice a few irksome limitations.

Read more »
Posted by Andreas 'ads' Scherbaum on 2022-01-17 at 14:00
PostgreSQL Person of the Week Interview with Kuntal Ghosh: Hi, I am Kuntal, and I live in Bangalore, India. I’ve started exploring PostgreSQL during my Master Thesis, and continued it by joining EDB in India. Currently, I’m working at Amazon on Aurora Postgres.
On 17th of January 2022, Michael Paquier committed patch: Introduce log_destination=jsonlog   "jsonlog" is a new value that can be added to log_destination to provide logs in the JSON format, with its output written to a file, making it the third type of destination of this kind, after "stderr" and "csvlog". The format is convenient … Continue reading "Waiting for PostgreSQL 15 – Introduce log_destination=jsonlog"
Back in 2018 I wrote why-upgrade.depesz.com – aggregator of changelogs between various versions of Pg. Want to know what you will get when upgrading from 12.1 to 12.9? Here you go. Longer time changes? Like, from 9.5.20 to 14.1? I've got your back. It even has a way to list every change that relates to … Continue reading "Configuration changes across Pg versions"
Posted by Franck Pachot in Yugabyte on 2022-01-14 at 15:08

pg_stat_statements can identify long queries, but are limited when we need to go further: the queries are normalized with values replaced by parameters. And the metrics gathered are cumulative, about time, with min, max, average, stddev, usually over a long time window. It is hard to catch the statements responsible for the activity at peak load, and when caught, not easy to reproduce the execution plan, know the user, or the application that ran it.

pg_stat_activity holds this information and here is small script to sample it. The goal of sampling the running statements is to start the investigation by looking at the top queries. However, they are all different, with different parameters. What I'm doing here is aggregate the ones sharing the same execution plan. For each active queries in pg_stat_activity I run an explain and remove the parts with values (Filter and Index Conditions). Then I take a md5 of it and this will become my "Plan Hash Value".

Sampling function

This installs my sampling function and the table to store the plans:

drop table if exists yb_ash_plans;
drop function if exists yb_ash_sample;

create table if not exists yb_ash_plans(
 plan_hash_value text primary key, 
 samples int, running_for interval,last_plan text,last_pg_stat_activity jsonb
 );
create or replace function yb_ash_sample() returns int as
 $$
declare
 psa record;
 plan_xml text;
 plan_nofilter text;
 plan_md5 text;
 last_running_for interval;
 count_updated int:=0;
begin
  for psa in (select * from pg_stat_activity
   where state='active' and pid!=pg_backend_pid()) loop
   begin 
    --raise notice 'SQL %',psa.query;
    execute format('explain (format xml, costs false, buffers false) %s',psa.query) into plan_xml;
    exception when others then
     if sqlstate not like '42%' then raise warning '% % %',sqlstate,sqlerrm,psa.query; end if;
    end;
   end loop;
   if plan_xml is not null then
    --raise notice 'XML %',plan_xml;
    plan_nofilter:=regexp_replace(plan_xml,'(|||
[...]
Posted by Nikolay Samokhvalov in Postgres.ai on 2022-01-14 at 07:11
Progress bar for Postgres queries – let's dive deeper

Recently, I have read a nice post titled "Query Progress Bar", by Brian Davis. It describes an interesting approach to observing the progress of slow query execution.

At some point, the author mentions:

> Don't use this in prod.

And I agree. The article discusses long-running queries such as SELECTs, UPDATEs, DELETEs, and quite "invasive" methods of progress monitoring. In an OLTP production scenario, in most cases, we should try to limit the duration of such queries, setting statement_timeout to a very low value – such as 30 or even 15 seconds.

Let's dive deeper into the topic of query progress monitoring, and discuss various types of queries, how to monitor their progress, considering production and non-production environments separately.

Continue reading...

Posted by Regina Obe in PostGIS on 2022-01-14 at 00:00

As of PostGIS 3.0, the PostGIS raster support is no longer part of the postgis extension, but instead spun off into a new PostGIS extension called postgis_raster.

The two main reasons for this break were:

  • Raster functionality in PostGIS is fat with over 150 functions and several types. Wading through these extra functions frustrated many who had no use for rasters.

  • Raster gdal dependency is very big and many dreamed of having postgis extension without the big raster dependency.

While repackaging raster as it’s own extension resolved many complaints, it meant a slightly more complicated upgrade process going from PostGIS 2.something to 3.something that even experienced PostGIS users managed to screw up.

I will detail the proper way of upgrading PostGIS raster, from a PostGIS 2.* install to 3.* install.

You can run these steps using psql or pgAdmin or any other PostgreSQL tool you want.

Regardless which version of PostGIS you are coming from, you should install the PostGIS 3.* binaries first.

Continue Reading by clicking title hyperlink ..
Posted by Paul Ramsey in Crunchy Data on 2022-01-13 at 18:23

Last month, just under the wire for a 2021 release, the 3.2 version of PostGIS hit the streets! This new PostGIS also supports the latest 3.10 release of GEOS, which underpins a few of the new features.

PostgreSQL Patroni Logical Replication Slot Failover

Failover of the logical replication slot has always been the pain point while using the logical replication in PostgreSQL. This lack of feature undermined the use of logical replication and acted as one of the biggest deterrents. The stake and impact were so high that many organizations had to discard their plans around logical replication, and it affected many plans for migrations to PostgreSQL. It was painful to see that many had to opt for proprietary/vendor-specific solutions instead.

At Percona, we have written about this in the past: Missing Piece: Failover of the Logical Replication Slot.  In that post, we discussed one of the possible approaches to solve this problem, but there was no reliable mechanism to copy the slot information to a Physical standby and maintain it.

The problem, in nutshell, is: the replication slot will be always maintained on the Primary node. If there is a switchover/failover to promote one of the standby, the new primary won’t have any idea about the replication slot maintained by the previous primary node. This breaks the logical replication from the downstream systems or if a new slot is created, it becomes unsafe to use.

The good news is that Patroni developers and maintainers addressed this problem from Version 2.1.0 and provided a working solution without any invasive methods/extensions. For me, this is a work that deserves a big round of applause from the Patroni community and that is the intention of this blog post and to make sure that a bigger crowd is aware of it.

How to Set it Up

A ready-to-use Patroni package is available from the Percona repository. But you are free to use Patroni from any source.

Basic Configuration

In case you are excited about this and want to try it, the following steps might be helpful.

The entire discussion is about logical replication. So the minimum requirement is to have a wal_level set to “logical”. If the existing Patroni configuration is having wal_level set to “replica” and if you want to use this featu

[...]
Posted by Ronan Dunklau in Aiven on 2022-01-13 at 13:29

Today's post is about a new PostgreSQL feature (coming in PostgreSQL 15) developed by Aiven to solve a reliability problem we sometimes run into with replication. The READ_REPLICATION_SLOT command makes a physical replication client, particularly pg_receivewal, more reliable.

At Aiven's OSPO office, we dedicate our time to the open source products Aiven offers or develops, and we were very happy to collaborate on this feature.

PostgreSQL replication surprises

This story starts with our own PgHoard, a PITR backup tool for PostgreSQL. PgHoard offers several methods to archive the WAL (Write Ahead Log), including pg_receivewal, a small application shipping with PostgreSQL which connects to a PostgreSQL cluster using the physical replication protocol to stream WAL as they are produced, optionally keeping track of the position on the server using a replication slot.

We noticed that we could occasionally lose some WAL when PgHoard is restarted on another machine. This happens because pg_receivewal determines its start position like this:

  • Look at what segments are present in the archive folder, and resume from the latest archived one.
  • If the archive folder is empty, start from the current server LSN position.

The problem is, in our case, since pg_receivewal was started on another server, it didn't have access to the archive folder directly so it resumed from the current LSN (Log Sequence Number) position.

You can observe this in action by launching pg_receivewal, stopping it after some WAL have been archived, then relaunching it with its directory empty:

❯ sudo -u postgres pg_receivewal -D /tmp/wals  -v -s pg_receivewal_slot
pg_receivewal: starting log streaming at 47/E5000000 (timeline 1)
pg_receivewal: finished segment at 47/E6000000 (timeline 1)
pg_receivewal: received interrupt signal, exiting
pg_receivewal: not renaming "0000000100000047000000E6.partial", segment is not complete
pg_receivewal: received interrupt signal, exiting
❯ sudo rm /tmp/wals/* -rf
# Wait for some WA
[...]

I have done a few previous blog posts on who has contributed to PostgreSQL, but I did not do one last year. A couple people mentioned to me that they missed it, so I decided to do one this year, and I decided to gather statistics, using basically the same methodology that I have in the past, for both 2020 and 2021. As always, it is important to remember, first, that many people contribute to the project in ways that these statistics don't capture, and second, that the statistics themselves are prone to mistakes (since there is a bunch of manual work involved) and bias (since each commit is attributed to the first author for lack of knowledge of the relative contributions of the various authors). As usual, I have posted a dump of the database I used to generate this in case anyone wants to check it over for goofs or use it for any other purpose.

I calculate that there were 176 people who were the principal author of at least one PostgreSQL commit in 2020 and 182 such people in 2021. In 2020, 13 people contributed 66% of the lines of new code, and 35 people contributed 90% of the lines of new code. In 2021, these numbers were 14 and 41 respectively. In 2020, there were a total of 2181 commits from 26 committers, and in 2021, there were 2269 commits from 28 committers. In each year, about 5 committers committed about two thirds of the non-self-authored patches, with Tom Lane leading the pack in both years.

Here are the top 35 contributors by lines of new code contributed in 2020. Asterisks indicate non-committers. Note that some of these people are committers now but were not committers at the time.

 #  |          author           | lines | pct_lines | commits 

----+---------------------------+-------+-----------+---------

  1 | Tom Lane                  | 65203 |     25.95 |     436

  2 | Peter Eisentraut          | 28771 |     11.45 |     229

  3 | Paul Jungwirth [*]        | 10723 |      4.27 |       2

  4 | Heikki Linnakangas        |  8293 |      3.30 |      31

  5 | Rober

[...]

If you’ve never done it before, you might be daunted by the idea of giving a conference talk. You know: the work involved, the butterflies, how to make it a good talk and not a boring one, the people who might judge you… And perhaps the hardest bit: choosing a topic others will find interesting.

The last few months I’ve been working on a new event. It’s a free and virtual developer event happening on 12-13 Apr 2022 called Citus Con: An Event for Postgres. Organized by the Postgres and Citus team here at Microsoft Azure, Citus Con is geared toward Postgres users, Azure Database for PostgreSQL customers, and those who use the Citus extension to Postgres (or other PG extensions.)

Like the Postgres devroom at FOSDEM in 2022, Citus Con will be virtual. And while many of us miss the human interaction of in-person events these past years, virtual events do have a silver lining: no jetlag. And the impact on your schedule is not nearly as big. At Citus Con, there will be short 3-hour livestreams in 3 different geographies—Americas, EMEA, and APAC—to make it easy to join from any time zone. Plus on-demand videos you can watch anytime. Talks will be 25 minutes long hence less material to prepare as a speaker. And we’re working to set up a place where y’all can engage with each other during the event, too.

Wearing my talk selection team hat, I’ve been reaching out to people to spread the word about the CFP for Citus Con which closes on Feb 6th. Along the way, this question has come up:

Why give a talk at a Postgres conference?

This post will walk you through the ways you, your team, your project—and especially the Postgres community—can benefit from a talk you give.

First-time speakers are welcome in so many conferences

But what if this is your first conference talk? Or your first talk at a Postgres event? Or your first talk at a Microsoft-organized event?

Many talk selection committees—this is true for PostgreSQL CFPs (including Citus Con) and also loads of Python events—welcome, I

[...]
Standby Cluster With the Percona Distribution for PostgreSQL Operator

A customer recently asked if our Percona Distribution for PostgreSQL Operator supports the deployment of a standby cluster, which they need as part of their Disaster Recovery (DR) strategy. The answer is yes – as long as you are making use of an object storage system for backups, such as AWS S3 or GCP Cloud Storage buckets, that can be accessed by the standby cluster. In a nutshell, it works like this:

  • The primary cluster is configured with pgBackRest to take backups and store them alongside archived WAL files in a remote repository;
  • The standby cluster is built from one of these backups and it is kept in sync with the primary cluster by consuming the WAL files that are copied from the remote repository.

Note that the primary node in the standby cluster is not a streaming replica from any of the nodes in the primary cluster and that it relies on archived WAL files to replicate events. For this reason, this approach cannot be used as a High Availability (HA) solution. Even though the primary use of a standby cluster in this context is DR, it can be also employed for migrations as well.

So, how can we create a standby cluster using the Percona operator? We will show you next. But first, let’s create a primary cluster for our example.

Creating a Primary PostgreSQL Cluster Using the Percona Operator

You will find a detailed procedure on how to deploy a PostgreSQL cluster using the Percona operator in our online documentation. Here we want to highlight the main steps involved, particularly regarding the configuration of object storage, which is a crucial requirement and should better be done during the initial deployment of the cluster. In the following example, we will deploy our clusters using the Google Kubernetes Engine (GKE) but you can find similar instructions for other environments in the previous link.

Considering you have a Google account configured as well as the gcloud (from the Google Cloud SDK suite) and kubectl command-line tools installed, authenticate your

[...]
Posted by Luca Ferrari on 2022-01-11 at 00:00

A couple of different solutions to an interesting problem.

Pentagon numbers

Since a few weeks, I tend to implement some of the tasks for the Perl Weekly Challenge into PostgreSQL specific code. One interesting problem has been this week task 2 of the Challenge 147: finding out a couple of pentagon numbers that have simultaneously a sum and a diff that is another pentagon number.
In this post, I discuss two possible solutions to the task.

What is a Pentagon Number?

A pentagon number is defined as the value of the expression n * ( 3 * n - 1 ) / 2, therefore the pentagon number corresponding to 3 is 12.
The task required to find out a couple of pentagon numbers so that:



P(n1) + P(n2) = P(x)
P(n1) - P(n2) = P(y)



It does not matter what x and y are, but n1 and n2 must be pentagon numbers and both their sum and diff must be pentagon numbers too.

The first approach: a record based function

The first solution I came with was inspired by the solution I provided in Raku, and is quite frankly a kind of record-based approach.
Firs of all, I define an IMMUTABLE function named f_pentagon that computes the pentagon number value starting from a given number, so that f_pentagon( 3 ) returns 12. Why do I need a function? Because I want to implement a table with a stored virtual column to keep track of numbers and their pentagon values.
For that reason, I created a pentagons table with a generic n column that represents the starting value and the p column that represents the computed pentagon value.



CREATE OR REPLACE FUNCTION
f_pentagon( n bigint )
RETURNS bigint
AS
$CODE$
        SELECT ( n * ( 3 * n - 1 ) / 2 );
$CODE$
LANGUAGE sql
IMMUTABLE;


DROP TABLE IF EXISTS pentagons;
CREATE TABLE pentagons
(
        n bigint
        , p bigint GENERATED ALWAYS AS ( f_pentagon( n ) ) STORED
);



INSERT INTO pentagons( n )
SELECT generate_series( 1, 5000 );




I inserted into the table 5000 records because I know, from the Raku solution, that what I’m look

[...]
Posted by Andreas 'ads' Scherbaum on 2022-01-10 at 14:00
PostgreSQL Person of the Week Interview with Emil Shkolnik: I was born in Riga, Latvia and our family moved to Israel in 1990, since then I am here.
Posted by Avinash Vallarapu in MigOps on 2022-01-07 at 21:37

On behalf of the PostgreSQL community and the team at MigOps, we wish all the readers a very Happy New Year - 2022. It is no wonder that PostgreSQL is the only Open Source database with a consistent increase in its adoption every year. The Year 2021 has been another milestone in the PostgreSQL world. Every user of PostgreSQL has either directly or indirectly created a positive impact to the PostgreSQL adoption through - patch submissions or reviews, feature requests, blogs and articles, speaking at PostgreSQL conferences, discussions in postgres mailing lists about proposed features, etc. In this article we are going to see a summary of the year 2021 in the PostgreSQL ecosystem. 

 

What’s being discussed in this Article ? Summary of great things in PostgreSQL in 2021
  1. DB-Engines Rankings - PostgreSQL is again the DBMS of the Year 2021.
  2. PostgreSQL 14 released - Let us see some of its features.
  3. Minor versions released in 2021.
  4. PostgreSQL versions that got EOL in 2021.
  5. Extensions that took birth in 2021.
  6. Security Vulnerabilities that got detected and patched
  7. Ora2Pg includes some more features simplifying Oracle to PostgreSQL migrations.
  8. Core Team report for 2021.
DB-Engines Rankings - PostgreSQL is again the DBMS of the Year 2021

DB-Engines determines the DBMS of the Year by subtracting the popularity score of the current year from the popularity score of the previous year for all the databases. Similarly, MigOps subtracted the popularity score of the databases in Dec, 2020 from the score of Dec, 2021 to find the DBMS of the Year - 2021.

We are proud to announce that PostgreSQL is again the DBMS of the Year - 2021. As per DB-Engines Rankings, PostgreSQL has already been the DBMS of the Year in 2017, 2018 and 2020. It now becomes the only database to win this title 4 times in the past 5 years.

 

PostgreSQL 14 released - Some of its features

The PostgreSQL community has announced the release of PostgreSQL 14 in the year 2021. This release included

[...]