PostgreSQL
The world's most advanced open source database
Top posters
Number of posts in the past two months
Top teams
Number of posts in the past two months
Feeds
Planet
  • Policy for being listed on Planet PostgreSQL.
  • Add your blog to Planet PostgreSQL.
  • List of all subscribed blogs.
  • Manage your registration.
Contact
  • Get in touch with the Planet PostgreSQL administrators at planet at postgresql.org.

1. Overview

PostgreSQL is a great open source project for many reasons. One of the reasons I like it is because of the design of buffer blocks addressing. In this blog, I am going to explain a possible way to share a Primary’s buffer blocks with a Standby. If you want to know more about how buffer tag works, you can refer to my previous blog.

2. Primary and Standby

In Postgres, a Primary is an active database which accepts connections and performs read-write SQL operations; a Standby is a copy of the active database, and it also accepts connection but only for read-only SQL operations. Because the Standby needs to perform the ‘copy’ action in order to synchronize with Primary database, technically they are not exactly equal if a SQL query is time sensitive. Then, the question is can we let a Standby to be equal to a Primary?

The answer is probably!. I recently did some experimental tests to see if a Standby can access Primary’s buffer blocks in order to check the data which has not been replicated yet. The results is very positive.

3. How to share it

To achieve this experimental remote buffer blocks access, here are a few things I did:
On Primary side,

  1. start a simple dedicated TCP/IP server to listen for the buffer tag request from Standby
  2. check if the buffer block in memory for received buffer tag
  3. If found and dirty then sends the 8KB buffer block to Standby
  4. If not found then simply answer not found

On Standby side,

  1. add a simple TCP/IP client for buffer manager
  2. send buffer tag request to Primary when buffer manager needs to load a data block from disk
  3. add 8KB data block to shared buffer if received from Primary
  4. load data block from disk if buffer block not found on Primary
  5. skip buffer tag check to avoid use cashed data blocks

With these basic changes, a Standby can access Primary’s buffer blocks and get the same query results even the data has not been replicated yet (in my experimental test, I actually also skip the head redo

[...]

I recently started using the H3 hex grid extension in Postgres with the goal of making some not-so-fast queries faster. My previous post, Using Uber's H3 hex grid in PostGIS, has an introduction to the H3 extension. The focus in that post, admittedly, is a PostGIS focused view instead of an H3 focused view. This post takes a closer look at using the H3 extension to enhance performance of spatial searches.

The two common spatial query patterns considered in this post are:

  • Nearest neighbor style searches
  • Regional analysis

Setup and Point of Focus

This post uses two tables to examine performance. The following queries add an h3_ix column to the osm.natural_point and osm.building_polygon tables. This approach uses GENERATED columns and adds an index to the column. Going through these steps allow us to remove the need for PostGIS joins at query time for rough distance searches. See my previous post for details about installing the H3 extension and the basics of how it works.

Posted by Jan Karremans in EDB on 2022-06-23 at 18:36

While this post is not going to be about pulsars, black holes or any other form of astrological phenomenon…It still is going to cover one of the most exciting and fundamental shifts in the IT industry today.

A tale of two convergences

This tale is discussing convergence.
To understand the significance of any convergence, it is necessary to understand the lines that are coming together. Please indulge me when I review these.
The challenging bit is the fact that inside both of the lines there are only the very early glimpses of this new era. They are still very much busy with the day-to-day operations in open source or conquering more of their base realm.

Why Postgres is the answer

I migrated myself from Oracle to Postgres! Moving from a steady path as an Oracle ACE to this—at least for me at that time—brand new world of open source data management. If you want to know how and what, I wrote a trilogy about that here, here and here.
After working in the Oracle realm for almost a quarter of a century, moving from Oracle to Postgres truly felt like following a new North Star. It has proven to provide good guidance.
With the phrase still in my head—”horses for courses”—the PostgreSQL Global Development Group does focus exclusively on Postgres. Coming from Oracle, though, it made me wonder who focuses on all the other aspects that vendors create to make their systems work. This “emergence from the red bubble” made me realize that there are much broader challenges, which leads to the “second thread” for convergence, in this story.

I like to think this puts me in a position that allows me to oversee this part of the spectrum (databases and data management) to a certain degree.

Note: if you realize where the data management industry is moving, Postgres is the answer.
Hold that thought!

Brain-breaking challenges

Over the years, much focus has been on infrastructure. Simply because it is expensive, tedious, error-prone and has lots and lots of room for improvement.

  • Infrastructure
[...]

The sheer cleverness of relational databases is often discounted because we so frequently use them for very simple data management tasks.

Serialize an object into a row, store with unique key. yawwwn

Search for unique key, deserialize row into an object. yawwwwwwn

The real power of relational databases is juggling "relations" (aka tables) in large numbers and figuring out on-the-fly the most effective way to filter out rows and find an answer.

PostgreSQL has an undeniably clever query planning system that auto-tunes based on the data in the system. It samples tables to gain statistics about the distribution of data, and uses those statistics to choose the order of joins and filters applied to the data for the most efficient query execution.

Even more amazing, the query planning system is modular enough to integrate user-defined data types, like the geometry and geography types in PostGIS. So complex queries involving spatial filters are also correctly planned on-the-fly.

Statistics Targets

Mostly, this just works magically. The system has a rough idea of the selectivity of any given filter or join condition, and can use that estimate to choose the most efficient order of operations to execute multi-table and multi-filter queries over complex collections of relational tables.

But what about when things go wrong?

By default, a PostgreSQL data ships with a default_statistics_target of 100. This means that to populate the statistics for a column the database will draw a sample of 300 * default_statistics_target = 30000 rows.

The system will then use that data to populate the "common[...]

Posted by Laurenz Albe in Cybertec on 2022-06-23 at 08:00

debugging deadlocks made unnecessary
© Laurenz Albe 2022

Even if you understand what a deadlock is, debugging deadlocks can be tricky. This article shows some techniques on how to figure out the cause of a deadlock.

A simple deadlock example

Setting the stage

We will test our techniques for debugging deadlocks with the following example:

CREATE TABLE parent (
   pid bigint PRIMARY KEY,
   pdata text NOT NULL
);

CREATE TABLE child (
   cid bigint PRIMARY KEY,
   pid bigint REFERENCES parent NOT NULL,
   cdata text NOT NULL
);

INSERT INTO parent VALUES (1, 'no children yet');

The deadlock

To provoke the deadlock, we run the following transactions in parallel. Each transaction adds a row to the child table and then tries to modify the corresponding row in parent.

-- session 1
BEGIN;

INSERT INTO child VALUES (100, 1, 'new child');

   -- session 2
   BEGIN;

   INSERT INTO child VALUES (101, 1, 'another child');

   SELECT pdata FROM parent WHERE pid = 1 FOR UPDATE;
   -- session 2 blocks

-- session 1
SELECT pdata FROM parent WHERE pid = 1 FOR UPDATE;
-- session 1 blocks, but after one second, it receives an error
ERROR:  deadlock detected
DETAIL:  Process 19674 waits for ShareLock on transaction 1109; blocked by process 19718.
Process 19718 waits for ShareLock on transaction 1108; blocked by process 19674.
HINT:  See server log for query details.
CONTEXT:  while locking tuple (0,1) in relation "parent"

   -- session 2 is unblocked and receives a result
       pdata    
   ═════════════
    no children
   (1 row)

How a deadlock appears in the log

The PostgreSQL log reveals more details:

ERROR:  deadlock detected
DETAIL:  Process 19674 waits for ShareLock on transaction 1109; blocked by process 19718.
        Process 19718 waits for ShareLock on transaction 1108; blocked by process 19674.
        Process 19674: SELECT pdata FROM parent WHERE pid = 1 FOR UPDATE;
        Process 19718: SELECT pdata FROM parent WHERE pid = 1 FOR UPDATE;
HINT:  See server log for query details.
CONTEXT:  while locking tuple (0
[...]
Posted by Paul Ramsey in PostGIS on 2022-06-23 at 08:00

I have a blog post up today at Crunchy Data on some of the mechanisms that underlie the PostgreSQL query planner, it’s pretty good if I do say so myself.

I was motivated to write it by a conversation over coffee with my colleague Martin Davis. We were talking about a customer with an odd query plan case and I was explaining how the spatial statistics system worked and he said “you should do that up as a blog post”. And, yeah, I should.

One of the things that is striking as you follow the PostgreSQL development community is the extent to which a fairly mature piece of technology like PostgreSQL is stacks of optimizations on top of optimizations on top of optimizations. Building and executing query plans involves so many different paths of execution, that there’s always a new, niche use case to address and improve.

I worked a political campaign a few years ago as a “data science” staffer, and our main problem was stitching together data from multiple systems to get a holistic view of our data.

That meant doing cross-system joins.

The first cut is always easy: pull a few records out of System A with a filter condition and then go to System B and pull the associated records. But then inevitably a new filter condition shows up and applied to A it generates so many records that the association step on B gets overloaded. But it turns out if I start from B and then associate in A it’s fast again.

And thus suddenly I found myself writing a query planner and executor.

It’s only when dumped into the soup of having to solve these problems yourself that you really appreciate the magic that is a mature relational database system. The idea that PostgreSQL can take a query that involves multiple tables of different sizes, with different join cardinalities, and different indexes and figure out an optimal plan in a few milliseconds, and then execute that plan in a streaming, memory efficient way…?

Magic is really the best word I’ve found.

Posted by Luca Ferrari on 2022-06-23 at 00:00

A trick about queryies that involves function.

Ordinality in fuction queries

The PostgreSQL SELECT statement allows you to query function that return result set (either a SET OF or TABLE), that are used as source of tuple for the query itself.
There is nothing surprising about that!
However, the SELECT statement, when invoked against a function that provides a result set, allows an extra clause to appear: [WITH ORDINALITY](https://www.postgresql.org/docs/14/sql-select.html){:target="_blank"}. This clause adds a column to the result set with a numerator (of type bigint) representing the number of the tuple as got from the function.

Why is this important? Because you don’t need your function to provide by itself a kind of tuple numerator.

WITH ORDINALITY in action

Let’s take a simple example to understand how it works. Let’s create a function that returns a table:



CREATE OR REPLACE FUNCTION animals( l int DEFAULT 5,
                                    animal text DEFAULT 'cat',
                                    owner text DEFAULT 'nobody' )
RETURNS TABLE( pk int, description text, mood text )
AS $CODE$
DECLARE
        i int := 0;
        j int := 0;
BEGIN
        FOR i IN 1 .. l LOOP
            pk          := i;
            description := format( '%s #%s owned by %s', animal, i, owner );
            j           := random() * 100;
            IF j % 2 = 0 THEN
               mood     := 'good';
            ELSE
              mood      := 'bad';
            END IF;

            RAISE DEBUG 'Generating % # % with mood %', animal, i, mood;
            RETURN NEXT;
        END LOOP;
RETURN;

END
$CODE$
LANGUAGE plpgsql;



The above function animals() produced an output with a simple name of the animal (numerated), the index of the generated tuple (i.e., a numerator) and a randomly select mood.
It is clearly easy to test it out:



testdb=> SELECT * FROM animals();
 pk |      description       | mood
----+------------------------+------
  1 | cat #1 owned b
[...]

For folks using PSQL to connect to PG databases, it can be a headache to manage a lot of different DB profile connections. PG makes it a bit easier by organizing db profiles in a file called, .pgpass. It contains one line for each DB Profile like this:

localhost:5432:mydb:myuser:mypassword

This file must reside in the user’s home directory and not have global permissions.

cd ~
touch .pgpass
chmod 600 .pgpass

But it only simplifies having to remember passwords. You still have to use a tedious psql command like this:

psql -h localhost -p 5432 -d mydb -U myuser

If the command matches a line in the .pgpass file, it will connect without you being prompted for a password.

Now comes the  Connection Service File to the rescue. Named .pg_service.conf, it is also like the .pgpass file in that it is located in your home directory. Also, like .pgpass, do the usual:

cd ~
touch .pg_service.conf
chmod 600 .pg_service.conf

The contents of this file are shown below for one profile example:

# Staging database for sales sector
[StagingDatabaseSales]
host=my-host-stage.cluster-whatever
dbname=mydb
port=5432
user=postgres
password=
application_name=psql

Then to connect simply do this:

export PGSERVICE=StagingDatabaseSales
psql

Voila! Life just got a bit simpler. With a lot of profiles, it will be a wee bit difficult to remember them all, so grep to the rescue!

grep '\[' .pg_service.conf
[StagingDatabaseSales]
[another profile]

Obviously,  you have to modify the passwords for all the connection profiles because password= might not work [smile].

Now that we have got connecting using multiple DB Profiles working smoothly, let’s focus on the actual psql session. Sometimes there are things you do repeatedly once connected via psql. Wouldn’t it be nice to simplify that process as well? .psqlrc to the rescue again! Configure it just like the other 2 files:

cd ~
touch .psqlrc
chmod 600 .psqlrc

The example here

[...]
Posted by Andreas Scherbaum on 2022-06-22 at 12:38

Together with Lætitia Avrot and Nikolay Samokhvalov I was invited to participate in a Community Panel (YouTube video) about PostgreSQL Upgradability at Postgres Vision 2022. The panel was moderated by Bruce Momjian and initiated and organized by Jimmy Angelakos. Bruce did talk with each of us before, which helped a lot to guide the discussion in the right direction. The recording of the panel discussion is available on the Postgres Vision website.

During this panel each of us provided examples for how easy or complicated PostgreSQL upgrades still are.
 

 

Continue reading "PostgreSQL Upgrades are hard!"
Posted by Jimmy Angelakos in EDB on 2022-06-21 at 13:25

Last week at the Postgres Vision online conference, I organised a live PostgreSQL community panel talk, in order to discuss the issue of #PostgreSQL upgradability. Many thanks to legendary Postgres figure Bruce Momjian for moderating the conversation, and to the participating panellists:

  • Nikolay Samokhvalov (Founder, Postgres.ai - Database Lab)
  • Lætitia Avrot (Field CTO, EDB)
  • Andreas Scherbaum (Head of Databases, Adjust GmbH)

Video recording of the panel talk at this year's Postgres Vision 👇

It’s time for another #AlwaysBeLaunching week! 🥳🚀✨ In our #AlwaysBeLaunching initiatives, we challenge ourselves to bring you an array of new features and content. Today, we are introducing TimescaleDB 2.7 and the performance boost it brings for aggregate queries. 🔥 Expect more news this week about further performance improvements, developer productivity, SQL, and more. Make sure you follow us on Twitter (@TimescaleDB), so you don’t miss any of it!

Time-series data is the lifeblood of the analytics revolution in nearly every industry today. One of the most difficult challenges for application developers and data scientists is aggregating data efficiently without always having to query billions (or trillions) of raw data rows. Over the years, developers and databases have created numerous ways to solve this problem, usually similar to one of the following options:

  • DIY processes to pre-aggregate data and store it in regular tables. Although this provides a lot of flexibility, particularly with indexing and data retention, it's cumbersome to develop and maintain, particularly deciding how to track and update aggregates with data that arrives late or has been updated in the past.
  • Extract Transform and Load (ETL) process for longer-term analytics. Even today, development teams employ entire groups that specifically manage ETL processes for databases and applications because of the constant overhead of creating and maintaining the perfect process.
  • MATERIALIZED VIEWS. While these VIEWS are flexible and easy to create, they are static snapshots of the aggregated data. Unfortunately, developers need to manage updates using TRIGGERs or CRON-like applications in all current implementations. And in all but a very few databases, all historical data is replaced each time, preventing developers from dropping older raw data to save space and computation resources every time the data is refreshed.

Most developers head down one of these paths because we learn, often the hard way,

[...]
Posted by Andreas Scherbaum on 2022-06-21 at 12:15

As is a good tradition, the PostgreSQL Project participates in Google Summer of Code (GSoC). Last year we submitted 7 projects for 7 students - and got all 7 projects accepted. This year we got quite a few more good proposals from students, and more mentors are helping. Guess what? Google accepted all 12 proposals!

Google modified the program again. For 2021 they cut the time for every project in half, to accommodate for the at-home work during the Covid-19 pandemic. This turned out to be suboptimal, and many larger projects need more time. This year students can choose between “medium” (175 hours) and “large” (350 hours) projects. This gives everyone a chance to scope the project accordingly.

 

 

Continue reading "PostgreSQL Project @ GSoC 2022"
Posted by Paul Ramsey in PostGIS on 2022-06-21 at 08:00

The question of why organizations are shy about their use of open source is an interesting one, and not completely obvious.

Open source luminary Even Roualt asks:

is there some explanation why most institutions can’t communicate about their PostGIS use ? just because it is a major hurdle for technical people to get their public relationship department approve a communication ? people afraid about being billed about unpaid license fees 🤣 ?

There’s really very little upside to publicizing open source use. There’s no open source marketing department to trumpet the brilliance of your decision, or invite you to a conference to give you an award. On the other hand, if you have made the mistake of choosing an open source solution over a well-known proprietary alternative, there is surely a local sales rep who will call your boss to tell them that you have made a big mistake. (You do have a good relationship with your boss, I hope.)

These reverse incentives can get pretty strong. Evendiagram reports:

Our small group inside a large agency uses postgis. We don’t talk about it, even internally, to avoid the C-suite forcing everyone back to oracle. RHEL repos allow us a lot of software that would otherwise be denied.

This reminds me of my years consulting for the British Columbia government, when technical staff would run data processing or even full-on public web sites from PostgreSQL/PostGIS machines under their desktops.

They would tell their management it was “just a test system” or “a caching layer”, really anything other than “it’s a database”, because if they uttered the magic word “database”, the system would be slated for migration into the blessed realm of enterprise Oracle systems, never to be heard from again.

Logos

Meanwhile, Daryl Herzmann reminds us that the Iowa Mesonet has been on Team PostGIS since 2003.

Iowa Environmental Mesonet, Iowa State University

  • Data being managed in the database
    Meteorological Data, “Common” GIS dataset
[...]
Posted by Adam Johnson on 2022-06-20 at 23:00

Slow queries happen, and when they do, it can be tough to dissect why they’re slow. This difficulty is compounded by using Django’s ORM, since it generates the SQL for you, so you may have little idea of the actual queries “under the hood”.

In this post we’ll look at what pgMustard does and how to use it with the Django ORM.

(Disclosure: I know pgMustard co-founder Michael Christofides, but I have not been paid for this post. I just like the product.)

pgMustard… Is This a Mustard?

PostgreSQL has an EXPLAIN statement to obtain a query plan for any given SELECT query. This plan contains structured performance data, and can contain both planned and actual metrics. Query plans are great, but interpreting them requires a broad understanding of PostgreSQL, and knowledge of what to focus on first. I ofetn find understanding query plans overwhelming.

This is where pgMustard enters the picture. It takes a query plan, visualizes it, and provides interpretation of the results. It ranks opportunities for optimization out of five stars (five = most optimizable) and provides detailed information about what you can do next.

For example, here’s one of their demo visualizations:

pgMustard example query visualization, showing a tree diagram with explanation on the left. The main hint is that the query planner estimated nearly 200 times too few rows, with a recommendation to try using ANALYZE on the table to correct the statistics.

This query took 55 seconds to execute. pgMustard’s visualization shows the query plan’s steps in a hierarchical display. The sidebar reveals the why it was slow, with a short explanation showing action points that are expanded on in a blog post.

I like how pgMustard gives you informed optimizations for your queries and tables. It’s like having a knowledgeable DBA on hand.

It Costs… Money!

pgMustard is a paid product, with a free trial that allows you to explain five query plans (no credit card required). This doesn’t include queries with no optimization suggestions.

At current it costs €95 (~$105) a year for a single developer, or €500 (~$525) a year for a team (no limit on members). Both plans allow you to explain an unlimited amount of

[...]
Posted by Frits Hoogland in Yugabyte on 2022-06-20 at 16:09

This blogpost is an introduction into the concept of postgres' TOAST (The Oversized-Attribute Storage Technique), and the way we deal with it in YugabyteDB.

The postgres database requires a row to be stored in a single block. Therefore, the total size of a row must be bound by the size of a postgres block, which by default is 8kB. However, there are situations where 8kB is too little for an entire row, or even a single field in a row. This is what TOAST tries to overcome.

First of all a field (also known as attribute, which is how the postgres catalog as well as the documentation calls it) must be eligible for TOAST. Attribute types that are eligible for TOAST are attribute types such as varchar, bytea, text, json, jsonb. These are types that are capable of handling large amounts of data, which can be larger than a postgres block, and are variable of size.

Postgres implements TOAST by evaluating the attributes (fields) of a table and determine if an attribute can potentially store "oversized" data. If that is the case, a toast table is created alongside the original table. This is indicated by the reltoastrelid field in pg_class, which is the OID of a table called pg_toast_OID in the pg_toast schema.

So far, so good: this is all well known, and documented in the postgres documentation and in lots of blogposts.

Fast forward to YugabyteDB. As probably most of the readers are aware, we (Yugabyte) use the postgres source to build a postgres layer we call YSQL (Yugabyte SQL), which uses most of the postgres source functionality, but uses our distributed storage layer called DocDB for persistence and the foundation of the ACID implementation.

Recently, we came across a case in YugabyteDB where retrieving an attribute/column turned out to be slow/slower than expected, which was proven by changing the SQL to retrieve the same column in way that was only slightly different.

This is how the slow execution looked like with explain select:

yugabyte=# explain analyze select data from t1 wher
[...]
Posted by Andreas 'ads' Scherbaum on 2022-06-20 at 14:00
PostgreSQL Person of the Week Interview with Hervé Schweitzer: My name is Hervé Schweitzer and I live in Courgenay (Canton of Jura, Switzerland), but I’m originally from Alsace (France), where I was born and lived until 2019. I worked as an Oracle DBA (database administrator) from 1997 to 2010. Then, with 4 friends, we decided to start our own company - dbi-services - which is an expertise company in the database/middleware field.
Posted by Paul Ramsey in PostGIS on 2022-06-20 at 08:00

Last week, I wrote that getting large organizations to cop to using PostGIS was a hard lift, despite that fact that, anecdotally, I know that there is massive use of PostGIS in every sector, at every scale of institution.

Simple Clues

Here’s a huge tell that PostGIS is highly in demand: despite the fact that PostGIS is a relatively complex extension to build (it has numerous dependencies) and deploy (the upgrade path between versions can be complex) every single cloud offering of PostgreSQL includes PostGIS.

AWS, Google Cloud, Azure, Crunchy Bridge, Heroku, etc, etc. Also forked not-quite-Postgres things like Aurora and AlloyDB. Also not-Postgres-but-trying things like Cockroach and Yugabyte.

If PostGIS was a niche hobbyist project…? Complete the sentence any way you like.

Logos

True to form, I received a number of private messages from people working in or with major institutions you have heard of, confirming their PostGIS use, and the fact that the institution would not publicly validate it.

However, I also heard from a couple medium sized companies, which seem to be the only institutions willing to talk about how useful they find open source in growing their businesses.

Hailey Eckstrand of Foundry Spatial writes to say:

Foundry Spatial uses PostGIS in development and production. In development we use it as our GIS processing engine and warehouse. We integrate spatial data (often including rasters that have been loaded into PostGIS) into a watershed fabric and process summaries for millions of watersheds across North America. We often use it in production with open source web tooling to return results through an API based on user input. One of our more complex usages is to return raster results within polygons and along networks within a user supplied distance from a click location. We find the ease and power of summarizing and analyzing many spatial datasets with a single SQL query to be flexible, performant, efficient, and&ellps; FUN!

Dian Fay of Understory w

[...]
Posted by Adam Johnson on 2022-06-19 at 23:00

Your PostgreSQL server is seizing up, with some naughty queries consuming too many resources or blocking others. Don’t panic! You can stop those problem queries and stabilize your system.

In this post we’ll look at stopping queries via SQL, techniques for finding problematic queries, and the occasionally useful ability to cancel via operating system tools.

Stopping Queries via SQL in Two Steps

Here’s the basic process to find and stop a query. Note you’ll need to connect as a user with adequate permissions to do so, such as an admin account.

1. Find the pid

PostgreSQL creates one process per connection, and it identifies each process with its operating system process ID, or pid. In order to cancel a query, you need to know the pid for the connection it’s running on.

One way to find this out is with the pg_stat_activity view, which provides information about the live queries. For example, try this query:

SELECT pid, state, backend_start, substr(query, 0, 100) q
FROM pg_stat_activity
WHERE backend_type = 'client backend'
ORDER BY backend_start;

The substr call limits the displayed SQL to 100 characters, to avoid long queries taking up a lot of screen space. The backend_type filter avoids showing background server processes, such as the autovacuum launcher. Sorting by backend_start shows the longest running connections first, which often shows problematic long-running transactions.

Here’s an example of running this on my development server:

stagingi_inventev=# SELECT pid, state, backend_start, substr(query, 0, 100) q
FROM pg_stat_activity
WHERE backend_type = 'client backend'
ORDER BY backend_start;
 pid | state  |         backend_start         |                                                  q
-----+--------+-------------------------------+-----------------------------------------------------------------------------------------------------
  73 | active | 2022-06-17 18:57:15.
[...]

Citus 11.0 is here! Citus is a PostgreSQL extension that adds distributed database superpowers to PostgreSQL. With Citus, you can create tables that are transparently distributed or replicated across a cluster of PostgreSQL nodes. Citus 11.0 is a new major release, which means that it comes with some very exciting new features that enable new levels of scalability.

The biggest enhancement in Citus 11.0 is that you can now always run distributed queries from any node in the cluster because the schema & metadata are automatically synchronized. We already shared some of the details in the Citus 11.0 beta blog post, but we also have big surprise for those of you who use Citus open source that was not part of the initial beta.

When we do a new Citus release, we usually release 2 versions: The open source version and the enterprise release which includes a few extra features. However, there will be only one version of Citus 11.0, because everything in the Citus extension is now fully open source!

That means that you can now rebalance shards without blocking writes, manage roles across the cluster, isolate tenants to their own shards, and more. All this comes on top of the already massive enhancement in Citus 11.0: You can query your Citus cluster from any node, creating a truly distributed PostgreSQL experience.

In this blog post we will cover the highlights of:

If you want to know everything that’s new, you can check out the Updates page for Citus 11.0, which contains a detailed breakdown of all the new features and other improvements.

Remaining Citus Enterprise features are now open source

Long ago, Citus Data was an enterprise software company. Over time our team’s focus shifted towards open-source, becoming a cloud vendor, and then becoming an integral part of Azure. With the new focus, our team has developed all n

[...]

When Disco was the hot new music, the people who built database management systems based their praxis around the then-correct assumption that hardware was extremely expensive, and hence that procuring things like more storage, let alone more machines on which they would run, could be counted on to require significant capital resources. This could only happen on the time scale of quarters, even if it was deemed worthwhile to fight the fights needed to procure them.

The above conditions informed the practice of hardening single nodes against failure without adding significant hardware resources in the process. This practice continues, and it's a good thing to have.

However.

Over the past few decades, hardware resources have been dropping precipitously in cost even when acquired via capital expenditure, and tax (read subsidy here) regimes have been altered to put a massive thumb on the scale to favor operational expenditures. This in turn has driven technologies that made it easy to make those operational expenditures, and to make it possible to make such expenditures in small increments.

Now, doubling a node's storage or lighting a powerful new node and having it working in production can and does happen on the scale of minutes, not least because the first chunk of money spent in order to do so is well within the discretion of an individual developer, and a junior one at that.

Topic Transition!

That's a lot of paragraphs on history and political economy, and you may be wondering how they're pertinent to the matter at hand. Wonder no more, as we've gotten there.

Now that it's cheap and easy to light even a database node, generally a pretty expensive type of node to light, we can use techniques to approach problems that would have been untenably profligate in the Disco era. Among these are replication.

It's not all sweetness and light, though. The proliferation of new nodes has created new problems. When you switch the database to that hot new hardware, lots of new systems c

[...]
Posted by Takamichi Osumi in Fujitsu on 2022-06-16 at 01:11

In this blog post I conclude my discussion on the new mechanisms available in PostgreSQL to handle logical replication conflicts by explaining how to use the ALTER SUBSCRIPTION SKIP command.

 

comic about pattern matching

© Renée Albe 2022

Case-insensitive search is a much-requested feature, partly (I suspect) to maintain compatibility with Microsoft SQL Server. There are several solutions to the problem, one of which is to use case-insensitive ICU collations. This works like a charm, except if you want to perform pattern matching. So let’s have a closer look at the problem and at possible solutions.

Alternatives for case-insensitive search

There are three known solutions to case-insensitive search in PostgreSQL:

Explicit conversion with lower() or upper()

A query that uses this method would look as follows:

SELECT id, col FROM tab
WHERE lower(col) = lower('search string');

This can be made fast with a B-tree index on lower(col), but has two disadvantages:

  • the solution is implemented at the application level, that is, you have to custom-tailor the query for case-insensitive search
  • if the database column contains long values, the whole value has to be converted to lower case, even if only a few characters have to be compared, which leads to bad performance

Using the citext extension

The extension citext provides a data type citext, which stands for “case-insensitive text”. The table is defined as

CREATE TABLE tab (
   col citext,
   ...
);

and the query is as simple as

SELECT id, col FROM tab
WHERE col = 'search string';

That is simple and convenient, but has some disadvantages as well:

  • there is no data type civarchar, so you can only implement that with a check constraint
  • performance for longer values can also be bad, because citext internally calls lower(col COLLATE "default") before comparing the values
  • regular expression matching is not case insensitive, and you have to use the case insensitive operator ~* explicitly

Using case-insensitive ICU collations

If you are using PostgreSQL v12 or better, and PostgreSQL was configured --with-icu, you can define a case-insensitive collation like this:

CREATE COLLATION english_ci (
   PROVID
[...]
Posted by Joshua Drake in CommandPrompt on 2022-06-14 at 17:43

In 2017 I wrote an article titled, “Where is the Postgres community?” The article was a summary of many of the external communities where Postgres people interact. I wrote it for two reasons:

  1. To show that the Postgres community is far broader than Postgresql.org mailing lists
  2. To bring visibility to the external communities in hopes that people will join them

Now that we are post-pandemic and it is five years later, I am curious as to where the communities are now. I ask because I recently started paying attention to pgsql-general@postgresql.org again. The pgsql-general mailing list used to be a vibrant collaboration and support channel. I am sad to report that the mailing list is all but a ghost town with only 15 messages sent in 4 days. In its hay day the list would easily pull thousands of messages a month. The sparse nature of collaboration on this list isn’t surprising and has been the topic of many in the community. People just don’t use email as a canonical source for collaboration anymore.

Where are we now?

In the 2017 article we referenced to /r/postgresql which at the time had 5,100 members. It now has 25.5k. In the same time period as pgsql-general, it generated 175 messages among community members (20 posts, 155 responses). Similarly, we mentioned the Slack channel which at the time had 1100+ members. It now hosts 18.3k subscribers with similar activity of the subreddit. The People, Postgres, Data Discord, which did not exist in 2017, has 3,579 members and is quite active over its 28 channels. The listed collaboration venues don’t take into account the thousands of members among the international or associated (Brazil, Russia, TimescaleDB, Yugabyte, NeonDB, etc…) Postgres communities.

Postgres community, where art thou?

Despite these numbers, the size of the communities and the activity mentioned above is barren in comparison to the mailing list’s hay day. I have to wonder: where did everyone go? Is this caused by consolidation of many of the Postgres companies? Is it

[...]

Why migrate to pg_timetable from pg_cron?

There are a lot of reasons you may want to migrate from pg_cron:

  • maybe you need to run a scheduler on a platform pg_cron doesn’t support;
  • or you want to run several parallel schedulers simultaneously;
  • it’s possible you hit the bugs in pg_cron that haven’t been fixed for ages (#64, #96);
  • or you are limited by the debugging facilities pg_cron provides;
  • it might be that you think the pg_cron project is obsolete and deprecated after its acquisition by Microsoft;
  • or maybe you need to implement a complex chain of tasks instead of a simple SQL statement.

There are many reasons, actually. Most of them may be spotted by taking a look at the PostgreSQL schedulers comparison table I’ve introduced in my previous post and the supported platform table from the official readme.

Prerequisites

  • You have installed the pg_cron extension in your database.
  • You’ve scheduled pg_cron jobs to export.
  • You have created the pg_timetable schema by running the pg_timetable against this database at least once. The schema will be created automatically. It doesn’t matter if you already added some chains or not.

Straightforward solution

If you want to quickly export jobs scheduled from pg_cron to pg_timetable, you can use this SQL snippet:

SELECT timetable.add_job(
    job_name            => COALESCE(jobname, 'job: ' || command),
    job_schedule        => schedule,
    job_command         => command,
    job_kind            => 'SQL',
    job_live            => active
) FROM cron.job;

The timetable.add_job(), however, has some limitations. First, the function will mark the task created as autonomous, specifying that the scheduler should execute the task out of the chain transaction. It’s not an error, but many autonomous chains may cause some extra connections to be used.

Secondly, database connection parameters are lost for source pg_cron jobs, making all jobs local.

The proper pg_cron to pg_timetable migration script

[...]
Posted by Bruce Momjian in EDB on 2022-06-13 at 20:00

Postgres issues major and minor releases regularly, as outlined on the Postgres website. Figuring out what changes are part of each upgrade can be confusing. Let's look at how to find the relevant Postgres changes for simple and complex upgrades:

  1. Upgrade from one recent minor release to a later minor release with the same major version number: Suppose you are running Postgres 13.4 and want to upgrade to 13.7 — well, that's easy — just read the minor release notes for each of the minor versions that span your old and new versions.
  2. Upgrade from the most recent minor release to a newly released major version: Suppose you are running the most recent release of Postgres, 13.7, and Postgres 14 is released — well, that's pretty easy. The Postgres 14 major release notes contain all the changes between Postgres 13.7 and 14.
  3. Upgrade from a non-newest minor release to a newly released major version: Suppose you are running a non-newest version of Postgres, 13.5, and Postgres 14 is released. You should read the release notes of all 13.x minor releases after 13.5, and the Postgres 14 major release notes. Reading the 13.x minor release notes is necessary because changes made after 13.5 will be part of Postgres 14 but not listed in the Postgres 14 release notes.
  4. Upgrade from a non-newest minor release to a minor release of the next released major version: Suppose you are running Postgres 13.5 and want to upgrade to 14.3. You should read the release notes of all 13.x minor releases after 13.5, the Postgres 14 major release notes, and the minor release notes from 14.1 to 14.3.
  5. Upgrade from a non-newest minor release to a minor release of a much later major version: Suppose you are running Postgres 11.7 and want to upgrade to 14.3. You should read the release notes of all 11.x minor releases after 11.7, the Postgres 12-14 major release notes, and the minor release notes from 14.1 to 14.3. However, you should also scan the 12.x and 13.x minor release notes because they might contain fixes to features adde
[...]
Posted by Akhil Reddy Banappagari in MigOps on 2022-06-13 at 16:58

While migrating from Oracle to PostgreSQL, the important fact to consider is that they both are 2 different database engines. The major mistake made by organizations during the course of migration is assuming that the behavior of both Oracle and PostgreSQL are same.  One of such areas with differences between Oracle vs PostgreSQL is while working with Transaction control statements. While migrating PL/SQL procedures and functions from Oracle to PL/pgSQL in PostgreSQL, we need to understand in detail about how transaction control statements work. By the end of this article, you would have clarity about Oracle vs PostgreSQL - Transaction control statements.

Oracle vs PostgreSQL Transaction Control Statements
BEGIN and END

Let us start with an important fact that, BEGIN and END in PL/SQL and PL/pgSQL are merely syntactic elements, and have nothing to do with transactions. Most of the time, we get confused with the similarly-named SQL commands for transaction control. The BEGIN that starts a block in PL/SQL and PL/pgSQL is different from the SQL statement BEGIN that starts a transaction. BEGIN/END here are used for grouping purposes only but they do not start or end a transaction. Functions and trigger procedures are always run as part of a transaction started by an outer query. They can't start or commit that transaction because there's no context in which they have to run.

Consider the following PL/pgSQL code in PostgreSQL as an example.

CREATE OR REPLACE PROCEDURE test_proc ()
AS $$
DECLARE
BEGIN
RAISE NOTICE 'current transaction id: %', txid_current();
END;
$$
LANGUAGE PLPGSQL;

DO $$
BEGIN
RAISE NOTICE 'current transaction id: %', txid_current();
CALL test_proc ();
RAISE NOTICE 'current transaction id: %', txid_current();
END;
$$;

Following is the Output from the above anonymous block.


NOTICE: current transaction id: 17788
NOTICE: current transaction id: 17788
NOTICE: current transaction id: 17788

We can conclude from the above output that BEGIN and END do not start or end transactions. The same is with PL/SQL in Oracle. 

N
[...]
Posted by Christopher Winslett in Crunchy Data on 2022-06-13 at 15:00

TL;DR: keep your data-layer simple, you'll need the agility it offers when budgets get stretched.

Over the past 12 years of helping people run databases for their businesses, I have discovered that there are only 2 financial phases in the lifecycle of a database:

  • Phase 1: Absolute costs are low, thus marginal costs do not matter.
  • Phase 2: Absolute costs are high, thus marginal costs are all that matter.

I'm calling this difference between Phase 1 and Phase 2 the "database financial gap."

The first hint there is a problem comes from questions about optimizations. Developers & DBAs have questions about indexes, RAM, connection management, schema design, sharding, and many more technical ideas. Many think they are solving technical problems, but are solving a money problem.

Why is it a money problem? If money is not the problem, then throw hardware at the problem. Yet, because they seek to optimize, it indicates they are choosing not to throw hardware at the problem.

Phase 1: Absolute costs are low

This is the honeymoon phase of running a database. Costs of running a database are insignificant1 for the business.

The situation in this phase is:

  • Small number of application & background processes
  • Data sizes are small
  • Peak per-second queries & writes are in the low double digits
  • Hardware overpowers the use-case
  • A single-node can handle the usage

Mistakes during this phase are not terminal. Common mistakes include:

  • Queries can run without indexes.
  • Misoptimization of data-structures are insignificant2.a
  • Table-scans are not hammering the iOPS because the number of rows is relatively constrained.
  • Connection pooling does not matter because there are limited application processes.
  • ORMs have not led you astray with sub-optimal queries2.b

During this phase, [...]

Posted by Andreas 'ads' Scherbaum on 2022-06-13 at 14:00
PostgreSQL Person of the Week Interview with Vigneshwaran C: My name is Vigneshwaran C, and I live in Bangalore, Karnataka, India. I work as part of the Postgres open-source team in Fujitsu.
Posted by Paul Ramsey in PostGIS on 2022-06-13 at 08:00

The question of “who uses PostGIS” or “how big is PostGIS” or “how real is PostGIS” is one that we have been wrestling with literally since the first public release back in 2001.

There is no doubt that institutional acceptance is the currency of … more institutional acceptance.

Oroboros

So naturally, we would love to have a page of logos of our major users, but unfortunately those users do not self-identify.

As an open source project PostGIS has a very tenuous grasp at best on who the institutional users are, and things have actually gotten worse over time.

Originally, we were a source-only project and the source was hosted on one web server we controlled, so we could literally read the logs and see institutional users. At the time mailing lists were the only source of project communication, so we could look at the list participants, and get a feel from that.

All that’s gone now. Most users get their PostGIS pre-installed by their cloud provider, or pre-built from a package repository.

So what do we know?

IGN

In the early days, I collected use cases from users I identified on the mailing list. My favourite was our first major institutional adopter, the Institut Géographique National, the national mapping agency of France.

IGN

In 2005, they decided to move from a desktop GIS paradigm for their nation-wide basemap (of 150M features), to a database-centric architecture. They ran a bake-off of Oracle, DB2 and PostgreSQL (I wonder who got PostgreSQL into the list) and determined that all the options were similar in performance and functionality for their uses. So they chose the open source one. To my knowledge IGN is to this day a major user of PostgreSQL / PostGIS.

GlobeXplorer

Though long-gone as a brand, it’s possible the image management system that was built by GlobeXplorer in the early 2000’s is still spinning away in the bowels of Maxar.

MAXAR

GlobeXplorer was both one of the first major throughput use cases we learned about, and also the first one where we kn

[...]
Posted by Andreas Scherbaum on 2022-06-11 at 18:08

The one thing I like to do is challenge PostgreSQL - and readers - to the extreme. A while ago I posted about the Beer o'clock. Building Zoé was another interesting example. And at some point I blog about Dancing with the Stars as well. Today it's all about nothing. Empty. Where is all the data gone?

A while ago I posted this screenshot on Twitter, and as far as I can tell no one found the correct answer.

The screenshot looks like there's nothing. Empty names, schema name, role name, database name, even no data. And yet this all comes together and somehow works. Let's dig in.

 

Continue reading "My PostgreSQL database is empty!"