Latest Blog Posts

Helping PostgreSQL® professionals with AI-assisted performance recommendations
Posted by Francesco Tisiot in Aiven on 2024-05-28 at 15:00

Since the beginning of my journey into the data world I've been keen on making professionals better at their data job. In the previous years that took the shape of creating materials in several different forms that could help people understand, use, and avoid mistakes on their data tool of choice. But now there's much more into it: a trusted AI solution to help data professional in their day to day optimization job.

From content to tooling

The big advantage of the content creation approach is the 1-N effect: the material, once created and updated, can serve a multitude of people interested in the same technology or facing the same problem. You write an article once, and it gets found and adopted by a vast amount of professionals.

The limit of content tho, it's that it is an extra resource, that people need to find and read elsewhere. While this is useful, it forces a context switch, moving people away from the problem they are facing. Here is where tooling helps, providing assistance in the same IDE that professionals are using for their day to day work.

Tooling for database professionals

I have the luxury of working for Aiven which provides professionals an integrated platform for all their data needs. In the last three years I witnessed the growth of the platform and its evolution with the clear objective to make it better usable at scale. Tooling like integrations, Terraform providers and the Console facilitate the work that platform administrators have to perform on daily basis.

But what about the day to day work of developers and CloudOps teams? This was facilitated when dealing with administrative tasks like backups, creation of read only replicas or upgrades, but the day to day work of optimizing the workloads was still completely on their hands.

Using trusted AI to optimize database workloads

This, however, is now changing. With the recent launch of Aiven AI Database Optimizer we are able to help both developer and CloudOps in their day to day optimization work!

Aive

[...]

Why do I have a slow COMMIT in PostgreSQL?
Posted by Laurenz Albe in Cybertec on 2024-05-28 at 05:42
© Laurenz Albe 2024

Sometimes one of our customers looks at the most time consuming statements in a database (either with pg_stat_statements or with pgBadger) and finds COMMIT in the high ranks. Normally, COMMIT is a very fast statement in PostgreSQL, so that is worth investigating. In this article, I will explore the possible reasons for a slow COMMIT and discuss what you can do about it.

The basic COMMIT activity in PostgreSQL

A slow COMMIT is a surprising observation, because committing a transaction is a very simple activity in PostgreSQL. In most cases, all a COMMIT has to do is

  • set the two bits for the transaction in the commit log to TRANSACTION_STATUS_COMMITTED (0b01) (persisted in pg_xact)
  • if track_commit_timestamp is set to on, record the commit timestamp (persisted in pg_commit_ts)
  • flush the write-ahead log (WAL) (persisted in pg_wal) to disk, unless synchronous_commit is set to off

Note that because of the multi-versioning architecture of PostgreSQL, both COMMIT and ROLLBACK are normally very fast operations: they both don't have to touch the tables, they only register the status of the transaction in the commit log.

The most frequent reason for slow COMMIT: disk problems

From the above it is clear that a potential cause of slowness is disk I/O. After all, flushing the WAL to disk causes I/O requests. So the first thing you should check is if the disk has a problem or is under load:

  • On Linux, you can use commands like “vmstat 1” or “sar -p 1” to measure the percentage of CPU time spent waiting for I/O (“wa” in vmstat and “%iowait” in sar). If that value is consistently above 10, you can be pretty certain that the I/O system is under stress.
  • With NAS, you should check if the TCP network is overloaded.
  • If the storage is a shared SAN or NAS, the disks may be shared wth other machines, and you should check if there is contention on the storage system.
  • Failing disks, other hardware problems or operating system problems can lead to in
[...]

Code Conversion Chronicles – Trigger Order of processing in Oracle to PostgreSQL Migration.
Posted by Deepak Mahto on 2024-05-27 at 17:02

Database triggers allow encapsulation of multiple functionalities that are automatically invoked on specific events or DML operations like INSERT, UPDATE, or DELETE. The invocation of triggers can be controlled as BEFORE or AFTER the triggering events, either for each change or per statement. In migrating from Oracle to PostgreSQL, it is important to be aware of triggers conversion gotchas.

In this blog, we discuss the default order of processing when multiple triggers are defined on the same table for the same events with the same characteristics. We will explore how this works in Oracle, how to alter the order, and how it operates in PostgreSQL.

Oracle – Trigger Order of Execution

We are defining a sample Oracle table that prints a message to understand the default order of processing.

--Tested on Oracle 19c
create table trigger_sample 
(
col1 integer
);

CREATE OR REPLACE TRIGGER trigger0
BEFORE INSERT OR UPDATE OR DELETE ON trigger_sample
FOR EACH ROW
follows trigger1
BEGIN
    DBMS_OUTPUT.PUT_LINE('RUNNING - trigger0 ');
END;
/

CREATE OR REPLACE TRIGGER trigger1 
BEFORE INSERT OR UPDATE OR DELETE ON trigger_sample
FOR EACH ROW
BEGIN
    DBMS_OUTPUT.PUT_LINE('RUNNING - trigger1 ');
END;
/

By default, Oracle follows the order of trigger creation time and invokes triggers on similar events based on this order of creation time.

In Oracle, we can also influence the order by using the FOLLOWS clause to define which existing trigger to follow as part of the execution order. In our sample, we will alter trigger0 to follow trigger1.

CREATE OR REPLACE TRIGGER trigger0
BEFORE INSERT OR UPDATE OR DELETE ON trigger_sample
FOR EACH ROW
follows trigger1
BEGIN
    DBMS_OUTPUT.PUT_LINE('RUNNING - trigger0 ');
END;
/

PostgreSQL – Trigger Order of Execution

Migration offers ample opportunities to gain a deeper understanding of both the source and target databases. It’s not uncommon to encounter similar triggering events on the same tables, as functionalities often span acro

[...]

Custom PostgreSQL extensions with Rust
Posted by Radim Marek on 2024-05-24 at 00:00

This article explores the pgrx framework, which simplifies the creation of custom PostgreSQL extensions to bring more logic closer to your database. Traditionally, writing such extensions required familiarity with C and a deep understanding of PostgreSQL internals, which could be quite challenging. pgrx lowers the barrier and allows developers to use Rust, known for its safety and performance, making the process of creating efficient and safe database extensions much more accessible.

pg_sysload

When working with large datasets and migrations (as discussed in How Not to Change PostgreSQL Column Type), or during resource-intensive maintenance tasks, you'll want to optimise speed and minimise disruption to other processes. One way to control the pace of batch operations is to consider the load on the underlying system.

Many Unix-based systems (we will focus on Linux) provide a valuable metric called the system load average. This average consists of three values: the 1-minute, 5-minute, and 15-minute load averages. The load average is not normalised for the number of CPU cores, so a load average of 1 on a single-core system means full utilisation, while on a quad-core system, it indicates 25% utilisation.

In many cases, the system load average is also an excellent indicator of how ongoing operations are impacting a busy database cluster. In this article, we will create a PostgreSQL extension with a function called sys_loadavg() that retrieves this load information. We will use the /proc/loadavg file (part of the proc filesystem), which exposes underlying system details.

Getting Started with pgrx

Before we start, ensure you have:

With these prerequisites in place, you can install pgrx itself and create a new extension skeleton:

cargo install --locked cargo-pgrx
cargo pgrx new pg_sysload
cd pg_sysload

This gives you a complete environment for developing your own PostgreSQL extensions in R

[...]

Converting DMS to PostGIS Point Geometry
Posted by Elizabeth Garrett Christensen in Crunchy Data on 2024-05-23 at 17:00

I love taking random spatial data and turning it into maps. Any location data can be put into PostGIS in a matter of minutes. Often when I’m working with data that humans collected, like historic locations or things that have not yet traditionally been done with computational data, I’ll find traditional Degrees, Minutes, Seconds (DMS) data. To get this into PostGIS and QGIS, you’ll need to convert this data to a different system for decimal degrees. There’s probably proprietary tools that will do this for you, but we can easily write our own code to do it. Let’s walk through a quick example today.

Let’s say I found myself with a list of coordinates, that look like this:

38°58′17″N 95°14′05″W

(this is the location of my town’s haunted hotel 👻)

This format of writing geographic coordinates is called DMS, Degrees, Minutes, Seconds (DMS). If you remember from 4th grade geography lessons, that is the latitude on the left there, representing N or S of the equator and longitude East or West of the Prime Meridian.

WKT & XY coordinates

PostGIS, and most computational spatial systems, work with a geographic system that is akin to an XY grid of the entire planet. Because it is XY, it is a longitude, latitude (X first) system.

postgis on xy globe

PostGIS utilizes with two kinds of geometry values:

  • WKT (Well-known text) where a point would look like this POINT(-126.4 45.32)
  • WKB (Well-known binary) where a point would look like this 0101000000000000000000F03F000000000000F03

Most often you’ll see the binary used to represent stored data and you can use a function, st_astext, to view or query it as text.

Converting coordinates to decimal degrees

To convert our traditional coordinates into decimals or WKT, we can use decimal math like this:

({long_degree}+({long_minutes}/60)+({long_seconds}/3600)

So for our location:

-- starting location
38°58′17″N 95°14′05″W

-- formula
38+(58/60)+(17/3600), 95+(14/60)+(05/3600)

-- switch the order since this is X first
-- make the Western quad negativ
[...]

Optimizing Performance in PostgreSQL: Join Column and ANY Filters
Posted by Deepak Mahto on 2024-05-23 at 07:54

Recently on the Postgres Slack, I encountered an interesting performance issue involving a SQL query that joins two tables with an ANY filter applied to one of the tables.

The problematic SQL was similar to the following:

SELECT
    tbl1.col1
FROM
    tbl1
    INNER JOIN tbl2 ON tbl1.col1 = tbl2.col1
WHERE
    tbl2.col1 IN (1,2,3);

Table tbl1 is joined with tbl2 on the column col1 from their respective tables.

A filter condition is applied to tbl2 on the same joining column col1 i.e join with tbl1. Let’s check the underlying problematic execution plan with mock tables.

--Tested on PostgreSQL 16.3
create table tbl1 as
select col1, col1::text as col2 , col1*0.999 as col3 
from generate_series(1,100) as col1;

create table tbl2 as
select col1, col1::text as col2 , col1*0.999 as col3 
from generate_series(1,10) as col1;

explain (analyze, buffers)
SELECT
    tbl1.col1
FROM
    tbl1
    INNER JOIN tbl2 ON tbl1.col1 = tbl2.col1
WHERE
    tbl2.col1 = ANY (ARRAY[1,2,3]);

Execution plan .

QUERY PLAN
----------------------------------------------
 Hash Join  (cost=1.18..3.58 rows=3 width=4) (actual time=1.354..1.404 rows=3 loops=1)
   Hash Cond: (tbl1.col1 = tbl2.col1)
   Buffers: shared hit=2
   ->  Seq Scan on tbl1  (cost=0.00..2.00 rows=100 width=4) (actual time=0.687..0.705 rows=100 loops=1)
         Buffers: shared hit=1
   ->  Hash  (cost=1.14..1.14 rows=3 width=4) (actual time=0.586..0.586 rows=3 loops=1)
         Buckets: 1024  Batches: 1  Memory Usage: 9kB
         Buffers: shared hit=1
         ->  Seq Scan on tbl2  (cost=0.00..1.14 rows=3 width=4) (actual time=0.034..0.041 rows=3 loops=1)
               Filter: (col1 = ANY ('{1,2,3}'::integer[]))
               Rows Removed by Filter: 7
               Buffers: shared hit=1
 Planning Time: 3.131 ms
 Execution Time: 2.049 ms
(14 rows)

Key Observations from Problematic Execution Plan

  • ANY (‘{1,2,3}’::integer[]) Filter is not pushed to acces path for tbl1 as filter, though it is joined with same column on which filter
[...]

Carbon Footprint and Query Optimization
Posted by Henrietta Dombrovskaya on 2024-05-23 at 03:20

I recently learned about this project, and I can’t stop thinking about it, and how it is strangely connected to what I was doing for a big portion of my professional life.

First of all, what is this project about? Since people rarely click the links, let me copy a project description here:

Cloud Carbon Footprint is an open source tool that provides visibility and tooling to measure, monitor and reduce your cloud carbon emissions. We use best practice methodologies to convert cloud utilization into estimated energy usage and carbon emissions, producing metrics and carbon savings estimates that can be shared with employees, investors, and other stakeholders.

What exactly contributes to the carbon footprint? The operational costs consist of Cloud provider usage, Cloud energy usage, power usage effectiveness, and grid emission factors. In other words, if you use more CPU to make the same calculations, your CCF will be higher. If you perform more IO operations to obtain the same data, your CCF will be higher. At this point, it should be clear how this is related to what I am doing. When I optimize queries or improve application data access patterns, my goal is to avoid “killing it with iron,” in other words, to avoid solving problems by adding more hardware. The KIWI approach is not scalable :).

Quite often, developers (and especially their managers) claim that it is faster and cheaper to throw more hardware into a problem rather than rewrite the application code. Now, we have one more argument against that approach: inefficient queries increase carbon footprint!

Right now, I am unsure of how I could possibly contribute to that project, but it looks like in addition to two existing optimization goals – decreasing query execution time and increasing system throughput, we should consider the third one: minimizing the carbon footprint.

Mini Summit Six
Posted by David Wheeler in Tembo on 2024-05-22 at 21:56

Last week, a few members of the community got together for for the sixth and final Postgres Extension Ecosystem Mini-Summit. Follow these links for the video and slides:

Or suffer through my interpolation of YouTube’s auto-generated transcript, interspersed with chat activity, if you are so inclined.

Introduction

  • I opened the meeting, welcomed everyone, and introduced myself as host. I explained that today I’d give a brief presentation on the list of issues I I’ve dreamed up and jotted down over the last couple mini-summits as possible potential topics to take on at the Summit in Vancouver on May 28th.

Presentation

  • These are things that I’ve written down as I’ve been thinking through the whole architecture myself, but also that come up in these Summits. I’m thinking that we could get some sense of the topics that we want to actually cover at the summit. There is room for about 45 people, and I assume we’ll break up “unconference style” into four or five working groups. People an move to corners, hallways, or outdoors to discuss specific topics.

  • Recall the first mini-summit I showed a list of things that of potential topics that might come up as we think through what’s issues in the ecosystem. I left off with the prompt “What’s important to you?” We hope to surface the most important issues to address at the summit and create a hierarchy. To that end, I’ve created this Canva board1 following Open Space Technology2 to set things up, with the rules and an explanation for how it workjs.

  • I expect one of us (organizers) to give a brief introduction at the start of the summit to outline the principles of Open Space Technology, which are similar to unconferences.

  • Open Space Tech

[...]

Default Sort Order in Db2, SQL Server, Oracle & Postgres 17
Posted by Jeremy Schneider in Amazon RDS on 2024-05-22 at 17:28

TLDR: I was starting to think that the best choice of default DB collation (for sort order, comparison, etc) in Postgres might be ICU. But after spending some time reviewing the landscape, I now think that code-point order is the best default DB collation – mirroring Db2 and Oracle – and linguistic sorting can be used via SQL when it’s actually needed for the application logic. In existing versions of Postgres, this would be something like C or C.UTF-8 and Postgres 17 will add the builtin collation provider (more details at the bottom of this article). This ensures that the system catalogs always use code-point collation, and it is a similar conclusion to what Daniel Vérité seems to propose in his March 13 blog, “Using binary-sorted indexes”. I like the suggestion he closed his blog with: SELECT ... FROM ... ORDER BY colname COLLATE "unicode" – when you need natural language sort order.

I spent some time reading documentation, experimenting, and talking to others in order to learn more about the general landscape of collation and SQL databases. It’s safe to say that every SQL database that’s been around for more than a hot minute has 🍿 fun collation quirks. (Another reason you shouldn’t write your own database… rediscovering all of this for yourself.)


Next week at PGConf.dev in Vancouver, Jeff Davis (and I) will be talking about collation and Postgres. If you’ll be at the conference then be sure to stop by and listen!

Wednesday May 29 at 2:30pm in the Canfor room (1600) – “Collations from A to Z” – https://www.pgevents.ca/events/pgconfdev2024/schedule/session/95-collations-from-a-to-z/


Db2

I asked Josh Tiefenbach – a friend who previously worked in Db2 development – and he’s helped me better understand the picture here. First off: Db2 will format your dates and numbers according to the client’s localization environment. I heard a funny story about an IBM engineer whose programs were randomly breaking because of comparison mismatches on dates. It wasn’t critical enough to merit

[...]

How Tembo Cloud stores Prometheus metrics data in PostgreSQL
Posted by Steven Miller in Tembo on 2024-05-22 at 16:45
This post contained content that could not be rendered in the Atom feed. Please use the official post link: https://tembo.io/blog/prometheus-data-in-postgresql

Waiting for PostgreSQL 17 – MERGE / SPLIT partitions
Posted by Hubert 'depesz' Lubaczewski on 2024-05-22 at 11:41
I thought about it for quite some time, whether I should write about it, and how. That's why there is delay since: On 6th of April 2024, Alexander Korotkov committed patch: Implement ALTER TABLE ... MERGE PARTITIONS ... command   This new DDL command merges several partitions into the one partition of the target table. … Continue reading "Waiting for PostgreSQL 17 – MERGE / SPLIT partitions"

16 Years After The Launch of 2ndQuadrant Italy: Remembering Simon Riggs
Posted by Gabriele Bartolini in EDB on 2024-05-21 at 13:18

Sixteen years ago, on May 21, 2008, we proudly announced the opening of the Italian branch of 2ndQuadrant, marking a milestone in PostgreSQL consulting and training in Europe. Simon Riggs, the visionary founder of 2ndQuadrant, placed immense trust in our small team from Prato, Tuscany. Tragically, Simon passed away on March 26, 2024, leaving a profound void. This article is my small tribute to Simon, his extraordinary leadership, and his invaluable contributions to the PostgreSQL community. It reflects on our journey, the challenges we faced, and the innovative spirit Simon instilled in us. Despite his passing, Simon’s legacy lives on through the values, principles, and examples he set, continuing to inspire and guide us.

New compilers, new timings
Posted by Peter Eisentraut in EDB on 2024-05-21 at 04:00

I had been eagerly awaiting the releases of Clang 18 and GCC 14 for the last few weeks. Now that they are both out, I figured I would check how fast they are for compiling PostgreSQL.

(The actual reasons why I was awaiting these compiler releases are related to the warning option -Wmissing-variable-declarations. GCC 14 now supports this option and PostgreSQL might want to use it. On the Clang side, the new release of LLVM enables a new release of IWYU that handles global variable declarations correctly.)

For the speed tests, I’m reusing the methodology from my previous articles about this last year (here and here). The host machine has been upgraded over time in various ways, so the timings are not necessarily comparable across these articles. The tests are done on tag REL_16_3 this time.

OS Compiler time meson compile -j1
macOS 14 gcc-14 4:57.73
  gcc-13 4:45.41
  gcc-12 4:30.17
  clang-18 2:18.10
  clang-17 4:22.19
  clang-16 4:00.87
  clang-15 2:56.41
  Apple clang (15) 2:18.48

Observations:

  • GCC is getting progressively slower with every release, continuing the trend already noticeable last year.

  • Clang had some trouble with versions 16 and 17, but it turned out that this was a compiler bug triggered by a specific file in PostgreSQL. This was apparently fixed in version 18,

[...]

🎙️ Ship It Podcast — PostgreSQL with Andrew Atkinson
Posted by Andrew Atkinson on 2024-05-21 at 00:00

Recently I joined Justin Garrison and Autumn Nash for episode “FROM guests SELECT Andrew” of Ship It, a Changelog podcast.

We had a great conversation! I made bullet point notes from the episode, and added extra details.

Let’s get into it!

PostgreSQL Community

  • Autumn shared that she met Henrietta Dombrovskaya, who is an author, DBA, and who organizes Chicago PostgreSQL meetup and the PgDay Chicago conference. This was fun to hear since Henrietta has become a friend. Check out my 2023 coverage of PgDay Chicago.
  • Justin wasn’t familiar with the Postgres community. I was glad to share that the community events I’ve attended and people I’ve met at them, have been great!
  • Justin talked about career goals of trading off less money, for greater happiness. Autumn talked about how remote work provides the opportunity to get to know neighbors and your local community.
  • We talked about the longevity of PostgreSQL as an open source project, and the benefits of not being lead by a single big entity hat might be quarterly-profit oriented. Hopefully there’s no license “rug pull” in the future. Core team member Jonathan Katz wrote about this topic in Will PostgreSQL ever change its license?
  • I shared that PostgreSQL leaders, contributors, and committers attend community events, and it’s been fun to meet some of them.
  • Autumn mentioned Henrietta helped give away tickets to Milspouse Coders (Military Spouse Coders) for PgDay Chicago and that was greatly appreciated.
  • Autumn appreciated the explicit goal to bring more women to the Postgres community and PgDay Chicago event.
  • Autumn shared how seeing women at Postgres events (Check out a list of PostgreSQL community events) is important. Representation matters.
  • I shared some prominent women in the Postgres community I’ve met: Melanie Plageman, recently by becoming a core committer to PostgreSQL, Lætitia Avrot, Karen Jex, Elizabeth Garret Christensen, Stacey Haysler, Chelsea Dole, Selena Flannery, Ifat Ribon, Gabrielle Roth, are a f
[...]

PostgreSQL 17: part 4 or CommitFest 2024-01
Posted by Pavel Luzanov in Postgres Professional on 2024-05-21 at 00:00

Spring is in full swing as we bring you the hottest winter news of the January Commitfest. Let’s get to the good stuff right away!

Previous articles about PostgreSQL 17: 2023-07, 2023-09, 2023-11.

  • Incremental backup
  • Logical replication: maintaining the subscription status when upgrading the subscriber server
  • Dynamic shared memory registry
  • EXPLAIN (memory): report memory usage for planning
  • pg_stat_checkpointer: restartpoint monitoring on replicas
  • Building BRIN indexes in parallel mode
  • Queries with the IS [NOT] NULL condition for NOT NULL columns
  • Optimization of SET search_path
  • GROUP BY optimization
  • Support planner functions for range types
  • PL/pgSQL: %TYPE and %ROWTYPE arrays
  • Jsonpath: new data conversion methods
  • COPY ... FROM: ignoring format conversion errors
  • to_timestamp: format codes TZ and OF
  • GENERATED AS IDENTITY in partitioned tables
  • ALTER COLUMN ... SET EXPRESSION

...

Introducing pg_timeseries: Open-source time-series extension for PostgreSQL
Posted by Samay Sharma in Tembo on 2024-05-20 at 09:00

We are excited to launch pg_timeseries: a PostgreSQL extension focused on creating a cohesive user experience around the creation, maintenance, and use of time-series tables. You can now use pg_timeseries to create time-series tables, configure the compression and retention of older data, monitor time-series partitions, and run complex time-series analytics functions with a user-friendly syntax. pg_timeseries is open-sourced under the PostgreSQL license and can be added to your existing PostgreSQL installation or tried as a part of the Timeseries Stack on Tembo Cloud.

What is time-series data?

Put simply, time-series data is a collection of data points, each with a timestamp attached. These could be stock prices recorded throughout a trading day, temperature and availability data returned from devices and sensors, or web traffic on a website. Time-series workloads typically include queries filtering by time (generally for some degree of recency) and aggregation queries to summarize the data for analytics.

time-series

Using PostgreSQL for Time-series workloads

We believe that PostgreSQL can be used to power any data workload thanks to its extensions and ecosystem tools. Therefore, at Tembo, we aim to make it easy for users to use the entire PostgreSQL ecosystem to reduce the complexity of the modern data stack.

In the last year, we’ve built several stacks and extensions that help you run analytics, AI, and operational workloads on PostgreSQL. However, our customers’ most requested stack has been one able to store and act upon their time-series data. That way, they can store all of their data with a single PostgreSQL provider who can meet all their needs.

You may already be asking: “why not just power the stack using TimescaleDB?” The Timescale License would restrict our use of features such as compression, incremental materialized views, and bottomless storage. With these missing, we felt that what remained would not provide an adequate basis for our customers’ time-series needs. Therefore, we

[...]

In-memory disk for PostgreSQL temporary files
Posted by Stefan Fercot in Data Egret on 2024-05-20 at 07:30

Recently, while debugging a performance issue of a CREATE INDEX operation, I was reminded that PostgreSQL might produce temporary files when executing a parallel query, including parallel index creation, because each worker process has its own memory and might need to use disk space for sorting or hash tables.

Thanks to Peter Geoghegan answering this pgsql-admin email thread.

So, in order to try to speed up that index creation, I thought it would be beneficial to move those temporary files directly into memory using a tmpfs and wanted to test that theory, writing this blog post :-)


Example

Let’s first enable logging of the temporary files to evaluate the change we’re planning to make:

ALTER SYSTEM SET log_temp_files TO 0;
ALTER SYSTEM SET log_min_duration_statement TO 0;
SELECT pg_reload_conf();

Create a test database using pgbench and an index on the pgbench_accounts table:

$ createdb bench
$ /usr/pgsql-16/bin/pgbench -i -s 100 bench
$ psql bench -c "CREATE INDEX ON pgbench_accounts (aid, filler);"

We can see from the logs the time it took to build the index (~5.9s) with temporary files involved:

LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp28501.0.fileset/0.0", size 541376512
STATEMENT:  CREATE INDEX ON pgbench_accounts (aid, filler);
LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp28501.0.fileset/1.0", size 541024256
STATEMENT:  CREATE INDEX ON pgbench_accounts (aid, filler);
LOG:  duration: 5936.468 ms  statement: CREATE INDEX ON pgbench_accounts (aid, filler);

Let’s try to re-create the index with a higher maintenance_work_mem to get rid of the temporary files:

DROP INDEX pgbench_accounts_aid_filler_idx;
SET maintenance_work_mem TO '2GB';
CREATE INDEX ON pgbench_accounts (aid, filler);

But the temporary files are not gone, as we expected given the comment on the pgsql-admin thread mentioned above.

LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp28501.10.fileset/0.0", size 365936640
STATEMENT:  CREATE INDEX ON pgbench_accounts (aid, fi
[...]

Upgrade PostgreSQL from 15 to 16 on Ubuntu 24.04
Posted by Paolo Melchiorre in ITPUG on 2024-05-19 at 22:00

Howto guide for upgrading PostgreSQL from version 15 to 16 on Ubuntu, after its upgrade from version 23.10 to 24.04 (Noble Numbat).

Deep Dive into PostgREST - Time Off Manager (Part 2)
Posted by Radim Marek on 2024-05-18 at 00:00

Let's recap the first part of "Deep Dive into PostgREST," where we explored the basic functionality to expose and query any table using an API, demonstrated using cURL. All it took was to set up a db-schema and give the db-anon-role some permissions. But unless you are creating the simplest of CRUD applications, this only scratches the surface.

In Part 2, we will expand APIs, provide better abstraction, and implement the foundation of what can be considered business logic, all while extending the sample "Time Off Manager" application. While the previous instalment being introductiory only, make sure you don't miss the important details in this one.

Before we move on, let's do a bit of housekeeping and clean up the permissions setup from the first part. This way, we can start with a clean slate (when it comes to the permissions) and avoid any lingering rights that could confuse us later.

REVOKE ALL PRIVILEGES ON ALL TABLES IN SCHEMA public FROM time_off_anonymous;

Dedicated schema for the API

Using the public schema (or any schema(s) where your core data model resides) is a fast way to get started. However, using a dedicated schema for the API is beneficial for several reasons:

  • It provides options for better abstraction, which you might appreciate later when refactoring the original data model.
  • Data customisation is also a requirement unless you prefer building a "fat" client and thus transferring the majority of the business logic there. Combining data from multiple tables helps shield the consumer from complex queries and relations.
  • While we won't explore it as a security feature, it can also provide relevant boundaries.

Let's get started with the creation of the schema itself, and expose the users using a view and setting basic permissions. We will do this by setting default permissions, so we don't have to repeat the same for all objects as we create them. Please note that default privileges are applied only to new objects created.

CREATE SCHEMA api;

GRANT USAGE ON S
[...]

Understand PostgerSQL’s Portal – Executor vs Process Utility
Posted by cary huang in Highgo Software on 2024-05-17 at 22:22

Introduction

When you send a query to PostgreSQL, it normally would go through stages of query processing and return you the results at the end. These stages are known as:

  • Parse
  • Analyze
  • Rewrite
  • Plan
  • Execute

I wrote another blog to briefly explain the responsibility of each query processing stage. You can find it here. In this blog. There is one that focuses primarily on the planner module that you can find here. In this blog, we will focus on the executor part of the query processing, which is the entry point to “communicate” with other PostgreSQL internal modules to collect the correct data for the query.

In fact, executor may not be the best term to describe this stage. In the PostgreSQL source code, it is actually “packaged” in another object called “portal”. Often, and it has 2 potential paths to take, the “executor” path, or the “process utility” path. Very often, we just call this stage as “executor” rather than “portal”, because executor handles most of the DML query types that involve SELECT, INSERT, UPDATE, DELETE…etc and it has to process them according to the query plan created by the “planner” module.

Process Utility, on the other hand, does not have a plan from “planner” to process and it handles DDL and other non-DML queries such as VACUUM, CREATE, PREPARE …etc.

The portal (or executor) is not as complicated as planner in terms of the logics involved, but it does involve a lot of other modules in PostgreSQL that cooperate together to produce the right output. Let’s take a look.

Where it all start

the function exec_simple_query() in postgres.c is where the query processing stages take place. We will focus on what happens after PortalStart().

Relationship Between Portal, Executor and ProcessUtility

As mentioned above, portal is an object that encapsulates “executor” and “process utility” where:

  • executor is responsible for executing the “plan” created by planner – SELECT, UPDATE, INSERT…etc
  • process utility is responsible fo
[...]

Understanding Synchronous and Asynchronous Replication in PostgreSQL – What is Best for You?
Posted by semab tariq in Stormatics on 2024-05-17 at 05:17

Learn about synchronous and asynchronous replication in PostgreSQL, including how they work and when to use each.

The post Understanding Synchronous and Asynchronous Replication in PostgreSQL – What is Best for You? appeared first on Stormatics.

Geographically distributed Postgres for multi-tenant applications
Posted by Tudor Golubenco on 2024-05-16 at 00:00
Documenting a pattern for making multi-tenant applications global by distributing the data, using only standard PostgreSQL functionality.

PGConf.DE 2024 - Review
Posted by Andreas Scherbaum on 2024-05-15 at 22:55

PGConf.DE 2024 was last month, the feedback is now closed - time for a review.

Removing Bloat with pg_repack Extension
Posted by muhammad ali in Stormatics on 2024-05-15 at 06:55

PostgreSQL's pg_repack extension offers a solution to database maintenance challenges by optimizing table storage without disrupting operations. It tackles issues like bloat and fragmentation efficiently. Learn about its operation, options, and restrictions for effective database management.

The post Removing Bloat with pg_repack Extension appeared first on Stormatics.

Git Filters
Posted by Christoph Berg in Cybertec on 2024-05-14 at 07:13

Nowadays, most Debian packages are maintained in Git repositories, with the majority hosted on Debian's GitLab instance at salsa.debian.org.Debian is running a "vcswatch" service that keeps track of the status of all packaging repositories that have a Vcs-Git (and other VCSes) header set. This status is stored in a PostgreSQL database, which is then utilized by Debian's package maintenance infrastructure for various purposes. One such application is to identify repositories that may require a package upload to release pending changes.

Naturally, this amount of Git data required several expansions of the scratch partition on qa.debian.org in the past, up to 300 GB in the last iteration. Attempts to reduce that size using shallow clones (git clone --depth=50) resulted in only marginal space savings of a few percent. Running git gc on all repos helps somewhat, but is a tedious task. As Debian is growing, so are the repos both in size and number. I ended up blocking all repos with checkouts larger than a gigabyte, and even then, the only solutions were again either expanding the disk space or lowering the blocking threshold.

Since we only require minimal information from the repositories, specifically the content of debian/changelog and a few other files from debian/, along with the number of commits since the last tag on the packaging branch - it made sense to try obtaining this information without fetching a full repo clone. The question of whether we could retrieve this solely using the GitLab API at salsa.debian.org was pondered but never answered. But then, in #1032623, Gábor Németh suggested the use of git clone --filter blob:none. This suggestion remained unattended in the bug report for almost a year until the next "disk full" event prompted me to give it a try.

The blob:none filter makes git clone omit all files, fetching only commit and tree information. Any blob (file content) needed at git run time is transparently fetched from the upstream repository and stored locally. It turned out to be

[...]

When to split patches for PostgreSQL
Posted by Peter Eisentraut in EDB on 2024-05-14 at 04:00

In my previous article on how to submit patches by email for PostgreSQL, I skipped over whether patches should be split up. Let’s discuss that now.

(See the previous article, as well as general Git documentation, for the technical details of how to split up a patch. Here, I’m only talking about why.)

What I’m talking about here specifically is where instead of attaching one patch to an email to pgsql-hackers, you attach a sequence of patch files like 0001-first.patch, 0002-second.patch, etc. (see previous article for the correct naming).

What follows is my opinion, based on what I like to see during patch reviews, and how I tend to prepare my patch submissions. Maybe it’s all wrong and others hate it and wish I’d stop it. Feedback welcome. But anyway.

The foremost principle is, each patch in a series should make sense on its own, and it should move the source tree from one working state to another working state. (Think of the “C” in ACID.) When I review a patch series, I review each patch separately, and I also run each patch (incrementally on top of its predecessors) through the test suites. If that fails, meh.

This principle can be applied and interpreted in different ways, and that’s fine. Let’s go through a few different cases:

[I will show some examples. In order not to pick on anyone else, I’m only picking examples that I authored or I’m otherwise closely involved in.]

  1. Simplest case: You send the whole thing as one patch. Nothing wrong with that, let’s be clear. Start here if you don’t know better or you’re not a Git wizard or none of the other cases apply. There appears to be a natural limit on how big a patch can be processed successfully, but that’s not a hard rule, just indirect advice. Keep your patch free of unrelated cleanup. If you have cleanup, see below.

    [Example: Here is the first patch submission for a potential new feature. It’s just a big patch for now, early in development, maybe we’ll find ways to organize it differently later. But f

[...]

Extension Summit Topic Review
Posted by David Wheeler in Tembo on 2024-05-13 at 19:12

Boy howdy that went fast.

This Wednesday, May 15, the final Postgres extension ecosystem mini-summit will review topics covered in previous Mini-Summits, various Planet PostgreSQL posts, the #extensions channel on the Postgres Slack and the Postgres Discord. Following a brief description of each, we’ll determine how to reduce the list to the most important topics to take on at the Extension Ecosystem Summit at PGConf.dev in Vancouver on May 28. I’ll post a summary later this week along with details for how to participate in the selection process.

In the meantime, here’s the list as of today:

  • Metadata:
    • Third-party dependencies
    • Types of extensions
    • Taxonomies
    • System requirements (OS, version, CPU, etc.)
    • Categorization
    • Versioning
  • Registry:
    • Identity, namespacing, and uniqueness
    • Distributed vs. centralized publishing
    • Binary packaging and distribution patterns
    • Federated distribution
    • Services and tools to improve or build
    • Stats, Reports, Badging: (stars, reviews, comments, build & test matrices, etc.)
  • Packaging:
    • Formats (e.g., tarball, OCI, RPM, wheel, etc.)
    • Include dynamic libs in binary packaging format? (precedent: Python wheel)
    • Build farming
    • Platforms, architectures, and OSes
    • Security, trust, and verification
  • Developer:
    • Extension developer tools
    • Improving the release process
    • Build pipelines: Supporting PGXS, prgx, Rust, Go, Python, Ruby, Perl, and more
  • Community:
    • Community integration: identity, infrast
[...]

Use of “pgtt” Extension in Self Managed vs. Cloud Products
Posted by Shane Borden on 2024-05-13 at 18:40

There are several benefits to using temp tables in SQL DML operations. Given my long relationship and work on Oracle databases, I am highly familiar with the Oracle style “Global Temporary Tables” and have used them to solve many performance issues when complex joins are involved. That’s why I was excited when Gilles Darold released the extension “pgtt” for PostgreSQL. I was even more excited when Google released the extension for use in the Google CloudSQL for Postgres and AlloyDB for Postgres products.

In my opinion, native Postgres temporary tables leave a lot to be desired. For example, each table has to be created upon use. Nothing is persistent. This takes time and leads to Postgres catalog bloat (looking to do a post on this soon). That being said, in self managed instances, ‘pgtt’ is very easy to use. You can load the extension either at connection time by user, or for anyone that connects to a database. For example (make sure the extension is created in the database before setting these):

alter role test_user SET session_preload_libraries to 'pgtt';

alter database test_pgtt SET session_preload_libraries = 'pgtt';

This will, by default, use the ‘pgtt’ version of temporary tables instead of the default temporary tables Postgres offers. You can then create ‘pgtt’ style tables and refer to them via the pgtt_schema with no further modifications.

To revert back in the session, you must “disconnect” the session alter the user / database to revert the behavior to “old” style temporary tables. To my knowledge, if the library is enabled this way there is no way to use both styles of temporary tables at the same time.

alter role test_user RESET session_preload_libraries;

alter database pgtt_test RESET session_preload_libraries;

The Set Up…..

/* set flag "pgtt.enabled" on alloydb / cloudsql postgres console  to on */

/* create the extension in every database you need to use global temp tables */
CREATE EXTENSION pgtt;
LOAD 'pgtt';

/* Additional e
[...]

Conversion Chronicles – Invisible Indexes in Oracle to PostgreSQL Migration.
Posted by Deepak Mahto on 2024-05-13 at 15:22

Introduction – Invisible or Hidden Index.

Any database conversion offers valuable insights into how things work across platforms or how to achieve similar functionality in the target database. Invisible indexes are one such feature that is often requested in PostgreSQL when migrating away, due to their use case of create new or marking existing indexes as invisible for the optimizer.

Making indexes invisible can be beneficial for several use cases, such as:

  1. Creating invisible indexes allows for simulating and measuring impacts before actually dropping duplicate or unused indexes.
  2. Temporarily marking an index as invisible can influence the execution plan to avoid using it as an index access path option.

Oracle – Invisible Index

Before delving into the options available to achieve similar features in PostgreSQL, let’s first understand how it would work in Oracle with a sample table.

Sample Oracle examples

create table test_invisible as
select level as col1, level as col2, 
rpad('a',level,'x') as col3
from dual
connect by rownum < 501;

create index test_invisible_idx1 on 
test_invisible(col1);

create index test_invisible_idx2 on 
test_invisible(col1,col2);

explain plan for 
select /*+ NO_PARALLEL */ * from test_invisible 
where col1 = 1;

select * from table(dbms_xplan.display);

In our scenario, let’s assume we want to eliminate a single column index (test_invisible_idx1). We’ll alter the index to be invisible and then verify the execution plan to assess the impact on performance.

alter index test_invisible_idx1 invisible;

explain plan for 
select /*+ NO_PARALLEL */ * from test_invisible where col1 = 1;

select * from table(dbms_xplan.display);

In Oracle, making any index invisible has a global effect—it’s not limited to the current session or specific users.

PostgreSQL – Invisible Index(hypopg Extension)

Making indexes invisible isn’t officially supported as part of the ALTER INDEX command in vanilla PostgreSQL. However, leveraging the Ext

[...]

Advanced psql scripting with coproc
Posted by Daniel Vérité on 2024-05-13 at 15:10
In this post, let’s see how to control psql from outside with the coproc functionality in bash.

Top posters

Number of posts in the past two months

Top teams

Number of posts in the past two months

Feeds

Planet

  • Policy for being listed on Planet PostgreSQL.
  • Add your blog to Planet PostgreSQL.
  • List of all subscribed blogs.
  • Manage your registration.

Contact

Get in touch with the Planet PostgreSQL administrators at planet at postgresql.org.