Latest Blog Posts

11 Lessons to learn when using NULLs in PostgreSQL®
Posted by Francesco Tisiot in Aiven on 2024-02-28 at 10:00

A boolean value should only contain two values, True or False, but is it correct? Usually people assume so, but sometimes miss the fact that there could be the absence of the value all-together. In databases this is absence is usually stored as NULL and this blog showcases how to find them, use them properly and 11 lessons to learn to be a NULL Pro!

Keep in mind, it's not only booleans that can contain NULL values, it's all the columns where you don't define a NOT NULL constraint!

If you need a FREE PostgreSQL database?
🦀 Check Aiven's FREE plans! 🦀
If you need to optimize your SQL query?
🐧 Check EverSQL! 🐧

It all starts with some columns and rows

Let's start from the basics: you have a PostgreSQL® database and a table, called users like:

CREATE TABLE users (
    id SERIAL,
    username TEXT PRIMARY KEY,
    name TEXT,
    surname TEXT,
    age INT
);

Hey, do you want to test the above code but not having a PostgreSQL database handy? Past your code in PostgreSQL Playground and quickly check the results!

Let's insert some data:

INSERT INTO users (username, name, surname, age) VALUES
    ('jdoe', 'Jon', 'Doe', 25),
    ('lspencer','Liz', 'Spencer', 35),
    ('hlondon','Hanna', 'London', 45);

Querying the data showcases the table with all the columns filled.

 id | username | name  | surname | age 
----+----------+-------+---------+-----
  1 | jdoe     | Jon   | Doe     |  25
  2 | lspencer | Liz   | Spencer |  35
  3 | hlondon  | Hanna | London  |  45
(3 rows)

Insert NULLs

Now, let's check if we can insert some NULLs, let's try by inserting them in the name, surname and age columns:

INSERT INTO users (username, name, surname, age) VALUES ('test',NULL, NULL, NULL);

This works since we don't have any constraint

 id | username | name  | surname | age 
----+----------+-------+---------+-----
  1 | jdoe     | Jon   | Doe     |  25
  2 | lspencer | Liz   | Spencer |  35
  3 | hlondon  | Hanna | London  |  45
  4 | test     |    
[...]

Maximizing Microservice Databases with Kubernetes, Postgres, and CloudNativePG
Posted by Gabriele Bartolini in EDB on 2024-02-28 at 00:00

Explore the synergy between PostgreSQL and Kubernetes through CloudNativePG — a transformative operator discussed in this article. Discover how this powerful open source stack empowers organizations to free themselves from vendor lock-in and to create a seamless microservice database environment, enhancing innovation, operational efficiency and velocity. This article provides a refreshed perspective on “ Why Run Postgres in Kubernetes?” from 2022.

Extension Ecosystem Summit 2024
Posted by David Wheeler in Tembo on 2024-02-27 at 17:46
Logo for PGConf.dev

I’m pleased to announce that some pals and I have organized and will host the (first annual?) Extension Ecosystem Summit at PGConf.dev in Vancouver (and more, see below) on May 28:

Enabling comprehensive indexing, discovery, and binary distribution.

Participants will collaborate to examine the ongoing work on PostgreSQL extension distribution, examine its challenges, identify questions, propose solutions, and agree on directions for execution.

Going to PGConf? Select it as an “Additional Option” when you register, or update your registration if you’ve already registered. Hope to see you there!


Photo of the summit of Mount Hood

Extension Ecosystem Mini-Summit

But if you can’t make it, that’s okay, because in the lead up to the Summit, to we’re hosting a series of six virtual gatherings, the Postgres Extension Ecosystem Mini-Summit.

Join us for an hour or so every other Wednesday starting March 6 to hear contributors to a variety of community and commercial extension initiatives outline the problems they want to solve, their attempts to so, challenges discovered along the way, and dreams for an ideal extension ecosystem in the future. Tentative speaker lineup:

  • March 6: David Wheeler, PGXN: “History and Context of Extension Distribution”
  • March 20: Ian Stanton, Tembo: “Trunk”
  • April 3: Devrim Gündüz: “Overview of the yum.postgresql.org architecture, how new RPMs are added, and issues and challenges with distributing RPMed extensions”
  • April 17: TBD
  • May 1: Yurii Rashkovskii, Omnigres: “Universally buildable extensions: dev to prod”
  • May 15: David Wheeler, PGXN: “Metadata for All: Enabling discovery, packaging, and community”

Hit the event page for details. Many thanks to my co-organizers Jeremy Schneider, David Christensen, Keith Fiske, and Devrim Gündüz, as well as the PGConf.dev organize

[...]

AUTOCOMMIT – Oracle vs PostgreSQL
Posted by Akhil Reddy Banappagari in HexaCluster on 2024-02-27 at 16:30

Oracle and PostgreSQL differ significantly in their transaction models, and AUTOCOMMIT is one of those differences. We see many of our customers successfully migrate code, but still encounter variations in behavior and even runtime errors related to transaction control. Many issues happen because AUTOCOMMIT settings differ between Oracle and PostgreSQL. It is definitely necessary to […]

The post AUTOCOMMIT – Oracle vs PostgreSQL appeared first on HexaCluster.

Uncovering and Exploring ROW Constructors in PostgreSQL.
Posted by Deepak Mahto on 2024-02-27 at 13:55

A ROW expression allows you to construct ROW values, which can represent anonymous records, specific table row types, or custom composite types. Its uses include processing records within multiple expressions (using operators like =, <>, <, <=, >, or >=), evaluation with subqueries, and combining ROW values to build composite types. You can even export these to JSON format (using functions like rowtojson). Some key operations you can perform with a ROW constructor in Conversion or migration are :

  • Building custom ROW types
  • Expanding ROW types for internal processing
  • Using with functions that return multiple out params

With a ROW constructor, we can build record types with different data types and later expand them using internal aliases, as shown below.

While using ROW constructors in recent migrations, I’ve encountered multiple usage patterns and scenarios. Let’s explore some of them in a blog post about uncovering the uses of ROW constructors in PostgreSQL.

We will include the following table as a sample in the blog.

postgres=# create table testrow (col1 integer, col2 integer generated by default as identity);
CREATE TABLE
postgres=# \d testrow
                           Table "public.testrow"
 Column |  Type   | Collation | Nullable |             Default              
--------+---------+-----------+----------+----------------------------------
 col1   | integer |           |          | 
 col2   | integer |           | not null | generated by default as identity



1. Auto aliasing on anonymous row type expansion.

Whenever we build a ROW type using a constructor, PostgreSQL maps it to a pseudo-RECORD type. If we later expand this ROW type, the internal field names will be assigned aliases like f1, f2, f3…fn. This is because ROW constructors don’t preserve the original table column names or any custom aliases.

When expanding a ROW constructor, regardless of whether it’s initialized using table columns or custom types, you’ll encounter the

[...]

Installing PostgreSQL on SLES 15 just got easier – and better!
Posted by Devrim GÜNDÜZ in EDB on 2024-02-27 at 13:30

The PostgreSQL RPM repository for SuSE Enterprise Linux
has supported SLES for quite some time. We followed the usual conventions and used zypper addrepo … to add the repositories.

Continue reading "Installing PostgreSQL on SLES 15 just got easier – and better!"

The default value of fdw_tuple_cost was updated to 0.2. What does that mean?
Posted by Umair Shahid in Stormatics on 2024-02-27 at 13:14

This blog post explores the recent change to the fdw_tuple_cost parameter in PostgreSQL, examining the problem it addresses and the reasoning behind the new default value (0.2).

The post The default value of fdw_tuple_cost was updated to 0.2. What does that mean? appeared first on Stormatics.

Understanding Indexes in pgvector
Posted by semab tariq in Stormatics on 2024-02-26 at 13:26

Explore how pgvector's indexes work, choose the right one for your needs, and find the best option for your critical data.

The post Understanding Indexes in pgvector appeared first on Stormatics.

TOAST and its influences on parallelism in PostgreSQL
Posted by Anthony Sotolongo León in OnGres on 2024-02-26 at 10:00

Introduction

Since the PostgreSQL 9.6 version was released, the feature related to query parallelism has appeared and has become a good option to improve query performance. Since then, the evolution of parallelism has been growing, resulting in better performance in the database. To manage the behavior of parallelism, there are some parameters that you can tune, for example:

  • max_parallel_workers_per_gather: The number of parallel workers to execute a query activity in parallel (default 2), these parallel workers are taken from the pool of processes defined by max_worker_processes, limited by max_parallel_workers, worth mentioning there is one special worker named: the leader worker, who coordinates and gathers the result of the scan from each of the parallel workers. This leader worker can or can not participate in scanning and will depend on its load in the coordination activities. Also, it is possible to control the involvement of the leader worker in the scanning using the parameter parallel_leader_participation.
  • min_parallel_table_scan_size: The minimum amount of table data(size) for a parallel scan to be considered (default 8MB)
  • min_parallel_index_scan_size: the minimum amount of index data(size) for a parallel scan to be considered (default 512kB)
  • parallel_setup_cost : the cost of starting up worker processes for parallel query (default 1000)
  • parallel_tuple_cost: the cost of passing each tuple (row) from worker to leader backend (default 0.1)
  • parallel_workers: A storage parameter for tables, that allows change the behavior of number of workers to execute a query activity in parallel, similar to max_parallel_workers_per_gather, but only for a specific table; ALTER TABLE tabname SET (parallel_workers = N);

According to the resources of your server and the workload of your database, you can tune these parameters to take advantage of parallelism. The min_parallel_table_scan_size affects the decision of Postgres code about how many workers

[...]

100x Faster Query in Aurora Postgres with a lower random_page_cost
Posted by Shayon Mukherjee on 2024-02-24 at 06:43
Recently I have been working with some queries in Postgres where I noticed either it has decided not to use an index and perform a sequential scan, or it decided to use an alternative index over a composite partial index. This was quite puzzling, especially when you know there are indexes in the system that can perform these queries faster. So what gives? After some research, I stumbled upon random_page_cost (ref).

pgagroal 1.6.0 has been released
Posted by Luca Ferrari on 2024-02-24 at 00:00

pgagroal, the fast connection pooler for PostgreSQL, has reached a new stable release!

pgagroal 1.6.0 has been released

A couple of days ago, pgagroal version 1.6.0 has been released.

This new version includes a lot of new features and small improvements that make pgagroal much more user-friendly and ease to adopt as a conenction pooler. The main contribution, from yours truly, has been command line refactoring and JSON support. Now the command line supports commands and subcommands, like for example conf get and conf set, and a more consistent set of commands. The JSON command output allows for an ease automation and a stable command output, so to ease the adoption in different scenarios.

But there’s more: a lot of other tickets have been solved during this release, and there is now support fo Mac OSX. Moreover, it is now possible to retrieve and set configuration values at run-time, thus without the need to manually editing the configuration file and reloading the daemon.

There is an initial exeperimental support for client certificates, and now it is possible to determine how long a connection must live.

A better handling of the configuration files, hence a better detection and reporting of misconfiguration, as well as a better error messaging system, completes the release.

The list of contributors is also expanding, and this is good and exciting!

Give pgagroal a try, you will be amazed by the capabilities of this connection pooler!

Multiple Client Certificate Selection – a Simple POC
Posted by cary huang in Highgo Software on 2024-02-23 at 23:13

Introduction

I recently came across this email thread discussion from several years ago, which discussed ways to enable a client to choose from a list of client certificates to send to the server. The benefit is obvious; when a client has to communicate with different PostgreSQL servers with different TLS settings and trust structure, the feature can help reduce the application’s effort to figure out what certificate to send to what server. In other words, the application does not need to have this selection logic because the libpq library is able to choose the right one to send.

OpenSSL provides PostgreSQL with TLS and other cryptographic capabilities and it has also evolved significantly over the past years. Now, it has much more API support that enabled applications to make more informed TLS decisions. For example, choosing the most appropriate client certificate to send to the server during handshake. Today, in this post, I will briefly explain how certificate chain of trust works and share the “multiple client certificate selection” patch if one is intereted.

TLS ? Certificate ? Trust Chain? What?

TLS (Transport Layer Security)

formerly called SSL (Secured Socket Layer). Now the two terms are used interchangeably. This is a protocol for encrypting and protecting network communications. It has 2 major responsibilities:

  • Communication encryption / decryption + integrity check
  • Use certificates to verify the identities of both parties.

Certificate

Also called X509 v3 Certificate, or simply X509 Certificate. It is a data structure that contains validity, issuer, extension, purpose and other custom information. Primarily used to represent an entity’s identity and ensure trust. It has three basic types:

  • Certificate Authority – Also called CA Certificate or root CA. (IdenTrust, DigiCert, GlobalSign, Let’s Encrypt, etc are common CAs. It can also be self-signed for test purposes).
  • Intermediate CA certificate.
  • Entity Certificate – The actual certificate
[...]

Changes to PGSQL Phriday Blogging Events
Posted by Ryan Booz on 2024-02-23 at 21:54
It’s been great to see the interest and participation of the monthly PGSQL Phriday blogging event grow over the last 15 months. And like any new venture, it takes time to see and understand the best way to organize it and what the best parameters are to encourage more participation from month to month. With ... Read more

The Rest is History: Investigations of WAL History Files
Posted by Brian Pace in Crunchy Data on 2024-02-23 at 13:00

PostgreSQL uses the concept of a timeline to identify a series of WAL records in space and time. Each timeline is identified by a number, a decimal in some places, hexadecimal in others. Each time a database is recovered using point in time recovery and sometimes during standby/replica promotion, a new timeline is generated.

A common mistake is to assume that a higher timeline number is synonymous with the most recent data. While the highest timeline points to the latest incarnation of the database, it doesn't guarantee that the database indeed holds the most useful data from an application standpoint. To discern the validity of this statement, a closer examination of the Write-Ahead Logging (WAL) history files is essential, unraveling the messages they convey.

In this discussion, we will explore a recovered database and trace the narrative embedded in the history files. By the conclusion, you will have gained a deeper insight into the functionality of these history files within Postgres, empowering you to address queries related to recovery processes and the database's historical journey (or may I call it the 'family tree').

Assessing current state

Let's begin by gaining insights into the current status of the database. The information obtained from the pg_controldata output indicates that the database is currently on timeline 11. Take note of the latest checkpoint Write-Ahead Logging (WAL) file, identified as '0000000B0000000100000039'. See my previous post on WAL file naming and numbering.

It's worth noting that timelines are sometimes expressed in decimal form, as in the case of 11, and at other times in hexadecimal form, such as '0000000B'. While this dual representation might be perplexing initially, familiarity with when and where these different forms are employed will contribute to a clearer understanding.

$ pg_controldata
...
pg_control last modified:             Tue 06 Feb 2024 03:10:53 PM EST
Latest checkpoint location:           1/39000060
Latest checkpoint's REDO location:   
[...]

Autoscaling in Action: Postgres Load Testing with pgbench
Posted by Raouf Chebri in Neon on 2024-02-23 at 09:25

Blog post cover

In this article, I’ll show Neon autoscaling in action by running a load test using one of Postgres’ most popular benchmarking tool, pgbench. The test simulates 30 clients running a heavy query.

While 30 doesn’t sound like a lot, the query involves a mathematical function with high computational overhead, which signals to the autoscaler-agent that it needs to allocate more resources to the VM.

We will not cover how autoscaling works, but for those interested in knowing the details, you can read more about how we implemented autoscaling in Neon.

For this load test, you will need:

  1. A Neon account
  2. pgbench

The load test

Ensuring your production database can perform under varying loads is crucial. That’s why we implemented autoscaling to Neon, a feature that dynamically adjusts resources allocated to a database in real-time, based on its current workload.

However, the effectiveness and efficiency of autoscaling are often taken for granted without thorough testing. To showcase autoscaling in action, we turn to Postgres and pgbench.

pgbench is a benchmarking tool included with Postgres, designed to evaluate the performance of a Postgres server. The tool simulates client load on the server and runs tests to measure how the server handles concurrent data requests.

pgbench is executed from the command line, and its usage can vary widely depending on the specific tests or benchmarks being run. Here is the command we will use in our test:

pgbench -f test.sql -c 30 -T 120 -P 1 <;CONNECTION_STRING>;

In this example, pgbench executes the query in test.sql. The parameter -c 30 specifies 30 client connections, and -T 120 runs the test for 120 seconds against your database. -P 1 specifies that pgbench should report the progress of the test every 1 second. The progress report typically includes the number of transactions completed so far and the number of transactions per second.

30 clients don’t seem like enough do stress a database. Well, it depends on t

[...]

Recent PGXN Improvements
Posted by David E. Wheeler on 2024-02-22 at 21:19

One of the perks of my new gig at Tembo is that I have more time to work on PGXN. In the last ten years I’ve had very little time to give, so things have stagnated. The API, for example, hasn’t seen a meaningful update since 2016!

But that’s all changed now, and every bit of the PGXN architecture has experienced a fair bit of TLC in the last few weeks. A quick review.

PGXN Manager

PGXN Manager provides user registration and extension release services. It maintains the root registry of the project, ensuring the consistency of download formatting and naming ($extension-$version.zip if you’re wondering). Last year I added a LISTEN/NOTIFY queue for publishing new releases to Twitter and Mastodon, replacing the old Twitter posting on upload. It has worked quite well (although Twitter killed the API so we no longer post there), but has sometimes disappeared for days or weeks at a time. I’ve futzed with it over the past year, but last weekend I think I finally got to the bottom of the problem.

As a result, the PGXN Mastodon account should say much more up-to-date than it sometimes has. I’ve also added additional logging capability, because sometimes the posts fail and I need better clarity into what failed and how so it, too, can be fixed. So messaging should continue to become more reliable. Oh, and testing and unstable releases are now prominently marked as such in the posts (or will be, next time someone uploads a non-stable release).

I also did a little work on the formatting of the How To doc, upgrading to MultiMarkdown 6 and removing some post-processing that appended unwanted backslashes to the ends of code lines.

PGXN API

PGXN API (docs) has been the least loved part of the architecture, but all that changed in the last couple weeks. I finally got over my trepidation and got it working where I could hack on it again. Turns out, the pending issues and to-dos — some dating back to 2011! — weren’t so difficult to take on as I had feared. In the last few weeks, API has finally come uns

[...]

The History and Future of Extension Versioning
Posted by David Wheeler in Tembo on 2024-02-22 at 19:33

Every software distribution system deals with versioning. Early in the design of PGXN, I decided to require semantic versions (SemVer), a clearly-defined and widely-adopted version standard, even in its pre-1.0 specification. I implemented the semver data type that would properly sort semantic versions, later ported to C by Sam Vilain and eventually updated to semver 2.0.0.

As I’ve been thinking through the jobs and tools for the Postgres extension ecosystem, I wanted to revisit this decision, the context in which it was made, and survey the field for other options. Maybe a “PGXN v2” should do something different?

But first that context, starting with Postgres itself.

PostgreSQL Extension Version Standard

From the introduction extensions in PostgreSQL 9.1, the project side-stepped the need for version standardization and enforcement by requiring extension authors to adopt a file naming convention, instead. For example, an extension named “pair” must have a file with its name, two dashes, then the version as listed in its control file, like so:

pair--1.1.sql

As long as the file name is correct and the version part byte-compatible with the control file entry, CREATE EXTENSION will find it. To upgrade an extension the author must provide a second file with the extension name, the old version, and the new version, all delimited by double dashes. For example, to upgrade our “pair” extension to version 1.2, the author supply all the SQL commands necessary to upgrade it in this file:

pair--1.1--1.2.sql

This pattern avoids the whole question of version standards, ordering for upgrades or downgrades, and all the rest: extension authors have full responsibility to name their files correctly.

PGXN Versions

SemVer simplified a number of issues for PGXN in ways that the PostgreSQL extension versioning did not (without having to re-implement the core’s file naming code). PGXN

[...]

PostgreSQL Backup and Recovery Management using Barman
Posted by muhammad ali in Stormatics on 2024-02-22 at 17:47

Barman stands as a widely used open-source tool dedicated to managing backup and disaster recovery operations for PostgreSQL databases.

The post PostgreSQL Backup and Recovery Management using Barman appeared first on Stormatics.

Quick Benchmark: PostgreSQL 2024Q1 Release Performance Improvements
Posted by Michael Banck in credativ on 2024-02-22 at 15:55

The PostgreSQL 2024Q1 back-branch releases 16.2, 15.6, 14.11, 13.14 and 12.18 on February 8th 2024. Besides fixing a security issue (CVE-2024-0985) and the usual bugs, they are somewhat unique in that they address two performance problems by backporting fixes already introduced into the master branch before. In this blog post, we describe two quick benchmarks that show how the new point releases have improved. The benchmarks were done on a ThinkPad T14s Gen 3 which has a Intel i7-1280P CPU with 20 cores and 32 GB of RAM.

Scalability Improvements During Heavy Contention

The performance improvements in the 2024Q1 point releases concerns locking scalability improvements at high client counts, i.e., when the system is under heavy contention. Benchmarks had shown that the performance was getting worse dramatically for a pgbench run with more than 128 clients. The original commit to master (which subsequently was released with version 16) is from November 2022. It got introduced into the back-branches now as version 16 has seen some testing and the results were promising.

The benchmark we used is adapted from this post by the patch author and consists of a tight pgbench run simply executing SELECT txid_current() for five seconds each at increasing client count and measuring the transactions per second:

$ cat /tmp/txid.sql
SELECT txid_current();
$ for c in 1 2 4 8 16 32 64 96 128 192 256 384 512 768 1024 1536;
> do echo -n "$c ";pgbench -n -M prepared -f /tmp/txid.sql -c$c -j$c -T5 2>&1|grep '^tps'|awk '{print $3}';
> done

The following graph shows the average transactions per second (tps) over 3 runs with increasing client count (1-1536 clients), using the Debian 12 packages for version 15, comparing the 2023Q4 release (15.5, package postgresql-15_15.5-0+deb12u1) with the 2024Q1 release (15.6, package postgresql-15_15.6-0+deb12u1):

The tps numbers are basically the same up to 128 clients, whereas afterwards the 15.5 transaction counts drops from the peak of 650k 10-fold to 65k. The new 15.6

[...]

Point In Time Recovery Under the Hood in Serverless Postgres
Posted by Raouf Chebri in Neon on 2024-02-22 at 12:44

Imagine working on a crucial project when suddenly, due to an unexpected event, you lose significant chunks of your database. Whether it’s a human error, a malicious attack, or a software bug, data loss is a nightmare scenario. But fear not! We recently added support for Point-In-Time Restore (PITR) to Neon, so you can turn back the clock to a happier moment before things went south.

In the video below and in the PITR announcement article, my friend Evan shows you can recover your data in a few clicks. He also uses Time Travel Assist to observe the state of the database at a given timestamp to confidently and safely run the restore process.

How is this possible? This article is for those interested in understanding how PITR works under the hood in Neon. To better explain this, we will:

  1. Cover the basics of PITR in Postgres
  2. Explore the underlying infrastructure that allows for PITR in Neon.

We’ll ensure by the end of this post that you’re always prepared for disaster strikes.

Understanding the basics of Point In-Time Recovery in Postgres

PITR in Postgres is made possible using two key components:

  1. Write-Ahead Logging : Postgres uses Write-Ahead Logging (WAL) to record all changes made to the database. Think of WAL as the database’s diary, keeping track of every detail of its day-to-day activities.
  2. Base backups : Base backups are snapshots of your database at a particular moment in time.

With these two elements combined, you define a strategy to restore your database to any point after the base backup was taken, effectively traveling through your database’s timeline. However, you’d need to do some groundwork, which consists of the following:

  1. Setting up WAL archiving: By defining an archive_command and setting archive_mode to on in your postgresql.conf.
  2. Creating base backups: You can use the pg_basebackup to create daily backups.

If, for any reason, you need to restore your database, you need to recover the latest backup a

[...]

The Builders Podcast Recap: Unlocking Postgres Power: A Deep Dive on Simplifying High Availability with Gianni Ciolli
Posted by Gülçin Yıldırım Jelínek in EDB on 2024-02-21 at 01:10
If you’ve been following along with our Builders podcasts, you know that in every episode I interview one distinguished guest. This episode #3 is particularly interesting, because our guest is Gianni Ciolli, EDB VP and Field CTO, and we touch on everything from EDB's Trusted Postgres Architect solution to Galileo’s scientific method.

RFC: Extension Metadata Typology
Posted by David Wheeler in Tembo on 2024-02-20 at 22:26

Lately I’ve been thinking a lot about metadata for Postgres extensions. Traditional use cases include control file metadata, which lives in .control files used by CREATE EXTENSION and friends, and PGXN metadata, which lives in META.json files used by PGXN to index and publish extensions. But these two narrow use cases for SQL behavior and source code distribution don’t provide the information necessary to enable other use cases, including building, installing, configuration, and more.

So I have also been exploring other metadata formats, including:

These standards from neighboring communities reveal a great deal of overlap, as one might expect (everything has a name, a version, an author, license, and so on), but also types of metadata that had not occurred to me. As I took notes and gathered suggestions from colleagues and coworkers, I began to recognize natural groupings of metadata. This lead to the realization that it might be easier — and more productive — to think about these groupings rather than individual fields.

I therefore propose a typology for Postgres extension metadata.

Extension Metadata Typology

Essentials

Essential information about the extension itself, including its name (or unique package name), version, list of authors, license, etc. Pretty much every metadata format encompasses this data. Ecosystem applications use it for indexing, installation locations, naming conventions, and display information.

Artifacts

A list of links and checksums for downloading the extension in one or more formats, including source code, binaries, system packages, and more. Apps use this information to determine the best option for installing an extension on a particular system.

[...]

FOSSCOMM 2023 Heraklion — How PostgreSQL helps you enforce best practices.md
Posted by Jimmy Angelakos on 2024-02-20 at 14:00

So here's a very delayed blog post! Back in late October 2023, I visited the FOSSCOMM conference in Heraklion, Crete, Greece, organised for the 15th year running (!) by the Greek open source community. As a community conference, there is a very heavy student presence, a really broad range of topics, and geeky stuff, and that is just awesome. The DIY spirit, volunteering and camaraderie brings out the best in people and makes it all enjoyable for attendees. This conference accepts talks in either Greek or English and the level of talks was quite high. There were talks focused on beginners, community issues, but also very technical in-depth dives into stuff you've never thought about.

After giving this talk, titled "How PostgreSQL helps you enforce best practices" at PGConf.NYC I thought it might be interesting to the FOSSCOMM crowd, and it was indeed accepted for the conference. I was happy to find out it would be recorded — but unfortunately, for a couple of months, it looked like the footage had been lost.

One day last week, I was pleasantly surprised when one of the organisers got in touch and said they could give me access to some of the raw video files they had recovered from the talk recording! So I spent a couple of hours editing those, and what follows below is the best that I could do with them (so my apologies if it isn't TED talk quality).

The basic idea is that many systems have commonalities that lend themselves to similar best practices for administration and maintenance. Database systems in particular are complex and have many correctness, performance, and security considerations. This talk discusses PostgreSQL best practices, and how these translate to best practices in general - it is very dense and that is why I've decided instead of rushing through the content, to expand on it and write a BOOK (but that's another blog post!)

Topics discussed:

  • Using the proper data types
  • Controlling parallelization & transaction rate
  • How locking affects performance
  • Home-brewi
[...]

SQL Optimization: a comprehensive developer’s guide
Posted by oded valin on 2024-02-19 at 10:23

Ok you got a database, how do you optimize SQL performances? To answer this question you need a lot of time and effort in order to understand workloads and performance patterns, evaluate degradation and apply corrective measures. However there are standard practices that you can implement to improve performances. This SQL optimization guide will showcase some best practices that apply across almost every database and can be a good starting point to optimize your database workloads.

For a better understanding of your performance patterns and SQL optimization opportunities, you can always rely on EverSQL by Aiven. With EverSQL’s sensor you can monitor your database and receive performance insights. With the one-click SQL optimization engine you can retrieve index and SQL rewrite suggestions that can speed up your performance.

How to optimize SELECT SQL queries

Optimize SELECT SQL queries by understanding the query execution plan

How

All modern databases, like MySQL and PostgreSQL, define an optimal query execution plan based on the cardinality of the various tables involved and the auxiliary data structures available like indexes or partitions. Both MySQL and PostgreSQL provide a command called EXPLAIN to show the execution plan of a statement. From the execution plan, you can understand how tables are joined, if an index is used, if a partition is pruned and many other aspects of the query execution that could alter the performance. The query plan gives hints about the cost of each operation and can flag if an index is not being used.

To get the execution plan of a query, prefix the query with EXPLAIN like the following:

explain SELECT id
FROM orders
WHERE
order_timestamp between '2024-02-01 00:00:00' and '2024-02-03 00:00:00' 
OR status = 'NEW';

The database returns the plan showcasing, in this example, the usage of two indexes idx_order_date and idx_order_status and a BitmapOr between the two results.

                                                                       
[...]

Logging: What, Why and When
Posted by Henrietta Dombrovskaya on 2024-02-19 at 02:06

There are multiple PostgreSQL configuration parameters that determine what exactly to log. In Postgres 16, there are 30 of them if you exclude the ones related to the file naming and location. We rarely change them when we create a new instance. In a best-case scenario, your organization has a default set of cnfiguration parameters which are applied when a new instance is created.

Why are we usually ignorant of these parameters? My hypothesis is that that’s because we rarely use Postgres logs for anything. There are many extensions and third-party tools to monitor different Postgres activities, to alert of high CPU or IO, of too many connections, disk usage increase, tables bloat, and so on. None of them use information from Postgres logs. Even when we want to monitor queries’ performance changes, we use pg_stat_statements, not logs.

If anybody who is reading this blog uses Postgres logs “as is” on a regular basis, please let me know! PgBadger fans – please hold off; we will get there!

During all my Postgres years, all the cases when I or anybody whom I know would use Postgres log were for investigations; that’s when we try to find something specific that happened at some particular moment or period of time, but never for any regular monitoring. That’s why, in a majority of cases, we do not care. But if we truly “do not care,” why do any logging at all? One of the default log-related settings is log_min_duration_statement=10 which means that Postgres will record all statements which take 10 or more milliseconds.

On the surface, it makes sense. We want to record only those SQL statements that exceed the response time threshold so that we can investigate what exactly causes the problem when an application experiences slowness. However, this logging does not help us determine whether this is a sudden spike, or the execution time was always just below 10 ms and finally reached the magic number, or even what portion of similar queries is slow.

Anybody who ever was on support for PostgreSQL datab

[...]

Maintainable Podcast — Maintainable…Databases? 🎙️
Posted by Andrew Atkinson on 2024-02-19 at 00:00

Recently I appeared as a guest on the Maintainable Podcast with Robby Russell. I’ve admired this podcast for a long time based on the guests, conversations, and focus on software maintenance.

Much of what a software engineer does is evolve and maintain existing systems. I’m glad Robby created this podcast to explore maintenance topics, and has gathered the perspectives of many practitioners.

What is Maintainable Software?

Robby starts each episode by asking the guest what maintainable software is in their perspective.

I wanted to give an authentic response and avoid cliches, and hopefully provide something original and tangible.

My answer was that “the level of effort required should be proportional to the impact.” Despite attempting for this to be an original answer, I probably read it in some software book earlier. But this is definitely something I’ve “felt” when it’s not going well.

By making a change, I’m referring to the whole process of writing the code, writing a test or otherwise verifying the correctness, then getting the change released to production.

While this might sound simplistic, in my experience this can be a complex challenge when processes and systems make this seemingly simple set of steps exceedingly tedious.

What are some of the ways that happens? Having a development environment that’s difficult to initially set up and maintain, having a test suite that’s unreliable and slow, having an onerous code review process, or having a slow or unreliable release process, are all some of the ways.

Check out the podcast for more on this!

Well Maintainable Databases

Although guests on the podcast normally talk about software maintenance related to the software and code they maintain, given my background with databases, why not consider what a well-maintained database that we operate could look like? We discussed this in part by discussing some undesirable things we might find in a production system.

  • Is the database software a recent major version? If n
[...]

Moving local settings for pg_hba.conf and postgresql.conf out of PGDATA
Posted by Dan Langille on 2024-02-18 at 12:30

One of the configuration aspects of FreeBSD I have long liked is the concept of default values which are overridden by the user. For example, /etc/defaults/rc.conf (see The /etc directory). The default values in this file can be overridden by the user with their preferred values in /etc/rc.conf (or /etc/rc.conf.local, and other locations if you so choose (search for rc_conf_files)).

With that approach in mind, I wanted to do the same thing with my PostgreSQL installations.

I also wanted to configure pg_hba.conf and postgresql.conf via automated tools (e.g. Ansible). It is easier to drop one file with all your preferred values instead of parsing an existing file. You also don’t have to refresh your copy of the file each time a new release comes out with slight changes.

With that in might, I’ve added this entry to the end of postgresql.conf:

include_dir '/usr/local/etc/postgresql.conf.d'

In that directory, I have:

[12:26 r720-02-pg01 dan /var/db/postgres] % cat /usr/local/etc/postgresql.conf.d/postgresql.local.conf
cat: /usr/local/etc/postgresql.conf.d/postgresql.local.conf: Permission denied
[12:26 r720-02-pg01 dan /var/db/postgres] % sudo cat /usr/local/etc/postgresql.conf.d/postgresql.local.conf
listen_addresses = '127.163.54.32'

ssl = on

ssl_cert_file = '/usr/local/etc/ssl/r720-02-pg01.int.unixathome.org.cer'                # (change requires restart)
ssl_key_file  = '/usr/local/etc/ssl/r720-02-pg01.int.unixathome.org.key'                # (change requires restart)
ssl_ca_file   = '/usr/local/etc/ssl/ca.cer'                     # (change requires restart)

work_mem = 1MB                         # min 64kB
maintenance_work_mem = 1GB

max_wal_size = 1GB
min_wal_size = 80MB

max_wal_size = 1536

checkpoint_completion_target = 0.7

client_min_messages = notice
log_min_messages = notice
log_min_error_statement = notice

log_checkpoints = on
log_connections = on
log_disconnections = on
log_duration = on

log_lock_waits = on
log_statement = 'all'

log_timezone = 'UTC'
#datestyle = 'iso, mdy'

[...]

Selective Column Replication in PostgreSQL
Posted by semab tariq in Stormatics on 2024-02-16 at 10:06

Discover selective column replication in PostgreSQL - a powerful feature for replicating specific columns across databases.

The post Selective Column Replication in PostgreSQL appeared first on Stormatics.

FOSDEM and FOSDEM PGDay 2024 Review
Posted by Andreas Scherbaum on 2024-02-15 at 23:27

FOSDEM 2024 in Brussels is history, and as always it was a good event and nice trip to Belgium.

PostgreSQL Europe did organize a PGDay on Friday before FOSDEM. In addition we had a stand at the main FOSDEM event, and a Devroom on Sunday.

Functions and Procedures: Learning PostgreSQL with Grant
Posted by Grant Fritchey in Redgate on 2024-02-15 at 22:09

One of the most useful constructs in SQL Server is the stored procedure. It gives you a way to do several things. First up, you can store code within the database. Next, you can parameterize queries so that you’re not hard coding or generating ad hoc queries every time you want to call them. You can put in transaction handling, multiple result sets, security and more. They truly are a useful tool. So, of course, I went straight to CREATE PROCEDURE in PostgreSQL.

I begin reading about procedures in PostgreSQL and I’m suddenly hit with a revelation. Procedures can’t return result sets. They can’t return anything except INOUT parameter values (more on that in a bit). What the heck do we do to return results? Thankfully, Ryan Booz pointed me in the right direction, CREATE FUNCTION.

We have functions in SQL Server. They are of course scalar, table valued or multi-statement table values (AKA, sort of evil). However, they’re not that similar in form or function to SQL Server’s procedures, so what’s going on in PostgreSQL?

The Differences Between Functions and Procedures

In PostgreSQL, CREATE FUNCTION predates CREATE PROCEDURE, by a considerable margin. Procedures were added in PostgreSQL 11 (we’re on 16 as of this writing, and 17 is on the roadmap for later this year!). If you look at the syntax in the links already provided, you’re going to notice quite a few similarities. This is because these two constructs are similar. However, there are a few differences. Let’s talk about them.

First up, as was already mentioned, a procedure in PostgreSQL cannot return a result set. A function can return a result set. In fact, functions can return multiple result sets. Further, this being PostgreSQL, those result sets are objects, which means you can return, not simply a set of columns and rows (tuples), but a table, a cursor (the good kind, and yes, there’s a good kind of cursor, at least in PostgreSQL, it’ll take another article to explain it), other things. Both procedures and functions do allow for output

[...]

Top posters

Number of posts in the past two months

Top teams

Number of posts in the past two months

Feeds

Planet

  • Policy for being listed on Planet PostgreSQL.
  • Add your blog to Planet PostgreSQL.
  • List of all subscribed blogs.
  • Manage your registration.

Contact

Get in touch with the Planet PostgreSQL administrators at planet at postgresql.org.