Latest Blog Posts

Announcing PGDSAT to satisfy CIS benchmarks for PostgreSQL
Posted by Avi Vallarapu on 2024-04-23 at 20:40

In an era where data breaches are both costly and damaging to reputations, implementing best practices and robust security measures is crucial. PostgreSQL has emerged as one of the most successful and highly adopted open-source databases, competing with the enterprise standards of Oracle and SQL Server. Its increasing popularity is evident from DBEngine rankings, Stackoverflow […]

The post Announcing PGDSAT to satisfy CIS benchmarks for PostgreSQL appeared first on HexaCluster.

PGXN V2 Architecture
Posted by David Wheeler in Tembo on 2024-04-23 at 20:08
Diagram of the extension distribution ecosystem vision, featuring “Root Registry” in the center and bidirectional lines to four of the surrounding nodes: “Web UX”, “Client”, “Packaging”, and “Interactions”. The “Packaging” and “Interactions” boxes also have a bi-directional arrow between them, while the fifth box, “Stats & Reports”, has a bi–directional arrow pointing to “Interactions” and another arrow pointing to “Root Registry”.

PGXN Future Architecture

High-level diagram of the six logical services making up the proposed future extension distribution architecture. The Root Registry sits at the center, providing APIs for the other services to consume for their own use cases. Trusted instances of those services submit additional data about extensions via the Interactions service to enhance and enrich the service to better inform and delight users.

Over on the Postgres Wiki I’ve published a new document for the PGXN v2 project: PGXN v2 Architecture. It has diagrams, such as the one above! From the introduction:

This document outlines the project to build extension distribution, discovery, and packaging tools and services to power the growth, accessability, and utility of the Postgres extension ecosystem. Taking the overall Postgres community as its audience, it defines the services to be provided and the architecture to run them, as well as the strategic vision to guide project planning and decision-making.

With the goal to think strategically and plan pragmatically, this document describes the former to enable the latter. As such, it is necessarily high-level; details, scoping, and planning will be surfaced in more project-focused documents.

Bear in mind that this document outlines an ambitious, long-term strategy. If you’re thinking that there’s too much here, that we’er over-thinking and over-designing the system, rest assured that project execution will be fundamentally incremental and pragmatic. This document is the guiding light for the project, and subject to change as development proceeds and new wrinkles arise.

For those of you interested in the future of Postgres extension distribution, please give it a read! I expect it to guide the planning and implementation of the the new services and tools in the coming year. Please do consider it a living document, howeve

[...]

Installing pg_tle on Ubuntu: A Quick Guide
Posted by Robins Tharakan on 2024-04-23 at 13:25
Compile & Install pg_tle on PostgresPostgreSQL is a powerful database, but sometimes you want to extend its functionality with custom features, and that's where extensions like pg_tle (Trusted Language Extensions) come into play.If you're new to pg_tle, here's a quick round-up of why it makes sense for you - See Unlock PostgreSQL Superpowers with pg_tle.Given the power of&

PL\pgSQL Conversion Gotchas : Functions with Out parameter and return type.
Posted by Deepak Mahto on 2024-04-23 at 06:11

If you’re fascinated by the internals of PL/pgSQL processing and want to handle it efficiently, especially if you’re migrating procedural logic from another database, be sure to check out the previous posts in the series on PL/pgSQL Conversion Gotchas – Part 1 and Part 2

In Oracle, we used to build multiple functionalities within procedural blocks, either in functions or procedures. In these blogs, we will explore some of the gotchas that we need to take care of.

Migrating Functions with Out Parameters and Return Types

Oracle allows functions to have arguments defined as out parameters and also return additional status arguments or execution statuses as part of function execution. Though it is not best practice, when migrating legacy code bases, we encounter all types of scenarios.

Below is one such sample function with out params and return type.

create or replace function func_test1
(var1 in integer,var2 out integer )
return integer
as
begin
var2 := var1 + 1;
return var2*2;
end;

If we run it in Oracle, it will produce two values: one as an out parameter and another as a return value

set serveroutput on;
declare
  l_out_var2 integer; 
  l_out_ret integer;  
begin
  l_out_ret := func_test1(1,l_out_var2 );
  dbms_output.put_line('Out Value - ' || l_out_var2);
  dbms_output.put_line('Return Value - ' || l_out_ret);
end;
/
/

Let’s attempt to convert the sample function to PostgreSQL using the PL/pgSQL language. As a reference, we’ll aim to build similar code logic as in the Oracle sample, creating a function in PostgreSQL with PL/pgSQL that includes both an out parameter and a return type.

create or replace function func_test1
(var1 in integer,var2 out integer )
returns integer
language plpgsql
as
$$
begin
var2 := var1 + 1;
return var2*2;
end;
$$;

Let’s proceed with deploying it within the PostgreSQL database

In PostgreSQL, when there’s a requirement to return multiple values, we typically transform them into record types with attributes corresponding t

[...]

PgTraining Free Online Event: Material Available
Posted by Luca Ferrari on 2024-04-23 at 00:00

The material and the videos are now online!

PgTraining Free Online Event: Material Available

The past Friday, on April 19th, we did our fourth edition of the webinar dedicated entirely to PostgreSQL, provided by PgTraining.


As in the previous editions, we had three talks and an open discussion at the end. The talks (all in italian) were:

  • Introduzione al linguaggio PL/Java (“An introduction to the PL/Java language”), from yours truly;
  • PgVector - in R768 nessuno può sentirti urlare (“PgVector - in R768 nobody can hear you screaming”), by Chris Mair;
  • Repliche logiche e migrazione di versione a caldo da PostgreSQL 12 a PostgreSQL 16 (“Logical replication and hot upgrade from PostgreSQL 12 to PostgreSQL 16”), by Enrico Pirozzi


The material is available on our Gitlab repository and such repository contains also links and material from the previous editions!

Some material is still under upgrading, so if not already there, it will appear any moment soon.

About Talk Selection for POSETTE: An Event for Postgres 2024
Posted by Claire Giordano in CitusData on 2024-04-22 at 17:42

As promised in the CFP for POSETTE: An Event for Postgres 2024, all of the talk selection decisions were emailed out on April 17th. Our talk selection work has now concluded, with the possible exception of accepting proposals from the Reserve list.

So what’s next? First I want to thank all of you Postgres people who submitted such amazing talk proposals into the CFP for POSETTE, now in its 3rd year. I was so impressed by the submissions and wish we could have accepted more of them.

And I also want to thank Alicja Kucharczyk, Daniel Gustafsson, and Melanie Plageman from POSETTE’s Talk Selection Team for contributing their time and expertise to collaborate with me to select the talks for this year’s virtual POSETTE event. It’s not easy to carefully read through and review 184 talk proposals—in just 8 days—to come up with the program for an event like #PosetteConf.

That’s right, 184 talk proposals—from 120 unique speakers. (The CFP had a maximum of 4 submissions per speaker.) With just 38 talks to accept this year, that means POSETTE 2024 has a ~20% talk acceptance rate. Bottom line, we had some difficult decisions to make.

So many great talk proposals we had to lengthen the POSETTE schedule to make space

The original POSETTE plan for 2024 was to have 4 livestreams with 9 talks each. The math looked like this:

  • Each livestream would have:
    • 1 invited keynote—not selected through the CFP talk selection process, but rather an invited keynote speaker
    • 8 unique talks selected via the CFP process
  • Hence, 36 talks total:
    • 32 talks selected via the CFP process + 4 unique keynotes

However, the best laid plans of mice and men and all that, we had to throw that math out the window. There were too many good talk proposals.

Luckily the talk production team led by Teresa Giacomini was able to rejigger their recording schedules to make room for 6 more talks.

So the final POSETTE 2024 schedule will have:

  • 42 talks total: 38
[...]

🎙️ Hacking Postgres s02e03
Posted by David Wheeler in Tembo on 2024-04-22 at 16:28
Screenshot from Hacking Postgres, with a classic TV-shaped view on the left labeled “Ry Walker, CEO Tembo” featuring a middle-aged white man in a black hoodie and large black headphones; and another TV-shaped view on the right labeled “David Wheeler, Principal Architect, Tembo” featuring a middle-aged white man with a grey beard, eyeglasses, blue collared shirt, and large beige headphones.

Last week I appeared on s02e03 of the Hacking Postgres podcast.

The experience I had after my independent Consulting gig for 10 years working in companies was, like, bringing up other people and being supportive of other people and hearing from a diversity of voices and perspectives makes everything better.

That’s part of why I want to get so much input on and feedback on the stuff that I’m hoping do with PGXN v2 — or whatever we ultimately call it. But people matter, more than the technology, more than any of the rest of it.

I quite enjoyed this wide-ranging discussion. We covered my history with the Postgres community, a bunch of the projects I’ve worked on over the years, plans and hopes for the PGXN v2 project, perspectives on people and technology, and exciting new and anticipated features of Postgres. Find it wherever fine podcasts are streamed, including:

PGDay UK 2024 - It's back on September 11th in London, England!!
Posted by Dave Page in PGDay UK on 2024-04-22 at 12:47

PGDay UK 2024 will take place in London on September 11th at the Cavendish Conference Centre, London, UK.

Our Call for Papers and Call for Sponsors are now open.

We are accepting proposals for talks in English. Each session will last 50 minutes, and may be on any topic related to PostgreSQL. The submission deadline is 9th June 2024 23:59:59 BST. Selected speakers will be notified on or before July 1st, 2024. Please submit your proposals here:

https://2024.pgday.uk/call-for-papers/

The proposals will be considered by a committee who will produce a schedule to be published nearer the conference date. The members of the committee are listed on the website linked above. All selected speakers will get free entry to the conference.

Our Call for Sponsors is also open. Take your chance to present your services or products to the PostgreSQL community - or see it as a give-back opportunity. The benefactor sponsorship level also includes a free entrance ticket. Please head to:

https://2024.pgday.uk/become-sponsor/

for more details.

As usual, if you have any questions, don't hesitate to contact us at contact@pgday.uk.

We look forward to seeing you in London on September the 11th!

PGDay UK 2024 is a PostgreSQL Europe event run according to the PostgreSQL Community Conference Recognition programme. Local logistics and other services are provided by Slonik Enterprises Ltd. on a not-for-profit basis.

Recursive CTEs: Transforming and Analyzing Data in PostgreSQL, Part 3
Posted by Ryan Booz in Redgate on 2024-04-22 at 01:12

The first two articles in this series demonstrated how PostgreSQL is a capable tool for ELT – taking raw input and transforming it into usable data for querying and analyzing. We used sample data from the Advent of Code 2023 to demonstrate some of the ELT techniques in PostgreSQL.

In the first article, we discussed functions and features of SQL in PostgreSQL that can be used together to transform and analyze data. In the second article, we introduced Common Table Expressions (CTE) as a method to build a transformation query one step at a time to improve readability and debugging.

In this final article, I’ll tackle one last feature of SQL that allows us to process data in a loop, referencing previous iterations to perform complex calculations: Recursive CTE’s.

SQL is Set-based

SQL is primarily a set-based, declarative language. Using standard ANSII SQL and platform-specific functions, a SQL developer declares the desired outcome of a query, not the process by which the database should retrieve and process the data. The query planner typically uses statistics about the distribution of data to determine the best plan to get the desired result and return the full set of rows. While CTE’s and LATERAL joins make it feel like we can use the output of one query to impact another, those are always a one-shot opportunity. As a set-based language, it’s impossible to do algorithmic calculations, the ability to use the output of a query as input and control to another in a loop.

Stated another way, early versions of the SQL standard did not have procedural capabilities. To do that, most database platforms use their own superset of SQL that provides procedural capabilities. By default, this is T-SQL in SQL Server and pl/pgsql in PostgreSQL.

That changed with the SQL:1999 standard. With this new feature, implemented by all major databases, SQL became a Turing-complete language that can solve complex calculations in a single query.

Recursive Common Table Expressions (aka. Hierarchical Queries)

[...]

Using PL/Java: need for clarifications
Posted by Luca Ferrari on 2024-04-22 at 00:00

Sometimes it happens: I write something in a rush, and present it in a not-optimal way. And then I get advices!

Using PL/Java: need for clarifications

On January, I wrote an article about installing PL/Java on Rocky Linux, and about some of the difficulties I had in achieving a fully operational installation, even if I did not dig enough into the problems that I encountered.


Chapman Flack, the most active developer in the project at the moment, take the time to write to me a very detailed email with a lot of suggestions for improvements and providing corrections to some of the misconceptions I present in such an article.


I’m really glad to have received all those insights, and in order to spread the word, I’m writing here another article that, hopefully, fixes my mistakes. I’m not following the same order that Chapman presented them to me, since in my opinion some issues are much more important than others, so I present from the most important to the least one, according to me.

Editing the java.policy file

In my previous article, I advised readers to edit java.policy in the case there was a problem with Java permissions when executing PL/Java code. Despite the fact that I clearly stated that relaxing the permissions to all permissions was not a good idea, Chapman emphasized two main problems in my example: 1) I was editing the main policy file, therefore changing the policy rules for all the Java code, not only for PL/Java one; 2) adding java.security.AllPermission made no distinction between trusted and untrusted languages.

Chapman pointed out that PL/Java uses a customized policy file, that can be found in the PostgreSQL configuration directory, hence in $(pg_config --sysconfdir). This customizable configuration is available since PL/Java version 1.6, and is documented here in the section “Permissions available in sandboxed/unsandboxed PL/Java”. This file defines two main principals:



grant principal org.postgresql.pljava.PLPrincipal$Sandboxed * {
};


grant princ
[...]

Unlock PostgreSQL Superpowers with pg_tle
Posted by Robins Tharakan on 2024-04-20 at 08:51
pg_tle - A Must-Know for DevelopersPostgreSQL is a fantastic database, packed with features. But sometimes, you need to add a little something extra – a custom function, a specialized data type, or maybe a procedure written in your favorite programming language. That's traditionally where PostgreSQL extensions come in. But for developers working with managed databases (like Amazon RDS),

A Deeper Look Inside PostgreSQL Visibility Check Mechanism
Posted by cary huang in Highgo Software on 2024-04-19 at 22:40

What is Visibility?

Simply put, the visibility refers to whether a row of data (Heap Tuple by default) should be displayed to the user in certain states, backend processes, or transactions.

For example,

  • A user changes a data record from ‘A’ to ‘B’ via UPDATE
  • PostgreSQL handles this by INSERTing a new record call ‘B’ while marking record ‘A’ as ‘invisible’
  • Both records exist in the system, but ‘B’ is visible, while ‘A’ is not.
  • A deleted or invisible record is also referred as a ‘dead’ tuple
  • One of the responsibilities of VACUUM process is to clear these ‘dead’ tuples to free up space.

MVCC (Multi-Version Concurrency Control) is a method in which each write operation creates a “new version” of the data while retaining the “old version”. This allows concurrent read and write operations without blocking each other. PostgreSQL uses a variant of MVCC, also called Snapshot Isolation to isolate concurrent transactions.

So, it is possible that a single piece of data could have multiple “versions” of it, and it is PostgreSQL’s responsibility to determine which ‘version’ shall be presented to the user based on multiple factors. This act is also known as the “visibility check” or “visibility control”

In this blog, we will dive into PostgreSQL’s visibility check mechanism to understand how it works.

Visibility Indicators

To my knowledge today, PostgreSQL determines a data’s visibility based on the following indicators:

  • Transaction ID
  • xmin
  • xmax
  • cid
  • transaction snapshot
  • CLOG (Commit Log)
  • hintbit

what exactly are these? Let’s find out.

transaction ID

This should be self-explanatory. All transactions in PostgreSQL are associated with a ID number called transaction ID. This is mainly used to check if a data in question is inserted or deleted in the current transaction. We will explore more on this in the later part of the blog.

xmin, xmax, cid and hintbit

These indicators are categorized together because they are all

[...]

How to run TLS regression test in PostgreSQL
Posted by David Z in Highgo Software on 2024-04-19 at 21:26

1. Overview

In my previous blogs, I discussed Setting up a debugging environment in PostgreSQL to better understand OpenSSL APIs, Enhance PostgreSQL TLS Security with OCSP Stapling, and How to setup TLS connection for PostgreSQL. In this blog, I will share a simple procedure about How to run SSL/TLS regression tests in PostgreSQL.

2. Postgres regression tests

Whenever we want to add a new feature or make some fix, we should run the tests provided by PostgreSQL to make sure nothing is broken. If the new feature or the fix does not have corresponding tests, we should consider adding some to ensure it will not break in the future when other changes are made. PostgreSQL provides a comprehensive document) explaining all available tests. For example, if you want to run a quick regression test to check if any “core” feature may be broken, you can run make check for a temporary installation within the build tree or make installcheck against a running PostgreSQL server. make check and make installcheck will only test the built-in functionality of the PostgreSQL server. To run all tests for the modules that have been selected to be built, including the core tests, you need to run either make check-world or make installcheck-world corresponding to a temporary installation within the build tree or a running PostgreSQL server. However, some features still won’t be tested by running make check-world or make installcheck-world, such as the security-related features, ssl, kerberos, etc. One of the reasons is that these test cases need some special settings. To run a regression test for these features, as the document mentioned, you need to run the tests like make check-world PG_TEST_EXTRA='kerberos ldap ssl load_balance'.

3. Run SSL/TLS Regression Tests

To run the regression tests for SSL/TLS related features, first, you need to compile PostgreSQL with the SSL library enabled. For example, running the commands below will compile PostgreSQL with OpenSSL libraries:

git clone https://github.com/postgres/post
[...]

Auto-archiving and Data Retention Management in Postgres with pg_partman
Posted by Keith Fiske in Crunchy Data on 2024-04-19 at 13:00

You could be saving money every month on databases costs with a smarter data retention policy. One of the primary reasons, and a huge benefit of partitioning is using it to automatically archive your data. For example, you might have a huge log table. For business purposes, you need to keep this data for 30 days. This table grows continually over time and keeping all the data makes database maintenance challenging. With time-based partitioning, you can simply archive off data older than 30 days.

The nature of most relational databases means that deleting large volumes of data can be very inefficient and that space is not immediately, if ever, returned to the file system. PostgreSQL does not return the space it reserves to the file system when normal deletion operations are run except under very specific conditions:

  1. the page(s) at the end of the relation are completely emptied
  2. a VACUUM FULL/CLUSTER is run against the relation (exclusively locking it until complete)

If you find yourself needing that space back more immediately, or without intrusive locking, then partitioning can provide a much simpler means of removing old data: drop the table. The removal is nearly instantaneous (barring any transactions locking the table) and immediately returns the space to the file system. pg_partman, the Postgres extension for partitioning, provides a very easy way to manage this for time and integer based partitioning.

pg_partman daily partition example

Recently pg_partman 5.1 was released that includes new features such as list partitioning for single value integers, controlled maintenance run ordering, and experimental support for numeric partitioning. This new version also includes several bug fixes, so please update to the latest release when possible! All examples were done using this latest version.

https://github.com/pgpartman/pg_partman

First lets get a simple, time-based daily partition set going

CREATE TABLE public.time_stuff
    (col1 int
        , col2 text default 'stuff'
 
[...]

PostgreSQL Internals Part 3: Understanding Processes in PostgreSQL
Posted by semab tariq in Stormatics on 2024-04-18 at 10:01

We explore PostgreSQL Internals in detail - its processes, architecture, the different types of processes available, and their respective responsibilities.

The post PostgreSQL Internals Part 3: Understanding Processes in PostgreSQL appeared first on Stormatics.

Introducing Snowflake Sequences in a Postgres Extension
Posted by Cady Motyka in pgEdge on 2024-04-18 at 06:07

In a PostgreSQL database, sequences provide a convenient way to generate a unique identifier, and are often used for key generation. From the community, PostgreSQL provides functions and SQL language to help manage sequence generation, but the sequences themselves are not without limitations in a multi-master environment. Snowflake sequences from pgEdge work seamlessly in a multi-master PostgreSQL cluster to remove those limitations so your data can thrive at the network edge.

Why are Sequences an Issue?

In a distributed multi-master database system, sequences can get complicated. Ensuring consistency and uniqueness across the nodes in your cluster is a problem if you use PostgreSQL sequences; the Snowflake extension steps up to automatically mitigate this issue.PostgreSQL sequence values are prepared for assignment in a table in your PostgreSQL database; as each sequence value is used, the next sequence value is incremented. Changes to the next available sequence value are not replicated to the other nodes in your replication cluster. In a simple example, you might have a table on node , with 10 rows, each with a primary key that is assigned a sequence value from 1 to 10; the next prepared sequence value on will be 11. Rows are replicated from  to  without issue until you add a row on . The PostgreSQL sequence value table on  has not been incrementing sequence values in step with the sequence value table on . When you add a row on , it will try to use the next available sequence value ( will be 1 if you haven't added a row on ), and the  will fail because a row with the primary key value of 1 already exists. This disorder can be monitored and corrected by manually coordinating the PostgreSQL sequences between nodes in the cluster, but that quickly becomes complicated and potentially impacts the user experience as you add more nodes to the cluster. 

Introducing Snowflake Sequences

An alternative to using PostgreSQL sequences is to use a guaranteed unique Snowflake sequence. Snowflake sequences are repre[...]

Back from PGConf.DE 2024
Posted by Stefan Fercot in Data Egret on 2024-04-17 at 16:25

Last Friday (April 12, 2024) happened, with a record of 270 registrations, the 8th annual PGConf.DE in Munich, Germany.

There definitely was a great lineup: 1 German track, 2 English tracks and 1 sponsors track.

While my awesome colleague Valeria was speaking about Navigating the Postgres community, I attended Martín Marqués talk about Postgres native incremental backups.

If you’re following the -hackers mailing list, you may have noticed that it is a hot topic at the moment. PostgreSQL 17 should receive the possibility to take incremental backups with pg_basebackup. Even though I think this is a first step in adding more backups capabilities in core, the need of orchestration (pg_basebackup, pg_verifybackup, pg_combinebackup, WAL archives and retention management,…) just grew even more. I’m still convinced that users should use external tools driven by the community (yes, pgBackRest I’m looking at you) compared to what is currently available. However, I’m really curious (and excited) to see the new WAL summarizer process in action and how other tools could benefit from it! Anyway, kudos to all the contributors who worked on that feature!

To be honest, I haven’t attended a lot of other talks, because it was so great to have so many talks with other attendees that time flew so fast!

Not speaking German, I had to miss the fantastic Sichere dein pgBackRest gegen Ransomware! talk given by Gunnar “Nick” Bluth. You kinda know how much I care about that tool and I was sad to miss it.

To refuel some energy, I took a cruise in the PostgreSQL Seas with Lætitia AVROT (dressed as Nami from One Piece), looking for some Hidden Treasures (pattern matching, complex datatypes handling,…).

It really was a great conference, kudos to all the volunteers who organized it and all the #PostgresFriends there!


And now, do you want more Postgres content in Germany? Then you should look for the local user groups!

As Andreas pointed out to Valeria‘s talk, there are some meetups always looking

[...]

PGconf.de 2024 in Munich
Posted by Christoph Berg in Cybertec on 2024-04-17 at 12:17

As the two talks I had submitted did not get picked for this year´s PGconf.de, my prep for the conference was all about booking train tickets from Düsseldorf to Munich. Being an early bird definitely paid off – Bernd Helmle and I enjoyed first class luxury, while our procrastinating colleagues ended up paying more for second-class seats.

The Thursday journey was pretty smooth, with the conductor proudly announcing our punctual arrival – surprised it didn’t make it to the news with Deutsche Bahn’s track record. I even managed to get a good amount of work done, working on CYBERTEC’s PostgreSQL Enterprise Edition Docker images that we want to publish soon.

The day before the event

Some colleagues arrived throughout the afternoon and we went for a “relaxing” 7 km walk to the city center before heading to the Wirtshaus Eder where we had our company dinner. It turned out to be the same restaurant the conference organizers had chosen for the speakers’ dinner, which made it simple for our colleagues giving talks—they just had to decide whether to sit to the left of the right side of the main entrance. The food and service were great, and of course, nothing compares to Bavarian beer.

PGConf.de 2024

Conference day on Friday, 12th of April

The opening session on Friday started with a tribute to Simon Riggs, a cherished visitor and speaker at many past PGconf.de conferences, who had tragically passed away recently in a plane accident..

The first talk I attended was by Michael Banck, comparing the three major PostgreSQL cloud hosters – Amazon RDS, Google Cloud SQL and Microsoft Azure Postgres. An important highlight from the talk was that all three offer a (mostly) solid service, with Amazon being the most advanced one, and Azure having made significant progress over the past three years. This is also visible in the increasing number of community members working at these companies contributing to PostgreSQL.

I skipped the next session, chatting with Peter Eisentraut about community matters and PostgreSQL

[...]

GQL:2024 is out
Posted by Peter Eisentraut in EDB on 2024-04-17 at 04:00

The news today is that GQL:2024, the first version of the GQL standard, has been published by ISO. GQL is a new language for graph databases, like SQL is for relational databases.

Here is the link to it on the ISO web site:

This is the culmination of work that has been going on formally within ISO since 2019 and for several years more before that in preparation. (There was even a manifesto! How can you go wrong?)

I suppose this doesn’t affect my usual work with PostgreSQL and SQL much. But having a new database language to work on in the same ISO working group that deals with SQL has definitely created new energy and injected new ideas into the process. Some of the new functionality in SQL:2023 was influenced by or intentionally shared with GQL, such as the SQL/PGQ sublanguage, and also less obvious things like this. I expect more synergies like that in the future.

Now, I wish graph database vendors good fortunes in implementing all this. 😉

PostgreSQL 17 features I am most waiting for
Posted by Henrietta Dombrovskaya on 2024-04-17 at 00:58

I won’t be able to attend Magnus’ talk at PG Day Chicago because it is scheduled at the same time as my talk, but fortunately, I attended his talk at SCaLe. There are several features I am really excited about, and I can’t wait to start using them!

  1. Event trigger on login. Yes, yes, I know, Magnus told us it’s a footgun, but I promise, I know how to use it safely! I have so many potential usages!!!
  2. Error handling in COPY. Since I started my new development with pgBadger, my biggest fear is that if just one record is somehow wrong, the whole file won’t load, and I can’t do anything with that – except for that now I can!!!
  3. PgChangePassword!!! I have security-defined functions to handle that because what could be more natural than giving the user an option to change their own password?! Well… except for that, way too often, there is one user for the whole department… but we will address it later!
  4. Redundant NOT NULL removal – that’s great!!! Such a time-saver! And again, what can be ore natural?!
  5. Self-join removal!!! I can’t even tell how many times I fought with this problem starting from my Sybase days and all the way through my Oracle days! How many times I had to explain this “magic” to people! And now, finally, I will be able to remove this explanation from our performance book!
  6. You might think that my favorite feature is the temporal primary key – well, it is; however, I have already started to worry about what’s next. The temporal PK is implemented just as it should be, but what I see in the future code temporal foreign key makes me worried. And nobody answers my concerns, so I do not know how loud I should scream!

Summary: if you didn’t hear this talk yet, please go to this talk at PGDay Chicago – I will be happy to lose this competition :))

CloudNativePG Recipe 7: Postgres Vertical Scaling with Storage in Kubernetes - part 2
Posted by Gabriele Bartolini in EDB on 2024-04-16 at 08:24

This is the second article in a series that explores advanced strategies for scaling PostgreSQL databases in Kubernetes with the help of CloudNativePG. This article focuses on horizontal table partitioning and tablespaces and how they can be used to manage large datasets. By partitioning tables based on specific criteria and optimising storage with tablespaces, PostgreSQL users can achieve better scalability and performance in cloud-native environments, just like they could in traditional VMs or bare metal deployments.

Use Cases for Merging and Splitting Partitions With Minimal Locking in PostgreSQL 17
Posted by Andrew Atkinson on 2024-04-16 at 00:00

This post looks at some interesting new capabilities managing Partitioned Tables coming in PostgreSQL 17, expected for release Fall 2024. The current major version is 16.

Current Table Partition Commands

Prior to Version 17, workflow options for partition management are limited to creating, attaching, and detaching partitions.

Once we’ve designed our partition structure, we couldn’t redesign it in place.

This applies to all partition types, whether we’re using RANGE, LIST, or HASH.

To combine multiple partitions into a single one, or to “subdivide” a single partition into multiples, we’d need to design a new structure then migrate all data rows to it. That’s a lot of steps!

What’s New?

From version 17, we have more options. We can now perform a SPLIT PARTITION operation on an existing singular partition, into two or more new ones.

If we wish to do the reverse, we’ve got that option as well. Starting from two or more partitions, we can perform a MERGE PARTITIONS (plural) operation to combine them into one.

An aside: don’t confuse this with the upsert-like SQL MERGE command which uses the same “merge” verb (oof)!

The new DDL commands are:

Here are tweets from Nori Shinoda that link to PostgreSQL git commits:

Let’s test the new commands out and think about use cases for them.

We’ll need a way to run pre-release PostgreSQL 17. Fortunately, I’ve recently compiled PostgreSQL from source code on my macOS laptop, and will use that instance and some test tables within the default postgres DB.

I use Postgres.app to run PostgreSQL 16 for most of my local development, and instances on different ports. I stop the instance on port 5432 so I can start up the one based on the compiled source code lik

[...]

Operationalizing Vector Databases on Postgres
Posted by Adam Hendel in Tembo on 2024-04-15 at 18:00

Why do we need vector databases? The proliferation of embeddings immediately brought forth the need to efficiently store, index, and search these arrays of floats. However, these steps are just a small piece of the overall technology stack required to make use of embeddings. The task of transforming source data to embeddings and the serving of the transformer models that make this happen is often left as a task to the application developer. If that developer is part of a large organization, they might have a machine learning or data engineering team to help them. But in any case, the generation of embeddings is not a one-time task, but a lifecycle that needs to be maintained. Embeddings need to be transformed on every search request, and inevitably the new source data is generated or updated, requiring a re-compute of embeddings.

Consistency between model training and inference

Traditionally, machine learning projects have two distinct phases: training and inference. In training, a model is generated from a historical dataset. The data that go into the model training are called features, and typically undergo transformations.

At inference, the model is used to make predictions on new data. The data incoming into the model for inference requires precisely the same transformations that were conducted at training. For example in classical ML, imagine you have a text classification model trained on TF-IDF vectors. At inference, any new text must undergo the same preprocessing (tokenization, stop word removal) and then be transformed into a TF-IDF vector using the same vocabulary as during training. If there’s a discrepancy in this transformation, the model’s output will be unreliable.

Similarly, in a vector database used for embedding search, if you’re dealing with text embeddings, a new text query must be converted into an embedding using the same model and preprocessing steps that were used to create the embeddings in the database. Embeddings stored in the database using OpenAI’s text-embedding-a

[...]

Mini Summit: Jonathan Katz on TLEs
Posted by David Wheeler in Tembo on 2024-04-15 at 15:22

Coming up this week: the fourth installment of the Postgres extension ecosystem mini-summit. The topic: Trusted Language Extensions, a.k.a. TLEs. Jonathan Katz of AWS will give a brief presentation on “TLE Vision and Specifics” followed by community discussion of the issues TLEs aim to address, what works and what doesn’t, and the role of TLEs in the future of the extension ecosystem.

Join us! Note! that if you reserved a spot at a prior mini-summit, sadly you will need to do so again for each subsequent summit or miss out on reminders from Eventbrite. And if Eventbrite says the event is sold out, rest assured we have plenty more virtual seats! Just send at david@ this domain, ping me on Mastodon or via the #extensions channel on the Postgres Slack or the Postgres Discord.

Rebuilding Tables online using pg_repack in PostgreSQL
Posted by Manisankar Kanagasabapathy in HexaCluster on 2024-04-15 at 08:01

Any database engine, including PostgreSQL, requires some routine maintenance to ensure optimal performance. Especially when there are massive delete or purge operations to delete rows from a Table, the fragmentation builds up within a table. Reclaim this fragmented space from disk requires a table to be rebuilt. At HexaCluster, we had many Customers asking us […]

The post Rebuilding Tables online using pg_repack in PostgreSQL appeared first on HexaCluster.

Improved query cancellation in PostgreSQL 17 and Psycopg 3.2
Posted by Denis Laxalde in Dalibo on 2024-04-15 at 05:00

Toulouse, 15 April 2024

The upcoming PostgreSQL 17 will ship with improved query cancellation capabilities, as part of the libpq, and so will the upcoming release of Psycopg version 3.2.

About a month ago, Alvaro Herrera committed the following patch to PostgreSQL:

libpq: Add encrypted and non-blocking query cancellation routines

The existing PQcancel API uses blocking IO, which makes PQcancel
impossible to use in an event loop based codebase without blocking the
event loop until the call returns.  It also doesn't encrypt the
connection over which the cancel request is sent, even when the original
connection required encryption.

This commit adds a PQcancelConn struct and assorted functions, which
provide a better mechanism of sending cancel requests; in particular all
the encryption used in the original connection are also used in the
cancel connection.  The main entry points are:

- PQcancelCreate creates the PQcancelConn based on the original
  connection (but does not establish an actual connection).
- PQcancelStart can be used to initiate non-blocking cancel requests,
  using encryption if the original connection did so, which must be
  pumped using
- PQcancelPoll.
- PQcancelReset puts a PQcancelConn back in state so that it can be
  reused to send a new cancel request to the same connection.
- PQcancelBlocking is a simpler-to-use blocking API that still uses
  encryption.

Additional functions are
 - PQcancelStatus, mimicks PQstatus;
 - PQcancelSocket, mimicks PQcancelSocket;
 - PQcancelErrorMessage, mimicks PQerrorMessage;
 - PQcancelFinish, mimicks PQfinish.

Author: Jelte Fennema-Nio 
Reviewed-by: Denis Laxalde 
Discussion: https://postgr.es/m/AM5PR83MB0178D3B31CA1B6EC4A8ECC42F7529@AM5PR83MB0178.EURPRD83.prod.outlook.com

This should be shipped with the upcoming PostgreSQL 17 release.

Being involved in the development Psycopg, a Python driver for PostgreSQL, I pay special attention to how the libpq evolves in PostgreSQL core and this c

[...]

pgenv: run once scripts
Posted by Luca Ferrari on 2024-04-15 at 00:00

A new feature to run a single script at the very beginning of the cluster lifecycle.

pgenv: run once scripts

Today pgenv got a new release that provides a simple, but quite useful, feature: the capability to run a custom script the first time the instance is started.

The idea is simple: after the initdb phase, if the user has configured a PGENV_SCRIPT_FIRSTSTART executable, the system will run such script against the (just) started instance. This is different from PGENVE_SCRIPT_POSTSTART script, since the latter ie executed every time the cluster has started, while PGENV_SCRIPT_FIRSTSTART is run only the first time the database cluster is started.

The aim of this script is, hence, to install users and databases, or populate some initial data.

🎙️ Hacking Postgres 🐘 Podcast - Season 2, Ep. 1 - Andrew Atkinson
Posted by Andrew Atkinson on 2024-04-15 at 00:00

Recently I joined Ry Walker, CEO of Tembo, as a guest on the Hacking Postgres podcast.

Hacking Postgres has had a lot of great Postgres contributors as guests on the show, so I was honored to be a part of it being that my contributions are more in the form of developer education and advocacy.

Ry asked me about when I got started with PostgreSQL and what my role looks like today.

PostgreSQL Origin

Ry has also been a Ruby on Rails programmer, so that was a fun background we shared. We both started on early versions of Ruby on Rails in the 2000s, and were also early users of Heroku in the late 2000s.

Since PostgreSQL was the default DB for Rails apps deployed on Heroku, for many Rails programmers it was the first time they used PostgreSQL. Heroku valued the fit and finish of their hosted platform offering, and provided best in class documentation and developer experience as a cutting edge platform as a service (PaaS). The popularity of that platform helped grow the use of PostgreSQL amongst Rails programmers even beyond Heroku.

For me, Heroku was where I really started using PostgreSQL and learning about some of the performance optimization tactics “basics” as a web app developer.

Meeting The Tembo Team

Besides Ry, I’ve also had the chance to meet more folks from Tembo. Adam Hendel is a founding engineer and also based here in Minnesota. I also met Samay Sharma, PostgreSQL contributor and now CTO of Tembo, at PGConf NYC 2023 last Fall. While not an employee or affiliated with the company at all, it’s been interesting to track what they’re up to, and get little glimpses into starting up a whole company that’s focused on leveraging the power and extensibility of PostgreSQL.

If you’d like to learn more about Adam’s background, Adam was the guest for Season 1, Episode 2 of Hacking Postgres, which you can find here: https://tembo.io/blog/hacking-postgres-ep2

Using PostgreSQL with Ruby on Rails Apps

Ruby on Rails as a web development framework has great support via the ORM -

[...]

PostgreSQL 16 Coin
Posted by Luca Ferrari on 2024-04-14 at 00:00

I just got the coin in the mail!

PostgreSQL 16 Coin

I just received in the mail the PostgreSQL 16 Coin with a great artwork!


I’m really happy to be part of this great community!




PG Phriday: Under Observation
Posted by Shaun M. Thomas on 2024-04-12 at 16:30

Have you ever wanted to use a non-superuser role in a Postgres database to perform actions that are normally restricted? Even something as simple as reading from the pg_stat_activity view requires special permissions to view the query column because it could contain sensitive information.

Top posters

Number of posts in the past two months

Top teams

Number of posts in the past two months

Feeds

Planet

  • Policy for being listed on Planet PostgreSQL.
  • Add your blog to Planet PostgreSQL.
  • List of all subscribed blogs.
  • Manage your registration.

Contact

Get in touch with the Planet PostgreSQL administrators at planet at postgresql.org.