Latest Blog Posts

CipherDoc: A Searchable, Encrypted JSON Document Service on Postgres
Posted by David Wheeler on 2023-10-01 at 21:36

Over the last year, I designed and implemented a simple web service, code-named “CipherDoc”, that provides a CRUD API for creating, updating, searching, and deleting JSON documents. The app enforces document structure via JSON schema, while JSON/SQL Path powers the search API by querying a hashed subset of the schema stored in a GIN-indexed JSONB column in Postgres.

In may I gave a public presentation on the design and implementation of the service at PGCon: CipherDoc: A Searchable, Encrypted JSON Document Service on Postgres. Links:

I enjoyed designing this service. The ability to dynamically change the JSON schema at runtime without database changes enables more agile development cycles for busy teams. Its data privacy features required a level of intellectual challenge and raw problem-solving (a.k.a., engineering) that challenge and invigorate me.

Two minor updates since May:

  1. I re-implemented the JSON/SQL Path parser using the original Postgres path grammar, replacing the hand-written parser roundly castigated in the presentation.
  2. The service has yet to be open-sourced, but I remain optimistic, and continue to work with leadership at The Times towards an open-source policy to enable its release.

PGSQL Phriday #013
Posted by Ryan Booz on 2023-09-29 at 14:01

Invitation from Chris Ellis

I first met Chris in Brussels for the FOSDEM PGDay, even though we had known each other virtually for a bit. He seems to be everywhere all at once with PostgreSQL stuff right now, including the design of some amazing little LED pins for PGDay.UK a few weeks ago.

Chris has enthusiastically stepped this month at short notice to provide the invitation below. I think it’s a perfect topic! I know I always learn best from the real-world experiences of others!

Usecases and Why PostgreSQL

On Friday, October 6th 2023, publish a post on your blog telling a story about what you (or your team, client) built with PostgreSQL and how PostgreSQL helped you deliver.

I’d love to read about the weird and varied things that people are using PostgreSQL for. If you think your usecase is boring, I’m sure it will be of use to someone. Plus you can always focus more on the story and how PostgreSQL helped to deliver a project, or could have, or didn’t!

  • Did things like: Full Text Search, JSONB, PostGIS, etc enable you to build a better application
  • Did using PostgreSQL remove the need for other dependencies, or change your approach
  • Did you learn some SQL which made you realise stuff has moved along alot since SQL92

Maybe it was that disaster of a project where in hindsight using PostgreSQL would have been a winner. Or maybe PostgreSQL hindered you, so let’s hear about what went wrong.

I’d like to see people focus on telling some real world, practical examples of where PostgreSQL shined like a crazy diamond.

Read the invitation blog and get writing!!

Oracle Supports Postgres
Posted by Bruce Momjian in EDB on 2023-09-29 at 14:00

At CloudWorld 2022, an Oracle executive announced support for a managed Postgres cloud service and an optimized version called Postgres Aries. Eleven months later, Oracle has announced limited availability of their Postgres cloud service, with full availability in December. (FYI, OCI in this article stands for Oracle Cloud Infrastructure.) They even said some nice things about Postgres:

PostgreSQL has long been a beacon in the open source database world. With over 35 years of rigorous development, it boasts an impressive track record for reliability, robust features, and performance. Highlighted by DB-engines, its rise in market share is a testament to its adaptable data model and diverse extensions catering to a variety of use cases.

In general, companies only add products that have features that their existing products lack, and that seems to be the case here.

Continue Reading »

Five Great Features of the PostgreSQL Partition Manager
Posted by Keith Fiske in Crunchy Data on 2023-09-29 at 13:00

After much testing and work the PostgreSQL Partition Manager, pg_partman, version 5 is now available for public release. Thanks to everyone involved for helping me get here!

My recent post discusses many of the big changes, so please see that post or the CHANGELOG for a full summary of version 5.

What I'd like to do today is take a step back and review five notable features that make pg_partman an important tool for managing large tables in PostgreSQL:

  • Retention
  • Background Worker
  • Additional Constraint Exclusion
  • Epoch Partitioning
  • Template Table

Retention

One of the primary reasons to partition tables in PostgreSQL is to avoid the potentially very heavy overhead associated with deleting many rows in large batches. PostgreSQL's MVCC system means that when you delete rows, they're not actually gone until the VACUUM system comes through to mark them as reusable space for new rows. Even then that does not immediately (and may likely never) return the space to the file system. Vacuuming such large volumes of old rows can put a heavy strain on an already stressed system, which is what you may have been trying to alleviate by removing those rows.

Instead of running delete statements to remove your old data, partitioning let's you slice the table up and then simply drop the old partitions that have data you don't need in the database anymore. This avoids the need to vacuum to clean up old rows and, more importantly, immediately returns that space to the file system with very little overhead. Another thing to note here is that if retention is your reason for partitioning, you don't have to make your partition interval very small. It just has to be enough to cover your retention policy for how much data you need to keep. If your policy is 30 days, you can try setting the partition interval to monthly. That doesn't necessarily mean you'll only have one month of data around (Ex. February). But it does meet your retention policy of at least 30 days, so you might just be keeping

[...]

Version History and Lifecycle Policies for Postgres Tables
Posted by Steven Miller in Tembo on 2023-09-29 at 00:00

back-in-time

A nice feature of AWS S3 is version history and lifecycle policies. When objects are updated or deleted, the old object version remains in the bucket, but it’s hidden. Old versions are deleted eventually by the lifecycle policy.

I would like something like that for my Postgres table data. We can use the temporal_tables extension for version history, and combine it with pg_partman to partition by time, automatically expiring old versions.

Data model

Let's say we have a table employees, and it looks like this:

       name       |  salary
------------------+----------
Bernard Marx | 10000.00
Lenina Crowne | 7000.00
Helmholtz Watson | 18500.00

We will add one more column to this table, sys_period, which is a time range. This time range represents "since when" is this row the current version. this range is unbounded on the right side, because all the rows in the employees table are the present version.

       name       |  salary  |             sys_period
------------------+----------+------------------------------------
Helmholtz Watson | 18500.00 | ["2023-09-28 13:30:19.24318+00",)
Bernard Marx | 11600.00 | ["2023-09-28 13:33:58.735932+00",)
Lenina Crowne | 11601.00 | ["2023-09-28 13:33:58.738827+00",)

We will make a new table employees_history to store previous versions. This will have the same columns as the employees table, but all the rows in sys_period are bounded on the the right and the left sides. These ranges represent when this row was the current version. We will configure temporal_tables to automatically create these rows when anything changes in the employees table.

     name      |  salary  |                            sys_period
---------------+----------+-------------------------------------------------------------------
Bernard Marx | 10000.00 | ["2023-09-28 13:30:19.18544+00","2023-09-28 13:33:58.683279+00")
Bernard Marx | 11200.00 | ["2023-09-28 13:33:58.683279+00","2023-09-28 13:33:58.731332+00")
Bern
[...]

Setting up PostgreSQL cluster using pg_cirrus on AWS EC2 Instances
Posted by Syed Salman Ahmed Bokhari in Stormatics on 2023-09-28 at 15:20

This blog outlines how to set up a 3-node HA PostgreSQL cluster on AWS EC2 Instances using pg_cirrus, an automated tool for this purpose.

The post Setting up PostgreSQL cluster using pg_cirrus on AWS EC2 Instances appeared first on Stormatics.

PGSQL Phriday 013 - Usecases and Why PostgreSQL
Posted by Chris Ellis on 2023-09-28 at 00:00
I've always been much more into the practical side of engineering. Caring far more about what I can build with tools, rather than which tool I'm using. On Friday, October 6th 2023, publish a post on your blog telling a story about what you (or your team, client) built with PostgreSQL and how PostgreSQL helped you deliver. I'd love to read about the weird and varied things that people are using PostgreSQL for. If you think your usecase is boring, I'm sure it will be of use to someone. Plus you can always focus more on the story and how PostgreSQL helped to deliver a project, or could have, or didn't!

Anatomy of a Postgres extension written in Rust: pgmq
Posted by Ricardo Zavaleta in Tembo on 2023-09-28 at 00:00

In my previous submission to this space, I described my experience with pgmq while using the Python library. In this post, I'll share what I found after inspecting the code.

So, first, I'll describe the general structure of the project. Then, I'll explain what happens when we install the pgmq extension. Finally, I'll describe how some of its functions work.

In this post, I'll be using version v0.25.0, which you can find here.

Project structure

After cloning the appropriate tag, we can see that the repository contains the following files:

$ ls -1
Cargo.lock
Cargo.toml
CONTRIBUTING.md
core
Dockerfile.build
examples
images
LICENSE
Makefile
pgmq.control
pgmq-rs
README.md
sql
src
tembo-pgmq-python
tests

The project uses pgrx. From pgrx's README, we know that the relevant files for the extension are Cargo.toml, pgmq.control and the src and sql directories:

$ tree sql src
sql
├── pgmq--0.10.2--0.11.1.sql
├── pgmq--0.11.1--0.11.2.sql
...
├── pgmq--0.8.0--0.8.1.sql
├── pgmq--0.8.1--0.9.0.sql
└── pgmq--0.9.0--0.10.2.sql
src
├── api.rs
├── errors.rs
├── lib.rs
├── metrics.rs
├── partition.rs
├── sql_src.sql
└── util.rs

0 directories, 7 files

Installing the pgmq extension

note

This section assumes that you have successfully installed the pre-requisites as described in CONTRIBUTING.md

To build the pgmq extension, we can do the following:

cargo build

Alternatively, to build and install the pgmq extension, we can do:

cargo pgrx install

In either case, we can see a shared library pgmq.so being created. The installation process also places the shared library in the lib directory of the postgres installation; and the sql files and the control file in the extensions directory. In my case:

$ ls -1 /opt/postgres/share/extension/pgmq*
/opt/postgres/share/extension/pgmq--0.10.2--0.11.1.sql
...
/opt/postgres/share/extension/pgmq--0.9.0--0.10.2.sql
/opt/postgres/share/extension/pgmq.control

$
[...]

pgenv version 1.3.3 released
Posted by Luca Ferrari on 2023-09-28 at 00:00

A new release for the PostgreSQL binary manager.

pgenv version 1.3.3 released

pgenv release 1.3.3 is now available.

This release introduces two main environment variables to instrument the application about configuration files.

The first variable is PGENV_CONFIGURATION_FILE: such variable can be set to force pgenv to use a custom configuration file without having to guess which file to use depending on the specific PostgreSQL version in use. By default, pgenv looks for a configuration file named after the PostgreSQL version, or if not found, a default.conf configuration file. Using the above variable, it is now possible to pass information to pgenv about where a configuration is, and this allows for the same configuration file to be used over and over without any regard to the PostgreSQL version.



% export PGENV_CONFIGURATION_FILE=~/git/dot-files/pgenv/luca.conf

% pgenv rebuild 16.0
Using PGENV_ROOT /home/luca/git/pgenv
[DEBUG] Configuration file forced by environment variable PGENV_CONFIGURATION_FILE = /home/luca/git/dot-files/pgenv/luca.conf
[DEBUG] Configuration file forced by environment variable PGENV_CONFIGURATION_FILE = /home/luca/git/dot-files/pgenv/luca.conf
[DEBUG] Looking for configuration in /home/luca/git/dot-files/pgenv/luca.conf

...



As you can see, pgenv will now use the specified configuration file.

The other variable added is PGENV_WRITE_CONFIGURATION_FILE_AUTOMATICALLY, that if set to a false value (e.g., 0, no) will prevent pgenv to write or overwrite a configuration file once a build or rebuild is completed. The normal behavior is to let pgenv to write/overwrite the configuration file if this variable is not set at all or is set to a true value (e.g., 1, yes), and this is the behavior of previos releases. Since today, if you set this variable to a false value, pgenv will not create (nor overwrite) a configuration file at the end of a build phase.



% export PGENV_WRITE_CONFIGURATION_FILE_AUTOMATICALLY=no
% pgenv rebuild 16.0
...
[DEBU
[...]

TIL - When was my Postgres cluster initialized ?
Posted by Kaarel Moppel on 2023-09-27 at 21:00
Today again a nice finding after 12 years with Postgres - seems one can actually retrieve the time when a cluster / instance was once initialized! Given of course you haven’t dump-restored or in-place upgraded the instance in the mean time… So far on those rare occasions when I’ve actually...

Using traditional calendars with ICU
Posted by Daniel Vérité on 2023-09-27 at 18:14
How to use alternative date and time types to use non-gregorian calendars with ICU.

Transaction Processing Chapter
Posted by Bruce Momjian in EDB on 2023-09-27 at 17:00

Postgres 16 has a new chapter about transaction processing internals, which includes details about transaction identifiers, locking, subtransactions, and two-phase transactions. While this is new in Postgres 16, it also applies to all previous supported releases.

Chaos testing Kubernetes Operators for Postgres: StackGres
Posted by Nikolay Sivko on 2023-09-26 at 14:10
Introducing failures into a Postgres cluster managed by StackGres

Monitoring Performance for PostgreSQL with Citus
Posted by Hans-Juergen Schoenig in Cybertec on 2023-09-26 at 08:00

In the fast-paced world of data management, scalability rules supreme and Citus plays an ever greater role. The question now is: How can we see what is going on inside Citus? How can businesses leverage monitoring technology to optimize their PostgreSQL database performance?

Monitoring PostgreSQL and Citus

Database monitoring is important regardless of the extension you are using. The best way to find performance bottlenecks in PostgreSQL is definitely to use pg_stat_statements. I’ve written about detecting PostgreSQL performance problems in the past.

pg_stat_statements is the single most powerful tool to spot slow queries and to ensure that you have all the insights you need to actually react to bad performance.

However, if Citus comes into the picture, there is a bit more you have to know in order to inspect performance problems.

Configuring Citus performance monitoring

Enabling pg_stat_statements should be done on any PostgreSQL database deployment. The performance overhead is basically non-existent and therefore pg_stat_statements is truly a must.

When Citus enters the picture, we have to change some additional variables. The most important one is citus.stat_statements_track, which should be set to all. Here’s how it works:

postgres=# ALTER SYSTEM 
SET citus.stat_statements_track TO 'all';
ALTER SYSTEM
postgres=# SELECT pg_reload_conf();
 pg_reload_conf 
----------------
 t
(1 row)

After reconnecting to the database, we’ll be able to see information. In general, we also recommend turning on track_io_timing in PostgreSQL to gain some more insights about the time needed to perform I/O operations:

postgres=# ALTER SYSTEM 
SET track_io_timing TO on;
ALTER SYSTEM
postgres=# SELECT pg_reload_conf();
 pg_reload_conf 
----------------
 t
(1 row)

A restart is not needed in this context. Sending a signal (“reload”) is enough for all new connections coming in.

Checking Citus performance data

When Citus is enabled in your platform, you can call a set returning function w

[...]

Setting Per-User Server Variables
Posted by Bruce Momjian in EDB on 2023-09-25 at 16:30

Postgres server variables offer powerful control over how the database behaves. Privileged users often want to set superuser-restricted server variables for non-superusers. The simplest approach is for the superuser to issue ALTER ROLE ... SET on the non-superuser role. If the superuser wants the non-superuser role to have more control over a server variable, a SECURITY DEFINER function can be used, e.g.:

-- This function allows log_statement to be set to only ddl or mod.
CREATE OR REPLACE FUNCTION set_log_statement (value TEXT)
RETURNS VOID AS
$$
BEGIN
        IF value = 'ddl' OR value = 'mod'
        THEN    EXECUTE format('SET log_statement TO %1$s', value);
        ELSE    RAISE EXCEPTION 'log_statement cannot be set to "%"', value;
        END IF;
END
$$
LANGUAGE plpgsql
SECURITY DEFINER;
 
-- Grant non-superusers permission to execute this function.
GRANT EXECUTE ON FUNCTION set_log_statement TO non_superuser;

Continue Reading »

Evolution of Logical Replication
Posted by Amit Kapila in Fujitsu on 2023-09-25 at 12:47

This blog post is about how the Logical Replication has evolved over the years in PostgreSQL, what's in the latest release-16, and what's being worked upon for future releases. Logical replication is a method of replicating data objects and their changes, based upon their replication identity (usually a primary key). We use the term logical in contrast to physical replication, which uses exact block addresses and byte-by-byte replication. To know more about Logical Replication in PostgreSQL, read pgsql-docs.

The foundation of Logical Replication has been laid in PostgreSQL-9.4 where we introduced Logical Decoding (that allows database changes to be streamed out in a customizable format), Replication Slots (that allow preservation of resources like WAL files on the primary until they are no longer needed by standby servers), and Replica Identity (a table level option to control operations for logical replication) have been introduced.

With PostgreSQL-9.5, we introduced features like tracking replication progress via origins, and commit timestamps that will allow conflict resolution in the future. The replication origins allow to restart of the replication from the point where it was before the restart. The replication origins also help us avoid bi-directional loops in the replication setup, a feature introduced in the latest PostgreSQL 16 release.

With PostgreSQL-9.6, we added support for generic WAL messages for logical decoding. This feature allows extensions to insert data into the WAL stream that can be read by logical-decoding plugins.

With PostgreSQL-10.0, we introduced native Logical Replication using the publish/subscribe model. This allows more flexibility than physical replication does, including replication between different major versions of PostgreSQL and selective replication. In this release, we allowed Insert/Update/Delete operations on tables to be replicated. Going forward from here, we have enhanced this feature with each release.

With PostgreSQL-11.0, we allowed replicati

[...]

Functions to Validate User's Input
Posted by Luca Ferrari on 2023-09-25 at 00:00

PostgreSQL 16 introduces a couple of functions to validate user’s input.

Functions to Validate User’s Input

PostgreSQL 16 introduces a couple of new embedded functions: [pg_input_is_valid](https://www.postgresql.org/docs/16/functions-info.html#FUNCTIONS-INFO-VALIDITY-TABLE){:target="_blank"} and [pg_input_error_info](https://www.postgresql.org/docs/16/functions-info.html#FUNCTIONS-INFO-VALIDITY-TABLE){:target="_blank"}.

Both the functions accepts a couple of strings, the first one being the value to be validated, and the second one being the type to which you want to cast the value. This can be useful because you can check ahead of time if a given data type (expressed as a string) can be converted into a specific data type without raising an exception.

The first use case that comes into my mind is the conversion of some stringified date into an effective date, for example when importing data from an external source like a text file. Let’s see this in action:



testdb=> select * from pg_input_is_valid( '1978-07-19', 'timestamp' );
 pg_input_is_valid
-------------------
 t
(1 row)

testdb=> select * from pg_input_error_info( '1978-07-19', 'timestamp' );
 message | detail | hint | sql_error_code
---------+--------+------+----------------
         |        |      |
(1 row)




With a valid date, the pg_input_is_valid function returns true and the pg_input_error_info does not return any row. But what happens if the date is in a wrong format?



testdb=> \x
Expanded display is on.
testdb=> select * from pg_input_is_valid( '1978-19-07', 'timestamp' );
-[ RECORD 1 ]-----+--
pg_input_is_valid | f

testdb=> select * from pg_input_error_info( '1978-19-07', 'timestamp' );
-[ RECORD 1 ]--+--------------------------------------------------
message        | date/time field value out of range: "1978-19-07"
detail         |
hint           | Perhaps you need a different "datestyle" setting.
sql_error_code | 22008



As you can see from the above example, passing a wrong d

[...]

My Postgres Story: Internationalization
Posted by Oleg Bartunov in Postgres Professional on 2023-09-24 at 06:45

I used Postgres since 1995 ( [PG95] mailing list has less than 400 subscribers !) for my scientific projects and operating with numbers and english strings was smooth, but in 1996 I signed up to make a searchable database of electronic archive of "Uchitelskaya gazeta" ( the popular newspaper for teachers community). It was my moonlight work to help me grow kids. The whole project was sponsored by West foundation, if I recall properly, our office was located in Vetoshny Lane,  near the Read Square, I had a lot of fun working there with smart people. I claimed to be an expert in Postgres and didn't see any problem with loading the archive and writing several SQL queries :)   First, I quickly found that cyrillic letters transformed to something unreadable.  The problem was that Postgres operated with ASCII  only,  which isn't surprising given  his roots in English world, so practically it was impossible to use any national encodings  like russians koi8-r, cp-1251, cp-866, etc. That time I already participated in internationalization of perl and search engine glimpse, so I knew that  basically I need to replace  'char'  definiton  to 'unsigned char', add call setlocale(LC_ALL,"") and use functions from locale api. The reality is getting a bit more complicated than I anticipated, so it took a month to get a working Postgres on Slackware. The second problem I found is that postgres works very slow with text,  it was not only because of locale stuff — strcoll is 10 times slower that strcmp, but since Postgres used full scan — read every row of a table and performs slow text operation.  This problem was addressed by me and Teodor in 2000, when we developed OpenFTS, contrib/intarray (see this historical document for details), which was eventually transforms to built-in full-text search (FTS), which we presented at the PostgreSQL Anniversary Meeting in Toronto, 2006.  

My fist commit to Postgres, thanks Marc !
My fist commit to Postgres, thanks Marc !

For years I didn’t realize the importance of this first patch, only recently I

[...]

My Postgres Story: GIN and FTS
Posted by Oleg Bartunov in Postgres Professional on 2023-09-24 at 05:10


After ten years of working with postgres ( actually, I started with Ingres, then postgres95), I and Teodor got a chance to see live other developers.  We presented there GIN and FTS.  Pictures from the Anniversary Summit at at Ryerson University in Toronto can be  viewed on the flickr under name  PGCon-2006, I hope many of us will enjoy looking back 17 years, how young we were !

Our slides.
Our slides.
Participants of the PostgreSQL Anniversary Meeting in Toronto, July 8-9, 2006. Someone made this picture on Álvaro's Canon, which he bought day earlier.
Participants of the PostgreSQL Anniversary Meeting in Toronto, July 8-9, 2006. Someone made this picture on Álvaro's Canon, which he bought day earlier.

We were really worried about our spoken english, especially Teodor, but I remembered Yakov B. Zeldovich, who once said me, that many foreigners come to him to talk and it's their duty to understand his "russian-style english". I said Teodor, that what we did is very important for Postgres and we shouldn't confuse, see the titles of other talks to compare (unfortunately, conference site is down). The time has proven my words — GIN index and FTS (full-text search) are used everywhere and I am very proud for my contribution to open source, to Postgres.


Postgres 16 - a quick perf check
Posted by Kaarel Moppel on 2023-09-23 at 21:00
“That time of the year”™ again 🎉 As per usual for that “season” - not that I don’t trust the PostgreSQL Global Development Group and its performance farm (they did catch some performance regressions in beta stages!) - but somehow as soon as a new major release arrives, I get...

Exploring Various Ways to Manage Configuration Parameters in PostgreSQL
Posted by David Z in Highgo Software on 2023-09-22 at 23:42

1. Overview

PostgreSQL provides a configuration file postgresql.conf for end users to customize parameters. You may need to change some parameters to tune performance or deploy a PostgreSQL server in your working environment. In this blog post, we’ll explore different ways to manage these parameters.

2. Managing Parameters in Different Ways

PostgreSQL supports various parameters that allow users to customize behavior either globally (for the entire cluster) or locally (for one particular session, database, etc.). These parameters are mainly divided into five types: boolean, string, integer, floating point, and enum. Additionally, all parameter names are case-insensitive. Now, let’s explore how to manage parameters in four different ways with examples.

2.1. Changing Parameters in the Configuration File

This is a simple way to customize the server and is commonly used by most users when first working with PostgreSQL. For example, if you want to start PostgreSQL on port 5433 instead of the default port 5432, you can either uncomment the line #port = 5432 in the postgresql.conf file, changing 5432 to 5433, or simply append a new line port = 5433 to the end of the file. Keep in mind, PostgreSQL will always use the last value found in the configuration file if there are duplicated configurations for the same parameter.

2.2. Changing Parameters via SQL

PostgreSQL provides three SQL commands to change parameters in different scopes: ALTER SYSTEM, ALTER DATABASE, and ALTER ROLE.

  • ALTER SYSTEM changes the global default settings and persists them into postgresql.auto.conf. These changes will be applied on the next start.

Here is an example to change the log message output leve using ALTER SYSTEM:

postgres=# ALTER SYSTEM SET log_min_messages='debug2';
ALTER SYSTEM

postgres=# show log_min_messages;
 log_min_messages 
------------------
 warning
(1 row)

After restarting the PostgreSQL server, if you tail the log messages and manually issue a CHECKPOINT, you should see the me

[...]

Adding Postgres 16 support to Citus 12.1, plus schema-based sharding improvements
Posted by Naisila Puka in CitusData on 2023-09-22 at 15:07

The new PostgreSQL 16 release is out, packed with exciting improvements and features—and Citus 12.1 brings them to you at scale, within just one week of the PG16 release.

As many of you likely know, Citus is an open source PostgreSQL extension that turns Postgres into a distributed database. Our team started integrating Citus with the PG16 beta and release candidates early-on, so that you could have a new Citus 12.1 release that is compatible with Postgres 16 as quickly as possible after PG16 came out.

There are a lot of good reasons to upgrade to Postgres 16—huge thanks to everyone who contributed into this Postgres release! PG16 highlights include query performance boost with more parallelism; load balancing with multiple hosts in libpq (contributed by my Citus teammate, Jelte Fennema-Nio); I/O monitoring with pg_stat_io; developer experience enhancements; finer-grained options for access control; logical replication from standby servers and other replication improvements, like using btree indexes in the absence of a primary key (contributed by one of my teammates, Onder Kalaci.)

The good news for those of you who care about distributed Postgres: Citus 12.1 is now available and adds support for Postgres 16.

In addition to Postgres 16 support, Citus 12.1 includes enhancements to schema-based sharding, which was recently added to Citus 12.0—and is super useful for multi-tenant SaaS applications. In 12.1, you can now move tenant tables within a distributed schema to another node using the simple and powerful citus_schema_move() function. And you can manage permissions on creating distributed schemas more easily, too.

So if you’re a Citus user and you’re ready to upgrade to Postgres 16, you will also need to upgrade to Citus 12.1. (And if you use the Citus on Azure managed service, the answer is yes, Postgres 16 support is also coming soon to Azure Cosmos DB for PostgreSQL. Stay tuned for the next blog post.)

Let’s dive in to explore what’s new in 12.1.

[...]

Rolling the Dice with the PostgreSQL Random Functions
Posted by Paul Ramsey in Crunchy Data on 2023-09-22 at 13:00

Generating random numbers is a surprisingly common task in programs, whether it's to create test data or to provide a user with a random entry from a list of items.

PostgreSQL comes with just a few simple foundational functions that can be used to fulfill most needs for randomness.

Almost all your random-ness needs will be met with the random() function.

Uniformity

The random() function returns a double precision float in a continuous uniform distribution between 0.0 and 1.0.

What does that mean? It means that you could get any value between 0.0 and 1.0, with equal probability, for each call of random().

Here's five uniform random numbers between 0.0 and 1.0.

SELECT random() FROM generate_series(1, 5)
0.3978842227698167
0.7438732417540841
0.3875091442400458
0.4108009373061563
0.5524543763568912

Yep, those look pretty random! But, maybe not so useful?

Random Numbers

Most times when people are trying to generate random numbers, they are looking for random integers in a range, not random floats between 0.0 and 1.0.

Say you wanted random integers between 1 and 10, inclusive. How do you get that, starting from random()?

Start by scaling an ordinary random() number up be a factor of 10! Now you have a continuous distribution between 0 and 10.

SELECT 10 * random() FROM generate_series(1, 5)
3.978842227698167
7.438732417540841
3.875091442400458
4.108009373061563
5.5245437635689125

Then, if you push every one of those numbers up to the nearest integer using ceil() you'll end up with a random integer between 1 and 10.

SELECT ceil(10 * random()) FROM generate_series(1, 5)
4
8
4
5
6

If you wanted a random integer between 0 and 9, you could do the same thing, but pushing the floating numbers down to the next lowest integer using floor().

SELECT floor(10 * random()) FROM generate_series(1, 5)
3
7
3
4
5

Random Rows and Values

Sometimes the things you are trying to do randomly aren't numbers. How do you get a random entry out of a string?

[...]

psql \watch improvements
Posted by Luca Ferrari on 2023-09-22 at 00:00

A nice addition to the command \watch in the PostgreSQL command line client.

psql \watch improvements

psql is the best command line SQL client ever, and it gets improved constantly. With the new release of PostgreSQL 16, also psql get a new nice addition: the capability to stop a \watch command loop after a specific amount of iterations.


In this article I briefly show how the new feature works.

What is \watch?

The special command \watch is similar to the Unix command line utility watch(1): it repeats a specific command (in this case, an SQL statement) at regular time intervals.
I tend to use this, as an example, when I want to monitor some progress or some catalogs: I write the query that will produce the result I want to observe, and then use \watch to schedule regular repetitions of the query. For instance:



testdb=# SELECT * FROM pg_stat_progress_cluster;
...

testdb=# \watch 5



The above example will show me what is going on as CLUSTER or VACUUM with a refresh ratio of 5 seconds.


One problem of the \watch command is that it loops forever, meaning you need to manually stop it (e.g., CTRL-c). Another approach, is to raise an exception when the query has to stop. As a nasty example:



testdb=# WITH exit(x) AS ( SELECT count(*) FROM pg_stat_progress_cluster )
           SELECT p.*
           FROM pg_stat_progress_cluster p, exit e
           WHERE 1 / e.x > 0
           ;



The above query will raise a division by zero as soon as there are no more entries in the pg_stat_progress_cluster view, and this will in turn stop the \watch command:



testdb=# \watch 1
                                                                          Wed 20 Sep 2023 09:08:18 PM CEST (every 1s)

  pid  | datid | datname | relid |   command   |      phase       | cluster_index_relid | heap_tuples_scanned | heap_tuples_written | heap_blks_total | heap_blks_scanned | index_rebuild_count
-------+-------+---------+-------+-------------+----------
[...]

PostgresUS Diversity Scholarship to attend PGConf NYC
Posted by Pavlo Golub in Cybertec on 2023-09-21 at 10:10

I want to share the information published by Chealse Dole in the community #pglsql-women Slack channel for a wider audience:

Heads up PG women – the PostgresUS Diversity Scholarship still has funding to support flights, hotels, and conference registration for 2-3 women to attend PGConf NYC on October 3-5th!

Apply here by Sept 29th with details about your involvement in Postgres and how you could give back to your local community to be considered! I’d love to see some of y’all there! 🐘

Although this particular message is directly relevant to women, applicants may be from a a traditionally underrepresented and/or marginalized group in the technology and/or open source communities including, but not limited to: persons identifying as LGBT, women, persons of color, persons with disabilities, veterans, and/or students. Individuals who apply should be active members of the open source community who are unable to attend for financial reasons and are unable to get funding from their companies.

If you have any questions, please email diversity@postgresql.us.

Take your chance to attend this great conference. And see you there!

Sincerely yours, Pavlo Golub
Co-founder of PostgreSQL Ukraine 💙💛

Citus: 7 commonly used advanced SQL tools
Posted by Hans-Juergen Schoenig in Cybertec on 2023-09-21 at 08:00

When you run advanced SQL in Citus, what’s possible? Which SQL statements work, and which ones don’t? Citus is a PostgreSQL extension that adds powerful sharding capabilities to PostgreSQL. However, every solution does have limitations. Therefore, it makes sense to take a look at the latest version of Citus and learn how to properly use the most frequently-needed SQL features. Note that this is not a comprehensive overview, rather it is a guide through 6 of the most commonly-needed SQL tools:

  1. Naming Citus databases

    The first thing people typically notice is that with Citus, managing databases is not as straightforward as in a “normal PostgreSQL deployment” anymore. You might notice that most Citus examples use the “postgres” database for demonstration purposes. The reason is simple:

    testbox:citus hs$ createdb somename
    NOTICE:  Citus partially supports CREATE DATABASE for distributed databases
    DETAIL:  Citus does not propagate CREATE DATABASE command to workers
    HINT:  You can manually create a database and its extensions on workers.
    

    The Citus workers are bound to a certain database because in PostgreSQL, a database is a strict separation. Therefore people are supposed to use schemas rather than multiple databases. However, if you really run analytics at scale, you’ll most likely not need multiple databases on the same infrastructure anyway. You can build multiple clusters in Kubernetes using Patroni to ensure high availability.

    The first takeaway is therefore:

    Just use the postgres database and you’ll be OK.

  2. Loading data using COPY

    Bulk loading data using COPY is the key to loading data quickly. Yes, you can use the PostgreSQL COPY command with Citus. Here’s how it works:

    postgres=# CREATE TABLE t_oil (
    region  text, 
    country         text, 
    year            int, 
    production int, 
    consumption int
    );
    CREATE TABLE
    postgres=# SELECT create_distributed_table('t_oil', 'country');
     create_distributed_tabl
[...]

All PostgreSQL Replication Explained – How to Do Them Right.
Posted by cary huang on 2023-09-20 at 19:05

PostgreSQL Streaming Replication is a crucial feature that enhances availability, performance and data integrity of your database. It is a fundamental component in modern database deployment. Replication refers to a technique that copies data and state from one database instance to another. PostgreSQL supports 2 major replication techniques:

  • Streaming replication (also known as physical replication)
  • Logical replication

In this blog, I will explain the purposes of both replication techniques based on PostgreSQL 16 and how to properly set them up with the right examples. So, let’s begin.

PostgreSQL Streaming Replication

Streaming replication replicates entire database in real-time, which consists of

  • One primary node – can read from and write to database
  • One or more standby nodes (also called replica nodes. Please do not call them slave nodes) – can only read from database

The replication works by having the primary node send new WAL segments (when a change is made) to all standby nodes. The standby nodes can replicate primary’s database state and data simply by replaying the WAL segments (also called REDO).

postgresql streaming replication

Why PostgreSQL Streaming Replication?

  • High availability – if the primary node is down, a standby node can promote to be the new primary and continue database operation. Patroni is a popular orchestration tool for this purpose.
  • Data redundancy – all nodes store copies of the same data.
  • Load balancing – application can distribute read requests among all standby nodes and send write requests only to primary node. Pgpool is a popular orchestration tool for this purpose.
  • Scalability – more standby nodes can be added to distribute higher loads of read requests.

PostgreSQL Streaming Replication Setup Example

The example below uses default database postgres and default username $USER to set up the replication. If you would like to use it on another user and database, you will have to update pg_hba.conf as well.

[...]

Postgres 16 Features Presentation
Posted by Bruce Momjian in EDB on 2023-09-20 at 16:00

Now that I have given a presentation about Postgres 16 features in Jakarta and Chicago, I have made my slides available online.

Incremental Sort in PostgreSQL: A Developer’s Guide
Posted by Umair Shahid in Stormatics on 2023-09-20 at 13:31

By following best practices and query optimization techniques, you can leverage incremental sort in PostgreSQL and enhance query performance.

The post Incremental Sort in PostgreSQL: A Developer’s Guide appeared first on Stormatics.

JSON Updates Postgres 16
Posted by Christopher Winslett in Crunchy Data on 2023-09-20 at 13:00

Postgres has been steadily building on the JSON functionality initially released more than 10 years ago. With Postgres 16, working with JSON has gotten a couple nice improvements. Primarily, this release added features that ease the manipulation of data into JSON and improve the standard SQL functionality using JSON.

TL;DR:

  • A SQL/JSON data-type check. For instance, this lets you ask with SQL if something value IS JSON OBJECT
  • Addition of SQL-standard JSON functions: JSON_ARRAY()JSON_ARRAYAGG()JSON_OBJECT(), and JSON_OBJECTAGG()

Data for this post

For a quick tutorial, use the following dataset:

CREATE TABLE user_profiles ( id serial PRIMARY KEY, name text NOT NULL, details jsonb );
INSERT INTO
   user_profiles (name, details)
VALUES
   (
      'John Doe', '{"age": 30, "address": {"city": "New York", "state": "NY"}, "hobbies": ["reading", "hiking"]}'::jsonb
   )
,
   (
      'Jane Smith', '{"age": 25, "address": {"city": "Los Angeles", "state": "CA"}, "hobbies": ["painting", "cooking"]}'::jsonb
   )
,
   (
      'Emily Davis', '{"age": 29, "address": {"city": "Chicago", "state": "IL"}, "hobbies": ["dancing", "writing"]}'::jsonb
   )
;

IS JSON

Previously, to test if a field was JSON, your options were to run pg_typeof:

SELECT
   details,
   pg_typeof(details),
   pg_typeof(details ->> 'address')
FROM
   user_profiles LIMIT 1;

Which would return jsonb for that second column, and text for the 3rd column. The problem with this is that it didn’t give you any inspection into the nested values within JSON, so running pg_typeof(details->>'address') would just tell you it is text. Now, we can do something like:

SELECT
   details,
   details IS JSON,
   details ->> 'address' IS JSON OBJECT
FROM
   user_profiles LIMIT 1;

Using this new functionality, the IS JSON returns true and the IS JSON OBJECT returns true as well. Previously, when building queries to inspect JSON, you were at risk of failed queries if the data’s JSON format did not match expected values. Imagine you are m

[...]

Top posters

Number of posts in the past two months

Top teams

Number of posts in the past two months

Feeds

Planet

  • Policy for being listed on Planet PostgreSQL.
  • Add your blog to Planet PostgreSQL.
  • List of all subscribed blogs.
  • Manage your registration.

Contact

Get in touch with the Planet PostgreSQL administrators at planet at postgresql.org.