Latest Blog Posts

Waiting for PostgreSQL 16 – Allow underscores in integer and numeric constants.
Posted by Hubert 'depesz' Lubaczewski on 2023-02-06 at 11:58
On 4th of February 2023, Dean Rasheed committed patch: Allow underscores in integer and numeric constants.   This allows underscores to be used in integer and numeric literals, and their corresponding type input functions, for visual grouping. For example:   1_500_000_000 3.14159_26535_89793 0xffff_ffff 0b_1001_0001   A single underscore is allowed between any 2 digits, or … Continue reading "Waiting for PostgreSQL 16 – Allow underscores in integer and numeric constants."

I have a question…
Posted by Henrietta Dombrovskaya on 2023-02-06 at 00:08

Below is an excerpt from PostgreSQL documentation. Please, do not take me wrong – I take permissions and security very seriously, probably more than many others. I believe I have a very decent understanding of how PostgreSQL permissions work, better than many others. Still, my favorite database never fails to surprise me!

If a superuser chooses to issue a GRANT or REVOKE command, the command is performed as though it were issued by the owner of the affected object. Since all privileges ultimately come from the object owner (possibly indirectly via chains of grant options), it is possible for a superuser to revoke all privileges, but this might require use of CASCADE as stated above.

REVOKE can also be done by a role that is not the owner of the affected object, but is a member of the role that owns the object, or is a member of a role that holds privileges WITH GRANT OPTION on the object. In this case the command is performed as though it were issued by the containing role that actually owns the object or holds the privileges WITH GRANT OPTION. For example, if table t1 is owned by role g1, of which role u1 is a member, then u1 can revoke privileges on t1 that are recorded as being granted by g1. This would include grants made by u1 as well as by other members of role g1.

Please read these two paragraphs, preferably read them out loud. I have two questions. First, why in the world it has to be so complicated?! And second – why PostgreSQL documentation has to be written like that?!

Invoking (your own) Perl from PL/Perl
Posted by Luca Ferrari on 2023-02-06 at 00:00

A glance at how to invoke Perl code within PL/Perl code.

Invoking (your own) Perl from PL/Perl

Invoking your own Perl code from PL/Perl, how hard can it be?
Well, it turns out that it can be harder than you think. PL/Perl is made to allow Perl interacting with the SQL world. Assume a function, named get_prime requires to invoke another Perl function is_prime to test if a number is prime or not.
How is it possible to chain the function invocation?

Invoking PL/Perl from PL/Perl via a query

One obvious possibility is to wrap is_prime into a PL/Perl function. Since a PL/Perl function is, well, an ordinary function, it is always possible to call it the SQL way, as another ordinary function.



CREATE OR REPLACE FUNCTION is_prime( int )
RETURNS bool AS $CODE$
  my ( $n ) = @_;
  for ( 2 .. $n ) {
    last if $_ >= ( $n / 2 ) + 1;
    return 0 if $n % $_ == 0;
  }
  return 1;
$CODE$ LANGUAGE plperl;

CREATE OR REPLACE FUNCTION get_primes( int )
RETURNS SETOF int AS $CODE$
  for ( 1 .. $_[ 0 ] ) {
    my $result_set = spi_exec_query( "SELECT is_prime( $_ )" );
    return_next( $_ ) if ( $result_set->{ rows }[0]->{ is_prime } eq 't' );
  }
  return undef;
$CODE$ LANGUAGE plperl;



The function get_primes builds a query (SELECT is_prime( $_ );) that is executed several times in order to get the result.
Advantages of this approach are that this is the natural way to query PostgreSQL functions, and therefore it would be possible to mix and match PL/Perl functions with other PL-functions. The main drawback is that this approach is tedious and error prone, since there is the need to build SQL queries. Moreover, handling invocation and argument passing will slow down the execution of the main function.

Using anonymous subroutines

Luckily, Perl allows the definition of a subroutine within another subroutine, and to call it when required. One way to achieve this is by code references.



CREATE OR REPLACE FUNCTION get_prime( int )
RETURNS SETOF int AS $CODE$
  my $is_p
[...]

FOSDEM 2023
Posted by Stefanie Janine on 2023-02-05 at 23:00

FOSDEM 2023

Recent weekend the first in person FOSDEM since 2020 took place. There has also been a FOSDEM PGDay on Friday with some very interesting talks.

Slonik and me

At The PostgreSQL Booth

There has been a booth at FOSDEM organized by PostgreSQL Europe, where I volunteered.

Questions Being Asked

Multiple Writable PostgreSQL instances

What me really surprised has been several people asking about “multi-master”.
First of all, could people please adopt a better name like, for example, multi-primary?

None of these people really came up with an explanation, why they might need a multi-primary solution at all. After a short talk with one of them, I convinced him, that he wouldn’t need it all. All that was really needed here would be high availability. Something that can easily be achieved with Patroni, which I recommended.

There are some multi-primary solutions based on PostgreSQL available, the only one I would recommend, if at all, would be the solution by EDB, BDR. It is not available as open source, but it does come with good support, which is something you will need.

How To Handle Data in different Physical Locations

Another interesting talk has been with someone, who wanted to have stored data in different countries, but would like to access them as being in one location. The obvious solution here is the PostgreSQL Foreign Data Wrapper, short fdw.
With that fdw, you can connect to every PostgreSQL instance, that you could reach. Using views, or even materialized views you can combine data, that is located elsewhere.

Merchandise Stuff

We ran out very fast of our beloved small Sloniks, a big box of them have been gone on the first day before lunch.

Slonik and me

Also the small umbrellas went away fast, especially as it has been raining on Saturday morning.

PGSQL Phriday #005: Relational and Non-relational Data
Posted by Henrietta Dombrovskaya on 2023-02-04 at 04:45

The topic for the February edition of PGSQLPhriday is Relational and Non-Relational Data.

I was a little puzzled that the question “How do you define non-relational data?” is the last one. It only makes sense to answer the first three questions once you clearly define what you are talking about, so that is the first question I will address. 

Do we have a formal definition of non-relational data (or relational, for that matter)? If you can’t think of one, there is a good reason for that: “relational” is a characteristic of the model, not the dataThereby, the only definition we can give would be “The data you can’t represent using the relational model.” But is there anything in the definition of the relational model that limits it to certain types of data? The answer is no, so the best way to interpret the term “non-relational data” would be “the data which can’t benefit from being stored in a relational database.” Most often, it would be documents, images, and other blobs. 

If we never need to search inside the document, in other words, we never expect to use any full-text search; in my opinion, there is no reason to store these documents in a database. Like many others, I can recall several cases like the one mentioned by Pat Wright in his blog post. The only thing we need to store in the database is the pointer to the place where the actual document is stored. There is no benefit in storing it in the database.

However, it’s a different story when we need to perform a full-text search. Knowing that PostgreSQL’s full-text search capabilities are not the best tools available on the market, I would always try to justify the necessity of that feature. In many cases, after talking to the end users, I would find out that, in reality, the required search touches a limited number of fields/tables and can be transformed into a dynamically constructed query supported by b-tree indexes. (And we know that nothing can perform better than b-tree indexes!)

Finally – what if we truly need to implement th

[...]

Data Migration Tips
Posted by Joshua Tolley in End Point on 2023-02-04 at 00:00

Scattered leaves on grass fill the frame, made up of many colors; green, cyan, pale and bright yellow, with red leaves providing highlights

When you’re in the business of selling software to people, you tend to get a few chances to migrate data from their legacy software to your shiny new system. Most recently for me that has involved public health data exported from legacy disease surveillance systems into PostgreSQL databases for use by the open source EpiTrax system and its companion EMSA.

We have collected a few tips that may help you learn from our successes, as well as our mistakesparticularly educational experiences.

Customer Management

Your job is to satisfy your customers, and your customers want to know how the migration is progressing. Give them an answer, even if it’s just a generalization. This may be a burndown chart, a calculated percentage, a nifty graphic, or whatever, but something your project managers can show to their managers, to know more or less how far along things are.

Your job is also to know your system; that’s not the customer’s job. They shouldn’t have to get their data into a specific format for you to make use of it. Be as flexible as possible in the data format and structure you’ll accept. In theory, so long as your customer can provide the legacy data in a machine-readable format, you should be able to use it. Let them focus on getting the data out of their legacy system — which is sometimes quite an effort in itself! Real data is almost always messy data, and your customer will probably want to take the opportunity to clean things up; make it as easy as possible for them to do that, while still ensuring the migration proceeds quickly and smoothly.

Be careful about the vocabulary you use with your customer. Your system and the legacy system probably deal with the same kinds of data, and do generally the same kinds of things. Of course, your software does it better than the old ‘n’ busted mess you’re replacing, but in order to be better, your software has to be different from what it’s replacing. Different software means different methods, different processes, and different concepts. You an

[...]

Storing blobs in Postgres for our saved plans feature
Posted by Michael Christofides on 2023-02-03 at 18:44

The PGSQL Phriday topic this month is non-relational data, proposed by the excellent Ryan Lambert of Rustproof Labs. At first, I thought this might be yet another topic I can’t contribute much to… but then I remembered we do now store some some blobs in Postgres! 

The challenge

When we added the ability to publish query plans, back in May 2021, we needed a way to store them server-side. We considered our options, but went with the most logical place given our infrastructure — a bucket in Google Cloud.

More recently, when we added the ability to save plans, we had a similar, but not identical, set of requirements — and this time we opted to save the plans in the database. At first, this may seem to go against standard advice, which I thought might make it interesting!

By default, plans submitted to pgMustard are only stored in local browser storage, and are replaced over time. With saved plans, the aim was to allow people to select certain plans to keep longer term, or access them from multiple browsers, without having to publish them openly.

Why store blobs in the database?

We could easily have used a bucket for these plans, and retrieved them by id (like we do for published plans). However, a few factors led us to decide to buck the trend:

  1. Fewer moving parts — lots of folks were requesting to self-host pgMustard. While we don’t yet offer this, having fewer services needed for a self-hosted service felt wise. And naturally, our customers will be familiar with Postgres. 😎

  2. Our database is over-provisioned — we deliberately store as little data as possible, for both privacy and security reasons. Despite being on one of the smallest instances Google Cloud SQL offer, our largest table has fewer than 5,000 rows so we have a LOT of room to use the database for more things. 

  3. We didn’t anticipate much usage — while saving plans is very important some of the time, we didn’t want to save them by default (for the same privacy and security

[...]

How to Solve Advent of Code 2022 Using Postgres - Day 15
Posted by Greg Sabino Mullane in Crunchy Data on 2023-02-03 at 16:00

This article will contain spoilers both on how I solved 2022 Day 15's challenge "Beacon Exclusion Zone" using SQL, as well as general ideas on how to approach the problem. I recommend trying to solve it yourself first, using your favorite language. This article is delayed from the actual puzzle's release. Also note that my solutions are seldom going to be the "best" solutions - they were solved as quickly as possible, and these articles will show my first solutions, with some minor reformatting and cleaning up.

Hands on Tutorial

We've also loaded a tutorial for Day 15's challenge if you want to try it with a pre-loaded data set.

AOC Day 15

For this day's challenge, we are presented with an input file that contains the locations of many sensors, along with the closest beacon to that sensor. The first goal is to find out how many beacons CANNOT be in a certain row. The distance from the sensor to the beacon is measured in taxi distance also known as the snake distance or the Manhattan distance. See the Wikipedia entry if you are unfamiliar with it, as it will help to understand this problem. The file consists of lines like these:

 Sensor at x=1555825, y=18926: closest beacon is at x=1498426, y=-85030
 Sensor at x=697941, y=3552290: closest beacon is at x=595451, y=3788543
 Sensor at x=3997971, y=2461001: closest beacon is at x=3951198, y=2418718

As always, we are going to employ a handful of Postgres / SQL tools to help us solve this, including:

[...]

Using Postgres FILTER
Posted by Craig Kerstiens in Crunchy Data on 2023-02-03 at 15:00

For developers who have been around SQL long enough, they know that there is often more than one way to get the same result. Today I wanted to look at a few different ways to aggregate and show a total with paid and unpaid status. First, we will use a common table express (CTE), which is a nice method for organizing subqueries. Second, we use CASE statements within aggregation context to filter out the values we want. Lastly, we use FILTER to clean up the syntax, but effectively do the same as the CASE statement.

Hands On Tutorial

We've loaded a sample data set and a hands on tutorial in our browser based Postgres playground.

Target Output

The report we're going to work to generate is a monthly report of revenue from an invoices table. We'll want our end report to look something like:

    mnth    | billed  | uncollected | collected
------------+---------+-------------+----------
 2023-02-01 | 1498.06 | 1498.06     |       0
 2023-01-01 | 2993.95 | 1483.04     | 1510.91
 2022-12-01 | 1413.17 |  382.84     | 1030.33
 2022-11-01 | 1378.18 |  197.52     | 1180.66
 2022-10-01 | 1342.91 |  185.03     | 1157.88
 2022-09-01 | 1299.90 |   88.01     | 1211.89
 2022-08-01 | 1261.97 |   85.29     | 1176.68

Invoices Table

First, let’s look at the underlying data. Show the details of the invoices table:


                                              Table "public.invoices"
       Column       |              Type              | Collation | Nullable |               Default
--------------------+--------------------------------+-----------+----------+--------------------------------------
 id                 | bigint                         |           | not null | nextval('invoices_id_seq'::regclass)
 account_id         | integer                        |           |          |
 net_total_in_cents | integer                        |           |          |
 invoice_period     | daterange                      |           |          |
 status             | text                           |           |      
[...]

Leverage Google Cloud Logging + Monitoring for Custom Cloud SQL for Postgres or AlloyDB Alerts
Posted by Shane Borden on 2023-02-03 at 14:52

As migrations to CloudSQL and AlloyDB pick up speed, inevitably you will run into a condition where the cloud tooling has not quite caught up with exposing custom alerts and incidents that you may be exposing on-premises with tools such as Nagios or Oracle Enterprise Manager. One such example is monitoring of replication tools such as the GoldenGate Heartbeat table. While there are many ways that you may be able to implement this, I wanted to demonstrate a way to leverage Google Cloud Logging + Google Cloud Monitoring. Using this method will allow us to keep a long term log of certain parameters like lag or anything else you have built into the heartbeat mechanism. To demonstrate, lets use Python to query the database and create a Cloud Logging Entry:

import argparse
from datetime import datetime, timedelta
from sqlalchemy import create_engine, text
from google.cloud import logging


def retrievePgAlert(
    username: str,
    password: str,
    hostname: str,
    portNumber: int,
    databaseName: str,
    alertType: str,
) -> None:

    alertList: list = []

    conn_string = f"postgresql+psycopg2://{username}:{password}@{hostname}:{portNumber}/{databaseName}?client_encoding=utf8"
    engine = create_engine(conn_string)
    with engine.connect() as con:

        if alertType == "ogg-lag":
            sqlQuery = text(
                f"select replicat, effective_date, lag from ogg.heartbeat where lag >=:lagAmt and effective_date >= now() - interval ':intervalAmt min'"
            )

        result = con.execute(
            sqlQuery, {"lagAmt": oggLagAmt, "intervalAmt": checkIntervalMinutes}
        ).fetchall()
        for row in result:
            alertList.append(row)

        if not alertList:
            print(f"No alerts as of {datetime.now().strftime('%m/%d/%Y %H:%M:%S')}")
        else:
            for alertText in alertList:
                print(
                    f"Replicat: {alertText[0]} at date {alertText[1]} has a total lag of: {alertText[2]}
[...]

Postgres is Relational Plus
Posted by Ryan Lambert on 2023-02-03 at 05:01

I was the host for this month's #PGSQLPhriday topic (#005), and decided on the topic question: Is your data relational? This is my submission on the topic, and how I use Postgres for Relational Plus usages.

Non-relational data

Q: What non-relational data do you store in Postgres and how do you use it?

PostGIS is the most prominent non-relational data I am involved with. Pretty much all of the PostGIS data I am involved with is rooted alongside solidly relational data. Want to know demographics of customers within 10 miles of a specific location? The location portion is spatial, the demographic data is relational.

Fixing a broken postgis raster install
Posted by Regina Obe in PostGIS on 2023-02-03 at 00:00

As of PostGIS 3.0, the PostGIS raster support is no longer part of the postgis extension, but instead spun off into a PostGIS extension called postgis_raster.

Upgrading raster from 2.* to 3.* covers the proper way of upgrading your install from PostGIS 2+ to PostGIS 3+.

Unfortunately many people in a heat of panic when seeing the notice "UNPACKAGED" in their check of

SELECT postgis_full_extension()

tried to do something like drop raster type with cascade. Dropping the raster type DOES NOT remove all the raster functions, but DOES destroy all your raster data if you had any. Do not do that. But if you did happen to do that, this solution is for you.

The only solution after breaking your raster install is to remove the remaining bits of the postgis raster functionality. At that point, you can reinstall using CREATE EXENSION postgis_raster, or not bother if you don't need raster support.

PostgreSQL Constraints: Learning PostgreSQL with Grant
Posted by Grant Fritchey in Redgate on 2023-02-02 at 21:13

PostgreSQL supports constraints much like any other database management system. When you need to ensure certain behaviors of the data, you can put these constraints to work. I’ve already used several of these in creating my sample database (available articles publicly on GitHub, in the CreateDatabase.sql file). I’ll explain those as we go. The constraints supported by PostgreSQL are:

  • Not-Null Constraints
  • Unique Constraints
  • Primary Key
  • Foreign Key
  • Check Constraints
  • Exclusion Constraints

Not-Null Constraints

One way we can ensure that our data is correct is by requiring it in the first place. If you create a table like this:

create table public.example
(ID int,
SomeValue varchar(50));

Because you didn’t define whether or not the columns can, or cannot, accept NULL values, then these will accept NULL values. The thing about NULL is that it’s really useful. Sometimes, people simply don’t have the information to give you, so allowing NULL is a good solution. Other times though, you absolutely need that value to be added to the rows of your table.

Let’s take the radio.radios table from the hamshackradio database as an example:

CREATE TABLE IF NOT EXISTS radio.radios
        (radio_id int CONSTRAINT pkradios 
                  PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
        radio_name varchar(100) NOT NULL,
        manufacturer_id int NOT NULL,
        picture BYTEA NULL,
        connectortype_id int NOT NULL,
        digitalmode_id int NULL
);

Here I’ve used the clause NOT NULL on multiple columns to ensure that they must have information provided. In short, a radio must have a manufacturer (through the manufacturer_id column) and a name (through the radio_name column). Defining them with NOT NULL ensures that they must be filled out. You’ll also note that I use the NULL definition on the picture column. I could simply let the default take charge. For clarity, and to show intention, I use NULL instead.

So now, if I tried to insert data into the radio.radios t

[...]

Will Postgres Use My Index? Hypothetical Indexing for Postgres
Posted by Craig Kerstiens in Crunchy Data on 2023-02-02 at 15:00

Postgres is a great database with a ton of features including really rich indexing. Postgres itself maintains all sorts of data under the covers about things like cache hits and misses, when indexes are and aren't used, and more. If you're staring at a complex explain plan you may think some well targeted indexes may help, but how can you be sure? Enter HypoPG, a Postgres extension for adding hypothetical indexes to Postgres so you can do index planning.

HypoPG supports hypothetical indexes:

  • b-tree
  • brin
  • hash
  • but not gin or gist indexes

First load up some sample data. We're going to leverage the same dataset from our Postgres tutorial on indexing with b-trees.

And take a look at our first explain plan:

EXPLAIN ANALYZE
SELECT *
FROM weather
WHERE event_type = 'Winter Storm';
                                           QUERY PLAN
-----------------------------------------------------------------------------
 Seq Scan on weather  (cost=0.00..115.00 rows=11 width=622) (actual time=0.007..0.295 rows=11 loops=1)
   Filter: ((event_type)::text = 'Winter Storm'::text)
   Rows Removed by Filter: 1189
 Planning Time: 0.332 ms
 Execution Time: 0.423 ms
(5 rows)

Here we can see it's using a sequential scan. One option is to go ahead and create an index, but how do we know it'll work? How do we know Postgres will actually use the index? HypoPG will show whether the hypothetical index would be used or not.

Create the extension for HypoPG

CREATE EXTENSION hypopg;

Create a select with the create index statement inside the parenthesis.

SELECT hypopg_create_index('CREATE INDEX idx_weather_type ON weather(event_type);');

We can see that our hypothetical index is now used <13731>btree_weather_event_type.

           hypopg_create_index
-----------------------------------------
 (13732,<13732>btree_weather_event_type)
(1 row)

Time: 91.903 ms

Now we run EXPLAIN to see if it will use the hypothetical index

EXPLAIN SELECT * FROM weather WHERE event_type = 'Winter Storm';

🎉

[...]

PostgreSQL Database Security: External Server-Based Authentication
Posted by Ibrar Ahmed in Percona on 2023-02-02 at 14:06
PostgreSQL Database Security: External Server-Based Authentication

Database security is crucial to protecting sensitive information stored in a database. A critical aspect is authentication, which refers to verifying the identity of a user trying to access the database. The authentication process is typically accomplished through usernames and passwords and includes other factors such as security tokens and one-time passwords. Database Administrators (DBAs) must implement robust authentication mechanisms to ensure that only authorized users can access the database and its data. Additionally, it is best practice to monitor regularly and audit database authentication logs to detect and prevent unauthorized access attempts.

PostgreSQL server authentication refers to verifying the identity of a user trying to connect to the PostgreSQL server. This can be done using various methods, including password-based authentication, a client certificate, and external authentication methods such as GSSAPI, LDAP, and RADIUS. The pg_hba.conf file is used to configure the authentication methods the PostgreSQL server uses and can specify different methods for different types of connections. By default, PostgreSQL uses trust-based authentication, where any connection from localhost is considered trusted and doesn’t require a password. However, it’s important to note that it’s recommended to use password-based or external authentication methods for remote connections and production environments to ensure secure access to the data.

When talking about security, authentication is the first line of defense. PostgreSQL provides various methods of authentication, which are categorized into three categories.

 

PostgreSQL internal authentication and OS-based authentication have already been discussed in previous blogs. Now it’s time to discuss external authentication methods.

PostgreSQL external authentication

PostgreSQL supports a variety of external au

[...]

An Argument for Logical Failover Slots
Posted by Robert Bernier in Percona on 2023-02-02 at 13:02
Logical Failover Slots PostgreSQL

These days, the typical PostgreSQL cluster consists not only of a multi-node replication cluster, which is ordinarily an asynchronous streaming replication model but can sometimes include a logical replication component.

 

logical replication postgresql

 

Recall that logical replication works by using the PUB/SUB mode, where individual tables are published and are then subscribed by remotely connected databases/tables. DML operations, such as INSERT, UPDATE, DELETE, TRUNCATE, etc., are then replicated from one or more publishers to one or more subscribers.

For the most part it’s a pretty straightforward operation. The only real limitation is that one can only publish tables from a read-write server such as the PRIMARY.

 

There’s a problem, however, when it comes to executing failovers. While a STANDBY/REPLICA host can quickly take over with a simple select pg_promote() SQL statement, the same cannot be said regarding logical replication. Instead, failing over is, in fact, not a single action but a collection of discrete steps that must be performed in an exact and precise manner.

Consider the following steps typifying a failover:

  1. Promote the REPLICA, thus becoming the new PRIMARY.
  2. When present, redirect other REPLICAs pointing to the new PRIMARY.
  3. Failover the logical replication slot:
    1. Block all logical replication from the failed PRIMARY.
    2. Create a logical replication slot on the newly promoted PRIMARY.
    3. Alter the subscription and point to the newly promoted PRIMARY.
    4. Resume/restore logical replication between publisher and subscriber.

The challenge, of course, is reconstituting the logical replication process as quickly as possible, ideally without any data loss.

Two common solutions come to mind:

  1. Block all incoming connections before promoting the STANDBY while simultaneously installing the logical replication slot.
  2. Promote the STANDBY and then flus
[...]

When Proj Grid-Shifts Disappear
Posted by Paul Ramsey in PostGIS on 2023-02-02 at 08:00

Last week a user noted on the postgis-users list (paraphrase):

I upgraded from PostGIS 2.5 to 3.3 and now the results of my coordinate transforms are wrong. There is a vertical shift between the systems I’m using, but my vertical coordinates are unchanged.

Hmmm.

PostGIS gets all its coordinate reprojection smarts from the proj library. The user’s query looked like this:

SELECT ST_AsText(
    ST_Transform('SRID=7405;POINT(545068 258591 8.51)'::geometry, 
    4979
    ));

“We just use proj” is a lot less certain and stable an assertion than it appears on the surface. In fact, PostGIS “just uses proj” for proj versions from 4.9 all the way up to 9.2, and there has been a lot of change to the proj library over that sweep of releases.

  • The API radically changed around proj version 6, and that required a major rework of how PostGIS called the library.
  • The way proj dereferenced spatial reference identifiers into reprojection algorithms changed around then too (it got much slower) which required more changes in how we interacted with the library.
  • More recently the way proj handled “transformation grids” changed, which was the source of the complaint.

Exploring Proj

The first thing to do in debugging this “PostGIS problem” was to establish if it was in fact a PostGIS problem, or a problem in proj. There are commandline tools in proj to query what pipelines the system will use for a transform, and what the effect on coordinates will be, so I can take PostGIS right out of the picture.

We can run the query on the commandline:

echo 545068 258591 8.51 | cs2cs 'EPSG:7405' 'EPSG:4979'

Which returns:

52d12'23.241"N  0d7'17.603"E 8.510

So directly using proj we are seeing the same problem as in PostGIS SQL: no change in the vertical dimension, it goes in at 8.51 and comes out at 8.51. So the problem is not PostGIS, is is proj.

Transformation Grids

Cartographic transformations are nice deterministic functions, they take in a longitude and latitude

[...]

PostgreSQL 15 and beyond
Posted by Amit Kapila in Fujitsu on 2023-02-02 at 00:01

PostgreSQL 15 builds upon the enhancements of earlier versions and takes them to the next level, with several improvements to logical replication, increased sorting  performance, and the introduction of the MERGE command. I would like to talk about these features as well as what is under discussion for PostgreSQL 16.

Intro to Postgres Custom Data Types
Posted by Elizabeth Garrett Christensen in Crunchy Data on 2023-02-01 at 15:00

Custom data types is one of those many features that makes PostgreSQL flexible for development of a huge variety of business and application use cases. Data types will help you primarily for data integrity, ensuring your data is stored in the database as you want it to be. A common surprise gift of using strict data types is that you can end up reducing your long term data maintenance.

There’s two main ways to customize data types in Postgres:

  • Create DOMAINs, which are value constraints added to built-in data types
  • Creating user-defined custom data types

Before you go crazy with custom data types, make sure that one of the 43 existing data types won’t work for you ;). There’s so many data types and existing extensions, most people will never even need to touch custom data types. That doesn't mean it's not fun to learn about!

Hands on tutorials

My wonderful colleague, Jean-Paul Agudo wrote two hands-on tutorials for these topics, one for DOMAINS and the other for User Defined Custom Data Types.

Using CREATE DOMAIN

DOMAINs let you create a specific value check. For example if I want to make sure my birthdays are all greater than Jan 1st, 1930 and my emails are valid, I could create this:

CREATE DOMAIN date_of_birth AS date
        CHECK (value > '1930-01-01'::date)
;
CREATE DOMAIN valid_email AS text
        NOT NULL
        CHECK (value ~* '^[A-Za-z0-9._%-]+@[A-Za-z0-9.-]+[.][A-Za-z]+$')
;

When creating the table, assign the date_of_birth and valid_email data types to their respective columns. Now, if a database has multiple fields representing data of birth or valid email, the logic for those fields is portable to those additional columns.

CREATE TABLE person_using_domains
  (
     id         INTEGER GENERATED always AS IDENTITY PRIMARY KEY,
     firstname  TEXT NOT NULL,
     lastname   TEXT NOT NULL,
     birth_date DATE_OF_BIRTH,
     email      VALID_EMAIL
  );

When using psql, all domains will be listed by running \dD.

DOMAIN vs CHECK CONSTRAINT

So you

[...]

Unlocking the Secrets of TOAST: How To Optimize Large Column Storage in PostgreSQL for Top Performance and Scalability
Posted by Ibrar Ahmed in Percona on 2023-02-01 at 13:03
Unlocking the Secrets of TOAST PostgreSQL

PostgreSQL is a fantastic database, but if you’re storing images, video, audio files, or other large data objects, you need to “toast” them to get optimal performance. This post will look at using The Oversized-Attribute Storage Technique (TOAST) to improve performance and scalability. 

PostgreSQL uses fixed-sized pages, which makes it challenging to store huge data values directly. To solve this problem, large data values are compressed and broken into multiple smaller chunks. This process is done automatically and does not significantly impact how the database is used. This technique, called TOAST, improves how large data values are stored and used in the database.

Therefore, TOAST is a storage technique used in PostgreSQL to handle large data objects such as images, videos, and audio files. The TOAST technique allows for the efficient storage of large data objects by breaking them into smaller chunks and storing them separately from the main table. This can improve the performance of queries and indexing and reduce the amount of disk space required to store the data.

TOAST tables are created automatically by PostgreSQL when a table contains a column of type OID, bytea, or any other data type with the TOASTable storage class. The TOAST table is then used to store the large data objects, while the main table stores a reference to the TOAST table.

Here’s an example of using the TOAST technique in PostgreSQL:

  • Create a table with a large data column:
CREATE TABLE images ( id SERIAL PRIMARY KEY, data BYTEA );
  • Insert a large image into the table:
INSERT INTO images (data) VALUES (E'\\x...');
  • Query the table to see that the large data object is stored in a TOAST table:
SELECT relname, relkind FROM pg_class WHERE relname LIKE 'pg_toast%';

In the example, the images table contains a column named data of type bytea, which can hold extensive binary data. When a large image is inserted into the table, PostgreSQL automatically creates a TOAST table to store the ima

[...]

Null and Empty string in Oracle vs PostgreSQL vs SQL Server
Posted by Akhil Reddy Banappagari in MigOps on 2023-02-01 at 03:13

Database migrations from Oracle to PostgreSQL and SQL Server to PostgreSQL can get tricky when certain differences are not known. Unknown differences can lead to migration failures. We have earlier blogged about the behavior of DATE data type and Transaction Control statements between Oracle and SQL Server and PostgreSQL. In this article, we are going to discuss about the behavior of NULL and empty string in Oracle vs PostgreSQL vs SQL Server with Unique Constraint. This is the first article in our series about handling NULLS during migration from Oracle to PostgreSQL and SQL Server to PostgreSQL.

NULL and Empty String in Oracle vs PostgreSQL vs SQL Server

What is a NULL ?

According to ANSI SQL-92 specifications, a null is neither an empty string (for a character or a datetime data type) nor a value of zero (for numeric data types). To ensure that all nulls are handled uniformly, the ANSI SQL-92 specification stipulates that a null must be the same for all data types. When attributes have no data to be stored for a given record, it is represented by the SQL value, NULL.

NULL and Empty String in Oracle vs PostgreSQL vs SQL Server

Oracle treats both NULL as well as an Empty String as a NULL. PostgreSQL treats a NULL as a NULL and an Empty String as an Empty String. Similar to PostgreSQL, SQL Server also treats a NULL as a NULL and an Empty String as an Empty String.

Oracle needs one byte to store a NULL. Whereas, both PostgreSQL and SQL Server won’t require any space to store a NULL.

UNIQUE constraint on NULL and Empty String

In Oracle, a column on which there exists a unique constraint, can store any number of NULL entries and empty strings.

However, SQL Server only allows one NULL and one Empty String to be inserted to a column when there exists a Unique Constraint on it.

In PostgreSQL, multiple NULL entities similar to Oracle are allowed to be inserted. Similar to SQL Server, PostgreSQL only allows one Empty String when there exists a unique constraint on that column. 

Subscribe to our Newsletters today, to get notified abo

[...]

is it a bug?
Posted by Lætitia AVROT in EDB on 2023-02-01 at 00:00
Last week-end, I found this tweet: PostgreSQL, why do you fail me 😭 (on this syntax that not a single jOOQ user ever requested in > 10 years) pic.twitter.com/Wxr2JkWQBF — Lukas Eder (@lukaseder) January 26, 2023 My take was that Postgres was right and conforming to the SQL Standard whereas the other RDBMS were wrong, but that’s faith, not proof. So, I gathered my courage and deep dived into the SQL Standard to find formal proof.

PG Day Chicago Registration Is Open!
Posted by Henrietta Dombrovskaya on 2023-01-31 at 15:07

Register at the conference website, and take advantage of the Early Bird discount!

Transaction ID wraparound: a walk on the wild side
Posted by Laurenz Albe in Cybertec on 2023-01-31 at 09:00

transaction ID wraparound protection by freezing the server room?
© Laurenz Albe 2022

Most people are aware of transaction ID wraparound. The concept has been well explained in Hans’ article, so I won’t repeat all that here. But for most people it is an abstract concept, a bogeyman lurking in the dark around the corner. Many people know horror stories about anti-wraparound autovacuum tanking performance or databases that stop working, but who has ever seen actual data loss? I decided to face the beast (having developed a fondness for breaking things), and I learned something along the way. If you are curious, come along on this trip! We’ll do some scary things that you shouldn’t do to your databases at home.

Is data loss from transaction ID wraparound a real danger?

No, it isn’t. There are strong protections in PostgreSQL to prevent that. As you will see, I have to use evil tricks to overcome those protections. Still, these protections can disrupt operation, so it is good to be aware of the problem. But don’t panic: usually, you won’t even notice when your transaction IDs wrap around.

Preparing a database cluster

Since I will cause data corruption, I’ll create a new cluster that I can discard afterwards:

initdb testdb

Let’s start the cluster. I will use a free, non-standard port and allow a prepared transaction. That prepared transaction will later be used to stir trouble.

pg_ctl start -o '-c port=5555' \
             -o '-c unix_socket_directories=/tmp' \
             -o '-c max_prepared_transactions=1' \
             -D testdb \
             -l logfile

Now let’s connect with

psql -h /tmp -p 5555 -d postgres

Now we can create some data (that we will destroy later). I’ll return xmin for each row I create, which stores the transaction ID of the creating transaction. Similarly, I’ll return xmax for all deleting statements. For detailed information about xmax, you can read my article on that topic.

It is very important that we don’t SELECT from the table at this stage, since that would set hint bits, which would spoil the effect.

C
[...]

Luca Ferrari
Posted by Andreas 'ads' Scherbaum on 2023-01-30 at 14:00
PostgreSQL Person of the Week Interview with Luca Ferrari: I live in Italy, with my beautiful wife Emanuela, my great son Diego, our female cats, Sofia and Olivia, and our female dog Franzina. We live near the city of Modena, the land of sports cars like Ferrari and Maserati, and also the center of very good food and wine.

Nordic PgDay 2023
Posted by Christophe Pettus in pgExperts on 2023-01-30 at 07:00

I’m very pleased to be talking about real-life logical replication at Nordic PgDay 2023, in beautiful Stockholm.

PostgreSQL supply chain
Posted by Peter Eisentraut in EDB on 2023-01-30 at 05:00

If you are an application developer and you build on PostgreSQL, then maybe you have looked into where PostgreSQL comes from, who develops it, and where you can get professional help if needed.

Now, if you are a PostgreSQL developer (hi!), do you know what you are building on, where those things come from, who develops them, and where you get get professional help if needed?

Consider the dependency diagram of the internet:

PostgreSQL is perhaps one of the bigger boxes in the middle. But what are the shapes of the boxes below it?

Some are okay: Well-known projects with robust communities and ecosystems, such as Perl, Python, ICU, LLVM, systemd; those are big boxes. There is also OpenSSL, which in a well-publicized case was at point was one of those thin boxes, but that’s been rectified. There are lz4 and zstd, somewhat new arrivals in the dependencies of PostgreSQL, which appear to have active communities in their respective GitHub projects. There is also zlib, which is in maintaince mode but still putting out regular releases. This is all fine.

There is some stuff in the murky middle that I’m too lazy to look into. Who maintains libkrb5 and libldap and libpam these days? I guess these libraries are probably used widely enough that someone will care, and maybe at the end of the day with a support contract someone like Red Hat will be on the hook.

But there are also the rather thin sticks propping up PostgreSQL (and many other things):

  • The entire documentation toolchain (docbook-xml, docbook-xsl, libxml, libxslt, xsltproc) is in low to no maintenance mode. I discussed this in more detail in my presentation at FOSDEM 2021.

  • GNU Gettext is being maintained, but just barely. I would love some action there. Here is a feature request, for example. How can we help? I will talk about this in my presentation at FOSDEM 2023.

  • The TAP tests in PostgreSQL rely on the Perl module IPC::Run, which by the way is explicitly looking for a new

[...]

Relational and Non-relational Data: PGSQL Phriday #005
Posted by Andreas Scherbaum on 2023-01-29 at 17:33

Ryan Lambert asks in this month's PGSQL Phriday:

  • What non-relational data do you store in Postgres and how do you use it?
  • Have you attempted non-relational uses of Postgres that did not work well? What was the problem?
  • What are the biggest challenges with your data, whatever its structure?
  • Bonus: How do you define non-relational data?

Looking over the many databases we operate at work, I can say that we are mostly a "relational shop". Almost every database we have uses relational structures, and does not attempt to go in on "full NoSQL", or "store non-relational data". Many of the databases are on the larger size, often in the hundreds of GB or even several TBs. We have good people in our dev teams who understand both PostgreSQL and SQL, and can develop a proper schema for the data, and write performant queries.

There are a few exceptions though, but at least for us it's only a small number. Whenever we retrieve or get data which comes in from web requests, we may store this as JSON, and then we use JSONb. Quite often however it's rather "storage as JSON", not "querying the JSON in the database". It's just more convenient to keep the data in JSON format all the way, instead of transforming it several times.

 

Continue reading "Relational and Non-relational Data: PGSQL Phriday #005"

PGDay/MED 2023 - Call for Papers
Posted by Magnus Hagander in PostgreSQL Europe on 2023-01-28 at 11:09

PGDay/MED is pleased to announce that the call for papers is open for our inaugural conference.

We are looking for talks on any PostgreSQL-related subject for the one-day conference. Got a big idea you want to share? A problem you have an innovative solution to? An advancement you want to put forward? We want to hear from you!

All talks should last about 40-45 minutes, with an additional 5-10 minutes for questions. Due to the international nature of our audience, we are requesting that talks be made in English for this conference.

Key dates:

  • The call for papers will close on Sunday February 12. We know it’s not a lot of time but we know the PostgreSQL folks are the best, you can get it done!
  • We will try to inform all speakers by Wednesday February 15 if their talk has been selected.
  • Conference is scheduled for April 23 in Malta.

Submit your talk for consideration here. We look forward to hearing from you!

PGDay/MED is a conference organized by PostgreSQL Europe. It follows the PostgreSQL Community Conference Recognition guidelines

Large Object in PostgreSQL
Posted by David Z in Highgo Software on 2023-01-27 at 21:21

1. Overview

Sometimes, you may need to manage large objects, i.e. CLOB, BLOB and BFILE, using PostgreSQL. There are two ways to deal with large objects in PostgreSQL: one is to use existing data type, i.e. bytea for binary large object, and text for character-based large object; another is to use pg_largeobject; This blog will explain how to use pg_largeobject.

2. pg_largeobject

pg_largeobject is one of the large object solutions provided by PostgreSQL, which allows stream-style access to user data that is stored in a special large-object structure. Streaming access is useful when working with data values that are too large to manipulate conveniently as a whole.

pg_largeobject is a system table used store the actual large objects. Each large object has an entry in system table pg_largeobject_metadata. The large objects managed using pg_largeobject can be created, modified, and deleted using a read/write API. pg_largeobject allows to store large object up to 4TB.

The other large object solution using existing data type bytea and text built on top of TOAST table, which limits the large object’s size to 1GB.

Here are the schema of these two system tables designed for pg_largeobject.

postgres=# \d+ pg_largeobject;
                                    Table "pg_catalog.pg_largeobject"
 Column |  Type   | Collation | Nullable | Default | Storage  | Compression | Stats target | Description 
--------+---------+-----------+----------+---------+----------+-------------+--------------+-------------
 loid   | oid     |           | not null |         | plain    |             |              | 
 pageno | integer |           | not null |         | plain    |             |              | 
 data   | bytea   |           | not null |         | extended |             |              | 
Indexes:
    "pg_largeobject_loid_pn_index" PRIMARY KEY, btree (loid, pageno)
Access method: heap

postgres=# \d+ pg_largeobject_metadata;
                                 Table "pg_catalog.pg_largeobject_metadata"
  Column  |
[...]

Top posters

Number of posts in the past two months

Top teams

Number of posts in the past two months

Feeds

Planet

  • Policy for being listed on Planet PostgreSQL.
  • Add your blog to Planet PostgreSQL.
  • List of all subscribed blogs.
  • Manage your registration.

Contact

Get in touch with the Planet PostgreSQL administrators at planet at postgresql.org.