Below is an excerpt from PostgreSQL documentation. Please, do not take me wrong – I take permissions and security very seriously, probably more than many others. I believe I have a very decent understanding of how PostgreSQL permissions work, better than many others. Still, my favorite database never fails to surprise me!
If a superuser chooses to issue a GRANT or REVOKE command, the command is performed as though it were issued by the owner of the affected object. Since all privileges ultimately come from the object owner (possibly indirectly via chains of grant options), it is possible for a superuser to revoke all privileges, but this might require use of CASCADE as stated above. REVOKE can also be done by a role that is not the owner of the affected object, but is a member of the role that owns the object, or is a member of a role that holds privileges WITH GRANT OPTION on the object. In this case the command is performed as though it were issued by the containing role that actually owns the object or holds the privileges WITH GRANT OPTION. For example, if table t1 is owned by role g1, of which role u1 is a member, then u1 can revoke privileges on t1 that are recorded as being granted by g1. This would include grants made by u1 as well as by other members of role g1.
Please read these two paragraphs, preferably read them out loud. I have two questions. First, why in the world it has to be so complicated?! And second – why PostgreSQL documentation has to be written like that?!
A glance at how to invoke Perl code within PL/Perl code.
Invoking your own Perl code from PL/Perl, how hard can it be?
Well, it turns out that it can be harder than you think. PL/Perl is made to allow Perl interacting with the SQL world. Assume a function, named get_prime
requires to invoke another Perl function is_prime
to test if a number is prime or not.
How is it possible to chain the function invocation?
One obvious possibility is to wrap is_prime
into a PL/Perl function. Since a PL/Perl function is, well, an ordinary function, it is always possible to call it the SQL way, as another ordinary function.
CREATE OR REPLACE FUNCTION is_prime( int )
RETURNS bool AS $CODE$
my ( $n ) = @_;
for ( 2 .. $n ) {
last if $_ >= ( $n / 2 ) + 1;
return 0 if $n % $_ == 0;
}
return 1;
$CODE$ LANGUAGE plperl;
CREATE OR REPLACE FUNCTION get_primes( int )
RETURNS SETOF int AS $CODE$
for ( 1 .. $_[ 0 ] ) {
my $result_set = spi_exec_query( "SELECT is_prime( $_ )" );
return_next( $_ ) if ( $result_set->{ rows }[0]->{ is_prime } eq 't' );
}
return undef;
$CODE$ LANGUAGE plperl;
The function get_primes
builds a query (SELECT is_prime( $_ );
) that is executed several times in order to get the result.
Advantages of this approach are that this is the natural way to query PostgreSQL functions, and therefore it would be possible to mix and match PL/Perl functions with other PL-functions. The main drawback is that this approach is tedious and error prone, since there is the need to build SQL queries. Moreover, handling invocation and argument passing will slow down the execution of the main function.
Luckily, Perl allows the definition of a subroutine within another subroutine, and to call it when required. One way to achieve this is by code references.
CREATE OR REPLACE FUNCTION get_prime( int )
RETURNS SETOF int AS $CODE$
my $is_p
Recent weekend the first in person FOSDEM since 2020 took place. There has also been a FOSDEM PGDay on Friday with some very interesting talks.
There has been a booth at FOSDEM organized by PostgreSQL Europe, where I volunteered.
What me really surprised has been several people asking about “multi-master”.
First of all, could people please adopt a better name like, for example, multi-primary?
None of these people really came up with an explanation, why they might need a multi-primary solution at all. After a short talk with one of them, I convinced him, that he wouldn’t need it all. All that was really needed here would be high availability. Something that can easily be achieved with Patroni, which I recommended.
There are some multi-primary solutions based on PostgreSQL available, the only one I would recommend, if at all, would be the solution by EDB, BDR. It is not available as open source, but it does come with good support, which is something you will need.
Another interesting talk has been with someone, who wanted to have stored data in different countries, but would like to access them as being in one location. The obvious solution here is the PostgreSQL Foreign Data Wrapper, short fdw.
With that fdw, you can connect to every PostgreSQL instance, that you could reach. Using views, or even materialized views you can combine data, that is located elsewhere.
We ran out very fast of our beloved small Sloniks, a big box of them have been gone on the first day before lunch.
Also the small umbrellas went away fast, especially as it has been raining on Saturday morning.
The topic for the February edition of PGSQLPhriday is Relational and Non-Relational Data.
I was a little puzzled that the question “How do you define non-relational data?” is the last one. It only makes sense to answer the first three questions once you clearly define what you are talking about, so that is the first question I will address.
Do we have a formal definition of non-relational data (or relational, for that matter)? If you can’t think of one, there is a good reason for that: “relational” is a characteristic of the model, not the data. Thereby, the only definition we can give would be “The data you can’t represent using the relational model.” But is there anything in the definition of the relational model that limits it to certain types of data? The answer is no, so the best way to interpret the term “non-relational data” would be “the data which can’t benefit from being stored in a relational database.” Most often, it would be documents, images, and other blobs.
If we never need to search inside the document, in other words, we never expect to use any full-text search; in my opinion, there is no reason to store these documents in a database. Like many others, I can recall several cases like the one mentioned by Pat Wright in his blog post. The only thing we need to store in the database is the pointer to the place where the actual document is stored. There is no benefit in storing it in the database.
However, it’s a different story when we need to perform a full-text search. Knowing that PostgreSQL’s full-text search capabilities are not the best tools available on the market, I would always try to justify the necessity of that feature. In many cases, after talking to the end users, I would find out that, in reality, the required search touches a limited number of fields/tables and can be transformed into a dynamically constructed query supported by b-tree indexes. (And we know that nothing can perform better than b-tree indexes!)
Finally – what if we truly need to implement th
[...]
When you’re in the business of selling software to people, you tend to get a few chances to migrate data from their legacy software to your shiny new system. Most recently for me that has involved public health data exported from legacy disease surveillance systems into PostgreSQL databases for use by the open source EpiTrax system and its companion EMSA.
We have collected a few tips that may help you learn from our successes, as well as our mistakesparticularly educational experiences.
Your job is to satisfy your customers, and your customers want to know how the migration is progressing. Give them an answer, even if it’s just a generalization. This may be a burndown chart, a calculated percentage, a nifty graphic, or whatever, but something your project managers can show to their managers, to know more or less how far along things are.
Your job is also to know your system; that’s not the customer’s job. They shouldn’t have to get their data into a specific format for you to make use of it. Be as flexible as possible in the data format and structure you’ll accept. In theory, so long as your customer can provide the legacy data in a machine-readable format, you should be able to use it. Let them focus on getting the data out of their legacy system — which is sometimes quite an effort in itself! Real data is almost always messy data, and your customer will probably want to take the opportunity to clean things up; make it as easy as possible for them to do that, while still ensuring the migration proceeds quickly and smoothly.
Be careful about the vocabulary you use with your customer. Your system and the legacy system probably deal with the same kinds of data, and do generally the same kinds of things. Of course, your software does it better than the old ‘n’ busted mess you’re replacing, but in order to be better, your software has to be different from what it’s replacing. Different software means different methods, different processes, and different concepts. You an
[...]The PGSQL Phriday topic this month is non-relational data, proposed by the excellent Ryan Lambert of Rustproof Labs. At first, I thought this might be yet another topic I can’t contribute much to… but then I remembered we do now store some some blobs in Postgres!
When we added the ability to publish query plans, back in May 2021, we needed a way to store them server-side. We considered our options, but went with the most logical place given our infrastructure — a bucket in Google Cloud.
More recently, when we added the ability to save plans, we had a similar, but not identical, set of requirements — and this time we opted to save the plans in the database. At first, this may seem to go against standard advice, which I thought might make it interesting!
By default, plans submitted to pgMustard are only stored in local browser storage, and are replaced over time. With saved plans, the aim was to allow people to select certain plans to keep longer term, or access them from multiple browsers, without having to publish them openly.
We could easily have used a bucket for these plans, and retrieved them by id (like we do for published plans). However, a few factors led us to decide to buck the trend:
Fewer moving parts — lots of folks were requesting to self-host pgMustard. While we don’t yet offer this, having fewer services needed for a self-hosted service felt wise. And naturally, our customers will be familiar with Postgres. 😎
Our database is over-provisioned — we deliberately store as little data as possible, for both privacy and security reasons. Despite being on one of the smallest instances Google Cloud SQL offer, our largest table has fewer than 5,000 rows so we have a LOT of room to use the database for more things.
We didn’t anticipate much usage — while saving plans is very important some of the time, we didn’t want to save them by default (for the same privacy and security
This article will contain spoilers both on how I solved 2022 Day 15's challenge "Beacon Exclusion Zone" using SQL, as well as general ideas on how to approach the problem. I recommend trying to solve it yourself first, using your favorite language. This article is delayed from the actual puzzle's release. Also note that my solutions are seldom going to be the "best" solutions - they were solved as quickly as possible, and these articles will show my first solutions, with some minor reformatting and cleaning up.
We've also loaded a tutorial for Day 15's challenge if you want to try it with a pre-loaded data set.
For this day's challenge, we are presented with an input file that contains the locations of many sensors, along with the closest beacon to that sensor. The first goal is to find out how many beacons CANNOT be in a certain row. The distance from the sensor to the beacon is measured in taxi distance also known as the snake distance or the Manhattan distance. See the Wikipedia entry if you are unfamiliar with it, as it will help to understand this problem. The file consists of lines like these:
Sensor at x=1555825, y=18926: closest beacon is at x=1498426, y=-85030
Sensor at x=697941, y=3552290: closest beacon is at x=595451, y=3788543
Sensor at x=3997971, y=2461001: closest beacon is at x=3951198, y=2418718
As always, we are going to employ a handful of Postgres / SQL tools to help us solve this, including:
For developers who have been around SQL long enough, they know that there is often more than one way to get the same result. Today I wanted to look at a few different ways to aggregate and show a total with paid and unpaid status. First, we will use a common table express (CTE), which is a nice method for organizing subqueries. Second, we use CASE
statements within aggregation context to filter out the values we want. Lastly, we use FILTER
to clean up the syntax, but effectively do the same as the CASE
statement.
We've loaded a sample data set and a hands on tutorial in our browser based Postgres playground.
The report we're going to work to generate is a monthly report of revenue from an invoices table. We'll want our end report to look something like:
mnth | billed | uncollected | collected
------------+---------+-------------+----------
2023-02-01 | 1498.06 | 1498.06 | 0
2023-01-01 | 2993.95 | 1483.04 | 1510.91
2022-12-01 | 1413.17 | 382.84 | 1030.33
2022-11-01 | 1378.18 | 197.52 | 1180.66
2022-10-01 | 1342.91 | 185.03 | 1157.88
2022-09-01 | 1299.90 | 88.01 | 1211.89
2022-08-01 | 1261.97 | 85.29 | 1176.68
First, let’s look at the underlying data. Show the details of the invoices table:
Table "public.invoices"
Column | Type | Collation | Nullable | Default
--------------------+--------------------------------+-----------+----------+--------------------------------------
id | bigint | | not null | nextval('invoices_id_seq'::regclass)
account_id | integer | | |
net_total_in_cents | integer | | |
invoice_period | daterange | | |
status | text | |
[...]
As migrations to CloudSQL and AlloyDB pick up speed, inevitably you will run into a condition where the cloud tooling has not quite caught up with exposing custom alerts and incidents that you may be exposing on-premises with tools such as Nagios or Oracle Enterprise Manager. One such example is monitoring of replication tools such as the GoldenGate Heartbeat table. While there are many ways that you may be able to implement this, I wanted to demonstrate a way to leverage Google Cloud Logging + Google Cloud Monitoring. Using this method will allow us to keep a long term log of certain parameters like lag or anything else you have built into the heartbeat mechanism. To demonstrate, lets use Python to query the database and create a Cloud Logging Entry:
import argparse from datetime import datetime, timedelta from sqlalchemy import create_engine, text from google.cloud import logging def retrievePgAlert( username: str, password: str, hostname: str, portNumber: int, databaseName: str, alertType: str, ) -> None: alertList: list = [] conn_string = f"postgresql+psycopg2://{username}:{password}@{hostname}:{portNumber}/{databaseName}?client_encoding=utf8" engine = create_engine(conn_string) with engine.connect() as con: if alertType == "ogg-lag": sqlQuery = text( f"select replicat, effective_date, lag from ogg.heartbeat where lag >=:lagAmt and effective_date >= now() - interval ':intervalAmt min'" ) result = con.execute( sqlQuery, {"lagAmt": oggLagAmt, "intervalAmt": checkIntervalMinutes} ).fetchall() for row in result: alertList.append(row) if not alertList: print(f"No alerts as of {datetime.now().strftime('%m/%d/%Y %H:%M:%S')}") else: for alertText in alertList: print( f"Replicat: {alertText[0]} at date {alertText[1]} has a total lag of: {alertText[2]}[...]
I was the host for this month's #PGSQLPhriday topic (#005), and decided on the topic question: Is your data relational? This is my submission on the topic, and how I use Postgres for Relational Plus usages.
Q: What non-relational data do you store in Postgres and how do you use it?
PostGIS is the most prominent non-relational data I am involved with. Pretty much all of the PostGIS data I am involved with is rooted alongside solidly relational data. Want to know demographics of customers within 10 miles of a specific location? The location portion is spatial, the demographic data is relational.
As of PostGIS 3.0, the PostGIS raster support is no longer part of the postgis
extension, but instead spun off into a PostGIS extension called postgis_raster
.
Upgrading raster from 2.* to 3.* covers the proper way of upgrading your install from PostGIS 2+ to PostGIS 3+.
Unfortunately many people in a heat of panic when seeing the notice "UNPACKAGED" in their check of
SELECT postgis_full_extension()
tried to do something like drop raster type with cascade. Dropping the raster type DOES NOT remove all the raster functions, but DOES destroy all your raster data if you had any. Do not do that. But if you did happen to do that, this solution is for you.
The only solution after breaking your raster install is to remove the remaining bits of the postgis raster functionality. At that point, you can reinstall using CREATE EXENSION postgis_raster
, or not bother if you don't need raster support.
PostgreSQL supports constraints much like any other database management system. When you need to ensure certain behaviors of the data, you can put these constraints to work. I’ve already used several of these in creating my sample database (available articles publicly on GitHub, in the CreateDatabase.sql
file). I’ll explain those as we go. The constraints supported by PostgreSQL are:
One way we can ensure that our data is correct is by requiring it in the first place. If you create a table like this:
create table public.example (ID int, SomeValue varchar(50));
Because you didn’t define whether or not the columns can, or cannot, accept NULL
values, then these will accept NULL
values. The thing about NULL
is that it’s really useful. Sometimes, people simply don’t have the information to give you, so allowing NULL
is a good solution. Other times though, you absolutely need that value to be added to the rows of your table.
Let’s take the radio.radios
table from the hamshackradio
database as an example:
CREATE TABLE IF NOT EXISTS radio.radios (radio_id int CONSTRAINT pkradios PRIMARY KEY GENERATED ALWAYS AS IDENTITY, radio_name varchar(100) NOT NULL, manufacturer_id int NOT NULL, picture BYTEA NULL, connectortype_id int NOT NULL, digitalmode_id int NULL );
Here I’ve used the clause NOT NULL
on multiple columns to ensure that they must have information provided. In short, a radio must have a manufacturer (through the manufacturer_id
column) and a name (through the radio_name
column). Defining them with NOT NULL
ensures that they must be filled out. You’ll also note that I use the NULL
definition on the picture column. I could simply let the default take charge. For clarity, and to show intention, I use NULL
instead.
So now, if I tried to insert data into the radio.radios
t
Postgres is a great database with a ton of features including really rich indexing. Postgres itself maintains all sorts of data under the covers about things like cache hits and misses, when indexes are and aren't used, and more. If you're staring at a complex explain plan you may think some well targeted indexes may help, but how can you be sure? Enter HypoPG, a Postgres extension for adding hypothetical indexes to Postgres so you can do index planning.
HypoPG supports hypothetical indexes:
First load up some sample data. We're going to leverage the same dataset from our Postgres tutorial on indexing with b-trees.
And take a look at our first explain plan:
EXPLAIN ANALYZE
SELECT *
FROM weather
WHERE event_type = 'Winter Storm';
QUERY PLAN
-----------------------------------------------------------------------------
Seq Scan on weather (cost=0.00..115.00 rows=11 width=622) (actual time=0.007..0.295 rows=11 loops=1)
Filter: ((event_type)::text = 'Winter Storm'::text)
Rows Removed by Filter: 1189
Planning Time: 0.332 ms
Execution Time: 0.423 ms
(5 rows)
Here we can see it's using a sequential scan. One option is to go ahead and create an index, but how do we know it'll work? How do we know Postgres will actually use the index? HypoPG will show whether the hypothetical index would be used or not.
Create the extension for HypoPG
CREATE EXTENSION hypopg;
Create a select with the create index statement inside the parenthesis.
SELECT hypopg_create_index('CREATE INDEX idx_weather_type ON weather(event_type);');
We can see that our hypothetical index is now used <13731>btree_weather_event_type.
hypopg_create_index
-----------------------------------------
(13732,<13732>btree_weather_event_type)
(1 row)
Time: 91.903 ms
Now we run EXPLAIN
to see if it will use the hypothetical index
EXPLAIN SELECT * FROM weather WHERE event_type = 'Winter Storm';
🎉
[...]Database security is crucial to protecting sensitive information stored in a database. A critical aspect is authentication, which refers to verifying the identity of a user trying to access the database. The authentication process is typically accomplished through usernames and passwords and includes other factors such as security tokens and one-time passwords. Database Administrators (DBAs) must implement robust authentication mechanisms to ensure that only authorized users can access the database and its data. Additionally, it is best practice to monitor regularly and audit database authentication logs to detect and prevent unauthorized access attempts.
PostgreSQL server authentication refers to verifying the identity of a user trying to connect to the PostgreSQL server. This can be done using various methods, including password-based authentication, a client certificate, and external authentication methods such as GSSAPI, LDAP, and RADIUS. The pg_hba.conf file is used to configure the authentication methods the PostgreSQL server uses and can specify different methods for different types of connections. By default, PostgreSQL uses trust-based authentication, where any connection from localhost is considered trusted and doesn’t require a password. However, it’s important to note that it’s recommended to use password-based or external authentication methods for remote connections and production environments to ensure secure access to the data.
When talking about security, authentication is the first line of defense. PostgreSQL provides various methods of authentication, which are categorized into three categories.
PostgreSQL internal authentication and OS-based authentication have already been discussed in previous blogs. Now it’s time to discuss external authentication methods.
PostgreSQL supports a variety of external au
[...]These days, the typical PostgreSQL cluster consists not only of a multi-node replication cluster, which is ordinarily an asynchronous streaming replication model but can sometimes include a logical replication component.
Recall that logical replication works by using the PUB/SUB mode, where individual tables are published and are then subscribed by remotely connected databases/tables. DML operations, such as INSERT, UPDATE, DELETE, TRUNCATE, etc., are then replicated from one or more publishers to one or more subscribers.
For the most part it’s a pretty straightforward operation. The only real limitation is that one can only publish tables from a read-write server such as the PRIMARY.
There’s a problem, however, when it comes to executing failovers. While a STANDBY/REPLICA host can quickly take over with a simple select pg_promote() SQL statement, the same cannot be said regarding logical replication. Instead, failing over is, in fact, not a single action but a collection of discrete steps that must be performed in an exact and precise manner.
Consider the following steps typifying a failover:
The challenge, of course, is reconstituting the logical replication process as quickly as possible, ideally without any data loss.
Two common solutions come to mind:
Last week a user noted on the postgis-users list (paraphrase):
I upgraded from PostGIS 2.5 to 3.3 and now the results of my coordinate transforms are wrong. There is a vertical shift between the systems I’m using, but my vertical coordinates are unchanged.
Hmmm.
PostGIS gets all its coordinate reprojection smarts from the proj library. The user’s query looked like this:
SELECT ST_AsText(
ST_Transform('SRID=7405;POINT(545068 258591 8.51)'::geometry,
4979
));
“We just use proj” is a lot less certain and stable an assertion than it appears on the surface. In fact, PostGIS “just uses proj” for proj versions from 4.9 all the way up to 9.2, and there has been a lot of change to the proj library over that sweep of releases.
The first thing to do in debugging this “PostGIS problem” was to establish if it was in fact a PostGIS problem, or a problem in proj. There are commandline tools in proj to query what pipelines the system will use for a transform, and what the effect on coordinates will be, so I can take PostGIS right out of the picture.
We can run the query on the commandline:
echo 545068 258591 8.51 | cs2cs 'EPSG:7405' 'EPSG:4979'
Which returns:
52d12'23.241"N 0d7'17.603"E 8.510
So directly using proj we are seeing the same problem as in PostGIS SQL: no change in the vertical dimension, it goes in at 8.51 and comes out at 8.51. So the problem is not PostGIS, is is proj.
Cartographic transformations are nice deterministic functions, they take in a longitude and latitude
[...]PostgreSQL 15 builds upon the enhancements of earlier versions and takes them to the next level, with several improvements to logical replication, increased sorting performance, and the introduction of the MERGE command. I would like to talk about these features as well as what is under discussion for PostgreSQL 16.
Custom data types is one of those many features that makes PostgreSQL flexible for development of a huge variety of business and application use cases. Data types will help you primarily for data integrity, ensuring your data is stored in the database as you want it to be. A common surprise gift of using strict data types is that you can end up reducing your long term data maintenance.
There’s two main ways to customize data types in Postgres:
Before you go crazy with custom data types, make sure that one of the 43 existing data types won’t work for you ;). There’s so many data types and existing extensions, most people will never even need to touch custom data types. That doesn't mean it's not fun to learn about!
My wonderful colleague, Jean-Paul Agudo wrote two hands-on tutorials for these topics, one for DOMAINS and the other for User Defined Custom Data Types.
DOMAINs let you create a specific value check. For example if I want to make sure my birthdays are all greater than Jan 1st, 1930 and my emails are valid, I could create this:
CREATE DOMAIN date_of_birth AS date
CHECK (value > '1930-01-01'::date)
;
CREATE DOMAIN valid_email AS text
NOT NULL
CHECK (value ~* '^[A-Za-z0-9._%-]+@[A-Za-z0-9.-]+[.][A-Za-z]+$')
;
When creating the table, assign the date_of_birth
and valid_email
data types to their respective columns. Now, if a database has multiple fields representing data of birth or valid email, the logic for those fields is portable to those additional columns.
CREATE TABLE person_using_domains
(
id INTEGER GENERATED always AS IDENTITY PRIMARY KEY,
firstname TEXT NOT NULL,
lastname TEXT NOT NULL,
birth_date DATE_OF_BIRTH,
email VALID_EMAIL
);
When using psql
, all domains will be listed by running \dD
.
So you
[...]PostgreSQL is a fantastic database, but if you’re storing images, video, audio files, or other large data objects, you need to “toast” them to get optimal performance. This post will look at using The Oversized-Attribute Storage Technique (TOAST) to improve performance and scalability.
PostgreSQL uses fixed-sized pages, which makes it challenging to store huge data values directly. To solve this problem, large data values are compressed and broken into multiple smaller chunks. This process is done automatically and does not significantly impact how the database is used. This technique, called TOAST, improves how large data values are stored and used in the database.
Therefore, TOAST is a storage technique used in PostgreSQL to handle large data objects such as images, videos, and audio files. The TOAST technique allows for the efficient storage of large data objects by breaking them into smaller chunks and storing them separately from the main table. This can improve the performance of queries and indexing and reduce the amount of disk space required to store the data.
TOAST tables are created automatically by PostgreSQL when a table contains a column of type OID, bytea, or any other data type with the TOASTable storage class. The TOAST table is then used to store the large data objects, while the main table stores a reference to the TOAST table.
Here’s an example of using the TOAST technique in PostgreSQL:
CREATE TABLE images ( id SERIAL PRIMARY KEY, data BYTEA );
INSERT INTO images (data) VALUES (E'\\x...');
SELECT relname, relkind FROM pg_class WHERE relname LIKE 'pg_toast%';
In the example, the images table contains a column named data of type bytea, which can hold extensive binary data. When a large image is inserted into the table, PostgreSQL automatically creates a TOAST table to store the ima
[...]Database migrations from Oracle to PostgreSQL and SQL Server to PostgreSQL can get tricky when certain differences are not known. Unknown differences can lead to migration failures. We have earlier blogged about the behavior of DATE data type and Transaction Control statements between Oracle and SQL Server and PostgreSQL. In this article, we are going to discuss about the behavior of NULL and empty string in Oracle vs PostgreSQL vs SQL Server with Unique Constraint. This is the first article in our series about handling NULLS during migration from Oracle to PostgreSQL and SQL Server to PostgreSQL.
According to ANSI SQL-92 specifications, a null is neither an empty string (for a character or a datetime data type) nor a value of zero (for numeric data types). To ensure that all nulls are handled uniformly, the ANSI SQL-92 specification stipulates that a null must be the same for all data types. When attributes have no data to be stored for a given record, it is represented by the SQL value, NULL.
Oracle treats both NULL as well as an Empty String as a NULL. PostgreSQL treats a NULL as a NULL and an Empty String as an Empty String. Similar to PostgreSQL, SQL Server also treats a NULL as a NULL and an Empty String as an Empty String.
Oracle needs one byte to store a NULL. Whereas, both PostgreSQL and SQL Server won’t require any space to store a NULL.
In Oracle, a column on which there exists a unique constraint, can store any number of NULL entries and empty strings.
However, SQL Server only allows one NULL and one Empty String to be inserted to a column when there exists a Unique Constraint on it.
In PostgreSQL, multiple NULL entities similar to Oracle are allowed to be inserted. Similar to SQL Server, PostgreSQL only allows one Empty String when there exists a unique constraint on that column.
Subscribe to our Newsletters today, to get notified abo
[...]Register at the conference website, and take advantage of the Early Bird discount!
© Laurenz Albe 2022
Most people are aware of transaction ID wraparound. The concept has been well explained in Hans’ article, so I won’t repeat all that here. But for most people it is an abstract concept, a bogeyman lurking in the dark around the corner. Many people know horror stories about anti-wraparound autovacuum tanking performance or databases that stop working, but who has ever seen actual data loss? I decided to face the beast (having developed a fondness for breaking things), and I learned something along the way. If you are curious, come along on this trip! We’ll do some scary things that you shouldn’t do to your databases at home.
No, it isn’t. There are strong protections in PostgreSQL to prevent that. As you will see, I have to use evil tricks to overcome those protections. Still, these protections can disrupt operation, so it is good to be aware of the problem. But don’t panic: usually, you won’t even notice when your transaction IDs wrap around.
Since I will cause data corruption, I’ll create a new cluster that I can discard afterwards:
initdb testdb
Let’s start the cluster. I will use a free, non-standard port and allow a prepared transaction. That prepared transaction will later be used to stir trouble.
pg_ctl start -o '-c port=5555' \ -o '-c unix_socket_directories=/tmp' \ -o '-c max_prepared_transactions=1' \ -D testdb \ -l logfile
Now let’s connect with
psql -h /tmp -p 5555 -d postgres
Now we can create some data (that we will destroy later). I’ll return xmin
for each row I create, which stores the transaction ID of the creating transaction. Similarly, I’ll return xmax
for all deleting statements. For detailed information about xmax
, you can read my article on that topic.
It is very important that we don’t SELECT
from the table at this stage, since that would set hint bits, which would spoil the effect.
C[...]
I’m very pleased to be talking about real-life logical replication at Nordic PgDay 2023, in beautiful Stockholm.
If you are an application developer and you build on PostgreSQL, then maybe you have looked into where PostgreSQL comes from, who develops it, and where you can get professional help if needed.
Now, if you are a PostgreSQL developer (hi!), do you know what you are building on, where those things come from, who develops them, and where you get get professional help if needed?
Consider the dependency diagram of the internet:
PostgreSQL is perhaps one of the bigger boxes in the middle. But what are the shapes of the boxes below it?
Some are okay: Well-known projects with robust communities and ecosystems, such as Perl, Python, ICU, LLVM, systemd; those are big boxes. There is also OpenSSL, which in a well-publicized case was at point was one of those thin boxes, but that’s been rectified. There are lz4 and zstd, somewhat new arrivals in the dependencies of PostgreSQL, which appear to have active communities in their respective GitHub projects. There is also zlib, which is in maintaince mode but still putting out regular releases. This is all fine.
There is some stuff in the murky middle that I’m too lazy to look into. Who maintains libkrb5 and libldap and libpam these days? I guess these libraries are probably used widely enough that someone will care, and maybe at the end of the day with a support contract someone like Red Hat will be on the hook.
But there are also the rather thin sticks propping up PostgreSQL (and many other things):
The entire documentation toolchain (docbook-xml, docbook-xsl, libxml, libxslt, xsltproc) is in low to no maintenance mode. I discussed this in more detail in my presentation at FOSDEM 2021.
GNU Gettext is being maintained, but just barely. I would love some action there. Here is a feature request, for example. How can we help? I will talk about this in my presentation at FOSDEM 2023.
The TAP tests in PostgreSQL rely on the Perl module IPC::Run, which by the way is explicitly looking for a new
Ryan Lambert asks in this month's PGSQL Phriday:
Looking over the many databases we operate at work, I can say that we are mostly a "relational shop". Almost every database we have uses relational structures, and does not attempt to go in on "full NoSQL", or "store non-relational data". Many of the databases are on the larger size, often in the hundreds of GB or even several TBs. We have good people in our dev teams who understand both PostgreSQL and SQL, and can develop a proper schema for the data, and write performant queries.
There are a few exceptions though, but at least for us it's only a small number. Whenever we retrieve or get data which comes in from web requests, we may store this as JSON, and then we use JSONb. Quite often however it's rather "storage as JSON", not "querying the JSON in the database". It's just more convenient to keep the data in JSON format all the way, instead of transforming it several times.
Continue reading "Relational and Non-relational Data: PGSQL Phriday #005"
PGDay/MED is pleased to announce that the call for papers is open for our inaugural conference.
We are looking for talks on any PostgreSQL-related subject for the one-day conference. Got a big idea you want to share? A problem you have an innovative solution to? An advancement you want to put forward? We want to hear from you!
All talks should last about 40-45 minutes, with an additional 5-10 minutes for questions. Due to the international nature of our audience, we are requesting that talks be made in English for this conference.
Submit your talk for consideration here. We look forward to hearing from you!
PGDay/MED is a conference organized by PostgreSQL Europe. It follows the PostgreSQL Community Conference Recognition guidelines
Sometimes, you may need to manage large objects, i.e. CLOB, BLOB and BFILE, using PostgreSQL. There are two ways to deal with large objects in PostgreSQL: one is to use existing data type, i.e. bytea for binary large object, and text for character-based large object; another is to use pg_largeobject; This blog will explain how to use pg_largeobject.
pg_largeobject is one of the large object solutions provided by PostgreSQL, which allows stream-style access to user data that is stored in a special large-object structure. Streaming access is useful when working with data values that are too large to manipulate conveniently as a whole.
pg_largeobject is a system table used store the actual large objects. Each large object has an entry in system table pg_largeobject_metadata. The large objects managed using pg_largeobject can be created, modified, and deleted using a read/write API. pg_largeobject allows to store large object up to 4TB.
The other large object solution using existing data type bytea and text built on top of TOAST table, which limits the large object’s size to 1GB.
Here are the schema of these two system tables designed for pg_largeobject.
postgres=# \d+ pg_largeobject;
Table "pg_catalog.pg_largeobject"
Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
--------+---------+-----------+----------+---------+----------+-------------+--------------+-------------
loid | oid | | not null | | plain | | |
pageno | integer | | not null | | plain | | |
data | bytea | | not null | | extended | | |
Indexes:
"pg_largeobject_loid_pn_index" PRIMARY KEY, btree (loid, pageno)
Access method: heap
postgres=# \d+ pg_largeobject_metadata;
Table "pg_catalog.pg_largeobject_metadata"
Column |
[...]
Number of posts in the past two months
Number of posts in the past two months
Get in touch with the Planet PostgreSQL administrators at planet at postgresql.org.