PostgreSQL
The world's most advanced open source database
Top posters
Number of posts in the past month
Top teams
Number of posts in the past month
Feeds
Planet
  • Policy for being listed on Planet PostgreSQL.
  • Add your blog to Planet PostgreSQL.
  • List of all subscribed blogs.
  • Manage your registration.
Contact
  • Get in touch with the Planet PostgreSQL administrators at planet at postgresql.org.
This idea came to me today from swills: stupid question, would it be possible to make the urls on freshports case insensitive? so that like https://www.freshports.org/x11/libx11 would go to https://www.freshports.org/x11/libX11 After confirming this was for manually typed URLs (as opposed to generated links), I started looking into it. This is what I found. In this [...]

There you are writing some SQL, having a great time. Uh oh, you need to iterate over each item in a result set and apply a function. You think, "Now I am going to have to write a stored procedure." Well today's post will give you an alternative by using lateral joins in Postgres

 Lateral pass.... See what I did there?

What is a Lateral Join?

You are probably familiar with normal database joins, which are usually used to match up a column in one table with a column in another table to bring the data from both tables together. When the keyword LATERAL is added to your join, the output will now apply the right hand part of the join to every record in the left part of the join.

Posted by Pavel Stehule on 2020-08-11 at 11:47

I like terminal applications - usually TUI applications are fast and clean, and well readable.  This interface is too simply, and then developers has to much more think about UI, and has not too much possibilities. On second hand sometimes the output of terminal applications are too limited. We run our terminal from GUI (usually), and then some graphic possibilities can be practical - graphs are better in pixel graphics than ascii art.

There is solution - from terminal's dark age there is a SIXEL protocol. Unfortunately only few terminals supports this protocol. But there is great hope - Gnome terminal in develop version supports this protocol now. I can call gnuplot from psql and I can have well looked graphs in psql too.

For more comfort work I define a psql macro (variable):

\set gnuplot '\\g (format=csv) |gnuplot -p -e "set datafile separator \',\'; set key autotitle columnhead; set terminal sixelgd enhanced font \'verdana,9\';" -e'

and a statement

SELECT i, sin(i) FROM generate_series(0, 6.3, 0.05) g(i) :gnuplot "plot '-'with lines ls 3"

generates this image:




Pgpool-II probably is the most comprehensive clustering solution existing today for PostgreSQL. It provides a wide range of features like connection pooling, load balancing, automatic failover and high availability while using the Pgpool-II for load balancing and building a highly available PostgreSQL cluster is one of its most common use case.

Since Pgpool-II is a proxy server that sits between PostgreSQL and client applications so building a HA using Pgpool-II requires to ensure not only the redundancy of PostgreSQL servers ( primary and standby) but also multiple Pgpool-II servers are needed to make sure if one Pgpool-II fails another one should take over the responsibility to ensure that the database service remains unaffected.

To solve the SPOF of Pgpool-II service, Pgpool-II has a built-in watchdog module. Although the core function of this Pgpool-II module is to resolve the single point of failure of Pgpool-II node by coordinating multiple Pgpool-II nodes but it can also complement the Pgpool-II’s automatic failover and backend health checking.

This post is about the quorum aware and consensus based backend failover functionality of Pgpool-II watchdog.

Failover is an expensive operation

Failover is a mechanism in Pgpool-II to remove problematic PostgreSQL servers from the cluster and automatically promote the standby PostgreSQL to the new primary in case of primary database failure hence ensuring service continuation and avoiding any disruption.

Although failover is a basic functionality required to implement high availability, however, it must be avoided as much as possible. Reason being no matter how small the replication delay between primary and standby there is always a chance of losing some recent data in case of primary postgreSQL failover. Also we always lose the current sessions and currently running transactions at the time of failover. On top of that after the failover the cluster is left with one less database node and in most of the cases requires a manual interventi

[...]

Almost every organisation that I interact with wants a high availability system for PostgreSQL. This clearly depicts an active trend toward an increase in utilising PostgreSQL for critical business applications. In some cases it is a move away from other major database systems like Oracle or even Teradata.

It’s not possible to cover this topic in a single blog. So expect this to be a series of many blogs, perhaps five or six. Specifically in this blog we are going to define the basic criteria for a high availability system, and have a quick glance at the current open source solutions available in the market.

Before we jump on the high availability bandwagon, let’s establish the definitions that are essential in creating one.

For sake of clarity, we’ll use uptime and service availability interchangeably with both meaning that PostgreSQL service is up and running, connectable and can respond to queries as expected.

Why High Availability?

The answer to this question is very straightforward. “High Availability” system ensures service continuity. In an ideal world, we’d want a 100% service uptime. In a slightly less ideal and more realistic one, nines are aimed for; three, four or five 9s.

High availability is not just about service continuity, it is also about the system’s ability to scale and manage workload. The system must be able handle average and peak workloads.

Then it is also about recovery from a failure. A system must be designed so that it specifically meets Recovery Point Objective (RPO) and Recovery Time Objective (RTO) criteria. The smaller the value of these two objectives, the smaller the data loss and quicker the recovery from a failure.

Defining High Availability

There are key attributes that define a high availability system. It is important that we define these attributes before assessing these solutions.

  • There should not be any single point of failures in the system.
  • Continuous health monitoring of backend servers/systems.
  • Reliable failove
[...]
Posted by Bruce Momjian in EnterpriseDB on 2020-08-10 at 14:00

We have all used applications that allow searches which return results in sections or "pages", like for products or flights. For example, you might search for a product, and there might be 1,500 matches, but you don't see 1,500 results — you only see ten results (1-10), and you have the option of seeing the next ten (11-20), and the next ten (21-30), etc. Pagination is done for several reasons:

  1. The user probably doesn't want to see all the results at once
  2. The user probably doesn't want to ever see all of the results
  3. It might be inefficient to transfer and display all results

There are several ways to enable pagination using a client/server architecture like databases. One approach is to transfer all results to the client, and let the client do the pagination. That handles the first issue, but it doesn't handle cases where the transfer and storage of many results is inefficient. While this might work for 1,500 results, it is going to perform badly for 150 million results.

Therefore, some kind of pagination support on the server side is required if large result sets are possible. There are four common approaches:

  1. Cursors: Open a non-WITH HOLD cursor in a transaction block and fetch rows in pages. The downside of this is that the entire result set often has be computed before returning any result. Also, the transaction must be kept open while users are paging through the results, leading to potentially long-running transactions, and therefore decreased cleanup efficiency. Using idle_in_transaction_session_timeout prevents sessions from keeping transactions open too long.
  2. Using WITH HOLD cursors avoids keeping a transaction open during page fetches, but it does require the entire result set to be stored in server memory.
  3. LIMIT/OFFSET: These keywords allow SELECT to return partial results, ideally suited to pagination. LIMIT/OFFSET also allow the optimizer (slide 52) to efficiently access limited result sets by often using indexes which are only efficient for small r
[...]
Posted by Andreas 'ads' Scherbaum on 2020-08-10 at 14:00
PostgreSQL Person of the Week Interview with Michael Brewer: I live in Athens, Georgia (USA), the city where I was born (and have lived in or near most of my life). I’m a Web Developer Principal for the Franklin College of Arts and Sciences at The University of Georgia, my alma mater. I also play bass trombone in the Athens Symphony, am principal conductor of the Classic City Band (Georgia’s oldest continuously-operating community band), Director of Music at Emmanuel Episcopal Church, music director of the Athens Brass Choir, and also conduct a new local opera company, RespirOpera.

The table partitioning feature in PostgreSQL has come a long way after the declarative partitioning syntax added to PostgreSQL 10 by Amit Langote. The partitioning feature in PostgreSQL was first added by PG 8.1 by Simon Rigs, it has based on the concept of table inheritance and using constraint exclusion to exclude inherited tables (not needed) from a query scan. The exclusion constraint will basically do the pruning of tables based on the query filter. Before declarative partitioning https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=f0e44751d7175f was added to PG, it meant that the child partitions, constraints and triggers etc needed to be created manually which can be cumbersome and could lead to errors. Thanks to the great work from Amit Langote and others, the declarative partitioning feature on PG-10 meant that the user don’t need to create the partitions manually or create the constraints and triggers for routing the rows to the correct partition. All the artefacts required for setting up partitioning would be done by simply creating the partition table and specify the partitions using a standard syntax, this is was great step forward and the one that makes this feature very user friendly.

This blog is about number of enhancements for partitions added to PG-13. I am going to list down all the partitioning enhancements in the blog and and will demonstrate some of them with examples.

Overview

Quick introduction to partitioning and timeline of adding partitioning features to PG before we get into the enhancements done in PG-13. Partitioning is way of splitting or dividing a large table into smaller pieces,  the table can be divided using the List, Range or Hash partitioning techniques offered by PG. The parent partition table doesn’t store any data, the data is stored in the partitions defined when creating the partitioned table. While the partitions can be accessed directly, the queries are typically directed at the logical parent relation and the tuples are routed to the cor

[...]

jsonb supports, unsurprisingly, JSON

jsonb is, undeniably, king. It is a very flexible data type, that allows for unstructured/schema-less storage. It has very powerful indexing mechanisms, and its internal representation is reasonably compact and efficient. It comes with advanced operators and expressions to query/extract parts of it, and has recently seen the addition of SQL/JSON path functionality to extend that to comply with the SQL Standard.

There are countless posts and resources explaining the virtues of this data type and what you can accomplish with it. So nothing else to add, except to reiterate my appreciation for all those who have worked on creating and contributing to it1.

But there’s a catch. jsonb supports… just JSON! That’s great, but is it enough?

JSON data types

JSON is a container data type, which means it can store other data types contained within. And which data types it does support? From the JSON Spec, it supports the following:

  • Number, which is a quite flexible “numeric” data type.
  • String.
  • Boolean.
  • Null.
  • Array, or probably better called a “bag” or “container”, a sequence of elements of, possibly, mixed types.
  • Object, a collection of key-value pairs, where the value may be any other JSON data type.

jsonb maps these JSON data types, internally, to Postgres types. It is easy to see the resolved (JSON) data types:

select key, value, jsonb_typeof(value) from jsonb_each(
'{"a": 42, "b": true, "c": "hi", "d": null, "e": [42, "hi"], "f": {"a": 1}}'
);
┌─────┬────────────┬──────────────┐
 key    value     jsonb_typeof 
├─────┼────────────┼──────────────┤
 a    42          number       
 b    true        boolean      
 c    "hi"        string       
 d    null        null         
 e    [42, "hi"]  array        
 f    {"a": 1}    object       
└─────┴────────────┴──────────────┘

Scalar types are mapped to boolean, text and numeric. Essentially, three different data types. Now check all the data types a

[...]
Posted by John Porvaznik in Crunchy Data on 2020-08-06 at 17:01

Postgres has a number of different index types. You’ve got B-Tree, GIN, GiST, Sp-GiST, and BRIN. BRIN is a lightweight index that often goes misunderstood. When implemented correctly, it offers significant benefits such as space savings and speed. However, when implemented incorrectly, it loses some of its benefit, so it's important to look at some of the reasons a BRIN index might not be right for you.

Daniel Gustafsson has done some terrific work on using NSS as an alternative TLS library to OpenSSL for PostgreSQL. I’ve done some work making that build and run on Windows. Daniel recently asked how to get a working NSS on Windows to use for development, and this blog is about that process. First you need […]
Posted by Bruce Momjian in EnterpriseDB on 2020-08-05 at 20:30

Have you ever wondered how invalid or ambiguous times are handled by Postgres? For example, during a daylight saving time transition in the USA, time switches either from 2AM to 3AM, or from 2AM back to 1AM. On a fall-forward day (the former), how would 2:30am be represented? Is 1:30AM during a fall-back day represented as the first or second 1:30AM of that day? This email thread explains the problem, and this patch documents the behavior. (November 4, 2018 was a "fall back" date in the USA.)

The original poster, Michael Davidson, was showing the first query, and Tom Lane was saying you would need to use one of the two later queries to qualify the 1AM time:

SHOW TIME ZONE;
     TimeZone
------------------
 America/New_York
 
SELECT '2018-11-04 01:00:00'::timestamp WITH TIME ZONE;
      timestamptz
------------------------
 2018-11-04 01:00:00-05
 
SELECT '2018-11-04 01:00:00 EST'::timestamp WITH TIME ZONE;
      timestamptz
------------------------
 2018-11-04 01:00:00-05
 
SELECT '2018-11-04 01:00:00 EDT'::timestamp WITH TIME ZONE;
      timestamptz
------------------------
 2018-11-04 01:00:00-04

Continue Reading »

Posted by Pavel Stehule on 2020-08-05 at 09:39
The extension plpgsql_check can be used like linter, validator for PLpgSQL language. It can be used like profiler with possibility to calculate some basic coverage metrics. And now it can be used for code tracing. In this mode, plpgsql_check raises notice when function or statent is starting or is fininshing: Example of output in default verbosity level:

postgres=# do $$ begin perform fx(10,null, 'now', e'stěhule'); end; $$;
NOTICE: #0 ->> start of block inline_code_block (oid=0)
NOTICE: #2 ->> start of function fx(integer,integer,date,text) (oid=16405)
NOTICE: #2 call by inline_code_block line 1 at PERFORM
NOTICE: #2 "a" => '10', "b" => null, "c" => '2020-08-05', "d" => 'stěhule'
NOTICE: #4 ->> start of function fx(integer) (oid=16404)
NOTICE: #4 call by fx(integer,integer,date,text) line 1 at PERFORM
NOTICE: #4 "a" => '10'
NOTICE: #4 <<- end of function fx (elapsed time=0.336 ms)
NOTICE: #2 <<- end of function fx (elapsed time=0.631 ms)
NOTICE: #0 <<- end of block (elapsed time=0.978 ms)
DO
Example of output in verbose verbosity level:

postgres=# set plpgsql_check.tracer_verbosity TO verbose;
SET
postgres=# do $$ begin perform fx(10,null, 'now', e'stěhule'); end; $$;
NOTICE: #0 ->> start of block inline_code_block (oid=0)
NOTICE: #0.1 1 --> start of PERFORM (expr='fx(10,null, 'now', e'stěhule' ..')
NOTICE: #2 ->> start of function fx(integer,integer,date,text) (oid=16405)
NOTICE: #2 call by inline_code_block line 1 at PERFORM
NOTICE: #2 "a" =>; '10', "b" => null, "c" => '2020-08-05', "d" => 'stěhule'
NOTICE: #2.1 1 --> start of PERFORM (expr='fx(a)')
NOTICE: #2.1 "a" => '10'
NOTICE: #4 ->> start of function fx(integer) (oid=16404)
NOTICE: #4 call by fx(integer,integer,date,text) line 1 at PERFORM
NOTICE: #4 "a" => '10'
NOTICE: #4.3 6 --> start of IF (cond='a > 10')
NOTICE: #4.3 "a" =>
[...]

An introductory story (with some slight “drama” added for fun) from those good old days of on-site consulting 🙂 So…I’m at a client where the database is not behaving nicely among other things…what a crappy DB product indeed I hear, it gets cranky every morning although there should be a constant workload and is saturating the CPU and making some important queries time out! Hmm very interesting I think, Postgres usually doesn’t tend to care too much about mornings or time in general…Okay, well let’s sit down and look at the metrics I say…oh, surprise-surprise – we discover that there’s no monitoring framework in place! Sounds very bad and rare, right? Well, quite common actually as many people rely only on the server log files…which I wouldn’t recommend doing personally. But not too tragic in this case actually – as luckily their DBA at least had read from somewhere that it’s good to install the pg_stat_statements extension when taking an instance into use – which is of course absolutely true and a highly recommended thing to do!

So I say – well, let’s just quickly install some monitoring tool and have a coffee and talk about other issues while the metrics, that will help us to get to the bottom of this, are being gathered. And then I hear that distinct audible “sigh” and they look at me with despair in their eyes – you know, we just can’t install some software like that…firstly there’s no Internet…and 2nd, it needs to go through managerial and security acceptance processes X,Y and Z and it would take a week at best. OK, doesn’t make life easier for sure…the poor DBA is also already feeling bad that their organisation is not exactly the candidate for the “Agile company of the year” award…But, then I ask…you know, we do have direct console access to the database still, right? Yes, we do! And then I can finally use my wise Yoda quote: Well great, this is pretty much the same as a monitoring tool – it’s all about the data anyways! The DBA starts to look at me in a very suspicious and puzzled way…what the hell a

[...]
Posted by Paul Ramsey in Crunchy Data on 2020-08-04 at 13:07

PostGIS is a "geospatial database" and the "geo" in "geospatial" is an important qualifier: it means that all the coordinates in PostGIS point, lines, and polygons can be located somewhere on the earth.

Posted by Bruce Momjian in EnterpriseDB on 2020-08-03 at 14:45

What is the difference between two dates? You would think there was one answer, but there isn't. You can give an answer in calendar terms (years/months/days), the number of days, or the number of seconds. Postgres offers all of these options:

SELECT age('2019-12-25', '2018-06-01');
          age
-----------------------
 1 year 6 mons 24 days
 
SELECT '2019-12-25'::timestamp - '2018-06-01'::timestamp;
 ?column?
----------
 572 days
 
SELECT '2019-12-25'::timestamptz - '2018-06-01'::timestamptz;
     ?column?
-------------------
 572 days 01:00:00
 
SELECT '2019-12-25'::date - '2018-06-01'::date;
 ?column?
----------
      572
 
SELECT EXTRACT(EPOCH FROM '2019-12-25'::timestamptz) - EXTRACT(EPOCH FROM '2018-06-01'::timestamptz);
 ?column?
----------
 49424400

Continue Reading »

Posted by Andreas 'ads' Scherbaum on 2020-08-03 at 14:00
PostgreSQL Person of the Week Interview with Tatsuo Ishii: I live in Kanagawa prefecture of Japan, which is adjacent to Tokyo, with my wife. I love to walk beaches, listen to music (classical, rock and Jazz) from my favorite audio set (mainly consisting of TANNOY speakers and Luxman amplifiers), and read Sci-Fi books. I am working for SRA OSS, Inc. Japan as a branch manager and an engineer since 2005.
Posted by Bruce Momjian in EnterpriseDB on 2020-07-31 at 16:30

The INTERVAL data type stores time duration as months, days, and seconds. Years are represented as a fixed number of months, and hours and minutes as a fixed number of seconds. Using INTERVAL values makes time computation very simple:

-- Daylight saving time started in America/New_York on March 8, 2020 02:00:00
SHOW timezone;
     TimeZone
------------------
 America/New_York
 
SELECT '2020-03-07 00:00:00'::timestamptz + '2 days';
        ?column?
------------------------
 2020-03-09 00:00:00-04
 
SELECT '2020-03-07 00:00:00'::timestamptz + '48 hours';
        ?column?
------------------------
 2020-03-09 01:00:00-04

Continue Reading »

Posted by Bo Peng in SRA OSS, Inc. on 2020-07-31 at 06:32

Pgpool-II is a cluster management tool for PostgreSQL that can cache connections to PostgreSQL servers. This blog introduces Pgpool-II connection pooling feature and shows how to configure connection pooling in Pgpool-II.

What is connection pooling?

Establishing and maintaining Database connections are expensive. The reason is that you have to establish a network connection, perform authentication and so on. Database connection pooling is a cache of database connections to keep database connections open so that the connection can be reused when a connection is required by a future request.

Reusing an active connection rather than establishing a new connection each time a connection is requested can improve performance and save system resources.

How does connection pooling work in Pgpool-II?

Pgpool-II caches established connections to the PostgreSQL servers and reuses them whenever a new request with the same properties (i.e. user name, database, protocol version, and other connection parameters if any) comes in.

First, let me explain how connection pooling works in Pgpool-II.
  1. At startup, Pgpool-II parent process preforks num_init_children child processes, and each child process can cache connections up to the configured value of max_pool.
  2. Pgpool-II waits for connection requests from clients.
  3. One Pgpool-II child process receives connection request from a client.
  4. This Pgpool-II child process looks for existing connections which has requested database/user pair in the pool up to max_pool.
  5. If found, reuse it.
  6. If not found, the child process opens new connections to PostgreSQL servers and registers it to the pool. If the pool has no empty slot, Pgpool-II closes the oldest connection to PostgreSQL and reuses the slot.
  7. Do some queries processing until the client sends session close request.
[...]
One of the interesting new features in PostgreSQL for some time now is the ability to control removal of WAL files using replication slots. The dark side is that replication slots can cause disks to fill up with old WAL, killing the main production server. In this article I explain PostgreSQL replication slots, and how a new feature in PostgreSQL 13 helps prevent this problem.
Posted by Amit Khandekar on 2020-07-30 at 10:01
PostgreSQL 13 has introduced a simple but extremely useful capability to log a stack trace into the server logs when an error is reported. Let's see the details.

There is a GUC to enable stacktrace generation : backtrace_functions. Set it to a comma-separated function names.

SET backtrace_functions TO 'func1,func2';

If the error is thrown from one of these functions, a backtrace will be generated and logged into the server log.

Note that only superusers can set the backtrace_functions GUC. It can be set locally in a session, or can be included in postgresql.conf file to globally set it.

It's easy to see how it would help in a situation where a customer reports an error message. We can find from where it came from by grep'ing for it in the source code. But beyond that, it was all guess work. Not anymore. Now, you can ask the customer to set backtrace_functions to all such functions which are emitting this error message, and get the stack trace. In most cases, the root cause of the error is not in the function which emits the error; its located somewhere in the middle of the stack; hence the stack trace is critical.

This capability is already available in many other databases like MySQL, Greenplum, Oracle.

What's still missing in PostgreSQL - and is present in most of these other databases - is being able to generate stack trace when a server backend crashes with a segmentation fault or other such unexpected signals, or when the server PANICs due to some reason. This capability would make a much bigger difference. We will get rid of having to explain steps to generate core file. More importantly, this helps in situations where the crash happens only randomly. Even with a single unexpected crash, the customer would always be ready with a backtrace. I am hopeful this would be implemented in the next major release of PostgreSQL.

Let's see how a PostgreSQL stack trace log looks like. We will try to use a non-existent type to create a table. Supposing we know that the "type does not exist" error comes from typen[...]

A lot has been written about effective_cache_size in postgresql.conf and about PostgreSQL performance in general. However, few people know what this famous parameter really does. Let me share some more insights.

What the PostgreSQL optimizer does

The idea behind SQL is actually quite simple: The end user sends a query and the optimizer is supposed to find the best strategy to execute this query. The output of the optimizer is what people call an “execution plan”. The question now is: What makes one execution plan better than some other plan? What is it that makes a strategy greater than some other strategy? In PostgreSQL everything boils down to the concept of “costs”. The planner will assign costs to every operation. At the end of the day the cheapest plan is selected and executed.

The magic is therefore in the way the optimizer handles costs and this is exactly what effective_cache_size is all about.

Understanding effective_cache_size

To achieve good performance it is important to figure out whether to use an index or not. A question often asked is: Why not always use an index? Traversing and index might not be cheap at and using an index does not mean that there is no need to touch the table as well. The optimizer therefore has to decide whether to go for an index or not.

The way costs are estimated depend on various factors: Amount of I/O needed, number of operators called, number of tuples processed, selectivity, and a lot more. However, what is the cost of I/O? Obviously it makes a difference if data is already in cache or if data has to be read from disk. That brings us to the idea behind effective_cache_size which tells the optimizer how much cache to expect in the system. The important part is that “cache” is not only the amount of memory knows about (this part is pretty clear). The system also has to consider the size of the filesystem cache, CPU caches, and so on. effective_cache_size is the sum of all those caching components. What you will learn in this post is how the opti

[...]
Posted by Bruce Momjian in EnterpriseDB on 2020-07-29 at 13:15

In my years with Postgres, I have seen some amazingly complex queries posted to the email lists. I have never understood how people can read complex queries with no formatting, e.g., no keyword capitalization, no indenting, no line breaks for new clauses:

select n.nspname as "Schema", p.proname as "Name",
pg_catalog.format_type(p.prorettype, null) as "Result data type", case
when p.pronargs = 0 then cast('*' as pg_catalog.text) else
pg_catalog.pg_get_function_arguments(p.oid) end as "Argument data
types", pg_catalog.obj_description(p.oid, 'pg_proc') as "Description"
from pg_catalog.pg_proc p left join pg_catalog.pg_namespace n on n.oid =
p.pronamespace where p.prokind = 'a' and n.nspname <> 'pg_catalog' and
n.nspname <> 'information_schema' and
pg_catalog.pg_function_is_visible(p.oid) order by 1, 2, 4; 

Continue Reading »

Backups are a key staple of running any database. Way back in the day, a good friend and colleague wrote one of the most used Postgres backup tools called wal-e. Wal-e was initially written in just a few days, and rolled out to the fleet of databases we managed in the early days at Heroku. We got pretty lucky with rolling that out, because shortly after we had there was the great AWS Apocalypse of 2011. This was a full day outage of AWS with lingering effects for nearly a week... Reddit was down, Netflix was down, so you couldn't even kill time waiting for things to come back up. At the time, AWS came back to us saying they couldn't recover a number of disks. Had it not been for wal-e and our disaster recovery setup customers would have lost data. Luckily no bytes of data were lost, and customers were back up and running much faster than had they been on RDS. 

Migrating to PostgreSQL Version 13

Migrating to PostgreSQL Version 13The PostgreSQL 13 Beta is out in the testing phase with a rich feature set. It is a very good learning effort to participate in the testing of one of the finest databases in the world. It does not matter how much development, coding, and administration experience you have for the testing of the PostgreSQL version; you can participate in reviewing the documentation, validation of features, and on some small tasks. The complete testing guide is also available on the wiki page.

Soon, the PostgreSQL 13 GA will be available, and the people who require the new features of PostgreSQL will want to migrate to that version. This is a major release, so it requires some effort to upgrade.

Dump/Restore (pg_dump)

One of the safest and oldest methods to upgrade is a dump and restore. When we are saying safest, it means the database breakup is almost none after the restore to the new version, but it has its own limitations as it requires a lot of time and extra space to take the backup. PostgreSQL has the tools pg_dump and pg_restore, and the complete documentation can be found at the PostgreSQL official documentation site.

pg_upgrade

PostgreSQL has an in-place upgrade tool. It has the capability to upgrade the PostgreSQL major version without taking extra space and requires a lot less time to upgrade as compared to dump/restore.

The following is the list of observed incompatibilities:

1 – SIMILAR TO … ESCAPE NULL and substring(text FROM pattern ESCAPE text) return NULL.

In case id ESCAPE NULL, the application will get NULL instead of any value. Previously returned true, if ESCAPE NULL is specified. But in PostgreSQL 13, it returns the NULL which is correct behavior, but you need to modify your application if expecting true in that case.

PostgreSQL Version < 13

# SELECT 'abc' SIMILAR TO 'ab_' ESCAPE NULL AS text;
 text 
----------
 t
(1 row)

PostgreSQL Version  13

# SELECT 'abc' SIMILAR TO 'ab_' ESCAPE NULL AS text;  
text 
------


(1 row)

2 . Have jsonb_to_tsvector() properly che

[...]
Posted by Jeff Davis in CitusData on 2020-07-28 at 15:01

Making security easy to use is crucial because hard-to-use security is likely to be neglected entirely. SCRAM with channel binding is a variation of password authentication that is almost as easy to use, but much more secure.

In basic password authentication, the connecting client simply sends the server the password. Then the server checks that it’s the right one, and allows the client to connect. Basic password authentication has several weaknesses which are addressed with SCRAM and channel binding.

In this article, you’ll learn how to set up authentication using SCRAM with channel binding in Postgres. I implemented the client connection parameter channel_binding in PostgreSQL 13, due to be released in late 2020 (PostgreSQL 13 is in beta now). SCRAM and Channel Binding have already been supported in several releases, but this new connection parameter is necessary to realize the security benefits of SCRAM and Channel Binding.

First, before diving in to the tutorial, some background on SCRAM and Channel Binding.

Disclaimer: This article is just a how-to. As with any security decision, you should perform your own analysis to determine if it’s right for your environment. No security feature is right in all cases.

The SCRAM authentication method in Postgres

SCRAM is a secure password authentication protocol that can authenticate the client. It has several advantages over basic password authentication:

  • does not reveal the user’s cleartext password to the server
  • is designed to prevent replay attacks
  • enables the use of Channel Binding
  • can support multiple cryptographic hash functions
    • currently, PostgreSQL only supports SCRAM using SHA-256

For these reasons, in PostgreSQL, the scram-sha-256 password auth method is strongly recommended over md5 or password.

The first part of this tutorial can be used to set up SCRAM even if you don’t use channel binding.

Channel Binding with SCRAM

In many cases, it’s just as important for the client t

[...]
Posted by Bruce Momjian in EnterpriseDB on 2020-07-27 at 14:30

There seem to be as many methods of writing SQL queries as ways of writing essays. While spacing, capitalization, and naming are all personal preferences, there are also logical arguments for why certain styles are better than others. This web page outlines one set of styles, based on Joe Celko's SQL Programming Style. While I don't agree with all the style decisions, I feel it is a good way to think about your own style decisions and increase style consistency.

Posted by Andreas 'ads' Scherbaum on 2020-07-27 at 14:00
PostgreSQL Person of the Week Interview with Gilberto Castillo: My name is Gilberto, I live in Havana, Cuba. I’m married, with three sons. I love sports and I exercise together with my sons. My favorite sports are handball, baseball and basketball. I also enjoy dancing to all kind of music. Havana is the Capital of all Cubans, my recommendation to everyone is: do not miss the opportunity to visit the history places Havana has to offer!
Quite a lot of people have been bugging me about it, and finally got some time, and worked on it. Long story short – it works. You can add your query below plan (in separate text area), and then it will be displayed as one of tabs For example, check this plan. It also works … Continue reading "You can now add query to your plan on explain.depesz.com"
On 20th of July 2020, Fujii Masao committed patch: Rename wal_keep_segments to wal_keep_size.   max_slot_wal_keep_size that was added in v13 and wal_keep_segments are the GUC parameters to specify how much WAL files to retain for the standby servers. While max_slot_wal_keep_size accepts the number of bytes of WAL files, wal_keep_segments accepts the number of WAL files. … Continue reading "Waiting for PostgreSQL 14 – Rename wal_keep_segments to wal_keep_size."