PostgreSQL
The world's most advanced open source database
Top posters
Number of posts in the past two months
Top teams
Number of posts in the past two months
Feeds
Planet
  • Policy for being listed on Planet PostgreSQL.
  • Add your blog to Planet PostgreSQL.
  • List of all subscribed blogs.
  • Manage your registration.
Contact
  • Get in touch with the Planet PostgreSQL administrators at planet at postgresql.org.

The PostgreSQL community is anxiously waiting for an exciting online PostgreSQL conference taking place in November 2020. It would be a unique opportunity as PostgreConf.CN and PGCONF.Asia will be merged in order to provide wide range of topics covering various aspect of PostgreSQL ranging from Administration, Performance tuning, Hacking PostgreSQL, Security, Scalability, Foreign data wrappers and much much more. 

PostgresConf.CN is an annual event held by the China PostgreSQL Association for PostgreSQL users and developers, PostgreSQL is a leading open-source relational database with an active and vibrant community. PostgreConf.CN 2019 took place in Beijing, it was very well attended by PostgreSQL users and community members across the globe.  It is also one of the conference series held by PostgresConf Organization.

Website: https://2019.postgresconf.cn/en

PGCONF.Asia is a shorthand of the Asian PostgreSQL Conference which has been hosted in Tokyo from 2016 until 2018. The conference was held in Bali for the first time in Sep 2019 and it is intended as the Asian level of PostgreSQL International Conference, being the HUB and the Summit of the PostgreSQL Community throughout Asian Countries.

Website: https://2019.pgconf.asia/

PostgresConf.CN and PGCONF.Asia 2020 will be combined to offer the best and in-depth technical discussion on PostgreSQL world in 2020. Please note that PostgreConf.CN and PGCONF.Asia will be co-hosted only in this year, since pgconf.asia will go other Asian countries, while postgresconf.cn will only happen in China,

Official Websites: https://2020.postgresconf.cn/en
https://2020.pgconf.asia

Call for Paper

We are expecting a big turnout for this conference as the PostgresConf.CN 2019 and PGCONG.Asia 2019 was very well attended. Since both conferences are combined this year, it will attract many PostgreSQL enthusiasts from around the globe and specially belonging to Asian region. 

We have received an overwhelming response from PostgreSQL speakers from

[...]

1. Introduction

This blog is to follow up on the post I published back in July, 2020 about achieving an in-memory table storage using PostgreSQL’s pluggable storage API. In the past few months, my team and I have made some progress and did a few POC patches to prove some of the unknowns and hypothesis and today I would like to share our progress.

2. The PostgreSQL Buffer Manager

In the previous post, I mentioned that we would like to build a new in-memory based storage that is based on the existing buffer manager and its related component and hooked it up with the pluggable storage API. To achieve this, my team and I underwent an in-depth study to understand how the current buffer manager works in PostgreSQL and this chapter at interdb.jp is a good starting point for us to gain a general understanding of the buffer manager design in good details.

The current PostgreSQL buffer manager follows a 3-layer buffer design to manage the data pages as illustrated by this image below:

where it consists of

  • Buffer Table (hash table)
  • Buffer Descriptors (Array)
  • Buffer Pool (Array)

2.1 Page Table

Buffer Table is used like a routing table between PostgreSQL Core and the buffer manager. It is managed using the existing hash table utilities and uses buffer_tag to look up the page descriptor and buffer id. Buffer_tag is a structure that contains the table space, database, table name.

2.2 Buffer Descriptor

Buffer Descriptor is used to store the status of a buffer block and also the content lock. Refcount is a part of the buffer state, will be used to indicate the insert and delete operation. it will be increased by one when there is an insertion, and decreased by one when there is a deletion. The Vacuum process will reclaim this page once refcount reaches to 0.

2.3 Buffer Pool

Buffer Pool has a one to one relationship with buffer descriptor. it can be treated a simple pointer pointing to the beginning of the buffer pool, each buffer pool slot is defined as 8KB for

[...]
Posted by Bruce Momjian in EDB on 2020-09-25 at 15:00

Cloud vendors are barriers like department stores and supermarkets are barriers. Huh? People associate these entities with providing a huge variety of goods and services, all in one place. How can that be a barrier?

Well, you are looking at it from the consumer perspective. For the producer, they are a mixed benefit. These "super sellers" allow access to much larger markets for most single-product producers, but they can have negatives for producers:

  • They become the place consumers associate with your product
  • They have the relationship with the consumer
  • You can easily be replaced if a better product appears
  • They take a markup

As the producer of a physical product, it is up to you to decide if working with department stores and supermarkets is a positive or negative. However, with open source software, there is no calculus. Unless your open source license prohibits modified or unmodified hosting of your software on cloud servers, you have no control over whether a cloud vendor is the way consumers interact with your open source software. The cloud vendor is the one who downloads the software, configures it, perhaps supports it, and guarantees uptime. The cloud vendor can leverage software revenue opportunities. To avoid cloud usage, some software producers have chosen or created licenses that restrict such usage:

Continue Reading »

pgRouting is a powerful routing tool, usually used for pathfinding/mapping/direction applications. (See Paul Ramsey's introduction to pgRouting here). It is, however, also a robust graph db implementation, and can be used for much more than just finding the directions to your great aunt Tildy’s. 
Yacht Rock (as if you didn’t know) is a music genre created well after its active era. It’s characterized by smooth dulcet sounds that bring to mind wavy blond-haired waspy men in boat shoes, and ultimately provides a sound that rocks, but won’t rock the boat.  
Posted by Bruce Momjian in EDB on 2020-09-23 at 13:30

Most companies have marketing and sales people as the visible part of their company. Technical people, even in technology companies, are often kept in the back, and only brought out for brief periods when needed. Open source is different — there are no marketing or sales teams, so software developers are the faces of projects. This gives technical people an opportunity to attain world-wide recognition for their efforts. There are not many places technical people can truly shine, but open source is one such opportunity.

PL/pgSQL is the preferred way to write stored procedures in PostgreSQL. Of course there are more languages to write code available but most people still use PL/pgSQL to get the job done. However, debugging PL/pgSQL code can be a bit tricky. Tools are around but it is still not a fun experience.

One thing to make debugging easier is GET STACKED DIAGNOSTICS which us unfortunately not widely known. This post will show what it does and how you can make use of it.

Debugging PostgreSQL stored procedures

To show you how GET STACKED DIAGNOSTICS worked I have written some broken code which executes a division by zero which is forbidden in any sane database:

CREATE OR REPLACE FUNCTION broken_function()
        RETURNS void AS $$
BEGIN
        SELECT 1 / 0;
END;
$$ LANGUAGE 'plpgsql';
CREATE OR REPLACE FUNCTION simple_function()
        RETURNS numeric AS $$
DECLARE
BEGIN
        RAISE NOTICE 'crazy function called ...';
        PERFORM broken_function();
        RETURN 0;
END;
$$ LANGUAGE 'plpgsql';

The question now is: How can we get a backtrace and debug the code? One way is to wrap the code into one more function call and see where things fail:

CREATE OR REPLACE FUNCTION get_stack() 
        RETURNS void AS $$
DECLARE
        v_sqlstate text;
        v_message text;
        v_context text;
BEGIN
        PERFORM simple_function();

EXCEPTION WHEN OTHERS THEN 
        GET STACKED DIAGNOSTICS
                v_sqlstate = returned_sqlstate,
                v_message = message_text,
                v_context = pg_exception_context;
        RAISE NOTICE 'sqlstate: %', v_sqlstate;
        RAISE NOTICE 'message: %', v_message;
        RAISE NOTICE 'context: %', v_context;
END;
$$ LANGUAGE 'plpgsql';

My function catches the error causes by simple_function() and calls GET STACKED DIAGNOSTICS to display all the information we can possibly extract from the system. The output looks as follows:

test=# SELECT get_stack();
NOTICE: crazy function called ...
NOTICE: sqlstate: 22012
NOTICE: message: division
[...]
Posted by Pavel Stehule on 2020-09-23 at 04:38

 Compiled dll of Orafce extension 3.13 can be downloaded from url https://github.com/orafce/orafce/files/5264898/orafce_win_binary.zip

I have no idea how I missed that, but: if it wasn't for Alvaro's blog post I wouldn't know that: on 7th of April 2020, Alvaro Herrera committed patch: Support FETCH FIRST WITH TIES   WITH TIES is an option to the FETCH FIRST N ROWS clause (the SQL standard's spelling of LIMIT), where you … Continue reading "Waiting for PostgreSQL 13 – Support FETCH FIRST WITH TIES"
Posted by Kat Batuigas in Crunchy Data on 2020-09-22 at 17:04

I've been digging a little bit into PL/Python recently-inspired by our recent launch of Crunchy Bridge: a multi-cloud Postgres service with PL/Python and PL/R built right in. One thing that has tripped me up a few times is how PostgreSQL data types work with Python - especially when using external packages like NumPy or SciPy

Data type mapping is something you'll deal with when working with different languages or trying to get separate systems to talk to one another. Mapping types isn't just related to PL/Python, in fact one colleague once wrote a program to interoperate from Ruby to Python in C to make some of such things easier. In the available procedural languages in Postgres, there's going to be at least some documentation on what data types in that language correspond to which types in Postgres (and vice versa). For PL/Python, there are a few differences to note in the mappings for Python 2 and Python 3; I'd recommend studying the docs as they're pretty comprehensive.

Posted by Kat Batuigas in Crunchy Data on 2020-09-22 at 17:04

I've been digging a little bit into PL/Python recently-inspired by our recent launch of Crunchy Bridge: a multi-cloud Postgres service with PL/Python and PL/R built right in. One thing that has tripped me up a few times is how PostgreSQL data types work with Python - especially when using external packages like NumPy or SciPy

Data type mapping is something you'll deal with when working with different languages or trying to get separate systems to talk to one another. Mapping types isn't just related to PL/Python, in fact one colleague once wrote a program to interoperate from Ruby to Python in C to make some of such things easier. In the available procedural languages in Postgres, there's going to be at least some documentation on what data types in that language correspond to which types in Postgres (and vice versa). For PL/Python, there are a few differences to note in the mappings for Python 2 and Python 3; I'd recommend studying the docs as they're pretty comprehensive.

security definer postgresql

security definer postgresqlI have previously written a blog post on the detailed steps involved in enabling PostgreSQL monitoring using PMM. In that post, you could see me talking about the role: pg_monitor that can be granted to monitoring users. The

pg_monitor
role restricts a monitoring user from accessing user data but only grants access to statistic views needed for monitoring. The following are the simple steps to create a monitoring user for using Percona Monitoring and Management (PMM).
CREATE USER pmm_user WITH ENCRYPTED PASSWORD 'secret'; 
GRANT pg_monitor to pmm_user;

However, this feature is only available from PostgreSQL 10. So, let’s discuss how to deal with the previous releases such as PostgreSQL 9.6 or earlier.

One of the easiest methods is to grant

SUPERUSER
role to the monitoring user. But, granting the SUPERUSER access may not work in all the environments as it has the privileges to access, modify, and alter the database objects. For that reason, we could use
SECURITY DEFINER
to safely grant access to selected statistics views. Let us first understand the difference between a
security invoker
and a
security definer
in PostgreSQL functions.

Security Invoker vs Security Definer in PostgreSQL

Security Invoker

When you execute a function in PostgreSQL, it is executed using the privileges of the user calling it. So, if the calling user does not have access to select a specific table, then, the SQL statements on that table may fail, so the execution of the function fails.

Security Definer

When you execute a function in PostgreSQL using SECURITY DEFINER, it is executed by the privileges of the user who created it. Even if the calling user does not have access to the database objects being queried in the function, the function execution succeeds when the user who created the function has the required privileges on those database objects.

Statistic views accessed by PMM that need access using a security definer:

To enable PostgreSQL monitoring using PMM, you should be granting access to some o

[...]
Posted by Bruce Momjian in EDB on 2020-09-21 at 16:00

Postgres turns 34 this year. Michael Stonebraker, during his 2015 Turing Award speech (which I blogged about previously), included the names of the 39 Berkeley students (plus co-leader Larry Rowe) who helped write the original version of Postgres. I was hoping to add this list to the Postgres release notes, but we recently stopped including back branch release notes in current releases, so now there is no good place to put them. As a thanks to them, and with the help of the community, I am listing them below:

  • Jeff Alton
  • Paul Aoki
  • James Bell
  • Jennifer Caetta
  • Philip Chang
  • Jolly Chen
  • Ron Choi
  • Matt Dillon
  • Zelaine Fong
  • Adam Glass
  • Jeffrey Goh
  • Steven Grady
  • Serge Granik
  • Marti Hearst
  • Joey Hellerstein
  • Michael Hirohama
  • Chin-Heng Hong
  • Wei Hong
  • Anant Jhingran
  • Greg Kemnitz
  • Marcel Kornacker
  • Case Larsen
  • Boris Livshitz
  • Jeff Meredith
  • Ginger Ogle
  • Mike Olson
  • Nels Olson
  • Lay-Peng Ong
  • Carol Paxson
  • Avi Pfeffer
  • Spyros Potamianos
  • Sunita Surawagi
  • David Muir Sharnoff
  • Mark Sullivan
  • Cimarron Taylor
  • Marc Teitelbaum
  • Yongdong Wang
  • Kristen Wright
  • Andrew Yu

I knew some of the names and initials from the C comments. Just like those of us who have worked on Postgres for decades, I doubt they suspected that their code would be used so many years later.

Posted by Andreas 'ads' Scherbaum on 2020-09-21 at 14:00
PostgreSQL Person of the Week Interview with Dan Langille: I am Canadian. Born in Nova Scotia and attended high school and university in Ottawa. I moved to New Zealand shortly after graduation. After 16 years in Wellington, NZ, where I got into Open Souce via FreeBSD, I moved back to Ottawa. I lived there for 6 years, then moved to USA. I now live in a small town outside Philadelphia.

When working on the internals of Citus, an open source extension to Postgres that transforms Postgres into a distributed database, we often get to talk with customers that have interesting challenges you won’t find everywhere. Just a few months back, I encountered an analytics workload that was a really good fit for Citus.

But we had one problem: the percentile calculations on their data (over 300 TB of data) could not meet their SLA of 30 seconds.

To make things worse, the query performance was not even close to the target: the percentile calculations were taking about 6 minutes instead of the required 30 second SLA.

Figuring out how to meet the 30 second Postgres query SLA was a challenge because we didn’t have access to our customer’s data—and also because my customer didn’t have the cycles to compare the performance for different approaches I was considering. So we had to find ways to estimate which types of percentile calculations would meet their SLA, without having to spend the engineering cycles to implement different approaches.

This post explores how—with the help of the Postgres open source community—I was able to reduce the time to calculate percentiles by 45x by using the t-digest extension to Postgres.

Importance of calculating percentiles in analytics workloads

My customer operates a multi datacenter web application with a real-time analytics dashboard that displays statistics about a variety of signals—and they store the analytics data in Hyperscale (Citus) on our Azure Database for PostgreSQL managed service. They ingest over 2 TB of data per hour and needed to get < 30 second performance for their queries over a 7-day period This analytics dashboard is used by their engineers to debug and root cause customer-reported issues. So they query metrics like latency, status codes, and error codes based on dimensions such as region, browser, data center, and the like.

Latency is of course an important metric for understanding these types of issues. However, average late

[...]
Posted by Kat Batuigas in Crunchy Data on 2020-09-18 at 16:30

PL/Python can be a convenient and powerful way to take advantage of your PostgreSQL database. In an earlier post, I talked a little bit about how you can use modules included in the Python standard library with PL/Python functions. In this post, we'll try running NumPy in a simple user-defined function which also takes advantage of PL/Python database access functions. The function will show a working example of how to easily convert a data table in Postgres to a NumPy array.

I recently wrote a presentation, Postgres and the Artificial Intelligence Landscape, which covers the basics of artificial intelligence and shows how Postgres can be used for this purpose. This week I presented it at the Chicago PostgreSQL Meetup Group so I am now publishing the slides.

I released new almost bug-fix release 3.1.4 of pspg. There is new light theme Flatwhite based on Dmitry Biletskyy flatwhite theme.



I recently gave a talk about the Citus extension to Postgres at the Warsaw PostgreSQL Users Group. Unfortunately, I did not get to go in person to beautiful Warsaw, but it was still a nice way to interact with the global Postgres community and talk about what Citus is, how it works, and what it can do for you.

If you are already familiar with Postgres then this talk should be a good introduction to all the powerful capabilities that Citus gives you. The tl;dr is this: Citus is an open source extension to Postgres that transforms Postgres into a distributed database. Citus uses sharding and replication to distribute your data and your Postgres queries across a distributed database cluster.

Shining a light on the performance speedups of Citus (via demo)

Every so often, I try to rethink how I talk about Citus, especially as Postgres evolves and the needs of applications change, too. One thing we have not done much is talk directly about the performance improvements in Citus. Sometimes it’s actually slower, but at scale Citus can be *a lot* faster. Therefore, I introduced every Citus feature with some benchmarks that show the performance compared to a (large) Postgres server.

The talk is also worth watching for the demo (the demo starts at 46:52) where I compare the performance of Hyperscale (Citus) on Azure Database for PostgreSQL against a single Postgres server. For the demo, I use GitHub archive data in an analytics use case, and the demo shows >250x speedups for analytical queries with Citus!

Video of my talk at Warsaw PostgreSQL Users Group, on Citus: PostgreSQL at any Scale. Demo starts at 46:52, but the introductory discussion should be useful, too.

Props to the organizers of the Warsaw PostgreSQL Users Group—especially Alicja Kucharczyk—for the time they spend organizing Postgres talks for their community. And for inviting me to give a talk to their Postgres users group. I really appreciated all the good questions, too.

If this demo is your first intro to Citus & you

[...]
Posted by Bruce Momjian in EDB on 2020-09-16 at 14:00

Postgres has made great strides in adding features to match proprietary databases, and it has many complex features that other databases don't have. However, that doesn't mean it is the best fit for every organization. There are still reasons not to use Postgres:

  • Skills in another relational database, and no desire or value to learn Postgres
  • Custom applications written for another database that you don't want to modify to work with Postgres
  • Using externally-developed applications, tools, or frameworks that don't support Postgres
  • Storage of non-transactional or cache data where Postgres's overhead is significant
  • Multi-host workloads with simple queries where NoSQL is a win
  • Small, single-user systems, where SQLite is best

This email thread had lot of discussion on the topic. What is interesting is that decades-old complaints about missing features, reliability, and performance are no longer mentioned.

Posted by Álvaro Herrera in 2ndQuadrant on 2020-09-16 at 09:00
A simple description of the new WITH TIES standard feature in PostgreSQL 13.

HOT updates are for free!
© Laurenz Albe 2020

HOT updates are not a new feature. They were introduced by commit 282d2a03dd in 2007 and first appeared in PostgreSQL 8.3.

But since HOT is not covered by the PostgreSQL documentation (although there is a README.HOT in the source tree), it is not as widely known as it should be: Hence this article that explains the concept, shows HOT in action and gives tuning advice.

What is HOT?

HOT is an acronym for “Heap Only Tuple” (and that made a better acronym than Overflow Update CHaining). It is a feature that overcomes some of the inefficiencies of how PostgreSQL handles UPDATEs.

Problems with PostgreSQL’s implementation of UPDATE

PostgreSQL implements multiversioning by keeping the old version of the table row in the table – an UPDATE adds a new row version (“tuple”) of the row and marks the old version as invalid.room
In many respects, an UPDATE in PostgreSQL is not much different from a DELETE followed by an INSERT.

 

normal update

 

This has a lot of advantages:

  • no need for an extra storage area where old row versions are kept
  • ROLLBACK does not have to undo anything and is very fast
  • no overflow problem with transactions that modify many rows

But it also has some disadvantages:

  • old, obsolete (“dead”) tuples have to be removed from the table eventually (VACUUM)
  • heavily updated tables can become “bloated” with dead tuples
  • every update requires new index entries to be added, even if no indexed attribute is modified, and modifying an index is much more expensive than modifying the table (order has to be maintained)

Essentially, UPDATE-heavy workloads are challenging for PostgreSQL. This is the area where HOT updates help.

An UPDATE example

Let’s create a simple table with 235 rows:

CREATE TABLE mytable (
   id  integer PRIMARY KEY,
   val integer NOT NULL
) WITH (autovacuum_enabled = off);

INSERT INTO mytable
SELECT *, 0
FROM generate_series(1, 235) AS n;

This table is slightly more than one 8KB block lon

[...]
Sometimes, PostgreSQL databases need to import large quantities of data in a single or a minimal number of steps. This process can be sometimes unacceptably slow. In this article, we will cover some best practice tips for bulk importing data into PostgreSQL databases.
Posted by Jonathan Katz in Crunchy Data on 2020-09-15 at 08:40

PostgreSQL 13 is a lucky release! There are some very impactful new features and the release is coming out at its regularly scheduled time in the fall.

One of my favorite parts around the PostgreSQL release process is when I can take a step back, review and discuss the features going into a new release, and learn how they can make a positive impact on our users. For example, in a past hallway track at PGCon, I learned how PostgreSQL 12 introduced the ability to do a form of multi-factor authentication!

Each release has a lot of these "hidden gems" -- features that may not jump off the page, but can have a big impact when you actually need them. Postgres 13 is no exception: some of these features make it easier to write queries, add additional layers of security, or help you to avoid downtime.

So what are the hidden gems of PostgreSQL 13?

PostgreSQL will create some catalog and other caches for each connection request to store some queried data locally, which can speed up query efficiency. If a process undergoes a DDL operation that causes catalog to change, other running processes need to be notified that some cache data is invalid. We call this notification mechanism the SI (Share Invalid) mechanism.

I have made use of this feature in projects that I have worked before but never studied it in depth to understand it completely or appreciate its power. However I got a chance recently to experiment with this on a project and was able to learn more about it. This promoted to write this blog so I can share my knowledge on this explain the affectiveness of this feature.

1. Brief description of PostgreSQL cache

PostgreSQL’s cache is mainly the catalog cache. You can see the enum structure SysCacheIdentifier of the catalog cache in syscache.h. The catalog caches is mainly used to cache the contents of catalog, and there are some non-system caches too, these caches can be seen in the src/backend/utils/cache directory of the code.

describe_cache_in_postgres

As shown in the figure above, there is a simple query on the t1 table. Before the query, the backend first needs to obtain what t1 is, such as whether there is a t1 relationship, what attributes does this relationship have, and so on. Of course, this information is stored in each catalog. When backend obtains catalog information, it generally does not directly scan the hard disk data of the catalog, but tries to obtain data from the relevant cache. If the data that backend wants does not exist in the cache , the data will be scanned from the hard disk and cached in the cache.

2. Implementation of cache consistency

Each backend of PostgreSQL caches some catalog information, which creates the problem of cache consistency. Suppose there are two backends A and B who cached ‘t1’ tuple in RELNAMENSP, and then A has made some changes to the structure of t1, then B needs to be notified to invalidate this

[...]
Posted by Bruce Momjian in EDB on 2020-09-14 at 14:45

Database applications are initially written using the simplest queries possible. During testing and in production, some application tasks might have unacceptable performance. This is where re-architecturing happens, and where simple queries and data schema layouts can get complex. They might get complex because it is required to accomplish the task, or it might be because of limitations in how the database software handles certain queries. Database and tooling upgrades can require further complex additions.

When switching to a new database like Postgres, all that complexity comes along for the ride. Sometimes complexity added to work around deficiencies in other databases work fine in Postgres, but often that complexity has to be removed to get good performance in Postgres. There also might be cases where complexity has to be added to get good performance in Postgres.

The bottom line is that complexity is bad for applications, so add complexity only when necessary. Wise application developers remove unnecessary complexity periodically, but it can be hard to know if database upgrades have made some complexity unnecessary. Porting to a new database is an ideal time to reevaluate if application simplification is possible.

Posted by Andreas 'ads' Scherbaum on 2020-09-14 at 14:00
PostgreSQL Person of the Week Interview with Anthony Nowocien: Hi, I’m Anthony Nowocien. My name comes from my Polish side and some of my looks from my Trinidadian side. I live and work in Paris as a PostgreSQL DBA.
Posted by Pavel Stehule on 2020-09-13 at 05:16

I tested Fedora 33 on my old Lenovo T510 (enhanced 16GB RAM). It looks pretty well. The interactive processes are quick under high load (what was some time problem in older releases).  It is very nice and comfortable.

Posted by Jędrzej Biedrzycki on 2020-09-12 at 10:47

0. The problem

Some time ago I’ve stumbled on this interesting question on stackoverflow (as stated in the title). I could not find any satisfactory answer at that time, so I came up with this solution. I think the problem is interesting enough to make it into a blog post.

The following table shows the expected results. The x column contains the original data whereas mdn_x contains the median computed from current up to 3 preceding rows.

x mdn_x
1 1
2 1.5
3 2
5 2.5
8 4
13 6.5
21 10.5

1. Quick solution

Unfortunately ordered-set aggregate functions do not support windows, which would be the most intuitive approach. However, the window size in this example is fixed as 4 – it can be easily calculated using functions like lead and lag. A possible solution would look like this:

select x,
       (lag(x, 2) over w + lag(x) over w) / 2. as mdn_x
from tmp t
    window w as (rows between 3 preceding and current row)
order by 1;

It reads pretty easily – calculate the average of two previous values. It works for all rows except for the first three, because of the lag returning null. We can easily fix this by introducing a null-check:

select x,
       case
           when lag(x) over w is null then x
           when lag(x, 2) over w is null then (x + lag(x) over w) / 2.
           when lag(x, 3) over w is null then lag(x) over w
           else (lag(x, 2) over w + lag(x) over w) / 2.
           end
from tmp t
    window w a
[...]
At the core of the “10 Things I Hate About PostgreSQL” blog post, sits one particular observation that seems like a simple annoyance at first glance. MVCC is presented merely as item 4 in the list, but like XIDs, it serves as a star upon which an entire constellation of related stellar objects converge. While […]
Posted by Bruce Momjian in EDB on 2020-09-11 at 15:15

I recently wrote a presentation, Postgres in the Cloud: The Hard Way, which shows how to create a cloud instance, and install and run Postgres, purely from the command line. This helps show how all the pieces fit together. I recently presented this at pgDay Israel so I am now publishing the slides.