Latest Blog Posts

Introducing Snowflake Sequences in a Postgres Extension-2
Posted by Ahsan Hadi in pgEdge on 2025-12-03 at 06:36

In a PostgreSQL database, sequences provide a convenient way to generate a unique identifier, and are often used for key generation. From the community, PostgreSQL provides functions and SQL language to help manage sequence generation, but the sequences themselves are not without limitations in a multi-master environment. As a result, we introduced 100% open-source Snowflake sequences that work seamlessly in a multi-master PostgreSQL cluster to remove those limitations and ensure your data can thrive at the network edge. (Check out the extension on GitHub!) The easiest way to get started using Snowflake sequences is to try out pgEdge Postgres, either the Enterprise or the Distributed edition, available for self-hosting or fully managed hosting on virtual machines, containers, or the cloud. The pgEdge Control Plane (also 100% open source and available on GitHub) makes it even easier to deploy and orchestrate Postgres clusters, and comes with the snowflake extension by default (amongst others).

Why are Sequences an Issue?

In a distributed multi-master database system, sequences can get complicated. Ensuring consistency and uniqueness across the nodes in your cluster is a problem if you use PostgreSQL sequences. The Snowflake extension is designed to step up to automatically mitigate this issue. PostgreSQL sequence values are prepared for assignment in a table in your PostgreSQL database; as each sequence value is used, the next sequence value is incremented. Changes to the next available sequence value are not replicated to the other nodes in your replication cluster. In a simple example, you might have a table on node , with 10 rows, each with a primary key that is assigned a sequence value from 1 to 10; the next prepared sequence value on will be 11. Rows are replicated from to without issue until you add a row on . The PostgreSQL sequence value table on has not been incrementing sequence values in step with the sequence value table on n1. When you add a row on , it will try to use the next available seque[...]

Speed up JOIN Planning - upto 16x Faster!
Posted by Robins Tharakan on 2025-12-02 at 19:30
The hidden cost of knowing too much. That's one way to describe what happens when your data is skewed, Postgres statistics targets are set high, and the planner tries to estimate a join. For over 20 years, Postgres used a simple O(N^2) loop to compare (equi-join) Most Common Values (MCVs) during join estimation. It worked fine when statistics targets are small (default_statistics_target defaults

Contributions for week 49, 2025
Posted by Boriss Mejias in postgres-contrib.org on 2025-12-02 at 12:10

PostgreSQL Belgium User Group had a new meetup on November 25, in Haasrode, Leuven, organized by Kim Jansen, An Vercammen, Boriss Mejías and Stefan Fercot.

Speakers:

  • An Vercammen
  • Marco Huygen
  • Priyanka Mittal
  • Boriss Mejías

Related blog post by Kim Jansen

Prague PostgreSQL Meetup hosted a new meetup on November 24, organized by Gulcin Yildirim Jelinek

Speakers:

  • Radim Marek
  • Dalibor Pavlovic
  • Petr Castulik
  • Mayuresh B. (lightning talk)

What is better: a lookup table or an enum type?
Posted by Laurenz Albe in Cybertec on 2025-12-02 at 11:01

If people used lookup tables: a man is standing in front of a wall calendar and musing, "'Child 1 birthday' - Now who was child 1?"
© Laurenz Albe 2025

Sometimes a string column should only contain one of a limited number of distinct values. Examples would be the current state of a service request or the US state in an address. There are several ways to implement such a column in PostgreSQL, the most interesting being an enum type or a lookup table. In this article, I will explore the benefits and disadvantages of these methods.

Usually not the best solution: a string with a check constraint

To show this implementation, let's define a simplified table of Austrian citizens with their residential state:

CREATE UNLOGGED TABLE person_s (
   person_id bigint NOT NULL,
   state text NOT NULL,
   CHECK (state IN ('Wien', 'Niederösterreich', 'Burgenland',
                    'Steiermark', 'Oberösterreich', 'Kärnten',
                    'Tirol', 'Salzburg', 'Vorarlberg'))
);

There are around 8 million Austrians, so let's insert 8 million rows. Not all states are equally populous, so I use a CASE expression to assign states in an approximately realistic distribution:

INSERT INTO person_s (person_id, state)
SELECT i,
       CASE WHEN hashint4(i) < -1198295875 THEN 'Wien'
            WHEN hashint4(i) < -390842024 THEN 'Niederösterreich'
            WHEN hashint4(i) < -249108103 THEN 'Burgenland'
            WHEN hashint4(i) < 343597384 THEN 'Steiermark'
            WHEN hashint4(i) < 1060856922 THEN 'Oberösterreich'
            WHEN hashint4(i) < 1327144894 THEN 'Kärnten'
            WHEN hashint4(i) < 1692217115 THEN 'Tirol'
            WHEN hashint4(i) < 1958505087 THEN 'Salzburg'
            ELSE 'Vorarlberg'
       END
FROM generate_series(1, 8000000) AS i;

I'll add the primary key constraint and an index on the states to the table after loading the data because that is faster. Also, I'll run VACUUM and ANALYZE to improve the query performance:

ALTER TABLE person_s
   ADD CONSTRAINT person_s_pkey PRIMARY KEY (person_id);

CREATE INDEX person_s_state_idx ON person_s (state);

VACUUM (ANALYZE) person_s;

Disadvantages of t

[...]

CNPG Recipe 23 - Managing extensions with ImageVolume in CloudNativePG
Posted by Gabriele Bartolini in EDB on 2025-12-01 at 20:35

Say goodbye to the old way of distributing Postgres extensions as part of the main pre-built operand image. Leveraging the Kubernetes ImageVolume feature, CloudNativePG now allows you to mount extensions like pgvector and PostGIS from separate, dedicated images. This new declarative method completely decouples the PostgreSQL core from the extension binaries, enabling dynamic addition, easier evaluation, and simplified updates without ever having to build or manage monolithic custom container images.

Part 3: Postgres Journey to the top with developers
Posted by Tom Kincaid in EDB on 2025-12-01 at 18:33

 

This blog provides my opinion on how Postgres, according to the annual Stack Overflow developer survey, became the most admired, desired and used database by developers. This is part three in my series about the Postgres journey to the top with developers. Here are the first parts of this story:

 

  1. Postgres incredible rise to the top with developers. This presented the fact as a result of the changing survey results over the years.
  2. Part 2: PostgreSQL’s incredible trip to the top with developers. This presented my opinion on why Postgres didn’t simply go away when it was a distant second, third or

November Meetup Recording
Posted by Henrietta Dombrovskaya on 2025-12-01 at 13:28

And our last 2025 recording is here! Check out Jay Miller’s talk!

PgPedia Week, 2025-11-30
Posted by Ian Barwick on 2025-12-01 at 09:45
PostgreSQL 19 changes this week debug_exec_backend GUC to show EXEC_BACKEND state pg_buffercache view pg_buffercache_os_pages added following functions added: pg_buffercache_mark_dirty() pg_buffercache_mark_dirty_relation() pg_buffercache_mark_dirty_all() pg_replication_slots column slotsync_skip_reason added pg_stat_replication_slots columns slotsync_skip_count and slotsync_skip_at added Planner will now replace COUNT(ANY) with COUNT(*) , when possible PostgreSQL 19 articles Teaching Query Planner to See Inside C Functions (2025-11-27) - Robins Tharakan Settling COUNT(*) vs COUNT(1) debate in Postgres 19 (2025-11-27) - Robins Tharakan Super fast aggregations in PostgreSQL 19 (2025-11-25) - Hans-Jürgen Schönig / CYBERTEC PostgreSQL 18 articles

more...

Contributions for week 48, 2025
Posted by Floor Drees in postgres-contrib.org on 2025-12-01 at 08:10

Seattle Postgres Users Group (SEAPUG) maintained the PostgreSQL booth at PASS Data Community Summit 2025 from November 17-21, 2025:

  • Lloyd Albin
  • Jeremy Schneider
  • Harry Pierson
  • Ben Chobot
  • Deon Gill
  • Rick Lowe
  • Pavlo Golub

Speakers:

  • John Martin
  • Grant Fritchey
  • Lukas Fittl
  • Ryan Booz
  • Taiob Ali
  • John Martin
  • Pavlo Golub
  • Ramesh Pathuri
  • Abhishek Shukla
  • Ezat Karimi
  • Adam Machanic
  • Shane Borde
  • William Mentaze
  • Alvaro Costa-Neto
  • James Phillips
  • Janis Griffin
  • Jeremy Schneider
  • Shivam Gulati
  • Tim Steward
  • Sudhir Amin
  • Minesh Chande

The Mumbai PostgreSQL User Group meetup took place on 20th Nov, 2025, organized by Sovenath Shaw, Ajit Gadge and Deepak Mahto.

Speakers:

  • Jignesh Shah (AWS)
  • Suraj Kharage(EDB)
  • Jessica Shelar (DataCloudGaze Consulting)
  • Aniket Jadhav, Rushikesh Ghantewar (CDAC)
  • Trushar Borse , Sagar Sidhpura(Google Cloud)

PostgreSQL Conference Japan 2025 took place on November 21st. It was organized by Haruka Takatsuka, Kazushiro Takeda, Rintaro Ikeda, Tetsuo Sakata, Kousuke Kida, and Junichi Tado.

During the conference, the following volunteers helped running the conference: Tamotsu Odaka, Taiki Koshino, Nozomi Anzai, Jun Kuwamura, Hiroki Kataoka, Makoto Kaga, Tetsuji Koyama, Seiji Goto, Shinobu Honma, Kenji Sato, Katsuaki Fukushima, Thiduong Dang, Yung-Chung Ku, Wataru Takahashi, and Kouhei Ito.

Postgres Booth at PASS Data Community Summit
Posted by Jeremy Schneider on 2025-11-30 at 23:40

PASS Data Community Summit 2025 wrapped up last week. This conference originated 25 years ago with the independent, user-led, not-for-profit “Professional Association for SQL Server (PASS)” and the annual summit in Seattle continues to attract thousands of database professionals each year. After the pandemic it was reorganized and broadened as a “Data Community” event, including a Postgres track.

Starting in 2023, volunteers from the Seattle Postgres User Group have staffed a postgres community booth on the exhibition floor. We provide information about Postgres User Groups around the world and do our best to answer all kinds of questions people have about Postgres. The booth consistently gets lots of traffic and questions.

The United States PostgreSQL Association has generously supplied one of their booth kits each year, which has a banner/background and some booth materials like stickers and a map with many user groups and a “welcome to postgres” handout and postgres major version handouts. We supplement with extra pins and stickers and printouts like the happiness hints I’ve put together, a list of common extensions that Rox made, and a list of Postgres events that Lloyd made. Every year, we also bring leftover Halloween candy that we want to get rid of and we put it in a big bowl on the table.

One of the top questions people ask is how and where they can learn more about Postgres. Next year I might just print out the Links section from my blog, which has a bunch of useful free resources. Another idea I have is for RedGate and EnterpriseDB – I think both of these companies have paid training but also give free access to a few introductory classes – it would be nice if they made a small card with a link to their free training. I think we could have a stack of these cards at our user groups and at the PASS booth. The company can promote paid training, but the free content can benefit anyone even if they aren’t interested in the paid training. I might also reach out to other companies who have paid

[...]

Using JSON: json vs. jsonb, pglz vs. lz4, key optimization, parsing speed?
Posted by Hubert 'depesz' Lubaczewski on 2025-11-29 at 19:16
Recently(ish) I had a conversation on one of PostgreSQL support chats (IRC, Slack, or Discord) about efficient storage of JSON data, which compression to use, which datatype. Unrelated to this, some people (at least two over the last year or so) said that they aren't sure if PostgreSQL doesn't optimize storage between columns, for example, … Continue reading "Using JSON: json vs. jsonb, pglz vs. lz4, key optimization, parsing speed?"

Teaching Query Planner to See Inside C Functions
Posted by Robins Tharakan on 2025-11-28 at 19:30
The Postgres planner just got a new superpower: X-ray vision for your black-box functions. For years, Postgres has been able to "inline" simple SQL functions. If you wrote CREATE FUNCTION ... LANGUAGE SQL AS 'SELECT ...', the planner would often rip out the function call and paste the raw SQL directly into the main query. This was magic. It meant that WHERE clauses could be pushed down, indexes

TDE is now available for PostgreSQL 18
Posted by Jan Wieremjewicz in Percona on 2025-11-28 at 11:00

Back in October, before PGConf.EU, I explained the issues impacting the prolonged wait for TDE in PostgreSQL 18. Explanations were needed as users were buzzing with anticipation, and they deserved to understand what caused the delays and what the roadmap looked like. In that blog post I have shared that due to one of the features newly added in 18.0, the Asynchronous IO (AIO), we have decided to give ourselves time until 18.1 has been released to provide a build with TDE. We wanted to ensure best quality of the solution and that takes time.

Settling COUNT(*) vs COUNT(1) debate in Postgres 19
Posted by Robins Tharakan on 2025-11-27 at 13:29
A recent commit to the PostgreSQL master branch brings a nice quality-of-life optimization for a very common SQL pattern - improving performance by up to 64% for SELECT COUNT(h) where h is a NOT NULL column. If you've ever wondered whether you should use COUNT(*) or COUNT(1), or if you've been dutifully using COUNT(id) on a non-null column, this change is for you. Note: This feature is currently

DBeaver's Visual Query Builder
Posted by Dave Stokes on 2025-11-26 at 20:15

 Quick! You need to create a complex query, but you don't remember how to join the tables. And you need to get it done ASAP.

You are in luck if you have an enterprise edition of DBeaver handy.  You can drag and drop the tables and then adjust the clauses as needed without having to dig into the table definitions.

1. Open the database 

The PostgreSQL DVD Rental database. 

I am using the PostgreSQL DVDRental dataset for these examples. I have opened the connection and listed the tables.

As you can see, the database contains numerous tables. 

We are going to build a query visually to get information about the customers.










2. Open a SQL Console 

Open SQL Console

The second step requires an SQL Console. You will find it under the SQL Editor tab.





3. Select Builder 

On the right side, Select Builder

On the right side of the console pane, tilted 90 degrees, is a tab labeled Builder. 


4. You are ready to drag and drop tables

 

The Query Builder Is Ready to use. 
You have three new panes. The top one is where you drop the tables.  The bottom left is where we tailor the query. The query will then appear in the bottom right pane.

Now drag the payment, rental, and customer tables from step 1 separately to the top pane, the one that says 'no active query',


You can rearrange the tables to create a harmonious relationship map that aligns with your artistic aesthetic.


You should have something like thi

[...]

FOSDEM PGDay 2026 - Call for papers and registration open
Posted by Magnus Hagander in PostgreSQL Europe on 2025-11-26 at 11:52

The call for papers is now open for FOSDEM PGDay 2026. The CfP will run until December 15, 2025 at 11:59 CET. We will aim to notify speakers by December 17th, 2025 and publish the schedule before Christmas.

FOSDEM PGDay 2026 is a one-day conference that will be held ahead of the main FOSDEM event in Brussels, Belgium, on Friday, January 30st, 2025. This will be a PostgreSQL-focused event. This year, FOSDEM PGDay will feature two tracks of talks! This conference day will be for-charge with a registration of EUR 75, and will be held at the Brussels Marriott Hotel. Registration is required to attend and since we have a limited number of seats available for this event, we urge everybody to register as soon as possible once open.

PostgreSQL Europe will not have its usual devroom, but we invite all community members to visit the PostgreSQL booth and the Databases devroom at the main FOSDEM conference. No main FOSDEM events require registration.

For full details about the conference, venue and hotel, see the event website.

We also have a special negotiated room rate with the Brussels Marriott Hotel. For details, see the venue page.

We hope to see you in Brussels!

pgstream v0.9.0: Better schema replication, snapshots and cloud support
Posted by Esther Minano in Xata on 2025-11-26 at 09:00
Bringing connection retries, anonymizer features, memory improvements and solid community input.

KubeCon NA Atlanta 2025: a recap and CloudNativePG’s path to CNCF Incubation
Posted by Gabriele Bartolini in EDB on 2025-11-26 at 06:26

This blog post recaps my eventful participation in KubeCon + CloudNativeCon North America 2025 in Atlanta, highlighting the key decision by maintainers to formally apply for CNCF Incubation for the CloudNativePG operator. I had the pleasure of delivering three presentations: a CNPG Lightning Talk focused on community contribution; a deep-dive with Yoshiyuki Tabata on implementing modern PostgreSQL authorisation using Keycloak and OAuth for robust database security; and a session with Jeremy Schneider introducing the new quorum-based consistency feature in CNPG 1.28 for safer cluster reconfigurations. Links to the videos of all three talks are shared within the full article.

Announcing Release 20 of the PostgreSQL Build Farm Client
Posted by Andrew Dunstan in EDB on 2025-11-25 at 13:46

 Features:

  • New branches_to_build keyword UP_TO_ to allow building only certain older branches. e.g. UP_TO_REL_16_STABLE
  • Collect pg_config.h for meson builds
  • Handle new --restrict-key requirement for dump testing
  • Trim collection of log files
    Some files were collected multiple times, and some large log files were collected even when of little value, as the test succeeded. Reducing this will alleviate disk space pressure on the server.
  • Minor bug fixes.

Because of the reduced log collection, buildfarm owners are highly encouraged to  upgrade to the latest release.

The release can be obtained from Github or Buildfarm Server

Super fast aggregations in PostgreSQL 19
Posted by Hans-Juergen Schoenig in Cybertec on 2025-11-25 at 05:40

PostgreSQL 18 has just been born and we are already talking about the blessings of PostgreSQL 19 and beyond? Well, yes, and there is a good reason for it.

Recently, an important series of changes have been committed, which some of our folks (and many others around the world) have been working on for many many years and which have the power to seriously speed up aggregations for basically everyone out there. What is even better is that you don't have to change your code, there is no need to adjust parameters or anything of that sort. You can simply run your code as it is and enjoy the benefits of this important improvement.

Aggregating data in PostgreSQL

To understand why this improvement in the query optimizer is so important, we first have to understand how PostgreSQL has handled grouping up to now. The most simple rule was: "Join first, aggregate later". What does that mean? Consider the following example:

SELECT        j.gender_name, count(*)
FROM    person AS p, gender AS j
WHERE   p.gender_id = j.gender_id
GROUP BY j.gender_name

Let us assume that we only store a handful of genders but millions of people. The way everything before PostgreSQL 19 is handling this type of query is the following. Keep in mind that the example is of course highly simplified to make understanding easy:

  • Read every entry in the person table
  • For every entry, look up the gender_name and add to the count in the desired group
  • Display the result

What is fundamentally wrong with this approach? Actually nothing. This is how most systems would handle this type of operation. However, there is an inefficiency here: In a large table, millions of people might be female - what the system does is to look up the name for each gender_id again and again. In case almost every id is different, this is fine. However, if there are only a handful of different ids, the operation takes way too long and becomes pretty repetitive.

The breakthrough: Aggregate first - join later

The big breakthrough in Postgr

[...]

PostgreSQL Conference Japan 2025
Posted by Andreas Scherbaum on 2025-11-24 at 22:00
Back in July I submitted talks to the annual PostgreSQL Conference Japan 2025 (PostgreSQL カンファレンス 2025). This year is the 20th year of the conference, I was told. That is even longer running than our “own” PostgreSQL Conference Europe. To my surprise, one of my submitted talks was accepted. Which made me plan a trip to Japan and South Korea, to combine the conference

PostgreSQL Contributor Story: Vaibhav Dalvi
Posted by Floor Drees in EDB on 2025-11-24 at 15:25
Earlier this year we started a program (“Developer U”) to help colleagues who show promise for PostgreSQL Development to become contributors. Meet Vaibhav Dalvi, Staff SDE, who, prior to joining EDB had no experience with databases! And now look at him!

PostgreSQL Contributor Story: Akshay Joshi
Posted by Floor Drees in EDB on 2025-11-24 at 14:08
Earlier this year we started a program ("Developer U") to help colleagues who show promise for PostgreSQL Development to become contributors. Meet one of the participants: Akshay Joshi, a Principal Engineer and Engineering Manager at EDB.

PgPedia Week, 2025-11-23
Posted by Ian Barwick on 2025-11-24 at 13:30
PostgreSQL 19 changes this week COPY ... TO is now used for the initial synchronization of partitioned tables pg_dependencies type output format converted to a more user-readable JSON array of objects pg_ndistinct type output format converted to a more user-readable JSON array of objects pg_stat_subscription_stats new column seq_sync_error_count renamed to  sync_seq_error_count column sync_error_count renamed to sync_table_error_count PostgreSQL 18 articles Postgres 18: Skip Scan - Breaking Free from the Left-Most Index Limitation (2025-11-18) - Ahsan Hadi / pgEdge Highlights of PostgreSQL 18 (2025-08-28) - Ahsan Hadi / pgEdge

more...

PostgreSQL Berlin November 2025 Meetup
Posted by Andreas Scherbaum on 2025-11-23 at 22:00
On November 13th, 2025, we had the PostgreSQL November Meetup in Berlin. Redgate Software hosted us again, this time in a larger meeting room at The Social Hub Berlin. We had two speakers, and very tasty food. The event space is in Alexanderstraße 40, in Berlin Mitte. just five minutes away from Alexanderplatz. It is a event and gym venue, with several meeting rooms in various sizes, office area and a gym.

Contributions for week 47, 2025
Posted by Cornelia Biacsics in postgres-contrib.org on 2025-11-23 at 20:22

Seven PostgreSQL contributors completed this year of Google Summer Of Code:

Contributors:

  • Ahmad Gouda
  • Ashutosh Sh
  • Bassam Adnan
  • Gaurav Patidar
  • Mankirat Singh
  • Mohab Yasser
  • Tejas Tyagi

Mentors:

  • Akshat Jaimini
  • Pavlo Golub
  • Haoran Zhang
  • Saurav Pal
  • Jesper Pedersen
  • Rajiv Harlalka
  • David Wheeler
  • Shahryar Soltanpour
  • Luca Ferrari

Community Blog Posts:

Community Meetups:

Postgres Bangalore (PGBLR) happened on Meetup November 15th 2025

Speakers:

  • Srinath Reddy Sadipiralla
  • Nitin Jadhav
  • Narayana Jayaram and Sushmitha Mendon

Organisers:

  • Amit Kapila
  • Kuntal Ghosh
  • Sivji Kumar Jha
  • Vigneshwaran

Volunteers:

  • Jitendra Kumar
  • Nisha Moond
  • Shlok Kumar Kyal
  • Vasudev Sharma

Mumbai PostgreSQL UserGroup on November 20 2025 organised by:

  • Aarti Nadekar
  • Deepak Mahto
  • Sovenath Shaw

Speakers:

  • Trushar Borse
  • Jignesh Shah
  • Suraj Kharage
  • Rushikesh Ghantewar
  • Aniket Jadhav
  • Ajit Gadge
  • Jessica Shelar

On November 21 2025 Laura Minen & Martín Marqués organised Postgres Meetup en Valencia Noviembre 2025

PostgreSQL Buildfarm Members: A status update
Posted by Robins Tharakan on 2025-11-22 at 22:29
The PostgreSQL Buildfarm is a global network of machines that continuously test PostgreSQL across a wide range of operating systems, architectures, compilers, and branches. Over the past few years, I have created and maintained several buildfarm members, each with its own quirks and strengths. In this post, I’ll share a status-update working on the following animals: alligator, dodo,

Two Database Videos
Posted by Bruce Momjian in EDB on 2025-11-22 at 20:15

In the past few months, two new database videos were released on YouTube. From Hollerith to Stonebraker, from IMS to Postgres, the first video does a great job of walking through the history of databases in a way that I have never seen before. It focuses on the forces that drove database technology forward, like the U.S Census, airline reservation requirements, and planet-scale data storage. I have been in the database industry for so long, I sometimes forget how and why we got to here. This video will be helpful to show to your friends who don't understand the purpose of database technology.

The second video focuses on SQL and its history as breaking away from data models that required programmers to traverse fixed paths to access data, and SQL's ubiquity as the universal data-access language.

Cleanup Time
Posted by Bruce Momjian in EDB on 2025-11-22 at 16:45

With 3k slides and 64 presentations (and more two pending), the Presentations section of my website was getting cluttered. (Okay, I guess it has been cluttered for a while.) I have restructured the sections, adding two, removing one, and moving superseded presentations to Old. Hopefully this will help website visitors more easily find useful information.

PostGIS Performance: Data Sampling
Posted by Paul Ramsey in Crunchy Data on 2025-11-21 at 13:00

One of the temptations database users face, when presented with a huge table of interesting data, is to run queries that interrogate every record. Got a billion measurements? What’s the average of that?!

One way to find out is to just calculate the average.

SELECT avg(value) FROM mytable;

For a billion records, that could take a while!

Fortunately, the “Law of Large Numbers” is here to bail us out, stating that the average of a sample approaches the average of the population, as the sample size grows. And amazingly, the sample does not even have to be particularly large to be quite close.

Here’s a table of 10M values, randomly generated from a normal distribution. We know the average is zero. What will a sample of 10K values tell us it is?

CREATE TABLE normal AS
  SELECT random_normal(0,1) AS values
    FROM generate_series(1,10000000);

We can take a sample using a sort, or using the random() function, but both of those techniques first scan the whole table, which is exactly what we want to avoid.

Instead, we can use the PostgreSQL TABLESAMPLE feature, to get a quick sample of the pages in the table and an estimate of the average.

SELECT avg(values)
  FROM normal TABLESAMPLE SYSTEM (1);

I get an answer – 0.0031, very close to the population average – and it takes just 43 milliseconds.

Can this work with spatial? For the right data, it can. Imagine you had a table that had one point in it for every person in Canada (36 million of them) and you wanted to find out how many people lived in Toronto (or this red circle around Toronto).

alt

SELECT count(*)
  FROM census_people
  JOIN yyz
    ON ST_Intersects(yyz.geom, census_people.geom);

The answer is 5,010,266, and it takes 7.2 seconds to return. What if we take a 10% sample?

SELECT count(*)
  FROM census_people TABLESAMPLE SYSTEM (10)
  JOIN yyz
    ON ST_Intersects(yyz.geom, census_people.geom);

The sample is 10%, and the answer comes back as 508,292 (near one tenth of our actual measurement) in 2.2 seconds. What abou

[...]

Top posters

Number of posts in the past two months

Top teams

Number of posts in the past two months

Feeds

Planet

  • Policy for being listed on Planet PostgreSQL.
  • Add your blog to Planet PostgreSQL.
  • List of all subscribed blogs.
  • Manage your registration.

Contact

Get in touch with the Planet PostgreSQL administrators at planet at postgresql.org.