PostgreSQL
The world's most advanced open source database
Top posters
Number of posts in the past month
Top teams
Number of posts in the past month
Feeds
Planet
  • Policy for being listed on Planet PostgreSQL.
  • Add your blog to Planet PostgreSQL.
  • List of all subscribed blogs.
  • Manage your registration.
Contact
  • Get in touch with the Planet PostgreSQL administrators at planet at postgresql.org.
At PGConf US 2016, Sean Chittenden gave an excellent High Availability presentation. 



The presentation includes an introduction and setup for consul as the means of providing highly available PostgreSQL in local and geographically disparate data centers or cloud providers. The presentations includes:

*) Introduction to consul and its architecture
*) Setup of a single consul cluster
*) Setup for a few sample database instances (OLAP and OLTP)
*) Firewall requirements
*) Integration with bind, djbdns, and dnsmasq
*) Setup geographic failover to two different data centers and cloud providers
*) Various Best Practices tips and suggestions
*) Q&A

Posted by Joe Conway in Crunchy Data on 2017-04-25 at 13:58

This is the third and final post of the series intended to introduce PostgreSQL users to PL/R, a loadable procedural language that enables a user to write user-defined SQL functions in the R programming language. The information below provides sample use of R Functions against the NDVI dataset.

For many years MySQL and PostgreSQL were somewhat competing databases, which still addressed slightly different audiences. In my judgement (which is of course a bit biased) PostgreSQL always served my professional users, while MySQL had a strong standing among web developers and so on. But, after Oracle took over MySQL I had the feeling that […]

The post Why favor PostgreSQL over MariaDB / MySQL appeared first on Cybertec - The PostgreSQL Database Company.

Posted by Bruce Momjian in EnterpriseDB on 2017-04-24 at 16:30

Modern systems offer several storage options and databases are very sensitive to the I/O characteristics of these options. The simplest is direct-attached storage (DAS), e.g. SATA, SAS. It is the simplest because it uses a dedicated connection between the server and storage.

Another storage option is a storage area network (SAN), e.g. fibre channel, iSCSI. Like DAS, it offers a sequential set of blocks upon which a local file system can be built.

A third options is network-attached storage (NAS), e.g. NFS. While NAS is networked storage like SAN, it offers a remote file system to the server instead of remote storage blocks. (The fact that the SAN/NAS acronym letters are just reversed only adds to the confusion. I remember it as (block) "storage" being the first letter of SAN.)

Continue Reading »

You should always be careful of what you ask for. A couple of months ago while I was feeling particularly brave, I submitted to present at the DataLayer Conference. The next thing I knew, I was speaking at the DataLayer Conference. The conference takes place in Austin on May 17th. Conferences like this one are an awesome channel for PostgreSQL advocacy. Of course I am partial to PgConf US but a conference such as DataLayer allows us to reach people who may be using one of those "other" databases. 

Here is the synopsis of the presentation I will be giving:

PostgreSQL: The Center of Your Data Universe

Although there are still battles to be fought, the war has already been won. Find out how PostgreSQL answers all of your data layer needs. PostgreSQL is a one of the longest standing Open Source Database systems with legions of users leading the way to a sane, productive and performance driven data layer. This presentation will cover an overview of PostgreSQL technologies including:

  • NoSQL capabilities
  • Relational capabilities
  • Replication & High Availability
  • Features you can’t believe you lived without
  • Community

If you feel like going, you can use the code JDROCKS (I promise, I didn't pick it) for a 15% discount. Let's have a large PostgreSQL contigent at the conference and show those "other" technologies what real community feels like!

…and why I’m glad I did.

It’s not all technical… Who knew?!

Last year, Pycon7 was held right about the time I joined 2ndQuadrant. Seeing as I was new to the technology AND the Italian language (note: there was an English track), I opted out of attending. Well, after attending Pycon8, I can say that I won’t make that mistake again!

Over the past year working in the Open Source community, I’ve learned more technical information than I could have ever imagined. Even then, attending a technical conference and understanding (completely) technical talks seemed a little far-fetched. Or so I thought!

Since being introduced to the wonderful world of Open Source, PostgreSQL, and numerous other technologies – I’m continuously fascinated by the way that the communities intertwine. These technical communities are more than just the technology they use – they’re full of collaborative community advocates! There was even a ‘Community’ track at Pycon8. There were interesting talks that were outside the technical box such as: Be(come) a Mentor! Help Others Succeed!, Don’t try to look smart. Be smart, and several more!

It’s also always fun to see your colleagues and friends give talks at conferences. Not to play favorites… but the talk presented by my colleagues Giulio Calacoci and Francesco Canovai – Disaster Recovery: A Series of Unfortunate Events – had to be my favorite.

Joined by our own maestro Leonardo Cecchi, their talk was informational, interesting AND fun! Giulio and Francesco expressed the importance of Disaster Recovery tools (such as Barman) by relating them to fairy tales we all know and love, while Leonardo played his guitar in the background.

Other interesting talks I attended were: PostgreSQL su NFS: miti e verità [PostgreSQL with NFS: Myths and Truths] given by Jonathan Battiato, Taking care of PostgreSQL with Ansible by Rubens Souza, and Python e PostgreSQL, un connubio perfetto [Python and PostgreSQL, a perfect blend] by Marco Nenciarini.

Oh, and who could forget.. Me.. </ embarassed_emoji >  Just kidding! I w

[...]
Posted by Bruce Momjian in EnterpriseDB on 2017-04-21 at 16:00

Connection poolers have two advantages:

  • They reduce the number of active connections, allowing the server to operate more efficiently, without the overhead of managing many idle connections.
  • They speed startup by allowing previously started sessions to be used as soon as a connection request arrives.

Postgres has two popular external connection poolers, PgBouncer and pgpool. External poolers have several advantages compared to poolers built into the database:

  • They can be placed closer to applications, perhaps on application servers.
  • They allow the pooling load to be handled by a dedicated server.
  • If the database server goes down, poolers can be easily re-pointed to new database servers.

But external connection poolers have disadvantages over internal ones:

  • Authentication not integrated with the database server
  • Additional network delay going through a pooler

Continue Reading »

Some weeks ago at pgDay Paris, during the evening social event, we got into a small „guestimation“ with another Postgres enthusiast, about the percepted speed of using integer based ID-s vs UUID based ones for bigger amounts of data. In short he reasoned that one should generally avoid the UUID-s for performance reasons as they […]

The post int4 vs int8 vs uuid vs numeric performance on bigger joins appeared first on Cybertec - The PostgreSQL Database Company.

Everything you always wanted to know about stats

Today's post is about pg_stat_all_tables. This view contains various statistics about tables usage and may be useful in different scenarios. I'm going to discuss the following ones:
  1. Sequential scans.
  2. Table's write activity.
  3. Autovacuum queue.
Sequential scans. One of the most useful types of information that you can get from checking pg_stat_all_tables is the number of scans. It shows you how many times tables were accessed directly or through indexes and how many rows were returned by these scans - this information is located in seq_scan/seq_tup_read and idx_scan/idx_tup_fetch columns.

We need the first two columns that show number of times the tables were accessed through sequential scans and number of tuples returned as a result.

Why use sequential scan? It's not a problem when seqscan handles small tables, but in case of larger tables sequential scan might read the whole table and this might take a while. It also becomes an issue when postgres handles many sequential scans concurrently and performance of the storage drops significantly. As a rule, it's an absence if index for new kind of queries, inaccurate statistics used by query planner or simply forgotten LIMIT clause in the query. Anyway, pg_stat_all_tables allow to quickly check, whether there are any sequential scans in the systems. Using the mentioned columns we can write quite simple query and get the right information:

# SELECT
    schemaname, relname,
    seq_scan, seq_tup_read,
    seq_tup_read / seq_scan as avg_seq_tup_read
FROM pg_stat_all_tables
WHERE seq_scan > 0
ORDER BY 5 DESC LIMIT 5;

schemaname |         relname        | seq_scan | seq_tup_read | avg_seq_tup_read
------------+------------------------+----------+--------------+------------------
public     | deals                  |      621 |  81712449358 |        131582044
public     | client_balance         |       26 |    574164012 |         22083231
public     | events_by_date_summary |     2698 |  57342287963 |         2125362
[...]

Why did you attend PgConf US?

We have been looking for high quality, experienced, professional support for our application’s database for some time. We have found it difficult to find help online… seemingly every phone number we called was just an answering service. When we heard that the companies offering the level of service we were looking for were all available in the same place, we couldn’t resist.

Tell us a little bit (one or two paragraphs) about your project:

Our project (VX and VO collectively named Victor) is a SaaS solution for 911 emergency response systems. Victor provides analytics and quality assurance tools and services, enabling agencies to assess performance, measure resource &amp; deployment activity, model demand, optimize workload, and even generate financial estimates. For more information see our website.

Why did you chose PostgreSQL for your project?

Victor requires both time and spacial awareness, along with all the other things that are expected from an RDBMS. PostgreSQL is stable, secure, mature, well documented, open source, actively developed, community supported, and generally bad ass. The choice was easy.

As I understand it, attending PgConf US was a last minute decision. Do you think it was worth it? If so, why?

Absolutely! Our objective was to meet with vendors who were able to provide high quality support and we were successful. Additionally, we met a ton of super geniuses (Paragon, TimescaleDB, I’m talking about you) and learned more then expected. For example, we had no idea we could put raster images in PostgreSQL and process them with PostGIS… amazing!

Would you attend PgConf US again?

Yes… see above. Unrelated, but unbelievably cool, we met a guy named Solar… a PhD (from MIT!) passionate about carbon nanotubes.

Are you interested in contributing to the community and if so, in what fashion?

Yes! …and here is our biggest piece of feedback.

It would be great if there was a clear point of entry for people like us… we have been using PostgreSQL for several years, have been[...]
Posted by Andrew Dunstan in 2ndQuadrant on 2017-04-19 at 14:28

I have released version 4.19 of the PostgreSQL Buildfarm client. It can be downloaded from https://buildfarm.postgresql.org/downloads/releases/build-farm-4_19.tgz
Apart from some minor bug fixes, the following changes are made:

  • Include the script’s path in @INC. That means you can usually run the script from anywhere rather than just its own directory.
  • Set TZ after “make check” is run. This makes for slightly faster initdb runs in later stages.
  • make TAP tests run with –timer
  • change default log_line_prefix in config file to use %p instead of %c. That makes it easier to tie log messages to other messages that mention a pid
  • Add a module to log running commands to a file as it runs and replace critical uses of “ with the new procedure. That means we have better traces if the buildfarm process crashes.
  • Improvements to TAP tests logging and coverage. Each test set (i.e each /t directory) is now logged separately. Coverage is extended to the remaining test sets in src/test except SSL tests.
  • Add a module for testing ICU
  • change target URL in config file sample to use canonical name matching SSL certificate
  • default build_root to a directory in the same place that the script is found, and create it if it doesn’t exist.
  • add a command-line flag to allow setting config settings
  • from-source path can now be relative
  • from-source doesn’t need to specify a branch, if not clear it defaults to HEAD
  • from-source can now do vpath builds

 

These changes mean that the client is more useful for testing development code, and also that testing config settings is much simpler. An initial test run on a fresh installation is now as simple as:

cp buildfarm.conf.sample build-farm.conf
./run_build.pl --test --verbose

To test development code, the from-source option is now much more flexible and friendly. For example, one might do something like:

./run_build.pl --from-source=/path/to/postgresql \
  --config-set use_vpath=1 \
  --config-set config_opts+=--enable-tap-tests \
  --config-set locales+=en_US.utf8

If you run something like this

[...]
Posted by Bruce Momjian in EnterpriseDB on 2017-04-19 at 14:15

With streaming replication, Postgres allows sophisticated setups of primary and standby servers. There are two ways to promote a standby to be the new primary. A switchover is when the change happens in a planned way:

  1. All clients are disconnected from the master to prevent writes
  2. A sufficient delay allows the final write-ahead log (WAL) records to be transferred to all standbys
  3. The primary is shut down
  4. The standby is promoted to be the primary

A failover happens when the steps above can't be performed, usually because the primary has failed in some catastrophic way. The major difficulty with failover is the possibility that some of the final database changes contained in the WAL are not transferred to standbys, unless synchronous_standby_names was used. When a standby is promoted to primary after a failover, the final missing WAL records can cause problems:

  1. Some transactions on the old primary that were acknowledged to clients might be lost
  2. If the old master needs to be reconnected as a standby without reimaging, it might be necessary to use pg_rewind

Make sure you practice both methods of promoting a standby so, when you have to do the promotion in production, you are ready.

Continue Reading »

Posted by Tomas Vondra in 2ndQuadrant on 2017-04-19 at 14:00

A few weeks ago I explained basics of autovacuum tuning. At the end of that post I promised to look into problems with vacuuming soon. Well, it took a bit longer than I planned, but here we go.

To quickly recap, autovacuum is a background process cleaning up dead rows, e.g. old deleted row versions. You can also perform the cleanup manually by running VACUUM, but autovacuum does that automatically depending on the amount of dead rows in the table, at the right moment – not too often but frequently enough to keep the amount of “garbage” under control.

Generally speaking, autovacuum can’t be running too often – the cleanup is only performed after reaching some number dead rows accumulates in the table. But it may be delayed for various reasons, resulting in tables and indexes getting larger than desirable. And that’s exactly the topic of this post. So what are the common culprits and how to identify them?

Throttling

As explained in tuning basics, autovacuum workers are throttled to only perform certain amount of work per time interval. The default limits are fairly low – about 4MB/s of writes, 8MB/s of reads. That is suitable for tiny machines like Raspberry Pi or small servers from 10 years ago, but current machines are way more powerful (both in terms of CPU and I/O) and handle much more data.

Imagine you have a few large tables and some small ones. If all three autovacuum workers start cleaning up the large tables, none of the small tables will get vacuumed regardless of the amount of dead rows they accumulate. Identifying this is not particularly difficult, assuming you have sufficient monitoring. Look for periods when all autovacuum workers are busy while tables are not vacuumed despite accumulating many dead rows.

All the necessary information is in pg_stat_activity (number of autovacuum worker processes) and pg_stat_all_tables (last_autovacuum and n_dead_tup).

Increasing the number of autovacuum workers is not a solution, as the total amount of work remains the same. You can specify per-table throttling lim

[...]
On 7th of March 2017, Heikki Linnakangas committed patch: Support SCRAM-SHA-256 authentication (RFC 5802 and 7677). This introduces a new generic SASL authentication method, similar to the GSS and SSPI methods. The server first tells the client which SASL authentication mechanism to use, and then the mechanism-specific SASL messages are exchanged in AuthenticationSASLcontinue and PasswordMessage […]

This is the second in a series of posts intended to introduce PostgreSQL users to PL/R, a loadable procedural language that enables a user to write user-defined SQL functions in the R programming language.  This post builds on the example introduced in the initial post by demonstrating the steps associated with preprocessing the Normalized Difference Vegetation Index (NDVI) satellite raster data in preparation for spatial analytics. 

PostgreSQL support cases are coming in on a regular basis. This week an especially noteworthy one reached our desks here at Cybertec, which is so interesting, that I decided to sit down and share some information. I guess many people out there have similar issues and therefore this post might be helpful to developers and […]

The post (func()).* – Hidden performance issues appeared first on Cybertec - The PostgreSQL Database Company.

An important step in the SCRAM authentication is called SASLprep, a mandatory feature to be sure about the equivalence of two strings encoded with UTF-8. A first commit has added support for SCRAM-SHA-256 protocol with the full SASL exchange plugged on top of it, and this has been implemented by the following commit:

commit: 60f11b87a2349985230c08616fa8a34ffde934c8
author: Heikki Linnakangas <heikki.linnakangas@iki.fi>
date: Fri, 7 Apr 2017 14:56:05 +0300
Use SASLprep to normalize passwords for SCRAM authentication.

An important step of SASLprep normalization, is to convert the string to
Unicode normalization form NFKC. Unicode normalization requires a fairly
large table of character decompositions, which is generated from data
published by the Unicode consortium. The script to generate the table is
put in src/common/unicode, as well test code for the normalization.
A pre-generated version of the tables is included in src/include/common,
so you don't need the code in src/common/unicode to build PostgreSQL, only
if you wish to modify the normalization tables.

The SASLprep implementation depends on the UTF-8 functions from
src/backend/utils/mb/wchar.c. So to use it, you must also compile and link
that. That doesn't change anything for the current users of these
functions, the backend and libpq, as they both already link with wchar.o.
It would be good to move those functions into a separate file in
src/commmon, but I'll leave that for another day.

No documentation changes included, because there is no details on the
SCRAM mechanism in the docs anyway. An overview on that in the protocol
specification would probably be good, even though SCRAM is documented in
detail in RFC5802. I'll write that as a separate patch. An important thing
to mention there is that we apply SASLprep even on invalid UTF-8 strings,
to support other encodings.

Patch by Michael Paquier and me.

Discussion: https://www.postgresql.org/message-id/CAB7nPqSByyEmAVLtEf1KxTRh=PWNKiWKEKQR=e1yGehz=wbymQ@mail.gmail.com

As referenced in RFC 4103,

[...]
Posted by Bruce Momjian in EnterpriseDB on 2017-04-17 at 17:00

When using continuous archiving, you must restore a file system backup before replaying the WAL. If the file system backup was taken long ago, WAL replay might take a long time. One way to avoid this is to take file system backups frequently.

Another option is to perform an incremental file system backup that can be laid over the original file system backup, then replay WAL over that. This reduces restore time because you only need to replay WAL from the start of the incremental backup, not the start of the full backup. This also reduces the amount of WAL that must be retained.

However, Postgres doesn't natively support incremental backup. The best you can do is to use a tool like pgBackRest or Barman that supports incremental backup at the file level. The only problem is that the database files are potentially one gigabyte in size, so the granularity of the incremental backup isn't great. Ideally solutions will be developed that do page-level (8k) incremental backups, which would be much smaller. The trick is finding an efficient way to record which 8k pages have been changed since the last file system backup.

Continue Reading »

The April meeting will be held at 18:00 EST on Tues, the 25th. Once again, we will be holding the meeting in the community space at CoverMyMeds. Please RSVP on MeetUp so we have an idea on the amount of food needed.

Topic

CoverMyMeds’ very own CJ will be presenting this month. He’s going to tell us all about how CoverMyMeds uses Consul to assist with scaling PostgreSQL and maintaining the high availability of the database.

Parking

Please park at a meter on the street or in the parking garage (see below). You can safely ignore any sign saying to not park in the garage as long as it’s after 17:30 when you arrive. Park on the first level in any space that is not marked ‘24 hour reserved’. Once parked, take the elevator to the 3rd floor to reach the Miranova lobby.

Finding us

The elevator bank is in the back of the building. Take a left and walk down the hall until you see the elevator bank on your right. Grab an elevator up to the 11th floor. Once you exit the elevator, look to your left and right. One side will have visible cubicles, the other won’t. Head to the side without cubicles. You’re now in the community space. The kitchen is to your right (grab yourself a drink) and the meeting will be held to your left. Walk down the room towards the stage.

Posted by Federico Campoli on 2017-04-16 at 20:19
In 2013 I started my pet project pg_chameleon to discharge my frustration only to find the entire thing even more frustrating.

It's fascinating how time and circumstances can fix the personal perspective revamping the passion for development that never left me.

This philosophical introduction is to tell I still can't believe I've relased the RC1 of my pet project pg_chameleon.

The release is already available on pypi and includes several bug fixes and performance improvement. In particular the speed of pulling the row changes from MySQL is now 10 times faster because I'm skipping the implicit BEGIN issued by MySQL when generating a row image.

The display of sources and status is now nicer with the tabulate library.

I've also changed the logging file handler to use the time rotate which automatically rotate the replica logs on a daily basis. The log retention is configurable using the parameter log_days_keep.

It's also present a new command detach_replica which creates a standalone postgresql database with the sequences correctly set and the foreign keys created as invalid. The detach process tries to validate the foreign keys in a second moment though.

Pypi: https://pypi.python.org/pypi/pg_chameleon/
Documentation: http://pythonhosted.org/pg_chameleon/
Github: https://github.com/the4thdoctor/pg_chameleon

From this release I won't add new features but I'll focus only on bug fixes.
 

Changelog from v1.0 beta 2.

  • add support for primay key defined as column constraint
  • fix regression if null constraint is omitted in a primary key column
  • add foreign key generation to detach replica. keys are added invalid and a validation attempt is performed.
  • add support for truncate table
  • add parameter out_dir to set the output destination for the csv files during init_replica
  • add set tables_only to table_limit when streaming the mysql replica
  • force a close batch on rotate event if binlog changes without any row events collected
  • fixed replica regression with python 3.x and empty binary data
  • added event_update in hexli
[...]
Posted by Oleg Bartunov in Postgres Professional on 2017-04-16 at 13:49
Slides (full version) of talk "NoSQL Postgres" I presented at Stachka conference are available - http://www.sai.msu.su/~megera/postgres/talks/jsonb-stachka-2017-full.pdf



Slides covers the following topics:
1. SQL/JSON
2. Jsonb compression
3. Full text search for json[b] data
4. YCSB benchmark (one node) for PostgreSQL, MongoDB and MySQL
Posted by Yann Larrivee on 2017-04-15 at 20:57

confoo-yvr2017Want to get your web development ideas in front of a live audience? The
call for papers for the ConFoo Vancouver 2017 web developer conference is open! If you have
a burning desire to hold forth about PHP, databases, JavaScript, or any
other web development topics, we want to see your proposals. The window is
open only from April 10 to May 8, 2017, so hurry. An added benefit: If your
proposal is selected and you live outside of the Vancouver area, we will
cover your travel and hotel.

You’ll have 45 minutes for the talk, with 35 minutes for your topic and
10 minutes for Q&A. We can’t wait to see your proposals!

Until the talks are picked, the price for the tickets will be at its
lowest. Once the talks are announced, prices will go up. Check out the last conference to get an
idea of what to expect.

Posted by Bruce Momjian in EnterpriseDB on 2017-04-14 at 15:45

You probably have heard the term "checkpoint" before, or seen it mentioned in the postgresql.conf file. A checkpoints is a usually-invisible cleanup feature present in most database systems, but it is useful to know what it does.

This diagram illustrates checkpoints. At the top are three Postgres database sessions. Each session reads and writes to the shared buffer cache. Every modification to shared buffers also causes a change record to be written to the write-ahead log (WAL, blog entry). Over time the WAL would grow unbounded in size if it were not trimmed occasionally — that is what checkpoints do.

A checkpoint writes previously-dirtied shared buffers to durable storage over a period of several minutes, at which point the WAL representing those writes is no longer needed for crash recovery. (Hopefully continuous archiving and streaming replication have also processed those WAL files.) Therefore, the old WAL can then be removed or recycled.

Continue Reading »

Posted by Dinesh Kumar in OpenSCG on 2017-04-14 at 10:24
Sharing knowledge which I have gained from last 6 years.

So glad to be part of PostgreSQL High Performance Cookbook, where I have discussed all the knowledge I have gained from PostgreSQL database.

PostgreSQL High Performance Cookbook

Working with PostgreSQL from last 6 years, I have gained so much of knowledge about database management systems. Being a DBA for several years, I explored so many tools which work great with PostgreSQL database. During this 6 years journey, I got a chance to meet many wonderful peoples who guided me very well. I would like to say thanks to everyone who taught me PostgreSQL database in soft/hard ways :-). Also, thanks would like to say thanks to every PostgreSQL developer, and authors and bloggers, from where I have learned many more things.

Finally thanks to OpenSCG team who always treated me as a brother than an employee. :-)
Thanks to my wife manoja  for her wonderful support, and my friend Baji Shaik for his help in writing the content.

After some time of absence due to a high work load I finally got around to write more on an issue, which has been on my mind for quite a while: Removing duplicate rows. It happens from time to time that PostgreSQL support clients accidentally load data twice or somehow manage to duplicate data (human […]

The post Removing duplicate rows in PostgreSQL appeared first on Cybertec - The PostgreSQL Database Company.

Posted by Jason Owen on 2017-04-14 at 02:20

Correction

Shortly after I published Benchmarking UUIDs, someone emailed me with a correction. It turns out the approach Jonathan and I used to time how long PostgreSQL takes to generate a million UUIDs is mostly timing how long it takes to generate a million queries:

DO $$
BEGIN
  FOR i IN 0..1000000 LOOP
    PERFORM 1;
  END LOOP;
RETURN;
END;
$$;

They pointed out a better way to test:

SELECT COUNT(*)
FROM (
  SELECT 1 FROM generate_series(1, 1000000)
) AS x;

This results in a roughly order-of-magnitude difference in test times, just in overhead.

When we take this insight and applying it to the two UUID generator functions, we find that PostgreSQL is faster at this task than nodejs:

SELECT COUNT(*)
FROM (
  SELECT uuid_generate_v4() FROM generate_series(1, 1000000)
) AS x;
SELECT COUNT(*)
FROM (
  SELECT gen_random_uuid() FROM generate_series(1, 1000000)
) AS x;

On my machine, I see a big difference, more than 5x:

uuid_generate_v4 gen_random_uuid
6484.110 ms 1166.969 ms
6451.433 ms 1169.010 ms
6285.573 ms 1161.001 ms

Interestingly, on another machine, the two functions were approximately equally fast, with uuid_generate_v4 slightly edging out gen_random_uuid.

Conclusion

  • Writing benchmarks is tricky!
  • Using this updated methodology, on my machine, PostgreSQL with pgcrypto is faster at generating UUIDs than nodejs, which in turn is faster than PostgreSQL with uuid-ossp.

Thankfully, I don't think the flaw in my original measurements undermines the conclusion I drew: the difference between these methods is vanishingly small, and the likelihood that generating UUIDs is the bottleneck in your system is low. Better to focus your optimization efforts elsewhere!


Many thanks to my anonymous contributor! I'll update this article to credit them if they like.

Posted by Jason Owen on 2017-04-13 at 07:00

Jonathan New wrote an interesting article on UUID creation in Postgres vs Node. In it, he described the performance tradeoff of generating a UUID in the database vs in the application. It's not very long, go read it!

I've used PostgreSQL to generate UUIDs before, but I hadn't seen the function uuid_generate_v4(). It turns out to come from the uuid-ossp extension, which also supports other UUID generation methods. Previously, I've used the pgcrypto extension, which provides the gen_random_uuid() function.

How do they compare? On my machine, using the PostgreSQL package for Ubuntu (as opposed to the Ubuntu package for PostgreSQL...), the pgcrypto version is more than twice as fast than the uuid-ossp version.

How does this compare with nodejs? Using Jonathan's approach, nodejs is about 1.5 times as fast as PostgreSQL with pgcrypto!

uuid-ossp pgcrypto nodejs
10942.376 ms 4173.924 ms 2886.117 ms
11235.807 ms 4341.270 ms 2822.078 ms
10764.468 ms 4265.632 ms 2829.395 ms

What does this mean? I argue: very little! The slowest method takes ~11 seconds to generate one million UUIDs, and the fastest takes ~3 seconds. That's 3 - 11 microseconds per UUID! If this is the bottleneck in your application, I think you've done a very good job of optimizing - and you might have a pretty unusual use case.

PS: the RETURNING clause, not mentioned in Jonathan's post, is really cool:

> CREATE TABLE example (
    example_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    number INTEGER NOT NULL
  );
CREATE TABLE
> INSERT INTO example (number)
    VALUES (1)
    RETURNING example_id;
              example_id
--------------------------------------
 045857b4-6125-4746-94b8-a2e58f342b86
(1 row)

INSERT 0 1

Methodology

This was a very unscientific benchmark! I'm not controlling for other programs running on my machine, and this is not a server, it's just a laptop.

In the interest of writing things down, here's how I came up with the numbers above.

Environment

According to /proc/cpuinfo, I am running on a Intel(R) Core(TM) i7-35

[...]
Posted by Bruce Momjian in EnterpriseDB on 2017-04-12 at 15:30

The write-ahead log (WAL) file format changes with every major release. Also, initdb, which is required to install the new major version, starts WAL file numbers at 000000010000000000000001.

For these reasons, if you are archiving WAL, it is wise to use the major version number in the name of the WAL archive directory, e.g. /archive/pgsql/9.6. This avoids the problem of WAL from an old Postgres major version conflicting with WAL files from a new major version.

Although normally one should try to avoid using non-core extensions/forks of Postgres, in some rare cases it could be even worse not to use them. But as BDR project is also here to stay (as I’ve understood main functionality will be gradually integrated into Postgres core), then the following info might still be useful in […]

The post Custom replication handlers for Postgres-BDR appeared first on Cybertec - The PostgreSQL Database Company.