Some good news for packagers of PostgreSQL on Windows with many improvements in versioning that are showing up in 9.5. This is the result of a couple of months of work, concluded with the two following commits. The first commit is covering a major portion of executables and libraries:
commit: 0ffc201a51395ca71fe429ef86c872850a5850ee author: Noah Misch <email@example.com> date: Mon, 14 Jul 2014 14:07:52 -0400 Add file version information to most installed Windows binaries. Prominent binaries already had this metadata. A handful of minor binaries, such as pg_regress.exe, still lack it; efforts to eliminate such exceptions are welcome.
This has added versioning for contrib modules, conversion_procs, most of the ecpg thingies, WAL receiver and PL languages. And then the final shot has been done with this commit, for utilities like regression tools or even zic.exe, part of the timezone code path:
commit: ee9569e4df1c3bdb6858f4f65d0770783c32a84d author: Noah Misch <firstname.lastname@example.org> date: Mon, 18 Aug 2014 22:59:53 -0400 Finish adding file version information to installed Windows binaries. In support of this, have the MSVC build follow GNU make in preferring GNUmakefile over Makefile when a directory contains both. Michael Paquier, reviewed by MauMau.
This work has basically needed the following things to get correct versioning coverage when building with either MinGW and MSVC:
Now, the result of this work is directly visible on the files themselves by looking at the file details in menu "Property" by left-clicking on a given file, tab "Details". With that, it is possible to see fields for:
Almost a year ago, I wrote a custom experimental aggregate
COUNT(DISTINCT). The problem with the native
COUNT(DISTINCT) is that it forces a sort on the input
relation, and when the amount of data is significant (say, tens of
millions rows), that may be a significant performance drag. And
sometimes we really need to do
Extensibility is one of the great strengths of PostgreSQL - users of most other databases can only dream about things like defining custom aggregates. And this extension point was exactly the foundation for my idea was - implementing a custom aggregate, counting the distinct items in a more efficient way.
That's how count_distinct was
conceived, and the principle was really simple. For each group, a
small hash table is maintained, making it trivial to keep track of
distinct values (and counting them). And it worked quite well -
instead of the
SELECT COUNT(DISTINCT column) FROM table;
you can call the customa aggregate
SELECT COUNT_DISTINCT(column) FROM table;
and in most cases it was much faster (usually ~3x), without
doing the sort, etc. The main disadvantage was memory consumption -
the overhead of the additional hash table structure and
palloc overhead was bad (and in some cases quite
terrible - consuming an order of magnitude more memory than the
amount of data being processed). I experimented with various hash
table variants, allocation schemes, but either the impact on
performance was unacceptable, or the memory consumption was not
much lower. Until about a month ago ...
Luckily, I've been messing with the hash join implementation
recently, and it occured to me that a hash table may not be the
right tool for this task. Hash tables are great for lookups, but
that's not really the point of
count_distinct - it
only needs to detect and remove duplicates, and lookups are only
one of the ways to achieve that goal.
Also, as the hash table grows, the L2/L3 cache hit ratio gets worse and worse. And after a bit of experimenting, I managed to come up
create or replace function fx()
returns t2 as $$
$$ language plpgsql;
select * from plpgsql_check_function('fx()', performance_warnings := true);
error:42804:3:RETURN:returned record type does not match expected record type
Detail: Returned type integer does not match expected type numeric in column 2.
create or replace function f1()
returns setof int as $$
return next 1::numeric; -- tolerant, doesn't use tupmap
$$ language plpgsql;
select * from plpgsql_check_function('f1()', performance_warnings := true);
performance:42804:3:RETURN NEXT:target variable has different type then expression result
Detail: assign "numeric" value to "integer" variable
Hint: Hidden casting can be a performance issue.
create or replace function f1()
returns int as $$
$$ language plpgsql;
select * from plpgsql_check_function('f1()', performance_warnings := true);
warning:42804:3:RETURN:target variable has different type then expression result
Detail: assign "date" value to "integer" variable
Hint: There are no possible explicit coercion between those types, possibly bug!
DO LANGUAGE plpgsql $$ DECLARE n real;
DECLARE f integer;
FOR f IN 1..10000000 LOOP
if 0=0 then
n = SQRT (f);
RAISE NOTICE 'Result => %',n;
posted recently performance figures with varying page size
pgbench on SSD, which show a +10% improvement
with smaller 4 kB page size over the default 8 kB page size.
Josh Berkus pointed out that pgbench test uses rather small 100-bytes rows, and that changing the tuple size might induce a different conclusion. To assess this point, I ran some tests with different row sizes: 1 kB, 3 kB, 7 kB and 15 kB.
I ran prepared transactions (
-M prepared) to avoid
cpu cycles and focus on write performance. The scale is 10 for 1 kB
rows (about 1 GB base), 10 for 3 kB rows, 5 for 7 kB rows and 2 for
15 kB rows (about 4 GB bases): bases are small enough to fit in
memory, so that reads should not require disk accesses. I ran 200
seconds tests with 4 clients (
-c 4) on Postgresql
9.4b2. Although this is short, I am confident that it is enough
because the rather good performance of SSDs implies dozens of
checkpoints during that time, and also prior tests I did with 1000
vs 200 seconds on SSDs showed that the later were consistent with
the former, although slightly on the optimistic side. I lowered
autovacuum_naptime to 20 seconds and
checkpoint_timeout to 1min and raised
checkpoint_completion_target to 0.8 to shake things a
bit, but I do not think that there is a great influence of these
parameters given the overall performance. I used the default
configuration for everything else. The hardware is the same as the
The row size is changed by updating the
attributes of all tables with commands like:
ALTER TABLE pgbench_accounts DROP COLUMN filler, ADD COLUMN filler CHAR(1000) NOT NULL DEFAULT '', ALTER COLUMN filler SET STORAGE EXTERNAL; VACUUM FULL pgbench_accounts;
The key point is to change the
otherwise with the default
EXTENDED value TOAST
compression kicks in after 2 kB and with the default empty string
rows end up smaller than with the standard test…
Here are the performance results for two series of test.
The first serie uses simple updates (
As I followed along with the 9.4 release of Postgres I had a few posts of things that I was excited about, some things that missed, and a bit of a wrap-up. I thought this year (year in the sense of PG releases) I’d jump the gun and lay out areas I’d love to see addressed in PostgreSQL 9.5. And here it goes:
Merge/Upsert/Insert or Update whatever you want to call it this is still a huge wart that it doesn’t exist. There’s been a few implementations show up on mailing lists, and to the best of my understanding there’s been debate on if it’s performant enough or that some people would prefer another implementation or I don’t know what other excuse. The short is this really needs to happen, until that time you can always implement it with a CTE which can have a race condition.
There’s so much opportunity here, and this has easily been my favorite feature of the past 2-3 years in Postgres. Really any improvement is good here, but a hit list of a few valuable things:
Today there’s madlib for machine learning, and 9.4 got support for ordered set aggregates, but even still Postgres needs to keep moving forward here. PL-R and PL-Python can help a good bit as well, but having more out of the box functions for stats can continue to keep it at the front of the pack for a database that’s not only safe for your data, but powerful to do analysis with.
This is definitely more of a dream than not. Full multi-master replication would be amazing, and it’s getting closer to possible. The sad truth is even once it lands it will probably require a year of maturing, so even more reason for it to hopefully hit in 9.5
The foundation made it in for 9.4 which is huge. This means we’ll probably see a good working out of the box logical replication in 9.5. For those less familiar this m
I thought to share conference details that I’m going to attend next month. If you aren’t aware but interested attending any of them.
The speakers and talks lineup is looking interesting for both conferences. If you haven’t registered for them, it’s not too late yet!! If you are planning to attend anyone of them, looking forward to meet you in-person.
See you soon!
In October this year I'll be visiting the US and Canada for some conferences and a wedding. The first event will be xTupleCon 2014 in Norfolk, Virginia. xTuple make the popular open source accounting and CRM suite PostBooks. The event kicks off with a keynote from Apple co-founder Steve Wozniak on the evening of October 14. On October 16 I'll be making a presentation about how JSCommunicator makes it easy to add click-to-call real-time communications (RTC) to any other web-based product without requiring any browser plugins or third party softphones.
Juliana Louback has been busy extending JSCommunicator as part of her Google Summer of Code project. When finished, we hope to quickly roll out the latest version of JSCommunicator to other sites including rtc.debian.org, the WebRTC portal for the Debian Developer community. Juliana has also started working on wrapping JSCommunicator into a module for the new xTuple / PostBooks web-based CRM. Versatility is one of the main goals of the JSCommunicator project and it will be exciting to demonstrate this in action at xTupleCon.
xTuple has advised that they will offer a discount to other open source developers and contributers who wish to attend any part of their event. For details, please contact xTuple directly through this form. Please note it is getting close to their deadline for registration and discounted hotel bookings.
For those who don't or can't attend xTupleCon there has been some informal discussion about a small WebRTC-hacking event at some time on 15 or 16 October. Please email me privately if you may be interested.
After having decade of experience managing small to large scale and/or varieties of database systems, here is my first trial to come up with the top ten commands of database management !
If you have your own 10 commandments list, don’t hesitate to post in comment. Thanks!
Every now and then, someone gets confused about resetting stats in PostgreSQL, and the impact it may have on planner and other parts of the database. Perhaps the documentation might be a tad more clear on this, because while it certainly can be confusing for those who never had to deal with stats in PostgreSQL before. But it's not just about newbies - I wrote a patch in this area for 9.3, and I get occasionally confused to.
The most surprising fact for most users is that 'stats' may actually mean one of two things - statistics describing distribution of the data, and monitoring statistics, tracking some interesting counters about the operation of the database system itself. Each of the kinds has different purposes, is stored differently, and the impact when the data is missing is very different.
So let's see what is the purpose of the two kinds of stats, what are the common issues and what happens when the data get missing for some reason.
The first kind od stats tracks distribution of data - number of distinct values, most common values in the column, histogram of data etc. This is the information used when planning the queries - it allows answering questions like:
Essentially those are some of the questions asked by the planner/optimizer when deciding what is the best way to execute a query.
This kind of statistics is collected by ANALYZE (or autovacuum) and stored in pretty much "regular" tables, protected by a transaction log just like the regular data. Check pg_statistic system catalog, or rather pg_stats which is a view on top of pg_statistic making the stats easier to read for humans.
A lot of things can go wrong with the stats, mostly resulting in choice of a bad plan and bad query performance. There are various reasons why that may happen (inaccurate stats, complex conditions, correlated columns, ...) - I
We are excited to announce the release of cstore_fdw 1.1, Citus Data's open source columnar store extension for PostgreSQL. The changes in this release include:
For installation and update instructions, please see cstore_fdw’s page in GitHub.
We continue to improve cstore_fdw and there are several tasks in progress, among which are:
pg_receivexlog is an in-core utility of Postgres able to recover WAL files through a stream using the replication protocol. It is particularly useful when for example using it to transfer some WAL files to a proxy node when standby node cannot connect directly to a root node for whatever reason. The standby can then replay the WAL files obtained. The reliability of this utility has been improved in Postgres 9.5 with the following commit:
commit: 3dad73e71f08abd86564d5090a58ca71740e07e0 author: Fujii Masao <email@example.com> date: Fri, 8 Aug 2014 16:50:54 +0900 Add -F option to pg_receivexlog, for specifying fsync interval. This allows us to specify the maximum time to issue fsync to ensure the received WAL file is safely flushed to disk. Without this, pg_receivexlog always flushes WAL file only when it's closed and which can cause WAL data to be lost at the event of a crash. Furuya Osamu, heavily modified by me.
Thanks to the addition of a new option called -F/--fsync-interval, user can now control the interval of time between which WAL records are flushed to disk with fsync calls.
The default value, 0, makes flush occur only when a WAL file is closed. This is the same flush strategy as in the previous versions of this utility (since 9.2 precisely).
On the contrary, specifying -1 will make sure that WAL data is flushed as soon as possible, in this case at the moment when WAL data is available.
Now, using this option is rather simple:
pg_receivexlog -v -D /path/to/raw_wal/ -F -1 # For maximum flush pg_receivexlog -v -D /path/to/raw_wal/ -F 0 # For default pg_receivexlog -v -D raw_wal/ -F 10 # For interval of 10s
The level of information printed in verbose mode has not changed as well, so you can continue to rely on that as before.
$ pg_receivexlog -D raw_wal/ -v pg_receivexlog: starting log streaming at 0/4D000000 (timeline 1) pg_receivexlog: finished segment at 0/4E000000 (timeline 1) pg_receivexlog: finished segment at 0/4F000000 (timeline 1) pg_receivexlog: finished segment at 0/50000
In this post, I want to look at write performance on SSD, focusing
on the impact of PostgreSQL page size (
on checking whether the current 8 kB default is relevant.
I got interested by reading a post
by Hans-Jürgen Schönig who tested very large 1 MB block as well
as standard page sizes. What struck me is that there is a
significant impact of smaller page size on OLTP performance (6%
better for 4 kB, 10% better for 2 kB, with 8 kB as the reference),
suggesting that the 8 kB default is not necessarily the best
choice, especially when using SSD. However the test is on a
relatively small database (
pgbench scale factor 10
gives about 150 MB), a short time (typically 1 minute), and with
poor performance (less than 150 tps) given that a SSD was used.
I also found an older but excellent
post by Tomas Vondra who played with the file system options as
well as PostgreSQL page and WAL block size. The results show a +40%
performance boost for 4 kB pages wrt 8 kB pages for read-write
loads on a SSD. The 300 pgbench scaling is significant, reported
runs seem to be 5 minutes long, and the +1000 tps performances are
consistent with the SSD hardware. The downside is that both page
and WAL block sizes are moved together, and only one shot
performances seem to be reported, although my past experience with
pgbench makes me wary of performance reproducibility
pgbench simple updates with prepared
-N -M prepared) to focus on write
performance without contention interference, scale 100 so that the
database is reasonably large but can be kept in memory, 4 clients
-c 4) because my laptop has 2 HT cores. I run a
number of 1000 seconds tests so as to derive hopefully meaningful
statistics. With the SSD performance the 1000 seconds tests
basically overwrite a majority of rows in the
When: 7-9pm Thu Aug 21, 2014
Who: Jim Newman, iovation
What: Elasticsearch & Postgres, a working relationship
Topics to be covered:
– A brief intro to Elasticsearch
– Indexing Postgres with Elasticsearch using the JDBC river plugin
– SQL versus the Elasticsearch Query DSL, similarities (there are some) and differences
– Using Postgres Perl functions for some SQLish access to Elasticsearch
– Feasibility of a foreign data wrapper for Elasticsearch?, pg2elasticsearch is still just a GitHub readme
Jim Newman has 30 years of assorted database-related activity and shooting pool. He’s currently employed as Principal Data Architect at iovation.
This month we get to try yet another meeting location while the iovation offices are being renovated. We’re still in the US Bancorp Tower at 111 SW 5th (5th & Oak), but we’ll be in the iovation temporary office on the 36th floor, in the “West Hills” conference room. Once off the elevator, enter through the door with the small “iovation” sign, then just go right and keep right until you hit a dead end, which is where “West Hills” is. Signs will be posted.
The building is on the Green & Yellow Max lines. Underground bike parking is available in the parking garage; outdoors all around the block in the usual spots.
See you there!
Let's clarify some basic terms first. First, what is a distribution? Most of the time, it's a PostgreSQL extension in a fancy package, especially with a META.json specification containing additional information that are not available for plain extensions - description, links to git repositories, etc. Also, it may contain prerequisities - e.g. which PostgreSQL versions it's compatible with, etc. (more on this later).
It's a bit more complicated though - the distribution may pack multiple extensions (e.g. the "pgTAP" distribution used in META.json example packs "pgtap" and "schematap" extensions). Also, there are distributions that pack other kinds of software, not PostgreSQL extensions - for example omnipitr provides command-line utilities for PITR, and so on. Most distributions however pack a single PostgreSQL extension, and these two terms are frequently used as synonyms (after all, the X in PGXN stands for "eXtension").
The distributions are versioned (just like rpm, deb or any other packages), and each version has a "release status" with three possible values - unstable (alpha version), testing (beta version) and stable (ready for production). This is important when doing stats, because the unstable/testing versions are somehow expected to have bugs, and what really matters are stable versions (because that's what people are supposed to install on production). More specifically, what matters is the last stable version of the distribution.
So let's see some stats why the
make install) fails ...
Currently, there are 126 distributions and 470 versions. By
considering only the last versions for each release status (because
that's what gets installed by
pgxnclient install by
default), and doing the tests on a range of PostgreSQL versions
(8.2 - 9.4), this corresponds to 2727 tests and 1177 of those tests
fail at the install stage. Per release
Recently I stumbled across a question on Reddit regarding the performance impact of using pgBadger on an active database server. The only real answer to this question is: do not use pgBadger. Before anyone asks—no, you shouldn’t use pgFouine either. This is not an indictment on the quality of either project, but a statement of their obsolescence in the face of recent PostgreSQL features.
One of the recommended
postgresql.conf changes for
both of these tools is to set
log_min_duration_statement to 0. There are a few other
changes they require to put the log entries in the correct format,
but we can worry about those later. For now, let’s focus on a
little innocent math.
One of the PostgreSQL servers I work with, processes almost two billion queries per day. Let’s imagine every such query is very simple, even though this is definitely not the case. Consider this an example query:
SELECT col1, col2 FROM my_table WHERE id=?
Assuming the query is paramterized, and the number is from one to a million, our average query length is 47 characters. Let’s just say it’s 50 to keep things easy. If we multiply that by two billion, that’s 100-billion bytes of logged SQL. Seen another way, that’s 93GB of logs per day, or about 1MB of log data per second.
In practice, such a short query will not constitute the bulk of a PostgreSQL server’s workload. In fact, if even a simple ORM is involved, all queries are likely to be far more verbose. Java’s hibernate in particular is especially prone to overly gratuitous aliases prepended to all result columns. This is what our query would look like after Hibernate was done with it:
SELECT opsmytable1_.col1, opsmytable1_.col1 FROM my_table opsmytable1_ WHERE opsmytable1_.id=?
If we ignore the whitespace I added for readability, and use values from one to a million, the average query length becomes 99. Remember, this is ignoring all useful data PostgreSQL would also be logging! There are also a number of other problems with many of my operating assumptions. It’s very unlikely that que