After discussing the
TPC-DS results, it's time to look at the last benchmark,
testing performance of built-in fulltext (and GIN/GiST index
implementation in general).
The one chart you should remember from this post is this one,
GIN speedup between 9.3 and 9.4:
Interpreting this chart is a bit tricky - x-axis tracks duration
on PostgreSQL 9.3 (log scale), while y-axis (linear scale) tracks
relative speedup 9.4 vs. 9.3, so 1.0 means 'equal performance', and
0.5 means that 9.4 is 2x faster than 9.3.
The chart pretty much shows exponential speedup
for vast majority of queries - the longer the duration on 9.3, the
higher the speedup on 9.4. That's pretty awesome, IMNSHO. What
exactly caused that will be discussed later (spoiler: it's thanks
to GIN fastscan). Also notice that almost no queries are slower on
9.4, and those few examples are not significantly slower.
While both pgbench
and TPC-DS are well
established benchmarks, there's no such benchmark for testing
fulltext performance (as far as I know). Luckily, I've had played
with the fulltext features a while ago, implementing archie - an in-database
mailing list archive.
It's still quite experimental and I use it for testing GIN/GiST
related patches, but it's suitable for this benchmark too.
So I've taken the current archives of PostgreSQL mailing lists,
containing about 1 million messages, loaded them into the database
and then executed 33k real-world queries collected from postgresql.org. I can't publish
those queries because of privacy concerns (there's no info on
users, but still ...), but the queries look like this:
SELECT id FROM messages
WHERE body_tsvector @@ ('optimizing & bulk & update')::tsquery
ORDER BY ts_rank(body_tsvector, ('optimizing & bulk & update')::tsquery)
DESC LIMIT 100;
The number of search terms varies quite a bit - the simplest
queries have a single letter, the most complex ones often tens of
The PostgreSQL configuration was mostly default, with only minor