The reason why I do benchmarks is that, in my experience, many recommended settings are effectively myths. Many are perfectly fine, but a surprisingly large part either was never quite true or is based on data that got obsolete for various reasons (e.g. changes in hardware or software). So the verification of the facts is rather imporant to make good decisions, which is why I've been recently benchmarking file systems available on Linux, briefly discussed in the posts on EXT4 vs. XFS, BTRFS and Reiser.
In this post, I'd like to briefly look at a potential myth when it comes to databases and I/O schedulers. The purpose of the scheduler is to decide in what order to submit I/O requests (received from the user) to the hardware - it may decide to reorder them, coalesce requests into larger continuous chunks, prioritize requests from some processes etc.
Linux currently (kernel 4.0.4) offers three schedulers (there
used to be
anticipatory scheduler, but it got removed
in kernel 2.6.33):
ionice, and also applies a fair amount of magic (reordering, coalescing etc.)
I'm not going to discuss
cfq here - it's a fairly
complex scheduler attempting to reorder the requests in various
ways, support priorities, different handling for synchronous and
asynchronous requests and such. Instead, let's look at the simpler
scheduler and why they might work better in some cases.
The usual wisdom used to be that for databases,
deadline scheduler works best. The reasoning behind
this recommendation is quite nicely explained for example here, but let me
quote the main part:
This tends to reduce the merging opportunity for reads as they are dispatched q
One of the upcoming features in PostgreSQL 9.5 is foreign table inheritance, which offers a suite of new possibilities for users of foreign data wrappers. An exciting possibility is that you will be able to create a single table in which recent data is stored in a row-based format for fast appending and look-ups, while older data is stored in a columnar format for compression and fast analytics using cstore_fdw.
An example use-case might be a news organization which has frequent look-ups of recent data (e.g. to generate news feeds), while older data is used for analytics (e.g., find correlations in events). We use the gdelt dataset containing hundreds of millions of news events as a reference. This use-case will benefit greatly from using row-based storage for recent data, and columnar storage for older data, so let's see how we will be able to do this in PostgreSQL 9.5. The current release of cstore_fdw is not yet 9.5 compatible, but a preview is available on the 9.5-compatible branch.
To get started, we can create the events table and the row-based events_current table that inherits from it (see full source). When querying the events table, postgres includes inherited tables in the results.
CREATE TABLE events ( event_date int, ... ); CREATE TABLE events_current () INHERITS(events);
We also set up a trigger that routes INSERTs on the events table to the events_current table.
CREATE OR REPLACE FUNCTION events_insert_trigger() RETURNS TRIGGER AS $$ BEGIN INSERT INTO events_current VALUES (NEW.*); RETURN NULL; END; $$ LANGUAGE plpgsql; CREATE TRIGGER insert_events BEFORE INSERT ON events FOR EACH ROW EXECUTE PROCEDURE events_insert_trigger();
We can now start adding new data to our event table, which will use row-based storage. The gdelt dataset comprises over 100GB of news events in easy-to-digest CSV format. It can be copied into the events table using:
\COPY events FROM '20150101.export.CSV' WITH (NULL '')
After a while we can migrate older data to a cstore_fdw partition. We first cr
I’ve been talking about partitions a lot recently, and I’ve painted them in a very positive light. Postgres partitions are a great way to distribute data along a logical grouping and work best when data is addressed in a fairly isloated manner. But what happens if we direct a basic query at a partitioned table in such a way that we ignore the allocation scheme? Well, what happens isn’t pretty. Let’s explore in more detail.
Let’s use the fancy
partitioning I introduced a while back. Using this structure, I
ran a bunch of tests with a slight modification to my python
script. I changed the
reading_date granularity to
daily, and had the script itself create and destroy the partitions.
This allowed me to run several iterations and then meter the
performance of queries on the resulting output.
The relevant python chunk looks like this:
end_stamp = stamp + timedelta(days = 1) part = '' if j > 0: part = '_part_%d%02d%02d' % (stamp.year, stamp.month, stamp.day) cur.execute("DROP TABLE IF EXISTS sensor_log%s" % part) cur.execute( "CREATE TABLE sensor_log%s (" % part + "LIKE sensor_log INCLUDING ALL," + " CHECK (reading_date >= '%s' AND" % stamp.date() + " reading_date < '%s')" % end_stamp.date() + ") INHERITS (sensor_log)" ) db_conn.commit()
With that, the script will assume daily granularity and build the check constraints properly so constraint exclusion works as expected. To get an idea of how partitions scale with basic queries, I ran the script with 10, 20, 50, 100, 150, and 250 days. This gave a wide distribution of partition counts and the numbers I got made it fairly clear what kind of drawbacks exist.
All tests on the partition sets used these two queries:
-- Check 100 distributed values in the partitions. EXPLAIN ANALYZE SELECT * FROM sensor_log WHERE sensor_log_id IN ( SELECT generate_series(1, 10000000, 100000) ); -- Check 100 random values in the partitions. EXPLAIN ANALYZE SELECT * FROM sensor_log WHERE sensor_
When: 6-8pm Thursday October 15, 2015
Who: Gavin McQuillan
What: Sharded Postgres with PLProxy
Data of Future Past: Postgres as Distributed Online Processing Analytics Engine
Postgres has long been a great analysis tool, but as organizations seek to decentralize their data, it’s overlooked for other solutions. Maybe this doesn’t need to be the case. Using a novel combination of tools, old and new, Postgres can also decentralize its data. We can maintain the immensely powerful interface that we love, while meeting (or exceeding) throughput decentralized datastores provide. In this talk we see how this is possible using Foreign Data Wrappers (PLProxy) and HyperLogLog (Postgres-HLL) extensions, detailing benefits and constraints to this method. We explore a reimplementation of a complex Online Processing Analytics hypercube in Hbase with Postgres and measure its effectiveness, throughput, latency, and cost savings.
Gavin is a longtime PostgreSQL enthusiast, specializing in building automation and analytics solutions, distributed systems, scaling and securing web applications. His experience includes Google SRE, software development and leading teams in young technology startups. He lives in Portland with his family, bicycles, and hand planes.
If you have a job posting or event you would like me to announce at the meeting, please send it along. The deadline for inclusion is 5pm the day before the meeting.
Our meeting will be held at Iovation, on the 32nd floor of the US Bancorp Tower at 111 SW 5th (5th & Oak). It’s right on the Green & Yellow Max lines. Underground bike parking is available in the parking garage; outdoors all around the block in the usual spots. No bikes in the office, sorry!
Elevators open at 5:45 and building security closes access to the floor at 6:30.
See you there!
Matthew Kelly, TripAdvisor's in-house PostgreSQL guru, spoke to us recently about PGConf Silicon Valley which is November 17-18 at the South San Francisco Conference Center. Early Bird pricing ends October 4th.
Matthew's talk at PGConf Silicon Valley 2015 is At the Heart of a Giant: Postgres at TripAdvisor. Matthew said the PostgreSQL database server has been at the core of the company's infrastructure since the "early days as a scrappy start up" 15 years ago.
PostgreSQL is the database powering TripAdvisor’s 250 million reviews from inquisitive travellers around the world. With more than 375 million unique monthly visitors, TripAdvisor is the world's largest travel site and has been helping travelers plan and book the perfect trip since 2000.
Matthew joined TripAdvisor first as an intern during his computer science studies at the University of Massachusetts Amherst and was hired full-time following his graduation in 2012. I sat down with him virtually recently and here's what he had to say…
Terry: How is Postgres used at TripAdvisor today?
Matthew: Postgres is the backbone of our entire core production infrastructure here at TripAdvisor. On any given day, our Postgres infrastructure throughout the company might have serviced more than 1 billion queries. These will range from the OLTP sort of operations you would expect from a high end website to analytics to real time anomaly processing to fraud detection. We even use it to hold the photos that our in our upload queue. We run numerous different hardware configurations and workloads. Sure we have Hadoop for some back office processing but Postgres really is the primary workhorse of our data infrastructure.
Terry: TripAdvisor gets a massive amount of traffic. Similar companies operating at that scale are using or considering WebscaleSQL – which as you know is a collaboration among engineers from several companies facing similar challenges in running MySQL at scale. How does PostgreSQL stack up against it in terms of forming the backbone of a popular site like you
Since its introduction in 9.1, synchronous replication, or the ability to wait for a WAL flush confirmation from a standby before committing a transaction on the master node (depends on synchronous_commit whose default value is on), ensuring that the transaction committed has not been lost on the standby should a failover need to be done, has faced a wide adoption in many production environments for applications that need a no-data-loss scenario. Still, such application or clients may have seen something like that:
=# COMMIT; Cancel request sent WARNING: 01000: canceling wait for synchronous replication due to user request DETAIL: The transaction has already committed locally, but might not have been replicated to the standby. LOCATION: SyncRepWaitForLSN, syncrep.c:217 COMMIT
What happens here is that the commit query has remained stuck for a couple of seconds because it was keeping waiting for the flush confirmation that was not coming. Depending on the outage of the standby, this could take a while, so here what happened is that the query has been manually cancelled, then transaction has been committed on the master without the confirmation coming from the standby. One of the ways to check if a backend is being stuck similarly to the above is to have a look at the output of ps, to find something like that:
$ ps x | grep postgres | grep waiting 15323 ?? Ss 0:00.06 postgres: easteregg easteregg [local] COMMIT waiting for 0/1797488
But that's not particularly useful when user is not able to connect directly to the node involved (usually that should be the case for a superuser of this database instance), and the catalog table pg_stat_activity does not offer a way to know if the backend is really stuck at commit because of synchronous replication or because of another lock.
=# SELECT pid, state, waiting FROM pg_stat_activity WHERE lower(query) = 'commit;'; pid | state | waiting -------+--------+--------- 15323 | active | f (1 row)
Note that there is a patch submitted for integration t
On October 8th at 17:30 the second Dutch PostgreSQL meetup will be held in Amsterdam. Please join us for some very interesting talks.
Andres Freund, a PostgreSQL committer and developer working at Citus Data, will speak about major scalability improvements in recent PostgreSQL versions and what the biggest remaining scalability concerns are. Some of the remaining issues can mainly be addressed by changing Postgres, others can be worked around on the application side.
Marco Slot from Citus Data will give a talk about the open source pg_shard extension, which lets you seamlessly shard and scale out PostgreSQL tables across many machines. We will look at how to use it for building a scalable document store and demonstrate some upcoming features.
Finally, Reiner Peterke from Splendid Data will show pg_insight, a flexible tool for collecting historical statistics information using the Postgres statistics infrastructure. It integrates with elastic search for graphical representation of the data from performance monitoring and trending.
You can RSVP on the PostgreSQL Meetup. Hope to see you there!
Talks by Andres Freund, Marco Slot, and Reiner Peterke
In the last two post, I've briefly discussed results of a benchmark measuring PostgreSQL performance with different filesystems. The first post compared EXT4 and XFS, and the second one was mostly dealing with the problems I've ran into when benchmarking BTRFS.
In the discussion, I've been asked to benchmark ReiserFS. I haven't really planned to benchmark this file system because ... Let's say ReiserFS does not have the best reputation when it comes to reliability due to historical reasons. According to some sources the motto of the development was essentially "performance over correctness" which is approach database folks don't really like. I don't know how much that's true for the current state of ReiserFS, or how that applies to the new ReiserFS 4 (which is not merged and I don't think it'll ever happen).
But database people are rather conservative when it comes to storage reliability, so even if all the issues got fixed and the current version is reliable, it's unlikely to become very popular in this group soon. Factor in the fact that the development of ReiserFS pretty much stopped (in favor of the Reiser4, which did not get merged for years), so the code pretty much in maintenance mode only. Which is not necessarily bad, but it also means no significant reworks necessary for new features and performance improvements (which was the purpose of Reiser4).
So let's see if ReiserFS 3 actually performs so much better than the common file systems (EXT4 and XFS), as the users often believe ...
The short answer is "no". At least not on the workload measured by pgbench (a lot of random I/O) and on this particular hardware (good SSD), ReiserFS 3 actually performs worse than both XFS and EXT4.
On read-only benchmarks, all the filesystems perform pretty much exactly the same, irrespectedly of the amount of data. For example on the large data set, you'll get something like this:
The chart shows throughput (transactions per second) with different number of clients, for those three filesystems. And there's pret
session_preload_libraries to session_exec
session_exec.login_name to login
# first login
[pavel@dhcppc4 plpgsql_check]$ psql
WARNING: function "login()" does not exist
postgres=# CREATE OR REPLACE FUNCTION login()
postgres-# RETURNS void AS $$
postgres$# IF current_database() <> 'admindb' AND session_user = 'postgres' THEN
postgres$# RAISE EXCEPTION 'user postgres is blocked';
postgres$# RAISE NOTICE 'Hello, %', session_user;
postgres$# END IF;
postgres$# $$ LANGUAGE plpgsql;
# new login
[pavel@dhcppc4 plpgsql_check]$ psql
NOTICE: Hello, pavel
-- try to login as Postgres
postgres=# \c postgres postgres
FATAL: unhandled exception in login function "login"
DETAIL: user postgres is blocked
CONTEXT: PL/pgSQL function login() line 4 at RAISE
session_exec: perform login function "login"
Previous connection kept
pg_hba.confis better. But you can implement more complex logic in plpgsql. The login function can be specific for database or for user via
ALTER DATABASE xx SETor
ALTER ROLE xx SET.
There are two PostgreSQL FDWs (currently maintained) I know of for connecting to SQL Server from a Linux/Unix PostgreSQL box. There is the TDS Foreign Data wrapper (tds_fdw driver) which relies on the Free TDS driver. This is a fairly light-weight FDW since it just relies on TDS which is commonly already available on Linux installs or an easy install away. Unfortunately when I tried to use it on windows (compiling my usual mingw64 way), while it compiled and installed, it crashed when I attempted to connect to my SQL Server 2008 R2 box table, so I gave up on it for the time being as a cross-platform solution. One thing I will say about it is that it accepts ad-hoc queries from what I can see, as a data source, which is pretty nice. So we may revisit it in the future to see if we can get it to work on windows. I'm not sure if tds_fdw would support SQL Server spatial geometry columns though would be interesting to try.
The second option, which as you may have noticed, we spent much time talking about is the ogr_fdw foreign data driver. ogr_fdw utilizes UnixODBC on Linux, iODBC on MacOSX and Windows ODBC on windows for connecting to SQL Server. The ogr_fdw big downside is that it has a dependency on GDAL, which is a hefty FOSS swiss-army knife ETL tool that is a staple of all sorts of spatial folks doing both open source and proprietary development. The good thing about ogr_fdw, is that since it is a spatial driver, it knows how to translate SQL Server geometry to it's equivalent PostGIS form in addition to being able to handle most of the other not-so spatial columns.
On October 15th Josh Berkus will be reprising EXPLAIN Explained, his presentation from Postgres Open, at the Pandora offices in downtown Oakland. This talk will cover how to read EXPLAIN and EXPLAIN ANALYZE, and is aimed at developers and new DBAs. Pandora staff will also do a lightning talk.
On November 19th, the day after pgConfSV, Tableau is hosting an extended meetup which will cover using Tableau with PostgreSQL, Joe Conway on PL/R, and Álvaro Hernández Tortosa on Logical Decoding. This meetup will be in Palo Alto, and is expected to fill up very soon.
We are still looking to schedule a meetup in the city of San Francisco for late October or early November, but need a speaker for it. If you wanted to present, and didn't submit in time for pgConfSV, now's you chance. Contact the organizers through the meetup.
This PG Phriday is going to be a bit different. During my trip to Postgres Open this year, I attended a talk I had originally written off as “some Red Hat stuff.” But I saw the word “containers” in the PostgreSQL in Containers at Scale talk and became intrigued. A few days later, I had something of an epiphany: I’ve been wrong about servers for years; we all have.
That’s a pretty bold claim, so it needs some background.
Fairly often in infrastructure and development, we see servers like this diagram:
But what’s the most important thing in the diagram? Can you even tell? Well, everything except the AppData component can be replaced. The primary goal of highly available design is building an architecture to ensure this is the case. I don’t care where the data lives, or what server is hosting it, so long as certain performance metrics are supplied by the underlying hardware. But from that diagram, the most important element would appear to be the servers themselves.
This kind of thinking is a trap, and one pretty much everyone I’ve met in the industry is guilty of falling into. In a two-server Postgres stack, where one server is the primary database and the other is a replica, we tend to gravitate toward preserving the original 1->2 orientation. This may be the case even if retaining this arrangement requires a second failover after an outage or some kind of maintenance. For some reason, the physical server gets the focus. How many times have you seen this? I can’t even count the number of times I’ve encountered servers named after their primary role as a database host, yet I have to physically log in to find out what the databases are. Sure, sometimes there’s a Wiki page of some kind outlining the data to host relationship, but is that really putting the data first?
That’s what I mean when I say we’ve all had it wrong. This obviously isn’t absolutely true, but the prevailing momentum still treats servers as important. They’re not. They’re a medium for whatever is running on them. Businesses with dozens or hundre
I've just launched my new website modern-sql.com. It's very small right now—just six content pages—but it will grow over next month and years and might eventually become a book like Use-The-Inde-Luke.com did. So I though I'd better get you on board right now so you can grow as the site grows.
So what is modern SQL about? Yet another SQL reference? Absolutely not. There are definitively enough references out there. And they all suffer from the same issues:
The examples just show the syntax but not how to solve real-world problems.
They don't document the availability of the features amongst different database products.
No question, the latter ones are a consequence of the first one. However, modern SQL is different:
modern SQL shows how to use the most recent SQL features to solve real world problems. It also provides “conforming alternatives” as a way to solve the same problem using other (often elder) standard methods. If necessary and possible it will also show “proprietary alternatives.”
Note that there is a hierarchy: the recommended (most idiomatic) approach is shown first, standard conforming alternatives next and proprietary alternatives last but only if inevitable.
modern SQL documents the availability of the features
amongst six SQL databases (the example is about
Go here if you don't see an image above.
Click on a feature (e.g., “valid where
select is valid”) to see how the
availability of the features has evolved.
There you can see that SQLite just recently started to accept
select is valid. The timeline view also documents
the last checked version for those databases that don't support a
Inevitably, modern SQL will become both: an homage
The recent release of PgBouncer 1.6, a connection pooler for Postgres, brought a number of new features. The two I want to demonstrate today are the per-database and per-use pool_modes. To get this effect previously, one had to run separate instances of PgBouncer. As we shall see, a single instance can now run different pool_modes seamlessly.
There are three pool modes available in PgBouncer, representing how aggressive the pooling becomes: session mode, transaction mode, and statement mode.
Session pool mode is the default, and simply allows you to avoid the startup costs for new connections. PgBouncer connects to Postgres, keeps the connection open, and hands it off to clients connecting to PgBouncer. This handoff is faster than connecting to Postgres itself, as no new backends need to be spawned. However, it offers no other benefits, and many clients/applications already do their own connection pooling, making this the least useful pool mode.
Transaction pool mode is probably the most useful one. It works by keeping a client attached to the same Postgres backend for the duration of a transaction only. In this way, many clients can share the same backend connection, making it possible for Postgres to handle a large number of clients with a small max_connections setting (each of which consumes resources).
Statement pool mode is the most interesting one, as it makes no promises at all to the client about maintaining the same Postgres backend. In other words, every time a client runs a SQL statement, PgBouncer releases that connections back to the pool. This can make for some great performance gains, although it has drawbacks, the primary one being no multi-statement transactions.
To demonstrate the new pool_mode features, I decided to try out a new service mentioned by a coworker, called Scaleway. Like Amazon Web Services (AWS), it offers quick-to-create cloud servers, ideal for testing and demonstrating. The unique things about Scaleway is the servers are all ARM-based SSDs. Mini-
This is it, PostGIS 2.2 is almost out the door, so we’re looking for testing and feedback! Please give the release candidate a try and report back any issues you encounter.
View all closed tickets for 2.2.0.
I am pleased to announce that the full schedule of speakers and events for the first annual PGConf Silicon Valley conference is now available on the new conference website. Breakout session tracks include DevOps, New Features, PostGIS, Tales from the Trenches, Hacking Postgres, and Data at Scale.
In addition, we welcome two major new organizations to the list of PGConf Silicon Valley sponsors, Pivotal at the Platinum level and Amazon Web Services at the Silver level. Organized in cooperation with the San Francisco PostgreSQL Users Group, PGConf Silicon Valley is a technical conference aimed at the Silicon Valley PostgreSQL community and beyond.
The conference is November 17-18, 2015 at the South San Francisco Conference Center. Early Bird registration for PGConf Silicon Valley is available through Oct. 4, 2015. Register now to attend and receive an additional 20% off your registration fees when you use the discount code CitusData20 at checkout!
The full breakout session schedule includes:
PostgreSQL’s TABLESAMPLE brings a few more advantages compared to other traditional ways for getting random tuples.
TABLESAMPLE is a SQL SELECT clause and
it provides two sampling methods which are
the help of
TABLESAMPLE we can easily
retrieve random rows from a table. For further reading about
TABLESAMPLE you can check the previous blog post.
In this blog post we’ll talk about alternative ways of getting
random rows. How people were selecting random rows before
TABLESAMPLE, what are the pros and cons of
the other methods and what we gained with
There are awesome blog posts about selecting random rows, you can start reading the following blog posts to gain a deep understanding of this topic.
Let’s compare the traditional ways of getting random rows from a table with the new ways provided by TABLESAMPLE.
TABLESAMPLE clause, there
were 3 commonly used methods for randomly selecting rows from a
For testing purposes we need to create a table and put some data inside of it.
Let’s create ts_test table and insert 1M rows into it:
CREATE TABLE ts_test ( id SERIAL PRIMARY KEY, title TEXT ); INSERT INTO ts_test (title) SELECT 'Record #' || i FROM generate_series(1, 1000000) i;
Considering the following SQL statement for selecting 10 random rows:
SELECT * FROM ts_test ORDER BY random() LIMIT 10;
Causes PostgreSQL to perform a full table scan and also ordering. Therefore this method is not preferred for tables with large number of rows because of performance reasons.
Let’s look into
ANALYZE output of this query above:
random=# EXPLAIN ANALYZE SELECT * FROM ts_test ORDER BY random() LIMIT 10; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------- Limit (cost=33959.03..33959.05 rows=10 width=36) (actual time=1956.786..1956.807 rows=10 loops=1) ->
After installing PostgreSQL 9.4 and PostGIS following An Almost Idiot's guide to installing PostgreSQL, PostGIS, and pgRouting, on my CentOS 6.7 64-bit except replacing 9.3 references with equivalent 9.4 reference, I then proceeded to install ogr_fdw. To my disappointment, there are no binaries yet for that, which is not surprising, considering there aren't generally any binaries for any OS, except the windows ones I built which I will be packaging with PostGIS 2.2 windows bundle. Getting out of my windows comfort zone, I proceeded to build those on CentOS. Mainly because I have a client on CentOS where ogr_fdw I think is a perfect fit for his workflow and wanted to see how difficult of a feat this would be. I'll go over the steps I used for building and stumbling blocks I ran into in this article with hope it will be of benefit to those who find themselves in a similar situation.
Docker is quickly becoming one of the most popular ways of deploying distributed applications. By bundling all dependencies of an application in an easily shippable container, software deployment becomes a process that can be performed quickly and often. The open source sharding extension for PostgreSQL, pg_shard, and scalable real-time analytics solution for PostgreSQL, CitusDB, are both meant to run on a cluster of PostgreSQL servers. Deploying such a cluster can become a lot easier with Docker.
While there is no officially supported Docker container for Citus Data extensions at the moment, we were very excited to learn that Heap has published citus-docker. Heap uses CitusDB to analyze their click stream data in real-time with some advanced funnel queries. Their Docker image comes with both pg_shard and CitusDB pre-installed. If you haven't used docker before, you can follow the installation guide to set it up.
One of the benefits of Docker is that it lets you set up a whole cluster on your machine for testing very easily. pg_shard users often set up a cluster on their desktop, running multiple postgres servers on different ports. This approach is not very practical since it requires you to go through all the configuration steps multiple times. With docker-compose, setting up a local cluster becomes a breeze.
If you haven't done so already, you can install docker-compose with the following command:
sudo su curl -L https://github.com/docker/compose/releases/download/1.3.3/docker-compose-`uname -s`-`uname -m` > /usr/bin/docker-compose chmod +x /usr/bin/docker-compose
Now run the following:
git clone https://github.com/heap/citus-docker cd citus-docker docker-compose up -d
That's it! No additional configuration required. You now have a local pg_shard cluster with 2 worker nodes and a master node to which you can connect to using: psql -h localhost -U postgres. When you are done, you can remove it using:
docker-compose kill docker-compose rm
You could also start individual nodes by running the docker comm