After almost two years of lobbying at work to promote PostgreSQL instead of MySQL, after two years of tears against poor DB typing, lack of advanced SQL features, traps in what seemed like basic needs (select * from myview where a=5, if myview is not «simple», triggers a full scan on the view result)… we are finally starting to deploy some PostgreSQL databases.
I wrote my dear sysadmin friends documentations, argumentations. Among my «it’s simpler» arguments, I promoted application_name as a way to quickly know who is hurting that bad your SQL server, along with the so simple ps faux to know who is doing what on your server… That’s the simplest monitoring possible, but when you’re in a hurry at 3AM, it can help you… Sadly, the application_name is not displayed in the ps output. I was looking for a tiny thing to patch PostgreSQL in order to discover its code a bit, it seemed like a good opportunity.
And let’s share also how I did the patch to highlight how easy patching PostgreSQL is (the patch has been written together with this blog entry)
First thing, you need a minimal development environment for PostgreSQL. Your tools : your favorite compiler (GCC), make, git, $EDITOR, ack-grep and a bit of patience if your CPU is slow (postgresql compiles quite quickly).
~$ cd ~/projects
~/projects$ git clone git://git.postgresql.org/git/postgresql.git
~/projects$ cd postgresql
~/projects/postgresql$ git checkout -b application-name-in-proctitle
We want to modify how the proctitle is constructed. Under BSD
systems, it’s done through the setproctitle call, easier to find
than the game with ARGV under linux. Soo… let’s search it.
–/projects/postgresql$ ack-grep --type cc setproctitle
42: * use the function setproctitle(const char *, ...)
265: * apparently setproctitle() already adds a `progname:' prefix to the ps
329: setproctitle("%s", ps_buffer);
422:/* Define to 1 if you have the `setproctitle’ function. */
Well, only one
I've been having issues with PostgreSQL error handling mostly on windows 64-bit using binaries I build with mingw64 and using under VC++ built PostgreSQL. Most of which I don't understand. As many have pointed out these issues MIGHT go away or be easier to debug if PostGIS was built with same tool chain as EDB VC++ distributions. I can't argue with that especially since its never been done.
The good news is changing my build chain from mingw64 GCC 4.5.4 to mingw64 GCC 4.8.0 has eradicated all of these issues, and even ones I had with PL/V8 that happended both in a pure mingw64(w32/w64) as well as PostgreSQL VCC (32/64 bit) environment. It did introduce this minor annoying nat, presumably because my libpq.dll now has a dependency on user32.dll (which just seems wrong). The other nat is that all the binaries I built that have c++ in them (depend on libstd++) now need to be recompiled which means my prior builds of PostGIS will not be compatible with my upcoming 4.8.0 ones. A real pain since for PostGIS/pgRouting I've got like at least 15 of those dependencies and growing. So I've decided to perform this exercise just for PostgreSQL 9.3 -- a new beginning at PostGIS 2.1 and see how it goes before I bother with 9.2, 9.1. These issues really only affect PostgreSQL 64-bit PostGIS users and as a 64-bit user, you may never have even come across them.
PostgreSQL can be installed using installers prepared for your operation system. However this way you just depend on the installation settings chosen by the packages mainainers. Installation requires root privileges, on some machines programmers are not allowed to do that. What’s more, this way you rather will not install the PostgreSQL beta version.
The only way to install Postgres without root privileges, in home directory, is to compile it from sources. That’s not very difficult.
Unpack it somewhere, and you have the Postgres sources you need.
For compiling Postgres you will need some libraries and programs. The complete list can be found in Postgres documentation.
I’m using Ubuntu, the packages I use for compiling Postgres are:
If you are using different system, or different system/Postgres version, then your packages/libraries can be named differently.
Now you should enter the directory where your sources are and run below command for source configuration:
./configure --prefix=$HOME/postgres/ --with-python PYTHON=/usr/bin/python2.7
--prefix parameter shows the path where
Postgres will be installed.
--with-python parameter enables compiling with
PYTHON parameter points to current python binary
The configure command should finish without any errors. If you have any errors, most probably you don’t have some needed libraries installed.
If configure succeeded, you can compile the sources. It is simple:
make -j 4
-j parameter allows for this maximum number of
jobs at the same time.
My computer has 4 cores, I want to use all of them, th
PostgreSQL has got the great feature named “functional indexes”. A normal index just stores sorted values of some field. It is great for searching, as the values are already sorted.
You can create an index with a simple query like:
CREATE INDEX i_test ON test (i);
It will store all values of column i from table test. This index can be used with a query like:
SELECT * FROM test WHERE i < 100 ORDER BY i;
There is also something I like most. Index can store all values you want, they don’t need to be values from the table. You can use values calculated from the table columns. They will be sorted, so searching with those indexes will be pretty fast.
Creating such index is simple:
CREATE INDEX i_test_lower_i ON test (lower(i));
The main rule is: this index can be used if you have the same function call in your query, something like:
SELECT * FROM test WHERE lower(i) = 'aaa';
Let’s check something more complicated. My test table looks like:
CREATE TABLE test(t timestamp);
I filled this table with sample data. We need some bigger number of rows:
INSERT INTO test(t) SELECT generate_series(now() - '1 year'::interval, now(), '1 minute');
This way there are 500k rows.
I need to get two row sets from database. First I will get the rows with dates from the last 10 days. Later I will get all rows with dates from current year.
I can get the rows with dates from the last 10 days like:
postgres=# explain analyze select t from test where t::date > (now() - '10 days'::interval)::date; QUERY PLAN --------------------------------------------------------------------------------------------------------------- Seq Scan on test (cost=0.00..14152.02 rows=175200 width=8) (actual time=265.640..272.701 rows=13558 loops=1) Filter: ((t)::date > ((now() - '10 days'::interval))::date) Rows Removed by Filter: 512043 Total runtime: 273.152 ms (4 rows)
For speeding this up I will crea
Last week, one of my customers asked me during a training why there is no foreign data wrapper for sqlite (actually, you can have one with multicorn, but there's no native one). I have no idea why but I was happy to learn that no sqlite FDW already existed. I wanted to write one for quite some time now, and it appeared to be the perfect idea.
So, in the evening, I started working on one. I took as foundation the blackhole foreign data wrapper, written by Andrew Dunstan. It helped a lot to start quickly. I found a bit surprising that it didn't include #ifdef for some functions and hooks, so that it could be compatible with 9.1 and 9.2. I added them in a patch that you can find here if you need them. Otherwise, you can simply delete some parts of the blackhole code.
After that, I tried to find how to open and close a sqlite database, and how to read a table in it. They have a really nice example in their "Quick start document" and "Introduction To The SQLite C/C++ Interface".
I wanted something really simple to start with. Almost two years ago, at pgconf.eu, I went to see Dave Page's talk on FDW (PostgreSQL at the center of your dataverse). So I already knew that you mainly need three functions to read a table: BeginForeignScan, IterateForeignScan, EndForeignScan. The first one has to open the connection (if it's not opened yet). The second one will be executed as many times as there are rows to grab. On its first execution, it must launch the query and get the first row. On every other iteration, it will grab a new row. And the third function helps cleaning memory. So I started to include the sqlite tutorial code in the blackhole FDW. And it worked great. I had to write some other functions, specifically the handler and the validator, but,in an hour, I had something working. Of course, it wasn't pretty. The database filename was written in the code, with no option to change it. The query executed remotely was also written in the code, which means you couldn't change the tablename without recompiling.
So I started
First today is a PostgreSQL community blogger note. Those of you who publish to the Planet PostgreSQL blog feed should take a look at the updated Planet PostgreSQL Policy. There’s a new clause there clarifying when it’s appropriate to mention promotions of commercial products like books. We’re trying to keep every blog post to the Planet feed focused on providing useful information, and just informing people of things like product giveaways doesn’t meet that standard. Several of these have gone by recently, but moving forward that will be considered a violation of the rules.
Speaking of what it’s safe to write about, I’ve published the next installment in my growing documentation set around disks that work well with PostgreSQL. Disk Plug Pull Testing covers how to use the diskchecker.pl program to validate your disks are working as expected for database use. I’m going to include that information in every disk review I do moving forward, even though I normally only talk about ones that pass this test. I think it’s that important to emphasize how vital this is. If someone recommends a drive and doesn’t explicitly tell you they run a plug pull test, be suspicious.
I’m also working on a series of Postgres Guides for how to setup related hardware like RAID controllers. So far there’s a long discussion of the LSI MegaRAID controllers there. I’m almost done with an Adaptec controller one too. It’s hard to figure all this out and the information needed changes regularly. I’m going to make this spot the primary place I maintain this data at, mainly so I can keep everything consistent.
When: 7-9pm Thu June 20, 2013
What: Lightning Talks
Last month we had a surprise special guest – Emily Strickland spoke to us about the way Rentrak uses Postgres. It turned into a big group discussion, my favorite kind of meeting.
Packt has promised us some books to give away as well: a couple copies each of PostgreSQL 9.0 High Performance and the PostgreSQL 9 Admin Cookbook, as well as some of their newer eBooks: Instant PostgreSQL Backup and Restore and Instant PostgreSQL Starter. You must be present to win, of course.
Our meeting will be held at Iovation, on the 32nd floor of the US Bancorp Tower at 111 SW 5th (5th & Oak). It’s right on the Green & Yellow Max lines. Underground bike parking is available in the parking garage; outdoors all around the block in the usual spots. No bikes in the office, sorry!
Building security will close access to the floor at 7:30.
After-meeting beer location: Huber’s. See you there!
before upgrade to 3.9
08:35:01 AM CPU %user %nice %system %iowait %steal %idle 08:45:01 AM all 30.91 0.00 5.66 40.05 0.00 23.38 08:55:02 AM all 29.32 0.00 5.10 39.66 0.00 25.92 09:05:02 AM all 31.71 0.00 6.24 40.99 0.00 21.06 09:15:01 AM all 32.45 0.00 6.59 46.74 0.00 14.21 09:25:01 AM all 20.62 0.00 5.39 60.00 0.00 14.00 09:35:01 AM all 31.03 0.00 3.61 33.95 0.00 31.41 09:45:01 AM all 36.54 0.00 3.22 34.13 0.00 26.11 09:55:02 AM all 40.17 0.00 3.66 30.98 0.00 25.19 10:05:01 AM all 33.49 0.00 3.04 32.28 0.00 31.19 10:15:01 AM all 48.63 0.00 2.87 25.50 0.00 23.00 10:25:01 AM all 51.34 0.00 3.56 26.06 0.00 19.04 10:35:01 AM all 39.41 0.00 3.44 29.86 0.00 27.29 10:45:02 AM all 36.07 0.00 8.79 30.94 0.00 24.20 10:55:03 AM all 38.04 0.00 7.98 32.98 0.00 21.01 11:05:11 AM all 39.25 0.00 8.81 36.75 0.00 15.19 11:15:02 AM all 35.19 0.00 8.76 41.98 0.00 14.07 11:25:03 AM all 38.21 0.00 9.65 38.86 0.00 13.28 11:35:02 AM all 42.92 0.00 11.66 34.28 0.00 11.14 11:45:02 AM all 39.40 0.00 9.96 39.03 0.00 11.61 11:55:01 AM all 28.72 0.00 3.27 36.32 0.00 31.69
after upgrade to 3.9
08:35:02 AM all 40
I gave a tutorial at PGCon a couple weeks back, entitled “Agile Database Development with Git, Sqitch, and pgTAP.” It went well, I think. The Keynote document and an exported PDF have been posted on PGCon.org, and also uploaded to Speaker Deck. And embedded below, too. Want to follow along? Clone the tutorial Git repository and follow along. Here’s the teaser:
Image by Wikimedia user Ardfern
Yesterday I ran on to a nice practical application of a number of slightly unusual SQL features, in particular, window functions. PostgreSQL has had window functions for quite a while now (since version 8.4, in fact, the oldest version still officially supported), but even though they're part of the SQL standard, window functions aren't necessarily a feature people use every day. As a bonus, I also threw in some common table expressions (also known as CTEs, also a SQL standard feature), to help break up what could have been a more confusing, complex query.
A client of ours noticed a problem in some new code they were working on. It was possible for users to submit duplicate orders to the system in quick succession, by double-clicking or something similar. This was fixed in the code easily enough, but we needed to clean up the duplicate orders in the database. Which meant we had to find them. We defined a group of duplicates as all orders involving the same line items, with one of a set of possible status codes, created in an interval of less than five minutes by the same user.
This discussion of the time interval between two different records should immediately signal "window functions" (or possibly a self-join, but window functions are much easier in this case). A window function takes a set of rows and lets you chop them up into subsets, processing the subsets in various ways. In this case, we want to take all the rows in the orders table with a particular status value, group them by the customer who placed the order as well as by the items in the order, and then evaluate each of those groups.
As might be expected, the items associated with an order are in a different table from the orders themselves. There are probably several different ways I could have compared the items; I chose to accumulate all the items in an order into an array, and compare the resulting arrays (I imagine this would be awfully slow if orders had many different items attached to them, but it wasn't a pr
Satrting set up in streaming replication mode
creating startall and shutdownall
creating failover script
creating database cluster /home/t-ishii/work/git.postgresql.org/tmp2/data0...done.
creating database cluster /home/t-ishii/work/git.postgresql.org/tmp2/data1...done.
temporarily start data0 cluster to create extensions
temporarily start pgpool-II to create standby nodes
waiting for failover happens...done.
On my laptop I've managed to create 1,001 local chained (one-to-one) streaming-only (meaning no archive directory) asynchronous replication instances. The output of the status of the list is here: https://gist.github.com/darkixion/5694200
I also tested the promotion of the 1st standby to see if it would cope with propagation to the final 1,000th standby, and it worked flawlessly. This didn't work on my copy of Linux Mint without some adjustments to the kernel semaphores values, and it does take a while for all the standbys in the chain to reach full recovery. However, promotion propagation is very fast.
Try it for yourself (if you have enough RAM that is). You may find it quicker to use my pg_rep_test tool. Just don't do this manually... it'll take far too long.
Thanks to Heikki for putting in the changes that made this archiveless cascading replication possible. :)