I explored two interesting topics today while learning more about Postgres.
PostgreSQL’s partial page write protection is configured by the following setting, which defaults to “on”:
full_page_writes (boolean)
When this parameter is on, the PostgreSQL server writes the entire content of each disk page to WAL during the first modification of that page after a checkpoint… Storing the full page image guarantees that the page can be correctly restored, but at a price in increasing the amount of data that must be written to WAL. (Because WAL replay always starts from a checkpoint, it is sufficient to do this during the first change of each page after a checkpoint. Therefore, one way to reduce the cost of full-page writes is to increase the checkpoint interval parameters.)
Trying to reduce the cost of full-page writes by increasing the checkpoint interval highlights a compromise. If you decrease the interval, then you’ll be writing full pages to the WAL quite often. This should in theory lead to surges in the number of bytes written to the WAL, immediately following each checkpoint. As pages are revisited over time for further changes, the number of bytes written should taper off gradually until the next checkpoint. Hopefully someone who knows more can confirm this. Does anyone graph the number of bytes written to their WAL? That would be a nice illustration to see how dramatic this surging is.
Decreasing the checkpoint interval seems a bit scary, and is bound to have its own costs, for all the usual reasons. A massive checkpoint once in a while should be really expensive, and would lead to a bad worst-case time for recovery. Does the new bgwriter implementation in 8.3 fix any of this? In theory it could, but I don’t know enough yet to say. I have heard conflicting opinions on this point. I have a lot more to read about it before I form my own opinion.
Storing full pages might not really be that expensive. It could bloat the WAL, but is the cost (in terms of time) really that high? InnoDB (in MySQ
A while back I submitted a couple of talks for PG-East 2010 in Philly, and over the past couple of weeks I've been nagging the organizers semi-frequently to get some pre-info on whether I've been accepted or not, since flight prices started to climb fairly rapidly. The site clearly says information that the information will be available on Feb 15th, so I can't really complain that the answer kept being "don't know yet".
A couple of days ago, I got a note from Dave pinged me with a message asking if I was approved. Turns out this press-release had been posted (by his company, no less). Which explicitly names me as a speaker at the conference.
Took me two more days of chasing down JD, but I now have confirmation I'll be there. I don't actually know what I'll be speaking about, but it's a pretty safe bet it will be PostgreSQL related.
I call this Time management by press releases. If I could only get it to apply to all meetings, I would no longer need to keep my own calendar up to date.
So, I'll see you in Philly!



I'm happy to report that after a long haul, we have finally released PostGIS 1.5.0. Two months late, but there it is, and its a really great release I think; Perhaps the best release ever.
Summary excerpted from Paul's slightly premature announcement
February 4, 2010 The PostGIS development team has, after a long course of reflection and a detailed self-examination of our various personal failings, decided to release PostGIS 1.5.0 to the public. http://postgis.org/download/postgis-1.5.0.tar.gz This new version of PostGIS includes a new "geography" type for managing geodetic (lat/lon) data, performance-enhanced distance calculations, GML and KML format readers, an improved shape loading GUI, and other new features as well. Especial thanks to: * Dave Skea for algorithms and mathematics necessary to support spherical geometry * Nicklas Avén for the new performance enhanced distance calculations and other distance-related functions * Sandro Santilli for new buffering features (end caps and style options) * Olivier Courtin for GML/KML input functions * Guillaume Lelarge for support for the upcoming PgSQL 9.0 * George Silva for an example implementation of history tables * Vincent Picavet for Hausdorff distance calculations * The maintainers of GEOS, Proj4, and LibXML, without whom we would have less of a spatial database Love, the PostGIS project steering committee, Mark Cave-Ayland Kevin Neufeld Regina Obe Paul Ramsey
So when will PostgreSQL version 9.0 come out? I decided to "run the numbers" and take a look at how the Postgres project has done historically. Here's a quick graph showing the approximate number of days each major release since version 6.0 took:
Some interesting things can be seen here: there is a rough correlation between the complexity of a new release and the time it takes, major releases take longer, and the trend is gradually towards more days per release. Overall the project is doing great, releasing on average every 288 days since version 6. If we only look at version 7 and onwards, the releases are on average 367 days apart. If we look at *just* version 7, the average is 324 days. If we look at *just* version 8, the average is 410. Since the last major version that came out was on July 1, 2009, the numbers predict 9.0 will be released on July 3, 2010, based on the version 7 and 8 averages, and on August 15, 2010, based on just the version 8 averages. However, this upcoming version has two very major features, streaming replication (SR) and host standby (HS). How those will affect the release schedule remains to be seen, but I suspect the 9.0 to 9.1 window will be short indeed.
As a recap, the Postgres project only bumps the first part of the version number for major changes (Although many, myself included, would argue that 7.4 was such a major jump it should have been called 8.0). The second number occurs anytime a "new release" happens, and means new features and enhancements. The final number, the revision, is only incremented for security and bug fixes, and is almost always a 100% binary compatible drop in for the previous revision in the branch. (What's the average (mean) days between revisions? 84 days since version 6, and 88 days since version 7. The medians are 84 and 87 respectively)
How busy were those periods? Here's the number of commits per release period. Note that I said release period, not release, as commits are still being made to old branches, although this is a very small minority of
I had the pleasure of attending and presenting at LinuxConf.AU this year in Wellington, NZ. Linux Conf.AU is an institution whose friendliness and focus on the practical business of creating and sustaining open source projects was truly inspirational.
My talk this year was "A Survey of Open Source Databases", where I actually created a survey and asked over 35 open source database projects to respond. I have received about 15 responses so far, and also did my own research on the over 50 projects I identified. I created a place-holder site for my research at: ossdbsurvey.org. I'm hoping to revise the survey (make it shorter!!) and get more projects to provide information.
Ultimately, I'd like the site to be a central location for finding information and comparing different projects. Performance of each is a huge issue, and there are a lot of individuals constructing good (and bad) systems for comparing. I don't think I want to dive into that pool, yet. But I would like to start collecting the work others have done in a central place. Right now it is really far too difficult to find all of this information.
Part of the talk was also a foray into the dangerous world of classification. I tried to put together basic categories, based on conversations with individual developers and some fine-tuning with Josh Berkus. Josh gave a short overview of database models during "Relational vs Non-relational" in the Data Storage mini-conf, and we collaborated some on category definition. I also saw Devdas Bhagat give a use case talk on using Postgres, yet again confirming how wonderful transactional DDL is for developers. I also gave a lightning talk (WITHOUT SLIDES!) on Bucardo at the tail end of the Data Storage mini-conf.
Josh Berkus, during "PostgreSQL Development Today", announced to the world that the new version of Postgres would be version 9.0! And he did a live demonstration of streaming replication and hot standby. The audience seemed pleased.
I was delighted to see representatives from the Postgres community on the m
Many computer programmers are introverts because they have chosen the solitary job of creating programs that accomplish specific tasks. Open source developers are probably even more prone to be introverts because the (boring?) meetings, office chit-chat, and lunch outings are mostly gone, with the only communication being via email, instant messaging, twitter, and blogs. Open source social interaction is certainly more controlled than traditional communication.
This is a beta for the upcoming 1.2.10 release.
WineHQ compatibility improved. SQL Reverse Engineering improved. Support for 8.4.x SQL dialect added.
You’re welcome to download the latest release from our website
at:
http://microolap.com/products/database/postgresql-designer/download/
Please don’t hesitate to ask any questions or report bugs with our Support Ticketing system.
[branchname]/pgsql, just as it is with CVS) exists,
the buildfarm client will do an update by simply calling git
pull. Where it does not exist, the client will create it by
calling git clone followed by git checkout -t
origin/[branchname] -b bf_[branchname].git_references config parameter, which can be used as
part of the clone command. By default it will not be there, in
which case the user would get a plain clone of the configured
master repo. Setting up the references repo would be something
documented but not done by the script.git fetch on your local master followed by git
pull on your working copy). I guess that can be aliased, and
it's something I will probably do pretty soon.
JD Wrote:
Last West, a sign up sheet was sent out for people who were interested in starting a Seattle PostgreSQL group. I am please to say that this group has started. Their first meeting is February 9th, 2010. I have the honor of being their first speaker. I will be speaking on Dumb Simple PostgreSQL Performance, which is the same talk I have been giving at various user groups around Seattle (Python, Django, Perl). You can check out there website here.
About a month ago, Bucardo added an interesting set of features in the form of a new script called slony_migrator.pl. In this post I'll describe slony_migrator.pl and its three major functions.
For these examples, I'm using the pagila sample database along with a set of scripts I wrote and made available here. These scripts build two different Slony clusters. The first is a simple one, which replicates this database from a database called "pagila1" on one host to a database "pagila2" on another host. The second is more complex. Its one master node replicates the pagila database to two slave nodes, one of which replicates it again to a fourth slave using Slony's FORWARD function as described here. I implemented this setup on two FreeBSD virtual machines, known as myfreebsd and myfreebsd2. The reset-simple.sh and reset-complex.sh scripts in the script package I've linked to will build all the necessary databases from one pagila database and do all the Slony configuration.
The slony_migrator.pl script has three possible actions, the first of which is to connect to a running Slony cluster and print a synopsis of the Slony setup it discovers. You can do this safely against a running, production Slony cluster; it gathers all its necessary information from a few simple Slony queries. Here's the synopsis the script writes for the simple configuration I described above:
josh@eddie:~/devel/bucardo/scripts$ ./slony_migrator.pl -db pagila1 -H myfreebsd
Slony version: 1.2.16
psql version: 8.3
Postgres version: 8.3.7
Slony schema: _pagila
Local node: 1
SET 1: All pagila tables
* Master node: 1 Active: Yes PID: 3309 Comment: "Cluster node 1"
(dbname=pagila1 host=myfreebsd user=postgres)
** Slave node: 2 Active: Yes Forward: Yes Provider: 1 Comment: "Node 2"
(dbname=pagila2 host=myfreebsd2 user=postgres)
The script has reported the Slony, PostgreSQL, and psql versions, the Slony schema name, and shows that there's only one set, replicated from the master node to one slave node, inc