<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0"><channel><title>Planet PostgreSQL</title><link>http://planet.postgresql.org</link><description>Planet PostgreSQL</description><lastBuildDate>Tue, 21 May 2013 08:31:54 GMT</lastBuildDate><generator>Planet PostgreSQL</generator><docs>http://blogs.law.harvard.edu/tech/rss</docs><item><title>Bruce Momjian: Video Interview</title><link>http://momjian.us/main/blogs/pgblog/2013.html#May_20_2013</link><description>&lt;p&gt;I did a video interview while I was at &lt;a class="txt2html" href="http://momjian.us/main/events/2013.html#February_25-March_1_2013"&gt;ConFoo&lt;/a&gt;, and it is now
&lt;a class="major" href="http://www.youtube.com/watch?v=kw1XXPtCk-o"&gt;online&lt;/a&gt;.
&lt;/p&gt;</description><guid isPermaLink="false">http://momjian.us/main/blogs/pgblog/2013.html#May_20_2013</guid><pubDate>Tue, 21 May 2013 02:00:01 GMT</pubDate></item><item><title>Josh Berkus: PostgreSQL New Development Priorities 4: Parallel Query</title><link>http://www.databasesoup.com/2013/05/postgresql-new-development-priorities-4_20.html</link><description>Parallel query is the first priority from those suggested in the comments that I agree should be a major PostgreSQL development priority.&amp;nbsp; I think that Joel Jacobson summarized it neatly: &lt;a href="http://joelonsql.com/2013/04/20/will-postgresql-9-5-bring-back-moores-law/"&gt;Bring Back Moore's Law&lt;/a&gt;.&amp;nbsp; Vertical scaling has always been one of PostgreSQL's strengths, but we're running into hard limits as servers are getting more cores but not faster cores.&amp;nbsp; We need to be able to use a server's full CPU capacity.&lt;br /&gt;&lt;br /&gt;&lt;i&gt;(note: this series of articles is my personal opinion as a PostgreSQL core team member)&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;The benefits to having some kind of parallel query are obvious to most users and developers today.&amp;nbsp; Mostly, people tend to think of analytics and parallel query across terabyte-sized tables, and that's definitely one of the reasons we need parallel query.&amp;nbsp; But possibly a stronger reason, which isn't much talked about, is CPU-heavy extensions -- chief among them, PostGIS.&amp;nbsp; All of those spatial queries are very processor-heavy; a location search takes a lot of math, a spatial JOIN more so.&amp;nbsp; While most users of large databases would like parallel query in order to do things a bit faster, PostGIS users need parallism yesterday.&lt;br /&gt;&lt;br /&gt;Fortunately, work on parallelism has &lt;a href="http://www.postgresql.org/message-id/20130513142859.GC171500@tornado.leadboat.com"&gt;already started&lt;/a&gt;.&amp;nbsp; Even more fortunately, parallel query isn't a single monumental thing which has to be done as one big chunk; we can add parallelism piecemeal over the next few versions of Postgres.&amp;nbsp; Rougly, parallel query breaks down into parallelizing all of the following operations:&lt;br /&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Table scan&lt;/li&gt;&lt;li&gt;Index scan&lt;/li&gt;&lt;li&gt;Bitmap scan&lt;/li&gt;&lt;li&gt;In-memory sort&lt;/li&gt;&lt;li&gt;On-disk sort&lt;/li&gt;&lt;li&gt;Hashing&lt;/li&gt;&lt;li&gt;Merge Join &lt;/li&gt;&lt;li&gt;Nested loop join&lt;/li&gt;&lt;li&gt;Aggregation&lt;/li&gt;&lt;li&gt;Framework for parallel functions&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;Most of these features can be worked on independently, in any order -- dare I say, developed in parallel?&amp;nbsp; Joins probably need to be done after sorts and scans, but that's pretty much it.&amp;nbsp; Noah Misch has &lt;a href="http://www.postgresql.org/message-id/20130513142859.GC171500@tornado.leadboat.com"&gt;chosen to start with parallel in-memory sort&lt;/a&gt;, so you can probably expect that for version 9.4.</description><guid isPermaLink="true">tag:blogger.com,1999:blog-7476449567742726187.post-8189199441377450511</guid><pubDate>Tue, 21 May 2013 01:48:49 GMT</pubDate></item><item><title>Selena Deckelmann: Distributed databases: a series of posts including 2-phase commit in Postgres</title><link>http://www.chesnok.com/daily/2013/05/20/distributed-databases-a-series-of-posts-including-2-phase-commit-in-postgres/?utm_source=rss&amp;utm_medium=rss&amp;utm_campaign=distributed-databases-a-series-of-posts-including-2-phase-commit-in-postgres</link><description>&lt;p&gt;There&amp;#8217;s a fantastic set of blog posts about distributed databases and network partitioning, starting with this post explaining the perils of trying to &amp;#8220;&lt;a href="http://aphyr.com/posts/281-call-me-maybe"&gt;communicate with someone who doesn&amp;#8217;t know you&amp;#8217;re alive&lt;/a&gt;.&amp;#8221;&lt;/p&gt;
&lt;p&gt;The next post is about &lt;a href="http://aphyr.com/posts/282-call-me-maybe-postgres"&gt;Postgres and 2-phase commit&lt;/a&gt;.  And there are four &lt;a href="http://aphyr.com/posts/283-call-me-maybe-redis"&gt;additional&lt;/a&gt; &lt;a href="http://aphyr.com/posts/284-call-me-maybe-mongodb"&gt;posts&lt;/a&gt; in &lt;a href="http://aphyr.com/posts/285-call-me-maybe-riak"&gt;the&lt;/a&gt; &lt;a href="http://aphyr.com/posts/286-call-me-maybe-final-thoughts"&gt;series&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The whole series worth reading for anyone interested in data stores, consistency and Postgres! &lt;img src="http://www.chesnok.com/daily/wp-includes/images/smilies/icon_smile.gif" alt=":)" class="wp-smiley" /&gt; &lt;/p&gt;</description><guid isPermaLink="false">http://www.chesnok.com/daily/?p=4808</guid><pubDate>Tue, 21 May 2013 00:19:33 GMT</pubDate></item><item><title>Hubert 'depesz' Lubaczewski: Explaining the unexplainable – part 4</title><link>http://www.depesz.com/2013/05/19/explaining-the-unexplainable-part-4/</link><description>In this, hopefully 2nd to last, post in the series, I will cover the rest of usually happening operations that you can see in your explain outputs. Unique Name seems to be clear about what's going on here &amp;#8211; it removes duplicate data. This might happen, for example, when you're doing: SELECT DISTINCT FIELD FROM [...]</description><guid isPermaLink="false">http://www.depesz.com/?p=2709</guid><pubDate>Sun, 19 May 2013 14:56:36 GMT</pubDate></item><item><title>Gabriele Bartolini: Configuring retention policies in Barman</title><link>http://blog.2ndquadrant.com/configuring-retention-policies-in-barman/?utm_source=rss&amp;utm_medium=rss&amp;utm_campaign=configuring-retention-policies-in-barman</link><description>&lt;p&gt;&lt;a title="Retention of backups with Barman" href="http://blog.2ndquadrant.com/retention-of-backups-with-barman/"&gt;In our previous article&lt;/a&gt; we went through describing what retention policies are and how they can be enforced on your PostgreSQL server backups with &lt;a href="http://www.pgbarman.org/"&gt;Barman&lt;/a&gt; 1.2. In this post, we will go through the configuration aspects.&lt;/p&gt;
&lt;p&gt;&lt;span id="more-581"&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;For the sake of simplicity, we assume a typical scenario which involves taking full backups once a week through the &amp;#8220;barman backup&amp;#8221; command. Suppose you want to automatically keep the latest 4 backups and let Barman automatically delete the old ones (obsolete).&lt;/p&gt;
&lt;p&gt;The main configuration option for retention policies in Barman is &amp;#8220;retention_policy&amp;#8221; which can be defined both at global or server level. If you want all your servers by default to keep the last 4 periodical backups, you need to add in the general section of Barman&amp;#8217;s configuration file the following line:&lt;/p&gt;
&lt;pre&gt;[barman]
... // General settings
retention_policy: REDUNDANCY 4&lt;/pre&gt;
&lt;p&gt;When the next &amp;#8220;barman cron&amp;#8221; command is executed (every minute if you installed Barman using RPMs or Debian/Ubuntu packages), Barman checks for the number of available full periodical backups for every server, order them in descending chronological order (from the most recent to the oldest one) and deletes backups from the 5th position onwards.&lt;/p&gt;
&lt;p&gt;In case you have several servers backed up on the same Barman host and you want to differentiate the retention policy for a specific server, you can simply edit that server configuration section (or file, see &amp;#8220;&lt;a title="Managing the backup of several PostgreSQL servers with Barman" href="http://blog.2ndquadrant.com/managing-backup-several-postgresql-servers-barman/"&gt;Managing the backup of several PostgreSQL servers with Barman&lt;/a&gt;&amp;#8220;) and define a different setting:&lt;/p&gt;
&lt;pre&gt;[malcolm]
description = Malcolm Rocks
ssh_command = ssh malcolm
conninfo = host=malcolm port=5432 user=postgres dbname=postgres
retention_policy: REDUNDANCY 8&lt;/pre&gt;
&lt;p&gt;However, Barman allows systems administrators to manage retention policies based on time, in terms of recovery window and point of recoverability. For example, you can set another server to allow to recover at any point in time in the last 3 months:&lt;/p&gt;
&lt;pre&gt;[angus]
description = Angus Rocks
ssh_command = ssh angus
conninfo = host=angus port=5432 user=postgres dbname=postgres
retention_policy: RECOVERY WINDOW OF 3 MONTHS&lt;/pre&gt;
&lt;p&gt;Make sure you have enough space on the disk to store all the WAL files for every server you back up, and always monitor &amp;#8220;barman check&amp;#8221; through your alerting tools (such as Nagios/Icinga/Zabbix/etc.).&lt;/p&gt;
&lt;p&gt;Current implementation of retention policies in Barman has some limitations: retention policies are managed only automatically (not manually &amp;#8211; this would require to create a &amp;#8220;barman delete &amp;#8211;obsolete&amp;#8221; command, for example) and there is no decoupling yet between full backups and WAL archive transactional logs (we have already thought of the &amp;#8220;wal_retention_policy&amp;#8221; option, but at the moment it is not handled).&lt;/p&gt;
&lt;p&gt;More detailed information on retention policies can be found on &lt;a href="http://docs.pgbarman.org/#retention_policies"&gt;Barman&amp;#8217;s documentation website&lt;/a&gt;.&lt;/p&gt;</description><guid isPermaLink="false">http://blog.2ndquadrant.com/?p=581</guid><pubDate>Sun, 19 May 2013 08:59:32 GMT</pubDate></item><item><title>Josh Berkus: PostgreSQL New Development Priorities 3: Pluggable Parser</title><link>http://www.databasesoup.com/2013/05/postgresql-new-development-priorities-3.html</link><description>Really, when you look at the long-term viability of any platform, pluggability is where it's at.&amp;nbsp; A lot of the success of PostgreSQL to date has been built on extensions and portability, just as the success of AWS has been built on their comprehensive API. &amp;nbsp; Our future success will be built on taking pluggability even further.&amp;nbsp;&lt;br /&gt;&lt;br /&gt;In addition to &lt;a href="http://www.databasesoup.com/2013/05/postgresql-new-development-priorities-2.html"&gt;pluggable storage&lt;/a&gt;, a second thing we really need is a pluggable parser interface.&amp;nbsp; That is, it should be possible to generate a statement structure, in binary form, and hand that off to libpq for execution.&amp;nbsp; There was recently &lt;a href="http://www.postgresql.org/message-id/CA+TgmoaaELFkE-QBoktUNNgWhx1qnZ06VwCqxeLH7Hb+G8UqJw@mail.gmail.com"&gt;some discussion about this on -hackers&lt;/a&gt;.&amp;nbsp;&lt;br /&gt;&lt;br /&gt;If there were a way to hand off expression trees directly to the planner, then this would allow creating extensions which actually had additional syntax, without having to fork PostgreSQL.&amp;nbsp; This would support most of those "compatibility" extensions, as well as potentially allowing extensions like SKYLINE OF which change SQL behavior.&lt;br /&gt;&lt;br /&gt;It would also help support PostgreSQL-based clustered databases, by allowing all of the parsing for a particular client to happen on a remote node and get passed to the clustered backends.&amp;nbsp; The pgPool2 project has asked for this for several years for that reason. &lt;br /&gt;&lt;br /&gt;More intriguingly, it would allow for potentially creating an "ORM" which doesn't have to serialize everything to SQL, but can instead build expression trees directly based on client code.&amp;nbsp; This would both improve response times, and encourage developers to use a lot of PostgreSQL's more sophisticated features since they could access them directly in their code.&lt;br /&gt;&lt;br /&gt;Taking things a step further, we could extend this to allow users to hand a plan tree directly to the executor.&amp;nbsp; This would fix things for all of the users who actually need query hints (as opposed to those who think they need them), as well as taking efficiency a step beyond cached plans.&lt;br /&gt;&lt;br /&gt;There are a lot of reasons this would be just as difficult to do as pluggable storage.&amp;nbsp; Currently parsing depends on a context-dependant knowledge of system catalogs, including things like search_path.&amp;nbsp; So I have no idea what it would even look like.&amp;nbsp; But a parser API is something that people who hack on Postgres and fork it will continue to ask for.</description><guid isPermaLink="true">tag:blogger.com,1999:blog-7476449567742726187.post-3981868842242235456</guid><pubDate>Sat, 18 May 2013 22:17:32 GMT</pubDate></item><item><title>Selena Deckelmann: Migrations with Alembic: a lightspeed tour</title><link>http://www.chesnok.com/daily/2013/05/17/migrations-with-alembic-a-lightspeed-tour/?utm_source=rss&amp;utm_medium=rss&amp;utm_campaign=migrations-with-alembic-a-lightspeed-tour</link><description>&lt;p&gt;I&amp;#8217;ve got a &lt;a href="https://wiki.mozilla.org/Webdev/Beer_And_Tell/May2013"&gt;Beer &amp;amp; Tell&lt;/a&gt; to give about &lt;a href="https://alembic.readthedocs.org/en/latest/"&gt;alembic&lt;/a&gt;. Alembic is a migration tool that works with SQLAlchemy. I&amp;#8217;m using it for database migrations with PostgreSQL.&lt;/p&gt;
&lt;p&gt;So, here&amp;#8217;s what I want to say today:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Written by SQLAlchemy wiz &lt;a href="https://twitter.com/zzzeek"&gt;Mike Bayer&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Here&amp;#8217;s the &lt;a href="https://alembic.readthedocs.org/en/latest/tutorial.html"&gt;tutorial&lt;/a&gt;. Socorro is now using alembic in production with SQLAlchemy 0.6.x. I&amp;#8217;m hoping to get us upgraded to 0.8.x soon.&lt;/li&gt;
&lt;li&gt;Here&amp;#8217;s what &lt;a href="https://gist.github.com/selenamarie/4dcf5d05bbe8419e4b42/raw/62de2c32f17c0153dc69afa97f145f25a5fab12b/alembic+output+v46.txt"&gt;running an upgrade in production for Socorro looks like&lt;/a&gt;. Awesome right?&lt;/li&gt;
&lt;li&gt;Here&amp;#8217;s what a &lt;a href="https://github.com/mozilla/socorro/blob/master/alembic/versions/37004fc6e41e_bug_867606_add_data_.py"&gt;migration looks like&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Here&amp;#8217;s &lt;a href="https://github.com/mozilla/socorro/blob/master/config/alembic.ini-dist"&gt;a configuration file&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Generating a migration from the command line might look something like:&lt;br /&gt;
&lt;code&gt;alembic revision -m "bug XXXXXX Add a new table" --autogenerate&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The most difficult thing to deal with so far are the many &lt;a href="https://github.com/mozilla/socorro/tree/master/socorro/external/postgresql/raw_sql/procs"&gt;User Defined Functions that we use in Socorro&lt;/a&gt;. This isn&amp;#8217;t something that any migration tools I tested deal well with.&lt;/p&gt;
&lt;p&gt;Happy to answer questions! And I&amp;#8217;ll see about making a longer talk about this transition soon.&lt;/p&gt;</description><guid isPermaLink="false">http://www.chesnok.com/daily/?p=4761</guid><pubDate>Fri, 17 May 2013 20:44:46 GMT</pubDate></item><item><title>Joe Abbate: Pyrseas contributions solicited</title><link>http://pyrseas.wordpress.com/2013/05/17/pyrseas-contributions-solicited/</link><description>&lt;p&gt;Do you use &lt;a href="http://www.postgresql.org/"&gt;PostgreSQL&lt;/a&gt; and truly believe it&amp;#8217;s &amp;#8220;the world&amp;#8217;s most advanced open source database&amp;#8221; and that its upcoming 9.3 release will make it even more awesome?&lt;/p&gt;
&lt;p&gt;Do you also use &lt;a href="http://www.python.org/"&gt;Python&lt;/a&gt; and believe it&amp;#8217;s &amp;#8220;an easy to learn, powerful programming language&amp;#8221; with &amp;#8220;elegant syntax&amp;#8221; that makes it an ideal language for developing applications and tools around PostgreSQL, such as Pyrseas?&lt;/p&gt;
&lt;p&gt;Then we could use your help. For starters, we want to add support for the &lt;a href="https://github.com/jmafc/Pyrseas/issues/55"&gt;MATERIALIZED VIEWs&lt;/a&gt; and &lt;a href="https://github.com/jmafc/Pyrseas/issues/56"&gt;EVENT TRIGGERs&lt;/a&gt; coming up in PG 9.3.&lt;/p&gt;
&lt;p&gt;We have also been requested to add the capability to load and maintain &amp;#8220;static data&amp;#8221; (relatively small, unchanging tables) as part of &lt;a href="http://pyrseas.readthedocs.org/en/latest/yamltodb.html"&gt;yamltodb&lt;/a&gt;, so that it can be integrated more easily into database version control workflows.&lt;/p&gt;
&lt;p&gt;And for the next release, Pyrseas 0.7, we&amp;#8217;d like to include the first version of the &lt;a href="https://github.com/jmafc/Pyrseas/issues/17"&gt;database augmentation tool&lt;/a&gt; which will support declarative implementation of business logic in the database&amp;#8211;starting off with audit trail columns. Some work has been done on this already, but it needs integration with the current code and tests.&lt;/p&gt;
&lt;p&gt;Or perhaps coding is not your forte, but you&amp;#8217;re really good at explaining and documenting technical &amp;#8220;stuff&amp;#8221;. Then you could give us a hand with revamping the &lt;a href="http://pyrseas.readthedocs.org/en/latest/"&gt;docs&lt;/a&gt;, maybe writing a tutorial so that users have a smooth ride using our tools.&lt;/p&gt;
&lt;p&gt;Or maybe you have your own ideas as to how improve the PostgreSQL version control experience. We&amp;#8217;d love to hear those too.&lt;/p&gt;
&lt;p&gt;If you&amp;#8217;d like to help, you can &lt;a href="https://github.com/jmafc/Pyrseas"&gt;fork the code on GitHub&lt;/a&gt;, join the &lt;a href="http://lists.pgfoundry.org/mailman/listinfo/pyrseas-general"&gt;mailing list&lt;/a&gt; and introduce yourself, or leave a comment below.&lt;/p&gt;
&lt;br /&gt;Filed under: &lt;a href="http://pyrseas.wordpress.com/category/postgresql-2/"&gt;PostgreSQL&lt;/a&gt;, &lt;a href="http://pyrseas.wordpress.com/category/python/"&gt;Python&lt;/a&gt;, &lt;a href="http://pyrseas.wordpress.com/category/version-control/"&gt;Version control&lt;/a&gt;  &lt;img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=pyrseas.wordpress.com&amp;blog=19437126&amp;post=2019&amp;subd=pyrseas&amp;ref=&amp;feed=1" width="1" height="1" /&gt;</description><guid isPermaLink="true">http://pyrseas.wordpress.com/?p=2019</guid><pubDate>Fri, 17 May 2013 18:48:31 GMT</pubDate></item><item><title>Mason Sharp: Backup and Recovery in Postgres-XC compared to PostgreSQL</title><link>http://www.stormdb.com/content/backup-and-recovery-postgres-xc-compared-postgresql</link><description>&lt;p&gt;Backup and recovery in &lt;a href="http://www.stormdb.com/community/postgresxc"&gt;Postgres-XC&lt;/a&gt; has some parallels to PostgreSQL, but with its own wrinkles.&lt;/p&gt;
&lt;p&gt;&lt;a href="http://www.stormdb.com/content/backup-and-recovery-postgres-xc-compared-postgresql" target="_blank"&gt;read more&lt;/a&gt;&lt;/p&gt;</description><guid isPermaLink="false">http://www.stormdb.com/blog/mason_sharp/953 at http://www.stormdb.com</guid><pubDate>Thu, 16 May 2013 18:53:02 GMT</pubDate></item><item><title>Josh Berkus: PostgreSQL New Development Priorities 2: Pluggable Storage</title><link>http://www.databasesoup.com/2013/05/postgresql-new-development-priorities-2.html</link><description>Over the last decade, Greenplum, Vertica, Everest, Paraccel, and a number of non-public projects all forked off of PostgreSQL.&amp;nbsp; In each case, one of the major changes to the forks was to radically change data storage structures in order to enable new functionality or much better performance on large data.&amp;nbsp; In general, once a Postgres fork goes through the storage change, they stop contributing back to the main project because their codebase is then different enough to make merging very difficult.&lt;br /&gt;&lt;br /&gt;Considering the amount of venture capital money poured into these forks, that's a big loss of feature contributions from the community.&amp;nbsp; Especially when the startup in question gets bought out by a company who buries it or loots it for IP and then kills the product.&lt;br /&gt;&lt;br /&gt;More importantly, we have a number of people who would like to do something interesting and substantially different with PostgreSQL storage, and will likely be forced to fork PostgreSQL to get their ideas to work.&amp;nbsp; Index-organized tables, fractal trees, JSON trees, EAV-optimized storage, non-MVCC tables, column stores, hash-distributed tables and graphs all require changes to storage which can't currently be fit into the model of index classes and blobs we offer for extensibility of data storage.&amp;nbsp; Transactional RAM and Persistent RAM in the future may urge other incompatible storage changes.&lt;br /&gt;&lt;br /&gt;As a community, we want to capture these innovations and make them part of mainstream Postgres, and their users part of the PostgreSQL community.&amp;nbsp; The only way to do this is to have some form of pluggable storage, just like we have pluggable function languages&amp;nbsp; and pluggable index types.&lt;br /&gt;&lt;br /&gt;The direct way to do this would be to refactor our code to replace all direct manipulation of storage and data pages with a well-defined API.&amp;nbsp; This would be extremely difficult, and would produce large performance issues in the first few versions.&amp;nbsp; It would, however, also have the advantage of allowing us to completely solve the binary upgrade of page format changes issue.&lt;br /&gt;&lt;br /&gt;A second approach would be to do a MySQL, and build up Foreign Data Wrappers (FDWs) to the point where they could perform and behave like local tables.&amp;nbsp; This may be the more feasible route because the work could be done incrementally, and FDWs are already a well-defined API.&amp;nbsp; However, having Postgres run administration and maintenance of foreign tables would be a big step and is conceptually difficult to imagine.&lt;br /&gt;&lt;br /&gt;Either way, this is a problem we need to solve long-term in order to continue expanding the places people can use PostgreSQL.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;</description><guid isPermaLink="true">tag:blogger.com,1999:blog-7476449567742726187.post-295599504198565077</guid><pubDate>Thu, 16 May 2013 17:18:16 GMT</pubDate></item><item><title>Andrew Dunstan: Our commitment to quality is second to none.</title><link>http://adpgtech.blogspot.com/2013/05/our-commitment-to-quality-is-second-to.html</link><description>If anyone doubts the total commitment of the Postgres project to quality and correctness, let them be reassured by this completely correct but decidedly pedantic &lt;a href="http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=e7bfc7e42cebf80507f9c9965dc4a572e9fb76a4" target="_blank"&gt;commit&lt;/a&gt;. I dips me lid to Tom Lane and Thom Brown.</description><guid isPermaLink="false">tag:blogger.com,1999:blog-2356137376934964551.post-385701571966524516</guid><pubDate>Thu, 16 May 2013 16:49:19 GMT</pubDate></item><item><title>Raghavendra Rao: Disk page checksums to detect filesystem failures in PostgreSQL 9.3Beta 1</title><link>http://raghavt.blogspot.com/2013/05/disk-page-checksums-to-detect.html</link><description>&lt;div dir="ltr"&gt;New feature introduced in PostgreSQL 9.3Beta 1 i.e. "Disk page checksums". Thanks to authors &lt;i&gt;Simon Riggs, Jeff Davis &amp;amp; Greg Smith&lt;/i&gt;.&lt;br /&gt;&lt;br /&gt;In earlier releases, if there's any data corruption block on disk it was silently ignored until any pointer arrives on it or some wrong results shown by the queries. Now, data corruption detected beforehand by triggering WARNING message instead of silently using or waiting for hit on the corrupted block.&lt;br /&gt;&lt;br /&gt;Disk page checksums feature implementation is unique, &amp;nbsp;its not plug-able like EXTENSIONs its selectable feature. That's if you need your database should be under monitory umbrella of data corruption then it should be enabled at the time of cluster initialization not on existing or running cluster. Below's the example how it works.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;&lt;u&gt;&lt;i&gt;&lt;span&gt;Initialize the cluster with checksums:&lt;/span&gt;&lt;/i&gt;&lt;/u&gt;&lt;/b&gt;&lt;br /&gt;&lt;pre class="cpp" name="code"&gt;initdb [OPTION]... [DATADIR]&lt;br /&gt;   ........&lt;br /&gt;   -k, --data-checksums      use data page checksums&lt;br /&gt;   &lt;br /&gt;initdb -D data_directory -k &lt;br /&gt;&lt;/pre&gt;Now, any data corruption found will be notified as below:&lt;br /&gt;&lt;pre class="cpp" name="code"&gt;postgres=# select * from corruption_test;&lt;br /&gt;WARNING:  page verification failed, calculated checksum 63023 but expected 48009&lt;br /&gt;ERROR:  invalid page in block 0 of relation base/12896/16409&lt;br /&gt;&lt;/pre&gt;In earlier version,just an error message.&lt;br /&gt;&lt;pre class="cpp" name="code"&gt;postgres=# select * from corruption_test where id=1;&lt;br /&gt;ERROR:  invalid page header in block 0 of relation base/12870/18192&lt;br /&gt;&lt;/pre&gt;That's cool right....&lt;br /&gt;&lt;br /&gt;So, how do you know whether disk page checksums enabled on the cluster or not ? &lt;br /&gt;As of now, there's no pg_catalog to store such information or any files created in the $PGDATA directory, only pg_control file will hold that information. Using pg_controldata utility you can know about it.&lt;br /&gt;&lt;pre class="cpp" name="code"&gt;$ export PGDATA=/usr/local/pg93beta/data&lt;br /&gt;$ pg_controldata &lt;br /&gt;....&lt;br /&gt;....&lt;br /&gt;....&lt;br /&gt;Data page checksum version:           1&lt;br /&gt;&lt;/pre&gt;&lt;b&gt;&lt;i&gt;&lt;u&gt;&lt;span&gt;Some points on Disk page checksums:&lt;/span&gt;&lt;/u&gt;&lt;/i&gt;&lt;/b&gt;&lt;br /&gt;1. Temp tables are excluded from checksums checks.&lt;br /&gt;2. There's performance overhead if checksums enabled as per the &lt;a href="http://www.postgresql.org/docs/devel/static/app-initdb.html"&gt;PG documentation&lt;/a&gt;.&lt;br /&gt;3. Once enabled checksums on a cluster cannot be rolled back.&lt;br /&gt;&lt;br /&gt;Thanks&lt;br /&gt;Raghav&lt;/div&gt;</description><guid isPermaLink="true">tag:blogger.com,1999:blog-716982283843360345.post-2264109636412508426</guid><pubDate>Thu, 16 May 2013 08:16:30 GMT</pubDate></item><item><title>Christophe Pettus: DjangoCon US 2013 CFP is open</title><link>http://thebuild.com/blog/2013/05/15/djangocon-us-2013-cfp-is-open/</link><description>&lt;p&gt;The &lt;a href="http://www.djangocon.us/cfp/"&gt;Call for Papers&lt;/a&gt; for DjangoCon US 2013 is now open.&lt;/p&gt;</description><guid isPermaLink="false">http://thebuild.com/blog/?p=452</guid><pubDate>Thu, 16 May 2013 06:59:31 GMT</pubDate></item><item><title>Hubert 'depesz' Lubaczewski: A tale of automating tests of Pg with Bash</title><link>http://www.depesz.com/2013/05/16/a-tale-of-automating-tests-of-pg-with-bash/</link><description>Word of warning: this blogpost is about thing related to Bash (well, maybe other shells too, didn't really test), but since I found it while doing Pg work, and it might bite someone else doing Pg related work, I decided to add it to &amp;#8220;postgresql" tag. So, due to some work I had to do, [...]</description><guid isPermaLink="false">http://www.depesz.com/?p=2707</guid><pubDate>Wed, 15 May 2013 23:03:25 GMT</pubDate></item><item><title>Josh Berkus: Unconference, 9.3 Beta</title><link>http://www.databasesoup.com/2013/05/unconference-93-beta.html</link><description>The first &lt;a href="http://wiki.postgresql.org/wiki/Pgcon2013unconference"&gt;pgCon Unconference&lt;/a&gt; is only 10 days away, and we now have room numbers.&amp;nbsp; If you want to lead a topic ... or, more importantly, if you want &lt;i&gt;someone else&lt;/i&gt; to lead a topic ... please add your topic ideas to &lt;a href="http://wiki.postgresql.org/wiki/Pgcon2013unconference"&gt;the wiki page&lt;/a&gt;!&amp;nbsp; We're expecting over 100 people at the unconference, given that pgCon registration is up above 225, more than 10% higher than last year at this point.&lt;br /&gt;&lt;br /&gt;Second, my &lt;a href="https://lwn.net/Articles/550418/"&gt;article on the PostgreSQL 9.3 Beta&lt;/a&gt; is up at LWN.net (subscription required, or wait 2 weeks).</description><guid isPermaLink="true">tag:blogger.com,1999:blog-7476449567742726187.post-6170655324585428919</guid><pubDate>Wed, 15 May 2013 18:28:26 GMT</pubDate></item><item><title>Pavel Stehule: PostgreSQL history</title><link>http://okbob.blogspot.com/2013/05/postgresql-history.html</link><description>A year ago I wrote article about PostgreSQL history. Original &lt;a href="http://postgres.cz/wiki/Historie_projektu_PostgreSQL"&gt;article&lt;/a&gt; is in Czech language, but there is a link to Google translated &lt;a href="http://translate.google.cz/translate?hl=cs&amp;sl=cs&amp;tl=en&amp;u=http%3A%2F%2Fpostgres.cz%2Fwiki%2FHistorie_projektu_PostgreSQL"&gt;article&lt;/a&gt;</description><guid isPermaLink="true">tag:blogger.com,1999:blog-8839574367290288724.post-8314537755554806035</guid><pubDate>Wed, 15 May 2013 17:25:53 GMT</pubDate></item><item><title>Andrew Dunstan: PostgreSQL tables for single Redis objects</title><link>http://adpgtech.blogspot.com/2013/05/postgresql-tables-for-single-redis.html</link><description>After playing a bit with the &lt;a href="https://github.com/pg-redis-fdw/redis_fdw" target="_blank"&gt;Redis FDW&lt;/a&gt; and the &lt;a href="https://bitbucket.org/qooleot/redis_wrapper" target="_blank"&gt;Redis command extension&lt;/a&gt;, Josh Berkus was slightly dissatisfied. He was looking for a way to map the possibly huge set of values in a single Redis object to a set of rows on a PostgresSQL table, and the Redis FDW currently maps each object to a single row - as an array if it's appropriate. Of course, we could call unnest () on the array, but it seems roundabout to construct an array only to have to unpack it immediately. These aren't terribly cheap operations. &lt;br /&gt;&lt;br /&gt;I've been thinking a bit about his complaint, and I think this will be doable. What we'll need is a new table option that specifies the key and designates the table as one sourced from a single key rather than a range of keys. Say we call this option &lt;i&gt;&lt;span&gt;singleton_key&lt;/span&gt;&lt;/i&gt;. Then we might do something like:&lt;br /&gt;&lt;blockquote&gt;&lt;pre&gt;CREATE FOREIGN TABLE hugeset (value text)&lt;br /&gt;SERVER localredis&lt;br /&gt;OPTIONS (tabletype 'list', singleton_key 'myhugelist');&lt;br /&gt;&lt;/pre&gt;&lt;/blockquote&gt;This option would be incompatible with the &lt;i&gt;&lt;span&gt;tablekeyset&lt;/span&gt;&lt;/i&gt; and &lt;i&gt;&lt;span&gt;tablekeyprefix&lt;/span&gt;&lt;/i&gt; options. If given, the key won't be looked up at all. We would simply use the given key and return the corresponding list of values. That would make selecting from such a table faster - possibly lots faster. For scalars, sets and lists, the table would have one column. In the case of a scalar there would only be one row. For zsets, it would have two columns - one for the value and one for the score. Finally, for hashes it would have two, one for the property and one for the value.&lt;br /&gt;&lt;br /&gt;This looks like a useful possible enhancement.</description><guid isPermaLink="false">tag:blogger.com,1999:blog-2356137376934964551.post-283753381675920054</guid><pubDate>Wed, 15 May 2013 16:44:39 GMT</pubDate></item><item><title>Michael Paquier: Postgres 9.3 feature highlight: new flavors of IF EXISTS and IF NOT EXISTS</title><link>http://michael.otacoo.com/postgresql-2/postgres-9-3-feature-highlight-new-flavors-of-if-exists-and-if-not-exists/</link><description>IF EXISTS and IF NOT EXISTS are clauses allowing to return a notice message instead of an error if a DDL query running on a given object already exists or not depending on the DDL action done. If a given query tries to create an object when IF NOT EXISTS is specified, a notice message &lt;a href="http://michael.otacoo.com/postgresql-2/postgres-9-3-feature-highlight-new-flavors-of-if-exists-and-if-not-exists/"&gt;[...]&lt;/a&gt;</description><guid isPermaLink="false">http://michael.otacoo.com/?p=1889</guid><pubDate>Wed, 15 May 2013 11:51:07 GMT</pubDate></item><item><title>Hans-Juergen Schoenig: Finding similar texts in PostgreSQL</title><link>http://www.cybertec.at/finding-similar-texts-in-postgresql/</link><description>20 years ago it was enough for a database to simply check if one string was identical to some other string. Those times are long gone and thus several algorithms to do fuzzy string matches have been developed over time. Many of these mechanisms, such as trigrams, regular expressions, soundex and so on are already [...]</description><guid isPermaLink="false">http://www.cybertec.at/?p=2299</guid><pubDate>Wed, 15 May 2013 10:30:10 GMT</pubDate></item><item><title>Josh Berkus: PostgreSQL New Development Priorities: Scale It Now</title><link>http://www.databasesoup.com/2013/05/postgresql-new-development-priorities.html</link><description>The comments on &lt;a href="http://www.databasesoup.com/2013/03/postgresqls-new-development-priorities.html"&gt;my introductory post on this topic&lt;/a&gt; mentioned a lot of the major features which users would like to see&amp;nbsp;in PostgreSQL.&amp;nbsp; Among those mentioned were:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Improvements to replication&lt;/li&gt;&lt;li&gt;Parallel query&lt;/li&gt;&lt;li&gt;Index-organized tables&lt;/li&gt;&lt;li&gt;Better partitioning&lt;/li&gt;&lt;li&gt;Better full-text search&lt;/li&gt;&lt;li&gt;Logical streaming replication&lt;/li&gt;&lt;/ul&gt;However, as with other projects, our perennial temptation is to listen to current users rather than potential users.&amp;nbsp; We can focus on making PostgreSQL better for the people who already use it.&amp;nbsp; It's attractive, but that way lies obsolescence.&lt;br /&gt;&lt;br /&gt;What we need to focus on is the reasons why people don't use PostgreSQL at all.&amp;nbsp; Only by exploiting new markets -- by pushing Postgres into places which never had a database before -- do we grow the future PostgreSQL community.&amp;nbsp; And there's a bunch of ways we are failing new users. &lt;br /&gt;&lt;br /&gt;For example, listen to Nelson Elhage, engineer at Stripe.com:&lt;br /&gt;&lt;blockquote class="tr_bq"&gt;"I love Mongo's HA story.&amp;nbsp; Out of the box I can build a 3-node Mongo cluster with a full replica set.&amp;nbsp; I can add nodes, I can fail over, without losing data."&lt;/blockquote&gt;Wouldn't it be nice if we could say the same thing about Postgres?&amp;nbsp; But we can't.&lt;br /&gt;&lt;br /&gt;If we're looking for a #1 PostgreSQL development priority, this is it:&lt;br /&gt;&lt;b&gt;&lt;br /&gt;&lt;/b&gt;&lt;span&gt;&lt;b&gt;We need a "scale it now" button.&lt;/b&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;This is where we're losing ground to the new databases.&amp;nbsp; In every other way we are superior: durability, pluggability, standards-compliance, query sophistication, everything.&amp;nbsp; But when a PostgreSQL user outstrips the throughput of a single server or a single EC2 instance, our process for scaling out sucks.&amp;nbsp; It's complicated.&amp;nbsp; It has weird limitations.&amp;nbsp; And most of all, our scale-out requires advanced database expertise, which is expensive and in short supply.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;We need some way for users to go smoothly and easily from one Postgres node to three and then to ten.&amp;nbsp; Until we do that, we will continue to lose ground to databases which do a better job at scaling, even if they suck at everything else.&lt;br /&gt;&lt;br /&gt;I don't know what our answer to scaling out will be.&amp;nbsp; Streaming logical replication, PostgresXC, Translattice, Hadapt, something else ... I don't know.&amp;nbsp; But if we don't answer this vital question, all our incremental improvements will amount to naught.</description><guid isPermaLink="true">tag:blogger.com,1999:blog-7476449567742726187.post-3163987061643509312</guid><pubDate>Wed, 15 May 2013 07:44:42 GMT</pubDate></item><item><title>Andrew Dunstan: Redis FDW gets a slightly new home</title><link>http://adpgtech.blogspot.com/2013/05/redis-fdw-gets-slightly-new-home.html</link><description>To avoid a few administrative difficulties imposed by github, Dave Page and I have moved the repo for the Redis FDW to &lt;a href="https://github.com/pg-redis-fdw/redis_fdw"&gt;https://github.com/pg-redis-fdw/redis_fdw&lt;/a&gt;. If you have a clone of the old repo, it should be sufficient to change the remote setting in the .git/config file, replacing "dpage" with "pg-redis-fdw". The commits are identical up to the point where we removed everything from the old repo.&lt;br /&gt;&lt;br /&gt;Sorry for any inconvenience.</description><guid isPermaLink="false">tag:blogger.com,1999:blog-2356137376934964551.post-3262161141373136719</guid><pubDate>Tue, 14 May 2013 18:50:43 GMT</pubDate></item><item><title>Raghavendra Rao: What if, import file (txt/csv) having "BOM-ed UTF-8" encoding?</title><link>http://raghavt.blogspot.com/2013/05/what-if-import-file-txtcsv-having-bom.html</link><description>&lt;div dir="ltr"&gt;So what is "UTF-8 BOM" mean ? its byte order mark for UTF-8, some bytes (0xEF,0xBB,0xBF) are added at the start of the file to indicate that the file having unicode characters in it. BOM Characters "ï»¿9".&lt;br /&gt;&lt;br /&gt;As per Unicode documentation, the presence of BOM in file are useless, because it causes problems with non-BOM-aware software's to identify or parse the leading characters having at the start. Same has been quoted at the bottom of the &lt;a href="http://en.wikipedia.org/wiki/Byte-order_mark#cite_note-2" target="_blank"&gt;Wikipedia page&lt;/a&gt;:&lt;br /&gt;&lt;br /&gt;&lt;b&gt;&lt;u&gt;&lt;i&gt;&lt;span&gt;Related errors in PostgreSQL:&lt;/span&gt;&lt;/i&gt;&lt;/u&gt;&lt;/b&gt;&lt;br /&gt;&lt;span&gt;ERROR:  invalid input syntax for integer: "ï»¿9" (in psql-client)&lt;/span&gt;&lt;br /&gt;&lt;span&gt;SQL state: 22P02 (in PgAdmin-III)&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;b&gt;&lt;i&gt;&lt;span&gt;&lt;u&gt;Test case &amp;amp; fix on Windows:&lt;/u&gt;&lt;/span&gt;&lt;/i&gt;&lt;/b&gt;&lt;br /&gt;Sample file "state_data.txt" created in NOTEPAD with unicode characters in it:&lt;br /&gt;&lt;pre class="cpp" name="code"&gt;9,Karnataka,कर्नाटक&lt;br /&gt;10,Kerala,केरळा&lt;br /&gt;&lt;/pre&gt;&lt;b&gt;&lt;i&gt;&lt;span&gt;Table to import data:&lt;/span&gt;&lt;/i&gt;&lt;/b&gt;&lt;br /&gt;&lt;pre class="cpp" name="code"&gt;create table states(state_code int, state_name char(30), state_in_hindi text);&lt;br /&gt;&lt;/pre&gt;&lt;span&gt;&lt;b&gt;&lt;i&gt;Error:&lt;/i&gt;&lt;/b&gt;&lt;/span&gt;&lt;br /&gt;&lt;pre class="cpp" name="code"&gt;postgres=# copy test from 'c:/Pgfile/state_data.txt' with delimiter ',' CSV;&lt;br /&gt;ERROR:  invalid input syntax for integer: "ï»¿9"&lt;br /&gt;CONTEXT:  COPY test, line 1, column state_code: "ï»¿9"&lt;br /&gt;&lt;/pre&gt;To fix, I have used a tool &lt;b&gt;&lt;i&gt;&lt;span&gt;"bomremover.exe"&lt;/span&gt;&lt;/i&gt;&lt;/b&gt; to remove leading characters from a file as its on windows, if its on linux, then there are many tips &amp;amp; tricks available on net to wipe BOM from a utf-8 format file.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;&lt;i&gt;&lt;u&gt;&lt;span&gt;Tool Download link and usage:&lt;/span&gt;&lt;/u&gt;&lt;/i&gt;&lt;/b&gt;&lt;br /&gt;&lt;i&gt;&lt;span&gt;&lt;a href="http://www.mannaz.at/codebase/utf-byte-order-mark-bom-remover/"&gt;http://www.mannaz.at/codebase/utf-byte-order-mark-bom-remover/&lt;/a&gt;&lt;/span&gt;&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;&lt;pre class="cpp" name="code"&gt;Eg:-&lt;br /&gt;C:\Pgfile&amp;gt;bomremover.exe . *&lt;br /&gt;Added '.\state_data.txt' to processing list.&lt;br /&gt;Press enter to process all files in the list. (1 files in total)&lt;br /&gt;&lt;br /&gt;Processing file '.\state_data.txt'...&lt;br /&gt;Finished. Press Enter to Exit&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;After running bomremover.exe on file, re-run COPY command which will succeed to import data. &lt;br /&gt;&lt;pre class="cpp" name="code"&gt; state_code | state_name | State_name_in_hindi&lt;br /&gt;------------+------------+---------------------&lt;br /&gt;          9 | Karnataka  | αñòαñ░αÑ&amp;lt;8d&amp;gt;αñ¿αñ╛αñƒαñò&lt;br /&gt;         10 | Kerala     | αñòαÑçαñ░αñ│αñ╛&lt;br /&gt;(2 rows)&lt;br /&gt;&lt;!--8d--&gt;&lt;/pre&gt;&lt;br /&gt;Some of the editors, avoids default saving text with UTF8-BOM:&lt;br /&gt;- Windows - Notepad++ (In Notepade default BOM enabled)&lt;br /&gt;- Linux - VI&lt;br /&gt;- Mac - TextEdit&lt;br /&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;--Raghav&lt;/div&gt;</description><guid isPermaLink="true">tag:blogger.com,1999:blog-716982283843360345.post-1833069763083763755</guid><pubDate>Tue, 14 May 2013 18:11:16 GMT</pubDate></item><item><title>Andrew Dunstan: Sometimes you just get hit by a huge application load</title><link>http://adpgtech.blogspot.com/2013/05/sometimes-you-just-get-hit-by-huge.html</link><description>I love emergency calls. They give me a chance to get into something new, very quickly, and get them fixed. A few days ago a company called &lt;a href="http://spaceinch.com/" target="_blank"&gt;SpaceInch&lt;/a&gt; made a new app called &lt;a href="http://saythesamething.com/" target="_blank"&gt;Say The Same Thing&lt;/a&gt;, which became an instant hit, possibly due to the quite awesome &lt;a href="http://www.youtube.com/watch?v=2sP1DqyagXE" target="_blank"&gt;video by OK Go&lt;/a&gt;, which you should watch even if you don't look at anything else in this post.&lt;br /&gt;&lt;br /&gt;Since the launch they have been hit by a huge flood of users and last night things started to drag, and they asked PostgreSQL Experts for some emergency help. The app is backed by a PostgreSQL database running on &lt;a href="https://postgres.heroku.com/" target="_blank"&gt;Heroku&lt;/a&gt;.&amp;nbsp; During a phone call of two hours, I was able to get in, examine the database, and devise some steps that we hope will have the effect of making things run smoother.&lt;br /&gt;&lt;br /&gt;The solution is going to involve some partitioning of one or two tables. This isn't being done so much for the benefit of constraint exclusion as because it lets them drop large amounts of data that is no longer needed, cheaply. Previously they had been cleaning one of these tables by deleting rows, but abandoned that as it adversely affected performance.&amp;nbsp; But the consequence of that was that the table just grew and grew to several million rows in a few days. Dropping child tables that contain data that is of no further interest will work well to tame the data size without all the overhead of row deletion.&lt;br /&gt;&lt;br /&gt;Troubles sometimes don't come singly. I actually had to deal with two emergencies last night, which is quite unusual - the other was also for a client running on Amazon. For good or ill, we are seeing more and more use of Postgres instances running in the cloud - whether managed by the client or on a managed service such as Heroku.</description><guid isPermaLink="false">tag:blogger.com,1999:blog-2356137376934964551.post-6591408335906445446</guid><pubDate>Tue, 14 May 2013 16:17:12 GMT</pubDate></item><item><title>Jignesh Shah: How can PostgreSQL 9.3 beta1 help you?</title><link>http://jkshah.blogspot.com/2013/05/how-can-postgresql-93-beta1-help-you.html</link><description>&lt;div dir="ltr"&gt;&lt;a href="http://www.postgresql.org/about/news/1463/" target="_blank"&gt;PostgreSQL 9.3 beta1&lt;/a&gt; is now available. Giving early access to software is always a good idea to test out evolutionary, revolutionary, radical ideas because unless it is field tested, it has not gone through its trial&amp;nbsp;by fire to be proven gold.&lt;br /&gt;&lt;br /&gt;There are many new changes introduced&amp;nbsp;in PostgreSQL 9.3 beta1 and I do have few favorites in them. &lt;br /&gt;&lt;br /&gt;For example Disk page checksums to detect filesystem failures. In fact this would allow VMware to use the now standard disk page checksum instead of a custom feature. This highly debated feature is required to identify silent bit corruptions (or deter malicious ones).&amp;nbsp;I have been told in talks with database administrators (not just PostgreSQL DBAs) that typically in a year they would face one such incident atleast where one of the disk would show such a bit rot which goes unnoticed without any instrumentations to catch it.&lt;br /&gt;&lt;br /&gt;Another change that goes in the right direction is how PostgreSQL maps the shared memory. This small change now allows no kernel changes to be done to start the database with a bigger shared buffer pool. This now allows one less cookbook step to be done to get the database working. Considering that in this cloud world where there are 100,000 VMs running databases one less step is a huge increase in productivity&amp;nbsp;since this step actually required privileges higher than the database instance owner.&lt;br /&gt;&lt;br /&gt;Yet another favorite feature is the custom background workers. This new mechanism is certainly a popular one in our team at VMware where are using it heavily to move some of the changes that we had done into custom background workers deployed as extensions and allowed us to align with core PostgreSQL and extra features enabled as extensions as needed.&lt;br /&gt;&lt;br /&gt;Next I want to talk of three features : Writeable Foreign Tables and pgsql Foreign Data Wrapper and Automatically update&amp;nbsp;VIEWs&amp;nbsp;together. These features on its own itself are very useful and generic. However when used together it actually opens new possibilities using multiple federated PostgreSQL databases shards with single logical view of the whole database as one. Quite Powerful if you think about it. I hope to see people trying these fundamental features into new derived features now made possible.&lt;br /&gt;&lt;br /&gt;Also new JSON functions help PostgreSQL on its evolution to be the Data Platform not just for relational data but also document data. &lt;br /&gt;&lt;br /&gt;&lt;br /&gt;While I have barely scratched the surface of&lt;a href="http://wiki.postgresql.org/wiki/What%27s_new_in_PostgreSQL_9.3#Updatable_Views" target="_blank"&gt; all the new features&lt;/a&gt; in PostgreSQL 9.3 beta1, I am already excited with this release and the possibilities I see going forward&amp;nbsp; in the world of data.&lt;br /&gt;&lt;/div&gt;</description><guid isPermaLink="true">tag:blogger.com,1999:blog-17085626.post-563666955225192461</guid><pubDate>Tue, 14 May 2013 07:33:41 GMT</pubDate></item><item><title>Bruce Momjian: PgLife Averages Thirty Active Users</title><link>http://momjian.us/main/blogs/pgblog/2013.html#May_13_2013</link><description>&lt;p&gt;A week after the &lt;a class="txt2html" href="http://momjian.us/main/blogs/pgblog/2013.html#May_6_2013"&gt;release&lt;/a&gt; of &lt;a class="txt2html" href="http://pglife.momjian.us"&gt;PgLife&lt;/a&gt;, the site
is averaging thirty active users.  (I define an active user as an IP address that has viewed the site for at least five minutes
during the past hour.)  I consider that a success.  Since the release of PgLife, I have increased the content update interval and added an
&lt;a class="txt2html" href="http://pglife.momjian.us/about.html"&gt;&lt;em&gt;About&lt;/em&gt;&lt;/a&gt; page explaining the site's purpose, which also includes the active user count.
&lt;/p&gt;
&lt;p&gt;The site uses &lt;a class="txt2html" href="https://en.wikipedia.org/wiki/Ajax_%28programming%29"&gt;AJAX&lt;/a&gt;, &lt;a class="txt2html" href="http://www.perl.org/"&gt;Perl&lt;/a&gt;,
&lt;a class="txt2html" href="http://www.procmail.org/"&gt;Procmail rules&lt;/a&gt;, and &lt;a class="txt2html" href="http://httpd.apache.org/"&gt;Apache&lt;/a&gt; to collect and deliver dynamic content.  Recent
improvements in the Postgres &lt;a class="txt2html" href="http://www.postgresql.org/list/"&gt;mailing list archive&lt;/a&gt;
&lt;a class="txt2html" href="http://www.postgresql.org/message-id/CABUevEwmR5+9fc1i1cFVpc1S8XShHwvKopcaKKfq-Txa+_mq_g@mail.gmail.com"&gt;feature set&lt;/a&gt; have made
linking to emails much simpler.
&lt;/p&gt;
&lt;p&gt;PgLife was my first attempt at a dynamic website, and I learned a few things.  First, I learned the value of having an
&lt;a class="txt2html" href="http://pglife.momjian.us/data/alert.html"&gt;alert&lt;/a&gt; file that can force a browser reload to push fixes and improvements to the browser. 
Second, I used the same file to allow pushing of news alerts to users, e.g. &lt;a class="txt2html" href="http://www.postgresql.org/about/news/1463/"&gt;9.3 Beta1&lt;/a&gt;. 
Third, I learned the importance of controlling browser and server caching and revalidation when using dynamic content.
&lt;/p&gt;
&lt;p&gt;&lt;a href="http://momjian.us/main/blogs/pgblog/2013.html#May_13_2013"&gt;Continue Reading &amp;raquo;&lt;/a&gt;&lt;/p&gt;</description><guid isPermaLink="false">http://momjian.us/main/blogs/pgblog/2013.html#May_13_2013</guid><pubDate>Mon, 13 May 2013 19:15:01 GMT</pubDate></item><item><title>Leo Hsu and Regina Obe: PostgreSQL 9.3 extension treats for windows users: plV8</title><link>http://www.postgresonline.com/journal/archives/305-PostgreSQL-9.3-extension-treats-for-windows-users-plV8.html</link><description>&lt;p&gt;Now that PostgreSQL 9.3 beta1 has been released we've started to jump start our experimentation by compiling our favorite extensions.  First on the list is PL/V8 js.&lt;/p&gt;

&lt;p&gt;This was compiled against 9.3beta1 for 64-bit and 32-bit and  plv8 version 1.4.0.  We briefly tried with the EDB windows builds which we downloaded from: &lt;a href="http://www.enterprisedb.com/products-services-training/pgbindownload" target="_blank"&gt;http://www.enterprisedb.com/products-services-training/pgbindownload&lt;/a&gt; and seems to work fine.&lt;/p&gt;

&lt;ul&gt;&lt;li&gt;&lt;a href="http://www.postgresonline.com/downloads/pg93plv8jsbin_w32.zip" target="_blank"&gt;PostgreSQL 9.3 plv8 32-bit download&lt;/a&gt;&lt;/li&gt; 
&lt;li&gt;&lt;a href="http://www.postgresonline.com/downloads/pg93plv8jsbin_w64.zip" target="_blank"&gt;PostgreSQL 9.3 plv8 64-bit download&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We hope windows users find these useful.&lt;/p&gt;</description><guid isPermaLink="false">http://www.postgresonline.com/journal/archives/305-guid.html</guid><pubDate>Mon, 13 May 2013 19:09:00 GMT</pubDate></item><item><title>Joshua Tolley: Foreign Data Wrappers</title><link>http://blog.endpoint.com/2013/05/foreign-data-wrappers.html</link><description>&lt;div&gt;&lt;p&gt;&lt;a href="http://3.bp.blogspot.com/-WHX5jwAPSIc/UY0qtnHguNI/AAAAAAAAAlk/7Jk9lZzvbwI/s1600/g3306.png"&gt;&lt;img border="0" src="http://3.bp.blogspot.com/-WHX5jwAPSIc/UY0qtnHguNI/AAAAAAAAAlk/7Jk9lZzvbwI/s320/g3306.png" /&gt;&lt;/a&gt;&lt;/p&gt;&lt;p align="center"&gt;&lt;small&gt;Original images from Flickr user &lt;a href="http://www.flickr.com/photos/jenniferwilliams/"&gt;jenniferwilliams&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;&lt;/div&gt;&lt;p&gt;One of our clients, for various historical reasons, runs both MySQL and PostgreSQL to support their website.  Information for user login lives in one database, but their customer activity lives in the other. The eventual plan is to consolidate these databases, but thus far, other concerns have been more pressing. So when they needed a report combining user account information and customer activity, the involvement of two separate databases became a significant complicating factor.&lt;/p&gt;&lt;p&gt;In similar situations in the past, using earlier versions of PostgreSQL, we've written scripts to pull data from MySQL and dump it into PostgreSQL. This works well enough, but we've updated PostgreSQL fairly recently, and can use the SQL/MED features added in version 9.1. &lt;a href="http://wiki.postgresql.org/wiki/Foreign_data_wrappers"&gt;SQL/MED&lt;/a&gt; ("MED" stands for "Management of External Data") is a decade-old standard designed to allow databases to make external data sources, such as text files, web services, and even other databases look like normal database tables, and access them with the usual SQL commands. PostgreSQL has supported some of the SQL/MED standard since version 9.1, with a feature called Foreign Data Wrappers, and among other things, it means we can now access MySQL through PostgreSQL seamlessly.&lt;/p&gt;&lt;p&gt;The first step is to install the right software, called mysql_fdw. It comes to us via Dave Page, PostgreSQL core team member and contributor to many projects.  It's worth noting Dave's warning that he considers this experimental code. For our purposes it works fine, but as will be seen in this post, we didn't push it too hard. We opted to &lt;a href="https://github.com/dpage/mysql_fdw"&gt;download the source&lt;/a&gt; and build it, but installing using pgxn works as well:&lt;/p&gt;&lt;pre class="brush:plain"&gt;$ env USE_PGXS=1 pgxnclient install mysql_fdw
INFO: best version: mysql_fdw 1.0.1
INFO: saving /tmp/tmpjrznTj/mysql_fdw-1.0.1.zip
INFO: unpacking: /tmp/tmpjrznTj/mysql_fdw-1.0.1.zip
INFO: building extension
gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -g -fpic -I/usr/include/mysql -I. -I. -I/home/josh/devel/pg91/include/postgresql/server -I/home/josh/devel/pg91/include/postgresql/internal -D_GNU_SOURCE -I/usr/include/libxml2   -c -o mysql_fdw.o mysql_fdw.c
mysql_fdw.c: In function ‘mysqlPlanForeignScan’:
mysql_fdw.c:466:8: warning: ‘rows’ may be used uninitialized in this function [-Wmaybe-uninitialized]
gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -g -fpic -shared -o mysql_fdw.so mysql_fdw.o -L/home/josh/devel/pg91/lib -L/usr/lib  -Wl,--as-needed -Wl,-rpath,'/home/josh/devel/pg91/lib',--enable-new-dtags  -L/usr/lib/x86_64-linux-gnu -lmysqlclient -lpthread -lz -lm -lrt -ldl
INFO: installing extension
&amp;lt; ... snip ... &amp;gt;
&lt;/pre&gt;&lt;p&gt;Here I'll refer to the documentation provided in &lt;a href="https://github.com/dpage/mysql_fdw/blob/master/README"&gt;mysql_fdw's README&lt;/a&gt;. The first step in using a foreign data wrapper, once the software is installed, is to create the foreign server, and the user mapping. The foreign server tells PostgreSQL how to connect to MySQL, and the user mapping covers what credentials to use. This is an interesting detail; it means the foreign data wrapper system can authenticate with external data sources in different ways depending on the PostgreSQL user involved. You'll note the pattern in creating these objects: each simply takes a series of options that can mean whatever the FDW needs them to mean. This allows the flexibility to support all sorts of different data sources with one interface.&lt;/p&gt;&lt;p&gt;The final step in setting things up is to create a foreign table. In MySQL's case, this is sort of like a view, in that it creates a PostgreSQL table from the results of a MySQL query. For our purposes, we needed access to several thousand structurally identical MySQL tables (I mentioned the goal is to move off of this one day, right?), so I automated the creation of each table with a simple bash script, which I piped into psql:&lt;/p&gt;&lt;pre class="brush:plain"&gt;for i in `cat mysql_tables`; do
    echo "CREATE FOREIGN TABLE mysql_schema.$i ( ... table definition ...)
        SERVER mysql_server OPTIONS (
            database 'mysqldb',
            query 'SELECT ... some fields ... FROM $i'
        );"
done
&lt;/pre&gt;&lt;p&gt;In a step not shown above, this script also consolidates the data from each table into one, native PostgreSQL table, to simplify later reporting. In our case, pulling the data once and reporting on the results is perfectly acceptable; in other words, data a few seconds old wasn't a concern. We also didn't need to write back to MySQL, which presumably could complicate things somewhat. We did, however, run into the same data validation problems PostgreSQL users habitually complain about when working with MySQL. Here's an example, in my own test database:&lt;/p&gt;&lt;pre class="brush:plain"&gt;mysql&gt; create table bad_dates (mydate date);
Query OK, 0 rows affected (0.07 sec)

mysql&gt; insert into bad_dates values ('2013-02-30'), ('0000-00-00');
Query OK, 2 rows affected (0.02 sec)
Records: 2  Duplicates: 0  Warnings: 0
&lt;/pre&gt;&lt;p&gt;Note that MySQL silently transformed '2013-02-30' into '0000-00-00'. Sigh. Then, in psql we do this:&lt;/p&gt;&lt;pre class="brush:plain"&gt;josh=# create extension mysql_fdw;
CREATE EXTENSION

josh=# create server mysql_svr foreign data wrapper mysql_fdw options (address '127.0.0.1', port '3306');
CREATE SERVER

josh=# create user mapping for public server mysql_svr options (username 'josh', password '');
CREATE USER MAPPING

josh=# create foreign table bad_dates (mydate date) server mysql_svr options (query 'select * from test.bad_dates');
CREATE FOREIGN TABLE

josh=# select * from bad_dates ;
ERROR:  date/time field value out of range: "0000-00-00"
&lt;/pre&gt;&lt;p&gt;We've told PostgreSQL we'll be feeding it valid dates, but MySQL's idea of a valid date differs from PostgreSQL's, and the latter complains when the dates don't meet its stricter requirements. Several different workarounds exist, including admitting that '0000-00-00' really is wrong and cleaning up MySQL, but in this case, we modified the query underlying the foreign table to fix the dates on the fly:&lt;/p&gt;&lt;pre class="brush:sql"&gt;SELECT CASE disabled WHEN '0000-00-00' THEN NULL ELSE disabled END,
    -- various other fields
    FROM some_table
&lt;/pre&gt;&lt;p&gt;Fortunately this is the only bit of MySQL / PostgreSQL impedance mismatch that has tripped us up thus far; we'd have to deal with any others we found individually, just as we did this one.&lt;/p&gt;</description><guid isPermaLink="true">tag:blogger.com,1999:blog-7997313029981170997.post-8708861390602568574</guid><pubDate>Mon, 13 May 2013 14:58:23 GMT</pubDate></item><item><title>Joel Jacobson: ::xml madness</title><link>http://joelonsql.com/2013/05/13/xml-madness/</link><description>&lt;p&gt;Nobody likes XML, except masochists and Microsoft consultants who charge by the hour.&lt;br /&gt;
Sometimes you are forced to deal with XML anyway, such as parsing the response from external APIs.&lt;br /&gt;
Parsing XML is per se a nasty business to begin with, but in this particular example, the ugliness set new records.&lt;/p&gt;
&lt;p&gt;The only &amp;#8220;solution&amp;#8221; I came up with is too ugly for production use, but the alternatives were even uglier, so I had no option.&lt;/p&gt;
&lt;p&gt;I hope there is someone out there reading this who can present a proper solution to the problem.&lt;/p&gt;
&lt;p&gt;Let&amp;#8217;s say you have this XML:&lt;/p&gt;
&lt;pre class="brush: xml; title: ; notranslate"&gt;
&amp;lt;testxml xmlns:ns1=&amp;quot;http://www.example.com&amp;quot; xmlns:ns2=&amp;quot;http://www.example2.com&amp;quot;&amp;gt;
    &amp;lt;ns1:foo&amp;gt;
        &amp;lt;ns2:bar&amp;gt;baz&amp;lt;/ns2:bar&amp;gt;
    &amp;lt;/ns1:foo&amp;gt;
&amp;lt;/testxml&amp;gt;
&lt;/pre&gt;
&lt;p&gt;Using &lt;code&gt;xpath()&lt;/code&gt; you extract the content of &lt;code&gt;testxml&lt;/code&gt;:&lt;/p&gt;
&lt;pre class="brush: sql; title: ; notranslate"&gt;
SELECT xpath(
    '/testxml/ns1:foo',
    '&amp;lt;testxml xmlns:ns1=&amp;quot;http://www.example.com&amp;quot; xmlns:ns2=&amp;quot;http://www.example2.com&amp;quot;&amp;gt;&amp;lt;ns1:foo&amp;gt;&amp;lt;ns2:bar&amp;gt;baz&amp;lt;/ns2:bar&amp;gt;&amp;lt;/ns1:foo&amp;gt;&amp;lt;/testxml&amp;gt;'::xml,
    ARRAY[
        ['ns1','http://www.example.com'],
        ['ns2','http://www.example2.com']
    ]
);
-- Result: &amp;lt;ns1:foo&amp;gt;&amp;lt;ns2:bar&amp;gt;baz&amp;lt;/ns2:bar&amp;gt;&amp;lt;/ns1:foo&amp;gt;
&lt;/pre&gt;
&lt;p&gt;The returned XML is not valid since its missing the xmlns definitions,&lt;br /&gt;
but the PostgreSQL XML data type doesn&amp;#8217;t complain, which is OK I guess,&lt;br /&gt;
a bit of quirks mode perhaps?&lt;/p&gt;
&lt;p&gt;Because of the missing xmlns, it&amp;#8217;s impossible to make use of this XML fragment returned.&lt;br /&gt;
You cannot extract any subsequent sub-elements in it using XPath.&lt;/p&gt;
&lt;p&gt;For instance, this won&amp;#8217;t work:&lt;/p&gt;
&lt;pre class="brush: sql; title: ; notranslate"&gt;
SELECT xpath(
    '/ns1:foo/ns2:bar/text()',
    (xpath(
        '/testxml/ns1:foo',
        '&amp;lt;testxml xmlns:ns1=&amp;quot;http://www.example.com&amp;quot; xmlns:ns2=&amp;quot;http://www.example2.com&amp;quot;&amp;gt;&amp;lt;ns1:foo&amp;gt;&amp;lt;ns2:bar&amp;gt;baz&amp;lt;/ns2:bar&amp;gt;&amp;lt;/ns1:foo&amp;gt;&amp;lt;/testxml&amp;gt;'::xml,
        ARRAY[
            ['ns1','http://www.example.com'],
            ['ns2','http://www.example2.com']
        ]
    ))[1]
);

-- Error:
-- ERROR:  could not create XPath object
-- DETAIL:  namespace error : Namespace prefix ns1 on foo is not defined
-- &amp;lt;ns1:foo&amp;gt;
--         ^
-- namespace error : Namespace prefix ns2 on bar is not defined
--   &amp;lt;ns2:bar&amp;gt;baz&amp;lt;/ns2:bar&amp;gt;
--           ^
-- Undefined namespace prefix
-- xmlXPathCompiledEval: evaluation failed
&lt;/pre&gt;
&lt;p&gt;Even if you pass it the NSArray in the outer xpath() call, you don&amp;#8217;t get &amp;#8216;baz&amp;#8217; but nothing at all.&lt;/p&gt;
&lt;pre class="brush: sql; title: ; notranslate"&gt;
SELECT xpath(
    '/ns1:foo/ns2:bar/text()',
    (xpath(
        '/testxml/ns1:foo',
        '&amp;lt;testxml xmlns:ns1=&amp;quot;http://www.example.com&amp;quot; xmlns:ns2=&amp;quot;http://www.example2.com&amp;quot;&amp;gt;&amp;lt;ns1:foo&amp;gt;&amp;lt;ns2:bar&amp;gt;baz&amp;lt;/ns2:bar&amp;gt;&amp;lt;/ns1:foo&amp;gt;&amp;lt;/testxml&amp;gt;'::xml,
        ARRAY[
            ['ns1','http://www.example.com'],
            ['ns2','http://www.example2.com']
        ]
    ))[1],
    ARRAY[
        ['ns1','http://www.example.com'],
        ['ns2','http://www.example2.com']
    ]
);

-- Returns:
--  xpath 
-- -------
--  {}
-- (1 row)
-- And NOT 'baz' which is what we want.
&lt;/pre&gt;
&lt;p&gt;Therefore, given you have an XML &amp;#8220;fragment&amp;#8221; where the tags have ns:___ but without the xmlns:___=&amp;#8221;URI&amp;#8221; part,&lt;br /&gt;
is there any way to extract sub-elements, given you know the namespaces?&lt;/p&gt;
&lt;p&gt;For instance, by wrapping the XML &amp;#8220;fragment&amp;#8221; inside an outer XML tag,&lt;br /&gt;
and specifying bogous xmlns elements for the namespaces,&lt;br /&gt;
I managed to hack together a work-around, probably with hundreds of flaws,&lt;br /&gt;
and this cannot possibly be the best way to approach this problem.&lt;/p&gt;
&lt;pre class="brush: sql; title: ; notranslate"&gt;
SELECT xpath_fragment('/ns1:foo/ns2:bar/text()','&amp;lt;ns1:foo&amp;gt;&amp;lt;ns2:bar&amp;gt;baz&amp;lt;/ns2:bar&amp;gt;&amp;lt;/ns1:foo&amp;gt;',ARRAY['ns1','ns2']);

 xpath_fragment 
----------------
 {baz}
(1 row)
&lt;/pre&gt;
&lt;p&gt;Source code of insanely ugly &amp;#8220;solution&amp;#8221; to the &amp;#8220;problem&amp;#8221;:&lt;/p&gt;
&lt;pre class="brush: sql; title: ; notranslate"&gt;
CREATE OR REPLACE FUNCTION xpath_fragment(_XPath text, _XML xml, _NSNames text[]) RETURNS XML[] AS $BODY$
DECLARE
_ text;
_WrappedXML xml;
_NSArray text[][] := ARRAY[]::text[][];
BEGIN

SELECT ('&amp;lt;xml ' || array_to_string(array_agg('xmlns:' || unnest || '=&amp;quot; &amp;quot;'),' ') || '&amp;gt;' || _XML::text || '&amp;lt;/xml&amp;gt;')::text INTO _WrappedXML FROM unnest(_NSNames);

FOR _ IN SELECT unnest(_NSNames)
LOOP
    _NSArray := _NSArray || ARRAY[[_, ' ']];
END LOOP;

RETURN xpath('/xml' || _XPath, _WrappedXML, _NSArray);
END;
$BODY$ LANGUAGE plpgsql IMMUTABLE;
&lt;/pre&gt;
&lt;p&gt;Ideas, anyone?&lt;/p&gt;
&lt;br /&gt;  &lt;a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/joelonsql.wordpress.com/165/"&gt;&lt;img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/joelonsql.wordpress.com/165/" /&gt;&lt;/a&gt; &lt;img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=joelonsql.com&amp;blog=36282610&amp;post=165&amp;subd=joelonsql&amp;ref=&amp;feed=1" width="1" height="1" /&gt;</description><guid isPermaLink="false">http://joelonsql.com/?p=165</guid><pubDate>Mon, 13 May 2013 13:53:45 GMT</pubDate></item><item><title>Leo Hsu and Regina Obe: PostGIS 2.1.0 beta2 is out and windows binaries available</title><link>http://www.postgresonline.com/journal/archives/304-PostGIS-2.1.0-beta2-is-out-and-windows-binaries-available.html</link><description>&lt;p&gt;PostGIS 2.1.0 beta2 is out. Details on what's new in it are in official news release: &lt;a href="http://postgis.net/2013/05/11/postgis-2-1-0beta2" target="_blank"&gt;http://postgis.net/2013/05/11/postgis-2-1-0beta2&lt;/a&gt;.
 This is the first version of PostGIS to work with PostgreSQL 9.3, so if you are planning to experiment with PostgreSQL 9.3 coming out soon, use this one. Also check out the documentation in new ePUB offering format if you have an ereader and let us know how it looks. It seems to vary alot depending on what ePub reader used.&lt;/p&gt;

&lt;p&gt;For windows users, we've got binary builds available compiled against PostgreSQL 9.3beta1 (and also available for 9.2 9x32,64) and 9.0,9.1 (x64). Details on windows PostGIS downloads page: &lt;a href="http://postgis.net/windows_downloads" target="_blank"&gt;http://postgis.net/windows_downloads&lt;/a&gt;.  It does not yet have the new Advanced 3D offering (provided by SFCGAL &lt;a href="https://github.com/Oslandia/SFCGAL" target="_blank"&gt;https://github.com/Oslandia/SFCGAL&lt;/a&gt;), but we hope to have that compiled and packaged with the binaries before release time.&lt;/p&gt;</description><guid isPermaLink="false">http://www.postgresonline.com/journal/archives/304-guid.html</guid><pubDate>Sun, 12 May 2013 22:38:00 GMT</pubDate></item><item><title>Andrew Dunstan: Only GROUP BY what you really need to</title><link>http://adpgtech.blogspot.com/2013/05/only-group-by-what-you-really-need-to.html</link><description>The old rule used to be that if you have a query that contained aggregated columns, you have to GROUP BY every other column selected. These days you are allowed to omit columns that are provably functionally dependent on one or more of the other grouped by columns. In practice, that means you can omit any columns that are not in the table's primary key if all the primary key columns are grouped by.&lt;br /&gt;&lt;br /&gt;Sometimes people, often including me, just do this fairly mindlessly, but sometimes it bites you. Consider this simple query:&lt;br /&gt;&lt;blockquote&gt;&lt;pre&gt;SELECT a.id as a_id, a.properties, sum(b.amount) as expenses&lt;br /&gt;FROM people a&lt;br /&gt;   JOIN expenses b on a.id = b.person_id&lt;br /&gt;GROUP BY a.id, a.properties&lt;br /&gt;&lt;/pre&gt;&lt;/blockquote&gt;We don't really want the expenses grouped by the person's properties. We just put that in because the parser complains if we don't. And if &lt;span&gt;people&lt;/span&gt; turns out to be a view which joins a couple of tables, we probably can't leave it out either. This can increase the amount of sorting that the GROUP BY requires, which can sometime have dramatic effects on performance. But even worse, there are cases where this can actually cause the query to be unrunnable. One such case is if &lt;span&gt;properties&lt;/span&gt; is a JSON column.&lt;br /&gt;&lt;br /&gt;That might surprise you. It has certainly surprised a couple of people I know. The reason is that there is no equality operator for JSON.&lt;br /&gt;&lt;br /&gt;So, how can we write this so we only GROUP BY what we really need to? One way is to pick up the extra column later in the query, after we have done the grouping, like this: &lt;br /&gt;&lt;blockquote&gt;&lt;pre&gt;WITH exp as &lt;br /&gt;(&lt;br /&gt;  SELECT a.id as a_id, sum(b.amount) as expenses&lt;br /&gt;  FROM people a&lt;br /&gt;     JOIN expenses b on a.id = b.person_id&lt;br /&gt;  GROUP BY a.id&lt;br /&gt;)&lt;br /&gt;SELECT exp.*, p.properties&lt;br /&gt;FROM exp &lt;br /&gt;   JOIN people p ON p.id = exp.a_id&lt;br /&gt;&lt;/pre&gt;&lt;/blockquote&gt;This might look a bit silly. We're adding in an extra join to &lt;span&gt;people&lt;/span&gt; that we shouldn't need. But it turns out in my experience that this actually often works pretty well, and what you pay by way of the extra join is often paid for by the fact that you're simplifying the GROUP BY, and that it is processing smaller rows, uncluttered by the extra columns you want carried through. And, in the case of a JSON column, it has the virtue that it will work.&lt;br /&gt;&lt;br /&gt;I often get called in to look at queries that run slowly and have huge GROUP BY clauses (I have seen them with 50 or so columns). I almost always start by reducing the GROUP BY to the smallest set possible, and this almost always results in a performance gain.</description><guid isPermaLink="false">tag:blogger.com,1999:blog-2356137376934964551.post-3655285427325025767</guid><pubDate>Fri, 10 May 2013 17:31:08 GMT</pubDate></item></channel></rss>