PostgreSQL
The world's most advanced open source database
Top posters
Number of posts in the past month
Top teams
Number of posts in the past month
Feeds
Twitter
Planet
  • Policy for being listed on Planet PostgreSQL.
  • Add your blog to Planet PostgreSQL.
  • List of all subscribed blogs.
  • Manage your registration.
Contact
  • Get in touch with the Planet PostgreSQL administrators at planet at postgresql.org.
So, I started this looking for our five major goals for future PostgreSQL develoment.  The last goal is more nebulous, but I think equally important with the other goals.  It's this: improve the "new user experience".

This is not a new goal, in some ways.  Improving installation, one of our previous 5 goals, was really about improving the experience for new users.  But the new user experience goes beyond installation now, and competition has "raised the bar".  That is, we matched MySQL, but now that's not good enough; we need to match the new databases.   It should be as easy to get started on a dev database with PostgreSQL as it is with, for example, Redis.  Let me give you a summary of the steps to get up, running, and developing an application in the two platforms:

Redis:
  1. install Redis, either from packages or multiplatform binaries.  No root access is required for the binaries.
  2. read a 1-page tutorial
  3. run redis-server
  4. run redis-cli or install drivers for your programming language
  5. start developing
  6. when your app works, deploy to production
  7. in production, tune how much RAM Redis gets.
PostgreSQL:
  1. install PostgreSQL from packages or the one-click installer.  Root/Admin access is usually required.
  2. search the documentation to figure out how to get started. 
  3. figure out whether or not your packages automatically start Postgres.  If not, figure out how to start it.  This may require root access.
  4. Install drivers for your programming language.
  5. Figure out how to connect to PostgreSQL.  This may require making changes to configuration files.
  6. Read more pages of documentation to learn the basics of PostgreSQL's variety of SQL, or how to program an ORM which works with PostgreSQL.
  7. Start developing.
  8. Deploy to production.
  9. Read 20 pages of documentation, plus numerous blogs, wiki pages and online presentations in order to figure out how to tune PostgreSQL.
  10. Tune PostgreSQL for production workload.  Be unsure if you've done it right.
The unfortunate reality is that a new user will hit a lot of points in the "getting to know

[continue reading]

Posted by Bruce Momjian in EnterpriseDB on 2013-05-21 at 02:00:01

I did a video interview while I was at ConFoo, and it is now online.

Parallel query is the first priority from those suggested in the comments that I agree should be a major PostgreSQL development priority.  I think that Joel Jacobson summarized it neatly: Bring Back Moore's Law.  Vertical scaling has always been one of PostgreSQL's strengths, but we're running into hard limits as servers are getting more cores but not faster cores.  We need to be able to use a server's full CPU capacity.

(note: this series of articles is my personal opinion as a PostgreSQL core team member)

The benefits to having some kind of parallel query are obvious to most users and developers today.  Mostly, people tend to think of analytics and parallel query across terabyte-sized tables, and that's definitely one of the reasons we need parallel query.  But possibly a stronger reason, which isn't much talked about, is CPU-heavy extensions -- chief among them, PostGIS.  All of those spatial queries are very processor-heavy; a location search takes a lot of math, a spatial JOIN more so.  While most users of large databases would like parallel query in order to do things a bit faster, PostGIS users need parallism yesterday.

Fortunately, work on parallelism has already started.  Even more fortunately, parallel query isn't a single monumental thing which has to be done as one big chunk; we can add parallelism piecemeal over the next few versions of Postgres.  Rougly, parallel query breaks down into parallelizing all of the following operations:

  • Table scan
  • Index scan
  • Bitmap scan
  • In-memory sort
  • On-disk sort
  • Hashing
  • Merge Join
  • Nested loop join
  • Aggregation
  • Framework for parallel functions

Most of these features can be worked on independently, in any order -- dare I say, developed in parallel?  Joins probably need to be done after sorts and scans, but that's pretty much it.  Noah Misch has chosen to start with parallel in-memory sort, so you can probably expect that for version 9.4.

There’s a fantastic set of blog posts about distributed databases and network partitioning, starting with this post explaining the perils of trying to “communicate with someone who doesn’t know you’re alive.”

The next post is about Postgres and 2-phase commit. And there are four additional posts in the series.

The whole series worth reading for anyone interested in data stores, consistency and Postgres! :)

In this, hopefully 2nd to last, post in the series, I will cover the rest of usually happening operations that you can see in your explain outputs. Unique Name seems to be clear about what's going on here – it removes duplicate data. This might happen, for example, when you're doing: SELECT DISTINCT FIELD FROM [...]

In our previous article we went through describing what retention policies are and how they can be enforced on your PostgreSQL server backups with Barman 1.2. In this post, we will go through the configuration aspects.

For the sake of simplicity, we assume a typical scenario which involves taking full backups once a week through the “barman backup” command. Suppose you want to automatically keep the latest 4 backups and let Barman automatically delete the old ones (obsolete).

The main configuration option for retention policies in Barman is “retention_policy” which can be defined both at global or server level. If you want all your servers by default to keep the last 4 periodical backups, you need to add in the general section of Barman’s configuration file the following line:

[barman]
... // General settings
retention_policy: REDUNDANCY 4

When the next “barman cron” command is executed (every minute if you installed Barman using RPMs or Debian/Ubuntu packages), Barman checks for the number of available full periodical backups for every server, order them in descending chronological order (from the most recent to the oldest one) and deletes backups from the 5th position onwards.

In case you have several servers backed up on the same Barman host and you want to differentiate the retention policy for a specific server, you can simply edit that server configuration section (or file, see “Managing the backup of several PostgreSQL servers with Barman“) and define a different setting:

[malcolm]
description = Malcolm Rocks
ssh_command = ssh malcolm
conninfo = host=malcolm port=5432 user=postgres dbname=postgres
retention_policy: REDUNDANCY 8

However, Barman allows systems administrators to manage retention policies based on time, in terms of recovery window and point of recoverability. For example, you can set another server to allow to recover at any point in time in the last 3 months:

[angus]
description = Angus Rocks
ssh_command = ssh angus
conninfo = host=angus port=5432 user=postgres dbname=postgres
retention

[continue reading]

Really, when you look at the long-term viability of any platform, pluggability is where it's at.  A lot of the success of PostgreSQL to date has been built on extensions and portability, just as the success of AWS has been built on their comprehensive API.   Our future success will be built on taking pluggability even further. 

In addition to pluggable storage, a second thing we really need is a pluggable parser interface.  That is, it should be possible to generate a statement structure, in binary form, and hand that off to libpq for execution.  There was recently some discussion about this on -hackers

If there were a way to hand off expression trees directly to the planner, then this would allow creating extensions which actually had additional syntax, without having to fork PostgreSQL.  This would support most of those "compatibility" extensions, as well as potentially allowing extensions like SKYLINE OF which change SQL behavior.

It would also help support PostgreSQL-based clustered databases, by allowing all of the parsing for a particular client to happen on a remote node and get passed to the clustered backends.  The pgPool2 project has asked for this for several years for that reason.

More intriguingly, it would allow for potentially creating an "ORM" which doesn't have to serialize everything to SQL, but can instead build expression trees directly based on client code.  This would both improve response times, and encourage developers to use a lot of PostgreSQL's more sophisticated features since they could access them directly in their code.

Taking things a step further, we could extend this to allow users to hand a plan tree directly to the executor.  This would fix things for all of the users who actually need query hints (as opposed to those who think they need them), as well as taking efficiency a step beyond cached plans.

There are a lot of reasons this would be just as difficult to do as pluggable storage.  Currently parsing depends on a context-dependant knowledge of system catalogs, inc

[continue reading]

I’ve got a Beer & Tell to give about alembic. Alembic is a migration tool that works with SQLAlchemy. I’m using it for database migrations with PostgreSQL.

So, here’s what I want to say today:

The most difficult thing to deal with so far are the many User Defined Functions that we use in Socorro. This isn’t something that any migration tools I tested deal well with.

Happy to answer questions! And I’ll see about making a longer talk about this transition soon.

Posted by Joe Abbate on 2013-05-17 at 18:48:31

Do you use PostgreSQL and truly believe it’s “the world’s most advanced open source database” and that its upcoming 9.3 release will make it even more awesome?

Do you also use Python and believe it’s “an easy to learn, powerful programming language” with “elegant syntax” that makes it an ideal language for developing applications and tools around PostgreSQL, such as Pyrseas?

Then we could use your help. For starters, we want to add support for the MATERIALIZED VIEWs and EVENT TRIGGERs coming up in PG 9.3.

We have also been requested to add the capability to load and maintain “static data” (relatively small, unchanging tables) as part of yamltodb, so that it can be integrated more easily into database version control workflows.

And for the next release, Pyrseas 0.7, we’d like to include the first version of the database augmentation tool which will support declarative implementation of business logic in the database–starting off with audit trail columns. Some work has been done on this already, but it needs integration with the current code and tests.

Or perhaps coding is not your forte, but you’re really good at explaining and documenting technical “stuff”. Then you could give us a hand with revamping the docs, maybe writing a tutorial so that users have a smooth ride using our tools.

Or maybe you have your own ideas as to how improve the PostgreSQL version control experience. We’d love to hear those too.

If you’d like to help, you can fork the code on GitHub, join the mailing list and introduce yourself, or leave a comment below.


Filed under: PostgreSQL, Python, Version control

Backup and recovery in Postgres-XC has some parallels to PostgreSQL, but with its own wrinkles.

read more

Over the last decade, Greenplum, Vertica, Everest, Paraccel, and a number of non-public projects all forked off of PostgreSQL.  In each case, one of the major changes to the forks was to radically change data storage structures in order to enable new functionality or much better performance on large data.  In general, once a Postgres fork goes through the storage change, they stop contributing back to the main project because their codebase is then different enough to make merging very difficult.

Considering the amount of venture capital money poured into these forks, that's a big loss of feature contributions from the community.  Especially when the startup in question gets bought out by a company who buries it or loots it for IP and then kills the product.

More importantly, we have a number of people who would like to do something interesting and substantially different with PostgreSQL storage, and will likely be forced to fork PostgreSQL to get their ideas to work.  Index-organized tables, fractal trees, JSON trees, EAV-optimized storage, non-MVCC tables, column stores, hash-distributed tables and graphs all require changes to storage which can't currently be fit into the model of index classes and blobs we offer for extensibility of data storage.  Transactional RAM and Persistent RAM in the future may urge other incompatible storage changes.

As a community, we want to capture these innovations and make them part of mainstream Postgres, and their users part of the PostgreSQL community.  The only way to do this is to have some form of pluggable storage, just like we have pluggable function languages  and pluggable index types.

The direct way to do this would be to refactor our code to replace all direct manipulation of storage and data pages with a well-defined API.  This would be extremely difficult, and would produce large performance issues in the first few versions.  It would, however, also have the advantage of allowing us to completely solve the binary upgrade of page format changes issue.

A second a

[continue reading]

If anyone doubts the total commitment of the Postgres project to quality and correctness, let them be reassured by this completely correct but decidedly pedantic commit. I dips me lid to Tom Lane and Thom Brown.
New feature introduced in PostgreSQL 9.3Beta 1 i.e. "Disk page checksums". Thanks to authors Simon Riggs, Jeff Davis & Greg Smith.

In earlier releases, if there's any data corruption block on disk it was silently ignored until any pointer arrives on it or some wrong results shown by the queries. Now, data corruption detected beforehand by triggering WARNING message instead of silently using or waiting for hit on the corrupted block.

Disk page checksums feature implementation is unique,  its not plug-able like EXTENSIONs its selectable feature. That's if you need your database should be under monitory umbrella of data corruption then it should be enabled at the time of cluster initialization not on existing or running cluster. Below's the example how it works.

Initialize the cluster with checksums:
initdb [OPTION]... [DATADIR]
........
-k, --data-checksums use data page checksums

initdb -D data_directory -k
Now, any data corruption found will be notified as below:
postgres=# select * from corruption_test;
WARNING: page verification failed, calculated checksum 63023 but expected 48009
ERROR: invalid page in block 0 of relation base/12896/16409
In earlier version,just an error message.
postgres=# select * from corruption_test where id=1;
ERROR: invalid page header in block 0 of relation base/12870/18192
That's cool right....

So, how do you know whether disk page checksums enabled on the cluster or not ?
As of now, there's no pg_catalog to store such information or any files created in the $PGDATA directory, only pg_control file will hold that information. Using pg_controldata utility you can know about it.
$ export PGDATA=/usr/local/pg93beta/data
$ pg_controldata
....
....
....
Data page checksum version: 1
Some points on Disk page checksums:
1. Temp tables are excluded from checksums checks.
2. There's performance overhead if checksums enabled as per the PG documentation.
3. Once enabled checksums on a cluster cannot be rolled back.

Thanks
Raghav
Posted by Christophe Pettus in pgExperts on 2013-05-16 at 06:59:31

The Call for Papers for DjangoCon US 2013 is now open.

Word of warning: this blogpost is about thing related to Bash (well, maybe other shells too, didn't really test), but since I found it while doing Pg work, and it might bite someone else doing Pg related work, I decided to add it to “postgresql" tag. So, due to some work I had to do, [...]
Posted by Josh Berkus in pgExperts on 2013-05-15 at 18:28:26
The first pgCon Unconference is only 10 days away, and we now have room numbers.  If you want to lead a topic ... or, more importantly, if you want someone else to lead a topic ... please add your topic ideas to the wiki page!  We're expecting over 100 people at the unconference, given that pgCon registration is up above 225, more than 10% higher than last year at this point.

Second, my article on the PostgreSQL 9.3 Beta is up at LWN.net (subscription required, or wait 2 weeks).
Posted by Pavel Stehule on 2013-05-15 at 17:25:53
A year ago I wrote article about PostgreSQL history. Original article is in Czech language, but there is a link to Google translated article
After playing a bit with the Redis FDW and the Redis command extension, Josh Berkus was slightly dissatisfied. He was looking for a way to map the possibly huge set of values in a single Redis object to a set of rows on a PostgresSQL table, and the Redis FDW currently maps each object to a single row - as an array if it's appropriate. Of course, we could call unnest () on the array, but it seems roundabout to construct an array only to have to unpack it immediately. These aren't terribly cheap operations.

I've been thinking a bit about his complaint, and I think this will be doable. What we'll need is a new table option that specifies the key and designates the table as one sourced from a single key rather than a range of keys. Say we call this option singleton_key. Then we might do something like:
CREATE FOREIGN TABLE hugeset (value text)
SERVER localredis
OPTIONS (tabletype 'list', singleton_key 'myhugelist');
This option would be incompatible with the tablekeyset and tablekeyprefix options. If given, the key won't be looked up at all. We would simply use the given key and return the corresponding list of values. That would make selecting from such a table faster - possibly lots faster. For scalars, sets and lists, the table would have one column. In the case of a scalar there would only be one row. For zsets, it would have two columns - one for the value and one for the score. Finally, for hashes it would have two, one for the property and one for the value.

This looks like a useful possible enhancement.
IF EXISTS and IF NOT EXISTS are clauses allowing to return a notice message instead of an error if a DDL query running on a given object already exists or not depending on the DDL action done. If a given query tries to create an object when IF NOT EXISTS is specified, a notice message [...]
Posted by Hans-Juergen Schoenig in Cybertec on 2013-05-15 at 10:30:10
20 years ago it was enough for a database to simply check if one string was identical to some other string. Those times are long gone and thus several algorithms to do fuzzy string matches have been developed over time. Many of these mechanisms, such as trigrams, regular expressions, soundex and so on are already [...]
The comments on my introductory post on this topic mentioned a lot of the major features which users would like to see in PostgreSQL.  Among those mentioned were:
  • Improvements to replication
  • Parallel query
  • Index-organized tables
  • Better partitioning
  • Better full-text search
  • Logical streaming replication
However, as with other projects, our perennial temptation is to listen to current users rather than potential users.  We can focus on making PostgreSQL better for the people who already use it.  It's attractive, but that way lies obsolescence.

What we need to focus on is the reasons why people don't use PostgreSQL at all.  Only by exploiting new markets -- by pushing Postgres into places which never had a database before -- do we grow the future PostgreSQL community.  And there's a bunch of ways we are failing new users.

For example, listen to Nelson Elhage, engineer at Stripe.com:
"I love Mongo's HA story.  Out of the box I can build a 3-node Mongo cluster with a full replica set.  I can add nodes, I can fail over, without losing data."
Wouldn't it be nice if we could say the same thing about Postgres?  But we can't.

If we're looking for a #1 PostgreSQL development priority, this is it:

We need a "scale it now" button.

This is where we're losing ground to the new databases.  In every other way we are superior: durability, pluggability, standards-compliance, query sophistication, everything.  But when a PostgreSQL user outstrips the throughput of a single server or a single EC2 instance, our process for scaling out sucks.  It's complicated.  It has weird limitations.  And most of all, our scale-out requires advanced database expertise, which is expensive and in short supply.


We need some way for users to go smoothly and easily from one Postgres node to three and then to ten.  Until we do that, we will continue to lose ground to databases which do a better job at scaling, even if they suck at everything else.

I don't know what our answer to scaling out will be.  Streaming logical replication, PostgresXC, Tr

[continue reading]

Posted by Andrew Dunstan in pgExperts on 2013-05-14 at 18:50:43
To avoid a few administrative difficulties imposed by github, Dave Page and I have moved the repo for the Redis FDW to https://github.com/pg-redis-fdw/redis_fdw. If you have a clone of the old repo, it should be sufficient to change the remote setting in the .git/config file, replacing "dpage" with "pg-redis-fdw". The commits are identical up to the point where we removed everything from the old repo.

Sorry for any inconvenience.
So what is "UTF-8 BOM" mean ? its byte order mark for UTF-8, some bytes (0xEF,0xBB,0xBF) are added at the start of the file to indicate that the file having unicode characters in it. BOM Characters "9".

As per Unicode documentation, the presence of BOM in file are useless, because it causes problems with non-BOM-aware software's to identify or parse the leading characters having at the start. Same has been quoted at the bottom of the Wikipedia page:

Related errors in PostgreSQL:
ERROR: invalid input syntax for integer: "9" (in psql-client)
SQL state: 22P02 (in PgAdmin-III)

Test case & fix on Windows:
Sample file "state_data.txt" created in NOTEPAD with unicode characters in it:
9,Karnataka,कर्नाटक
10,Kerala,केरळा
Table to import data:
create table states(state_code int, state_name char(30), state_in_hindi text);
Error:
postgres=# copy test from 'c:/Pgfile/state_data.txt' with delimiter ',' CSV;
ERROR: invalid input syntax for integer: "9"
CONTEXT: COPY test, line 1, column state_code: "9"
To fix, I have used a tool "bomremover.exe" to remove leading characters from a file as its on windows, if its on linux, then there are many tips & tricks available on net to wipe BOM from a utf-8 format file.

Tool Download link and usage:
http://www.mannaz.at/codebase/utf-byte-order-mark-bom-remover/

Eg:-
C:\Pgfile>bomremover.exe . *
Added '.\state_data.txt' to processing list.
Press enter to process all files in the list. (1 files in total)

Processing file '.\state_data.txt'...
Finished. Press Enter to Exit

After running bomremover.exe on file, re-run COPY command which will succeed to import data.
 state_code | state_name | State_name_in_hindi
------------+------------+---------------------
9 | Karnataka | αñòαñ░αÑ<8d>αñ¿αñ╛αñƒαñò
10 | Kerala | केरळा
(2 rows)

Some of the editors, avoids default saving text with UTF8-BOM:
- Windows - Notepad++ (In Notepade default BOM enabled)
- Linux - VI
- Mac - TextEdit


--Raghav
I love emergency calls. They give me a chance to get into something new, very quickly, and get them fixed. A few days ago a company called SpaceInch made a new app called Say The Same Thing, which became an instant hit, possibly due to the quite awesome video by OK Go, which you should watch even if you don't look at anything else in this post.

Since the launch they have been hit by a huge flood of users and last night things started to drag, and they asked PostgreSQL Experts for some emergency help. The app is backed by a PostgreSQL database running on Heroku.  During a phone call of two hours, I was able to get in, examine the database, and devise some steps that we hope will have the effect of making things run smoother.

The solution is going to involve some partitioning of one or two tables. This isn't being done so much for the benefit of constraint exclusion as because it lets them drop large amounts of data that is no longer needed, cheaply. Previously they had been cleaning one of these tables by deleting rows, but abandoned that as it adversely affected performance.  But the consequence of that was that the table just grew and grew to several million rows in a few days. Dropping child tables that contain data that is of no further interest will work well to tame the data size without all the overhead of row deletion.

Troubles sometimes don't come singly. I actually had to deal with two emergencies last night, which is quite unusual - the other was also for a client running on Amazon. For good or ill, we are seeing more and more use of Postgres instances running in the cloud - whether managed by the client or on a managed service such as Heroku.
Posted by Jignesh Shah in VMware on 2013-05-14 at 07:33:41
PostgreSQL 9.3 beta1 is now available. Giving early access to software is always a good idea to test out evolutionary, revolutionary, radical ideas because unless it is field tested, it has not gone through its trial by fire to be proven gold.

There are many new changes introduced in PostgreSQL 9.3 beta1 and I do have few favorites in them.

For example Disk page checksums to detect filesystem failures. In fact this would allow VMware to use the now standard disk page checksum instead of a custom feature. This highly debated feature is required to identify silent bit corruptions (or deter malicious ones). I have been told in talks with database administrators (not just PostgreSQL DBAs) that typically in a year they would face one such incident atleast where one of the disk would show such a bit rot which goes unnoticed without any instrumentations to catch it.

Another change that goes in the right direction is how PostgreSQL maps the shared memory. This small change now allows no kernel changes to be done to start the database with a bigger shared buffer pool. This now allows one less cookbook step to be done to get the database working. Considering that in this cloud world where there are 100,000 VMs running databases one less step is a huge increase in productivity since this step actually required privileges higher than the database instance owner.

Yet another favorite feature is the custom background workers. This new mechanism is certainly a popular one in our team at VMware where are using it heavily to move some of the changes that we had done into custom background workers deployed as extensions and allowed us to align with core PostgreSQL and extra features enabled as extensions as needed.

Next I want to talk of three features : Writeable Foreign Tables and pgsql Foreign Data Wrapper and Automatically update VIEWs together. These features on its own itself are very useful and generic. However when used together it actually opens new possibilities using multiple federated PostgreSQL databases shards w

[continue reading]

Posted by Bruce Momjian in EnterpriseDB on 2013-05-13 at 19:15:01

A week after the release of PgLife, the site is averaging thirty active users. (I define an active user as an IP address that has viewed the site for at least five minutes during the past hour.) I consider that a success. Since the release of PgLife, I have increased the content update interval and added an About page explaining the site's purpose, which also includes the active user count.

The site uses AJAX, Perl, Procmail rules, and Apache to collect and deliver dynamic content. Recent improvements in the Postgres mailing list archive feature set have made linking to emails much simpler.

PgLife was my first attempt at a dynamic website, and I learned a few things. First, I learned the value of having an alert file that can force a browser reload to push fixes and improvements to the browser. Second, I used the same file to allow pushing of news alerts to users, e.g. 9.3 Beta1. Third, I learned the importance of controlling browser and server caching and revalidation when using dynamic content.

Continue Reading »

Now that PostgreSQL 9.3 beta1 has been released we've started to jump start our experimentation by compiling our favorite extensions. First on the list is PL/V8 js.

This was compiled against 9.3beta1 for 64-bit and 32-bit and plv8 version 1.4.0. We briefly tried with the EDB windows builds which we downloaded from: http://www.enterprisedb.com/products-services-training/pgbindownload and seems to work fine.

We hope windows users find these useful.

Posted by Joshua Tolley in EndPoint on 2013-05-13 at 14:58:23

Original images from Flickr user jenniferwilliams

One of our clients, for various historical reasons, runs both MySQL and PostgreSQL to support their website. Information for user login lives in one database, but their customer activity lives in the other. The eventual plan is to consolidate these databases, but thus far, other concerns have been more pressing. So when they needed a report combining user account information and customer activity, the involvement of two separate databases became a significant complicating factor.

In similar situations in the past, using earlier versions of PostgreSQL, we've written scripts to pull data from MySQL and dump it into PostgreSQL. This works well enough, but we've updated PostgreSQL fairly recently, and can use the SQL/MED features added in version 9.1. SQL/MED ("MED" stands for "Management of External Data") is a decade-old standard designed to allow databases to make external data sources, such as text files, web services, and even other databases look like normal database tables, and access them with the usual SQL commands. PostgreSQL has supported some of the SQL/MED standard since version 9.1, with a feature called Foreign Data Wrappers, and among other things, it means we can now access MySQL through PostgreSQL seamlessly.

The first step is to install the right software, called mysql_fdw. It comes to us via Dave Page, PostgreSQL core team member and contributor to many projects. It's worth noting Dave's warning that he considers this experimental code. For our purposes it works fine, but as will be seen in this post, we didn't push it too hard. We opted to download the source and build it, but installing using pgxn works as well:

$ env USE_PGXS=1 pgxnclient install mysql_fdw
INFO: best version: mysql_fdw 1.0.1
INFO: saving /tmp/tmpjrznTj/mysql_fdw-1.0.1.zip
INFO: unpacking: /tmp/tmpjrznTj/mysql_fdw-1.0.1.zip
INFO: building extension
gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wformat-security -fno-strict-al

[continue reading]

Posted by Joel Jacobson on 2013-05-13 at 13:53:45

Nobody likes XML, except masochists and Microsoft consultants who charge by the hour.
Sometimes you are forced to deal with XML anyway, such as parsing the response from external APIs.
Parsing XML is per se a nasty business to begin with, but in this particular example, the ugliness set new records.

The only “solution” I came up with is too ugly for production use, but the alternatives were even uglier, so I had no option.

I hope there is someone out there reading this who can present a proper solution to the problem.

Let’s say you have this XML:

<testxml xmlns:ns1="http://www.example.com" xmlns:ns2="http://www.example2.com">
    <ns1:foo>
        <ns2:bar>baz</ns2:bar>
    </ns1:foo>
</testxml>

Using xpath() you extract the content of testxml:

SELECT xpath(
    '/testxml/ns1:foo',
    '<testxml xmlns:ns1="http://www.example.com" xmlns:ns2="http://www.example2.com"><ns1:foo><ns2:bar>baz</ns2:bar></ns1:foo></testxml>'::xml,
    ARRAY[
        ['ns1','http://www.example.com'],
        ['ns2','http://www.example2.com']
    ]
);
-- Result: <ns1:foo><ns2:bar>baz</ns2:bar></ns1:foo>

The returned XML is not valid since its missing the xmlns definitions,
but the PostgreSQL XML data type doesn’t complain, which is OK I guess,
a bit of quirks mode perhaps?

Because of the missing xmlns, it’s impossible to make use of this XML fragment returned.
You cannot extract any subsequent sub-elements in it using XPath.

For instance, this won’t work:

SELECT xpath(
    '/ns1:foo/ns2:bar/text()',
    (xpath(
        '/testxml/ns1:foo',
        '<testxml xmlns:ns1="http://www.example.com" xmlns:ns2="http://www.example2.com"><ns1:foo><ns2:bar>baz</ns2:bar></ns1:foo></testxml>'::xml,
        ARRAY[
            ['ns1','http://www.example.com'],
            ['ns2','http://www.example2.com']
        ]
    ))[1]
);

-- Error:
-- ERROR:  could not create XPath object
-- DETAIL:  namespace error : Namespace prefix ns1 on foo is not defined
-- <ns1:foo>
--         ^
-- namespace error : Namespace prefix ns2 on bar is not defined
--   <ns2:ba

[continue reading]

PostGIS 2.1.0 beta2 is out. Details on what's new in it are in official news release: http://postgis.net/2013/05/11/postgis-2-1-0beta2. This is the first version of PostGIS to work with PostgreSQL 9.3, so if you are planning to experiment with PostgreSQL 9.3 coming out soon, use this one. Also check out the documentation in new ePUB offering format if you have an ereader and let us know how it looks. It seems to vary alot depending on what ePub reader used.

For windows users, we've got binary builds available compiled against PostgreSQL 9.3beta1 (and also available for 9.2 9x32,64) and 9.0,9.1 (x64). Details on windows PostGIS downloads page: http://postgis.net/windows_downloads. It does not yet have the new Advanced 3D offering (provided by SFCGAL https://github.com/Oslandia/SFCGAL), but we hope to have that compiled and packaged with the binaries before release time.