PostgreSQL
The world's most advanced open source database
Top posters
Number of posts in the past month
Top teams
Number of posts in the past month
Feeds
Twitter
Planet
  • Policy for being listed on Planet PostgreSQL.
  • Add your blog to Planet PostgreSQL.
  • List of all subscribed blogs.
  • Manage your registration.
Contact
  • Get in touch with the Planet PostgreSQL administrators at planet at postgresql.org.
Posted by Andrew Dunstan in pgExperts on 2014-07-21 at 19:25:00
Someone was just talking about the size of some source files in PostgreSQL. The source code (.c, .h, .y and .l files) weighs in at a bit over 1 million lines of code. The documentation source has another roughly 300,000 lines. That's a large project, but by no means enormous by today's standards. The biggest source code file is pg_dump.c, at around 15,700 lines. The biggest documentation file is funcs.sgml, at around 17,600 lines. Both of these might well do with a bit of reorganization.

Postgres 9.5 is coming up with a new ECHO mode for psql that has been introduced by this commit:

commit: 5b214c5dd1de37764797b3fb9164af3c885a7b86
author: Fujii Masao <fujii@postgresql.org>
date: Thu, 10 Jul 2014 14:27:54 +0900
Add new ECHO mode 'errors' that displays only failed commands in psql.

When the psql variable ECHO is set to 'errors', only failed SQL commands
are printed to standard error output. Also this patch adds -b option
into psql. This is equivalent to setting the variable ECHO to 'errors'.

Pavel Stehule, reviewed by Fabrízio de Royes Mello, Samrat Revagade,
Kumar Rajeev Rastogi, Abhijit Menon-Sen, and me.

Up to now, there have been two ECHO modes:

  • "all", to print to the standard output all the queries before they are parsed or executed. This can be set when starting psql with option -a.
  • "queries", to have psql print all the queries sent to server. This can be set additionally with option -e of psql.

The new mode is called "errors" and can be either set with the option -b when starting psql or with "set" command in a psql client like that:

=# \set ECHO errors

The feature added is simple: have psql print all the failed queries in the standard error output. The failed query is printed in an additional field prefixed with STATEMENT:

=# CREATE TABLES po ();
ERROR:  42601: syntax error at or near "TABLES"
LINE 1: CREATE TABLES po ();
           ^
LOCATION:  scanner_yyerror, scan.l:1053
STATEMENT:  CREATE TABLES po ();

If multiple queries are specified within a single input only the query that failed is displayed:

=# CREATE TABLE aa (a int); CREATE FOO po; CREATE TABLE bb (a int);
CREATE TABLE
ERROR:  42601: syntax error at or near "FOO"
LINE 1: CREATE FOO po;
               ^
LOCATION:  scanner_yyerror, scan.l:1053
STATEMENT:  CREATE FOO po;
CREATE TABLE

Also, queries that are typed in multiple lines are showed as they are, spaces included:

=# SELECT
      col1_not_here,
      col2_not_here
   FROM
      table_not_here;
ERROR:  42P01: relation "table_not_here" does not exist
LINE 

[continue reading]

Posted by Hans-Juergen Schoenig in Cybertec on 2014-07-17 at 13:15:38
In 99% of all the cases the PostgreSQL planner is doing a perfect job to optimize your queries and to make sure, that you can enjoy high performance and low response times. The infrastructure ensuring this is both sophisticated as well as robust. However, there are some corner cases, which can turn out to be quite […]
On 14th of July, Alvaro Herrera committed patch: psql: Show tablespace size in \db+   Fabrízio de Royes Mello As I previously mentioned – I'm sucker for psql additions. And while todays patch is rather small, I really like it. What it does? It simply makes \db+ show size of all objects in given tablespace. […]
Posted by gabrielle roth on 2014-07-15 at 15:53:00

When: 7-8pm Tuesday
Where: Oregon Convention Center, Room E147

We’re having a Birds of a Feather session at OSCON instead of our usual July meeting. Come hang out with Pg peeps who are in town for the conference! You do not need to be registered for the conference to attend the BoF.

There will be an additional social hour afterwards should we need it.

See you there!


On 10th of July, Tom Lane committed patch: Implement IMPORT FOREIGN SCHEMA.   This command provides an automated way to create foreign table definitions that match remote tables, thereby reducing tedium and chances for error. In this patch, we provide the necessary core-server infrastructure and implement the feature fully in the postgres_fdw foreign-data wrapper. Other […]

PostgreSQL can, as many other products, use SSL to secure client/server communications. It can be configured to be optional or required, to require a client certificate, or even to use the SSL client certificate to log in. However, the DBA tools are currently a bit lacking in this regard. In particular, there is no way for a DBA to see what SSL parameters are in force (or indeed if SSL is enabled at all) for an already existing connection.

There are multiple ways to see the status of a specific connection (such as the libpq PQgetssl() function, the psql startup message or the sslinfo module. Unfortunately all these methods have one thing in common - they are only available to the process making the connection to the database, not to a DBA inspecting the system from the outside.

9.4 will make it a little bit better, because log_connections now include SSL information when the user connects, similar to:
LOG:  connection authorized: user=mha database=postgres SSL enabled (protocol=TLSv1.1, cipher=ECDHE-RSA-AES256-SHA)

But this can still be a bit of a pain to figure out for existing connectioons of course.

To deal with this problem, I've written a small PostgreSQL extension called pg_sslstatus. When loaded using shared_preload_libraries it will keep track of the SSL status for each connection, and expose it through a system view named pg_sslstatus like this:

postgres=# SELECT * FROM pg_sslstatus;
  pid  | ssl | bits | compression | version |        cipher        |                         clientdn                         
-------+-----+------+-------------+---------+----------------------+----------------------------------------------------------
 27286 | t   |  256 | f           | TLSv1   | ECDHE-RSA-AES256-SHA | 
 26682 | t   |  256 | t           | TLSv1   | ECDHE-RSA-AES256-SHA | /C=AU/ST=Some-State/O=Internet Widgits Pty Ltd/CN=magnus
 26693 | f   |      |             |         |                      | 
(3 rows)

It will include one row for each connection, and can then be joined with either pg_stat_activity or

[continue reading]

IMPORT FOREIGN SCHEMA is a SQL query defined in the SQL specification allowing to import from a foreign source a schema made of foreign tables. Its support has been added in Postgres 9.5 with the following commit:

commit 59efda3e50ca4de6a9d5aa4491464e22b6329b1e
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date:   Thu Jul 10 15:01:31 2014 -0400

Implement IMPORT FOREIGN SCHEMA.

This command provides an automated way to create foreign table definitions
that match remote tables, thereby reducing tedium and chances for error.
In this patch, we provide the necessary core-server infrastructure and
implement the feature fully in the postgres_fdw foreign-data wrapper.
Other wrappers will throw a "feature not supported" error until/unless
they are updated.

Ronan Dunklau and Michael Paquier, additional work by me

This feature is made of two parts:

  • New API available for foreign data wrappers to support this SQL query
  • Support for this query in postgres_fdw, foreign-data wrapper (FDW) for PostgreSQL available in core.

The new API available has the following shape:

List *
ImportForeignSchema (ImportForeignSchemaStmt *stmt, Oid serverOid);

ImportForeignSchemaStmt is a parsed representation of the raw query of IMPORT FOREIGN SCHEMA and serverOid is the OID of the FDW server used for the import. The parsed statement contains all the information needed by a FDW to fetch all the information to rebuild a schema fetched from a remote source, mainly being:

  • Type of import done with stmt->list_type with the table list (not for ALL)
    • FDW_IMPORT_SCHEMA_LIMIT_TO (LIMIT clause specified in query) for a restricted list of table names imported
    • FDW_IMPORT_SCHEMA_EXCEPT (EXCEPT clause specified in query) for a list of tables to not fetch during import
    • FDW_IMPORT_SCHEMA_ALL (no LIMIT TO or EXCEPT clauses in query) to let the FDW know that all the tables from the foreign schema
  • Remote schema name
  • List of options to customize the import

Then this API needs to return a list of raw queries that will be applied as-is by the server after pars

[continue reading]

This is a maintenance release with a few interesting upgrades

Version 9.3-1102 (2014-07-10)

Author:epgrubmair bug #161
    fix copyOut close hanging bug #161 from epgrubmair

Author:romank0

    backpatch exception during close of fully read stream from romank0

Author:Christophe Canovas

    Added caching for ResultSetMetaData  complete commit

Author:Elizabeth Chatman
    NullPointerException in AbstractJdbc2DatabaseMetaData.getUDTs

    setNull, setString, setObject may fail if a specified type cannot be transferred in a binary mode #151

    backpatch fix for changing datestyle before copy

Author:TomonariKatsumata
    binary transfer fixes new feature -1 for forceBinaryTransfer

Author:Sergey Chernov
    connectTimeout property support backpatch
   
Author:Naoya Anzai
    fix prepared statement ERROR due to EMPTY_QUERY defined as static.

9.4 jars can also be found on the site 

I work with a lot of open source projects, and I use the command-line for almost everything. It often happens that I need to examine a file from a project, and thanks to bash, Github, and curl, I can do so easily, without even needing to have the repo handy. One of the things I do sometimes is compare a file across versions to see what has changed. For example, I needed to see what changes were made between versions 1.22 and 1.23 to the file includes/UserMailer.php which is part of the MediaWiki project. For this trick to work, the project must be on Github, and must label their versions in a consistent manner, either via git branches or git tags.

MediaWiki exists on Github as wikimedia/mediawiki-core. The MediaWiki project tags all of their releases in the format X.Y.Z, so in this example we can use the git tags 1.22.0 and 1.23.0. Github is very nice because you can view a specific file at a certain commit (aka a tag), and even grab it over the web as a plain text file. The format is:

https://raw.githubusercontent.com/PROJECTNAME/BRANCH-OR-TAG/FILE

Note that you can use a tag OR a branch! So to compare these two files, we can use one of these pairs:

https://raw.githubusercontent.com/wikimedia/mediawiki-core/REL1_21/includes/UserMailer.php
https://raw.githubusercontent.com/wikimedia/mediawiki-core/REL1_22/includes/UserMailer.php

https://raw.githubusercontent.com/wikimedia/mediawiki-core/1.21.0/includes/UserMailer.php
https://raw.githubusercontent.com/wikimedia/mediawiki-core/1.22.0/includes/UserMailer.php

All that is left is to treat git as a web service and compare the two files at the command line ourselves. The program curl is a great tool for downloading the files, as it dumps to stdout by default. We will add a -s flag (for "silent") to prevent it from showing the progress meter as it usually does. The last bit of the puzzle is to use <(), bash's process substitution feature, to trick diff into comparing the curl outputs as if they were files. So our final command is:

diff <(curl -

[continue reading]

Posted by Kirk Roybal on 2014-07-08 at 18:40:04

Just a quick reminder that Dallas/Fort Worth PostgreSQL Users Group has a Meetup the first Wednesday of every month.

What: PostgreSQL-backed Websites: Middleware Considered Harmful

Who: Justin Tocci

When: Wednesday, August 6, 2014 7:00 PM

Where:
Improving Enterprises
16633 Dallas Parkway Suite 110 Addison, TX 75001

DFW PUG on Meetup

Posted by Paul Ramsey on 2014-07-07 at 21:07:00

Just a quick public service announcement for blog followers in the Pacific Northwest and environs: you've got a once in a not-quite-lifetime opportunity to attend the "Free and Open Source Software for Geospatial" (aka FOSS4G) conference this year in nearby Portland, Oregon, a city so hip they have trouble seeing over their pelvis.

Anyone in the GIS / mapping world should take the opportunity to go, to learn about what technology the open source world has available for you, to meet the folks writing the software, and the learn from other folks like you who are building cool things.

September 8th-13th, be there and be square.

I've seen one of our customer is migrating a table from SQL Server to PostgreSQL using EnterpriseDB's Migration ToolKit.  This table has a boolean datatype column. In migration process, MTK converts datatype "boolean" to "bit" in PostgreSQL and the process was taking 6 hrs to complete. Customer wanted to change the datatype from "Bit" to "Integer" and alter command for changing type was taking another 6 hrs in PostgreSQL. If he migrates only structure to PostgreSQL first, and then change the type to "Integer" from "Bit", then it does not allow you to load the data with below error. If it allows, it takes only 6 hrs as no need of alter the type after data load.

ERROR: column "hidehelm" is of type integer but expression is of type boolean
Hint: You will need to rewrite or cast the expression.

So I found a work around to type cast from "boolean" to "Integer" implicitly by updating "pg_cast" table as below. By this, he can directly load the boolean data into integer column which saves the time of altering the type from BIT to Integer after migrating.

postgres=# insert into tarik values (1::boolean);
ERROR: column "t" is of type integer but expression is of type boolean
LINE 1: insert into tarik values (1::boolean);
HINT: You will need to rewrite or cast the expression.
postgres=# select * from pg_cast where castsource='boolean'::regtype and casttarget='int4'::regtype;
castsource | casttarget | castfunc | castcontext | castmethod
------------+------------+----------+-------------+------------
16 | 23 | 2558 | e | f
(1 row)
postgres=# update pg_cast set castcontext ='i' where castsource='boolean'::regtype and casttarget='int4'::regtype;
UPDATE 1
postgres=# select * from pg_cast where castsource='boolean'::regtype and casttarget='int4'::regtype;
castsource | casttarget | castfunc | castcontext | castmethod
------------+------------+----------+-------------+------------
16 | 23 | 2558 | i | f
(1 row)
postgres=# insert into tarik values (1::boolean);

[continue reading]


The way I started the title might be confusing, "I corrupted my table", so everyone starts with "Crap !! why did you do that !!, ...", so just to justify it.......
I see many customers coming for recovering the corrupted tables without any backup. In such cases, hard to recover the tables completely and it needs lot of work, however we can recover salvaged data if they dont care about corrupted rows.

Let me corrupt the table first.. :-)

 I created a million-row table called "damaged"
postgres=# select count(*) from to_be_damaged ;
count
---------
1000000
(1 row)
postgres=# select relfilenode,relname from pg_class where relname='to_be_damaged';
relfilenode | relname
-------------+---------------
461257 | to_be_damaged

(1 row)
I've used "hexedit" to damage it. Open relfilenode file from OS level using hexedit and try picking a line which is the start of an 8K boundary and typing hex DE AD BE EF across it.

postgres=# select count(*) from to_be_damaged ;
ERROR: invalid page in block 0 of relation base/12896/461257

Now create an identical table "salvaged" to recover salvaged data from "to_be_damaged" table.
postgres=# create table salvaged(t int);
CREATE TABLE
Prepared below function which copies the rows which are still salvageable:
create or replace function salvage_damaged()
returns void
language plpgsql
as $$
declare
pageno int;
tupno int;
pos tid;
begin
<<pageloop>>
for pageno in 0..35930 loop -- pg_class.relpages for the damaged table
for tupno in 1..1000 loop
pos = ('(' || pageno || ',' || tupno || ')')::tid;
begin
insert into salvaged select * from damaged where ctid = pos;
exception
when sqlstate 'XX001' then
raise warning 'skipping page %', pageno;
continue pageloop;
when others then
raise warning 'skipping row %', pos;
end;
end loop;
end loop;
end;
$$;

Now run the function to copy salvagable rows:
postgres# select salvage_damaged();
WARNING: skipping page 0
salvage_damaged
-----------------

(1 row)
postgres=# select

[continue reading]

Postgres 9.5 is going to improve the performance of subqueries using window functions by allowing the pushdown of WHERE clauses within them. Here is a commit, done during commit fest 1, that is the origin of this improvement:

commit d222585a9f7a18f2d793785c82be4c877b90c461
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date:   Fri Jun 27 23:08:08 2014 -0700

Allow pushdown of WHERE quals into subqueries with window functions.

We can allow this even without any specific knowledge of the semantics
of the window function, so long as pushed-down quals will either accept
every row in a given window partition, or reject every such row.  Because
window functions act only within a partition, such a case can't result
in changing the window functions' outputs for any surviving row.
Eliminating entire partitions in this way obviously can reduce the cost
of the window-function computations substantially.

David Rowley, reviewed by Vik Fearing; some credit is due also to
Thomas Mayer who did considerable preliminary investigation.

The pushdown of the WHERE qual is done only if two conditions are satisfied:

  • Only the partitioning columns are referenced
  • The qual contains no volatile functions

Let's have a look at how things are improved by using a simple data set: a list of individuals referenced by an ID, with information about the city where they live and their respective age (the authorities lacked imagination for the city names).

=# CREATE TABLE population (
     person_id serial,
     age int,
     city text);
CREATE TABLE
=# INSERT INTO population (age, city)
     SELECT round(random() * 100),
            'city ' || round(random() * 200)
     FROM generate_series(1, 1000000);
INSERT 0 1000000

Now, here is a query that evaluates the average, minimum and maximum age of the cities where the population is living. If 9.4, this query returns the following plan:

=# EXPLAIN SELECT * FROM
     (SELECT city,
        avg(age) OVER (PARTITION BY city) avg_age,
        min(age) OVER (PARTITION BY city) min_age,
        max(age) OVER

[continue reading]

PostgreSQL 9.4 will finally support huge pages. This article will cover about configuring huge pages on RHEL 7 box.



Continue reading "Using huge pages on RHEL 7 and PostgreSQL 9.4"
In the 3rd post of "PostgreSQL on RHEL7 series, I will mention about running more than one 9.4 instance on RHEL 7. This article is also valid for 9.3.
Continue reading "Running more than one 9.4 /9.3 instance in parallel on RHEL 7"
On 29th of June, Andres Freund committed patch: Add cluster_name GUC which is included in process titles if set.   When running several postgres clusters on one OS instance it's often inconveniently hard to identify which "postgres" process belongs to which postgres instance.   Add the cluster_name GUC, whose value will be included as part […]
Posted by Pavel Golub in MicroOLAP on 2014-07-02 at 09:34:10

As you probably know PostgreSQL sources are managed by Git version control system. Which is great, but for me as SVN user it would be much better to get access to the sources using familiar tools, e.g. TortoiseSVN. Mainly because I don’t need write access, read mode only. There is a workaround for such guys.

First of all, there is a mirror of PostgreSQL sources on the GitHub. And the second, GitHub supports SVN protocol using the bridge to communicate svn commands to GitHub.

So in my case I need only trunk with the latest commits. I’ve created an empty folder and made Checkout to the https://github.com/postgres/postgres/trunk URL.Image


Filed under: Coding, PostgreSQL Tagged: development, git, PostgreSQL, svn, trick
Red Hat Enterprise Linux 7 was released recently. The day after the release, we announced PostgreSQL 9.4 RPMs for RHEL 7, and last week we pushed PostgreSQL 9.3 RPMs for RHEL 7.

Since Red Hat switched to systemd as of RHEL 7, there are some changes in the packaging as compared to RHEL 5 and RHEL 6. So, how do you install and configura PostgreSQL on RHEL 7?
Continue reading "Installing and configuring PostgreSQL 9.3 and 9.4 on RHEL 7"
Posted by Christophe Pettus in pgExperts on 2014-07-02 at 02:48:16

Just a reminder that the Djangocon US 2014 Call for Proposals ends July 15, 2014… we would love your talks and tutorials!

Posted by Kirk Roybal on 2014-07-01 at 19:59:44

Just a quick reminder that Dallas/Fort Worth PostgreSQL Users Group has a Meetup the first Wednesday of every month.

What: Brainstorming on Crypto-Digests and Relational Databases

Who: John Scott

When: Wednesday, July 2, 2014 7:00 PM

Where:
Improving Enterprises
16633 Dallas Parkway Suite 110 Addison, TX 75001

DFW PUG on Meetup

Posted by Kirk Roybal on 2014-07-01 at 17:10:11

Here’s a quick snippet that I wrote because I needed a media calendar that covers the time period of most living people. Hopes this helps keep somebody from typing this up themselves.

CREATE TABLE calendar (id bigserial primary key,
gregorian_date date not null,
 long_date text,
  day_of_week text,
  day_in_week smallint,
  day_number_in_month smallint,
  day_number_in_year smallint,
  last_day_in_week boolean,
  last_day_in_month boolean,
  week_ending_day date,
  month_ending_day date,
  week_in_month smallint,
  week_in_year smallint,
  month_name text,
  month_number smallint,
  year_month text,
  quarter text,
  year_quarter text,
  year_number smallint,
  roman_year text,
  roman_month text,
  roman_day text,
  load_date timestamp with time zone NOT NULL DEFAULT now())
  ;

COMMENT ON COLUMN calendar.gregorian_date IS 'Date in current calendar';
COMMENT ON COLUMN calendar.long_date IS 'Date in full formal notation';
COMMENT ON COLUMN calendar.day_of_week IS 'Text representation of the day of week';
COMMENT ON COLUMN calendar.day_in_week IS 'Numeric position of the day in the week';
COMMENT ON COLUMN calendar.day_number_in_month IS 'Numeric position of the day in the month';
COMMENT ON COLUMN calendar.day_number_in_year IS 'Numeric position of the day within the year';
COMMENT ON COLUMN calendar.last_day_in_week IS 'Is this the last day of the week';
COMMENT ON COLUMN calendar.last_day_in_month IS 'Is this the last day of the month';
COMMENT ON COLUMN calendar.week_ending_day IS 'What is the date of Saturday in this week';
COMMENT ON COLUMN calendar.month_ending_day IS 'Is this the last day of the week (saturday)';
COMMENT ON COLUMN calendar.week_in_month IS 'Numeric position of the week in the month';
COMMENT ON COLUMN calendar.week_in_year IS 'Numeric position of the week in the year';
COMMENT ON COLUMN calendar.month_name IS 'Text name of the month';
COMMENT ON COLUMN calendar.month_number IS 'Numeric value of the month in year';
COMMENT ON COLUMN calendar.year_month IS 'YYYY-MM formatted';
COMMENT ON CO

[continue reading]

Here is a small feature that has showed up during the first commit fest of Postgres 9.5 allowing to add complementary information in the process names displayed by a server:

commit 51adcaa0df81da5e94b582d47de64ebb17129937
Author: Andres Freund <andres@anarazel.de>
Date:   Sun Jun 29 14:15:09 2014 +0200

Add cluster_name GUC which is included in process titles if set.

When running several postgres clusters on one OS instance it's often
inconveniently hard to identify which "postgres" process belongs to
which postgres instance.

Add the cluster_name GUC, whose value will be included as part of the
process titles if set. With that processes can more easily identified
using tools like 'ps'.

To avoid problems with encoding mismatches between postgresql.conf,
consoles, and individual databases replace non-ASCII chars in the name
with question marks. The length is limited to NAMEDATALEN to make it
less likely to truncate important information at the end of the
status.

Thomas Munro, with some adjustments by me and review by a host of people.

This is helpful to identify to which server is attached a process when running multiple instances on the same host, here is for example the case of two nodes: a master and a standby (feel free to not believe that by the way!).

$ psql -At -p 5432 -c 'show cluster_name'
master
$ psql -At -p 5433 -c 'show cluster_name'
standby
$ ps x | grep "master\|standby" | grep -v 'grep'
80624   ??  Ss     0:00.00 postgres: standby: logger process
80625   ??  Ss     0:00.02 postgres: standby: startup process   recovering 000000010000000000000004
80633   ??  Ss     0:00.01 postgres: standby: checkpointer process
80634   ??  Ss     0:00.07 postgres: standby: writer process
80635   ??  Ss     0:00.00 postgres: standby: stats collector process
80655   ??  Ss     0:00.00 postgres: master: logger process
80657   ??  Ss     0:00.01 postgres: master: checkpointer process
80658   ??  Ss     0:00.07 postgres: master: writer process
80659   ??  Ss     0:00.04 postgres: master: wal writer process
80660

[continue reading]

Posted by Pavel Stehule on 2014-06-28 at 04:19:00
Short note: I did backport of plpgsql_check to PostgreSQL 9.2.

What is plpgsql_check? It is plpgsql validator without necessity to run checked function and all paths inside.
postgres=# select oid::regprocedure, plpgsql_check_function(oid::regprocedure) from pg_proc where prolang = 12599 and prorettype 
 2279;
oid │ plpgsql_check_function
───────────────┼─────────────────────────────────────────────────────────────────────────
bubu(integer) │ error:42702:3:RETURN:column reference "a" is ambiguous
bubu(integer) │ Query: SELECT (select * from t where t.a = a)
bubu(integer) │ -- ^
bubu(integer) │ Detail: It could refer to either a PL/pgSQL variable or a table column.
(4 rows)

Time: 4.857 ms

You have arrived at the final chapter of this PostGIS introduction series. Before continuing, I recommend you read chapter one and chapter two first.

In the last chapter we finished by doing some real world distance measuring and we saw how different projections pushed forward different results.

Today I would like to take this practical approach a bit further and continue our work with real world data by showing you around the town of Kin in Okinawa. The town where I live.

A word before we start

In this chapter I want to do a few experiments together with you on real world data. To gather this data, I would like to use OpenStreetMap because it is not only open but also gives us handy tools to export map information.

We will use a tool called osm2pgsql to load our OSM data into PostGIS enable tables.

However, it is more common to import and export real world GIS data by using the semi-closed ESRI standard shapefile format. OpenStreetMap does not support exporting to this shapefile format directly, but exports to a more open XML file (.osm) instead.

Therefor, near the end of this post, we will briefly cover these shapefiles as well and see how we could import them into our PostgreSQL database. But for the majority of our work today, I will focus on the OpenStreetMap approach.

The preparation

Let us commence with this adventure by first getting all the GIS data related to the whole of Okinawa. We will only be interested in the data related to Kin town, but I need you to pull in a data set that is large enough (but still tiny in PostgreSQL terms) for us to experiment with indexing.

Hop online and download the file being served at the following URL: openstreetmap.org Okinawa island It is a file of roughly 180 Mb and covers most of the Okinawan main island. Save the presented "map" file.

Next we will need to install a third party tool which is specifically designed to import this OSM file into PostGIS. This tool is called osm2pgsql and is available in many Linux distributions.

On a Debian system:

apt-get install osm

[continue reading]

A few weeks ago, I published a query to estimate index bloat. Since then, I went back on this query a few times to fix some drawbacks:

  • making it compatible from PostgreSQL 7.4 to latest releases
  • restrict to B-tree index only
  • remove psql variables (sorry for code readability and documentation)
  • improve 64 vs. 32 bits detection

This last one is actually far from perfect. Very bad estimation could arise if the query is wrong about this size of pointers.

New version

Here is the gist of the new version of this query if you want to comment/patch: https://gist.github.com/ioguix/c29d5790b8b93bf81c27

And the code itself:

SELECT current_database(), nspname AS schemaname, c.relname AS tablename, indexname, bs*(sub.relpages)::bigint AS real_size,
  bs*otta::bigint as estimated_size,
  bs*(sub.relpages-otta)::bigint                                     AS bloat_size,
  bs*(sub.relpages-otta)::bigint * 100 / (bs*(sub.relpages)::bigint) AS bloat_ratio
  -- , index_tuple_hdr_bm, maxalign, pagehdr, nulldatawidth, nulldatahdrwidth, datawidth, sub.reltuples, sub.relpages -- (DEBUG INFO)
FROM (
  SELECT bs, nspname, table_oid, indexname, relpages, coalesce(
      ceil((reltuples*(4+nulldatahdrwidth))/(bs-pagehdr::float)) + 1, 0 -- ItemIdData size + computed avg size of a tuple (nulldatahdrwidth)
    ) AS otta
    -- , index_tuple_hdr_bm, maxalign, pagehdr, nulldatawidth, nulldatahdrwidth, datawidth, reltuples -- (DEBUG INFO)
  FROM (
    SELECT maxalign, bs, nspname, relname AS indexname, reltuples, relpages, relam, table_oid,
      ( index_tuple_hdr_bm +
          maxalign - CASE -- Add padding to the index tuple header to align on MAXALIGN
            WHEN index_tuple_hdr_bm%maxalign = 0 THEN maxalign
            ELSE index_tuple_hdr_bm%maxalign
          END
        + nulldatawidth + maxalign - CASE -- Add padding to the data to align on MAXALIGN
            WHEN nulldatawidth = 0 THEN 0
            WHEN nulldatawidth::integer%maxalign = 0 THEN maxalign
            ELSE nulldatawidth::integer%maxalign
          END
      )::nu

[continue reading]

Achievement unlocked!


With the merge of Pull Request #213 (PR #213), Kenji Uno added support for Visual Studio Design Time also known as DDEX. 

A most wanted feature


From the statistics of pageviews I get from a previous post about design time support, I can affirm that this is a most wanted feature of Npgsql. And I'm very glad to say that Npgsql is finally having it. I'd like to thank Kenji Uno for all his hard work to make this happen. 

Npgsql DDEX support is done through an extension file which can be easily installed in Visual Studio. This is a much better situation than we had before where it was needed to use an experimental instance of Visual Studio. 


History


The code started with wwindcloud in PR #67 and Kenji started to play with it and said he was working to get it usable in VS2012 and 2013


In order to be able to add more fixes to the code, Kenji started the PR 213 and started to add more features and fixes until now when we merged all the changes back to master branch.

How to use it


Kenji Uno wrote a tutorial about how to build and use this feature. Below are some screenshots from the tutorial.

Please, give it a try and let us know if it works for you. Remember that this is a work in progress, so you may face bugs and problems. 

I'd like to thank Kenji Uno for all his hard work to get this feature working. And also thank wwindcloud for the initial work.


Sample screenshots


Here is an screenshot when you double click the .vsix file to install the extension:





When you install it, you will be able to select it when creating a new data source connection:












Goat & Kid by Flickr user Bala Sivakumar

Bucardo 5, the next generation of the async multimaster replication system, has been released. This major release removes the previous two source database limitation, allowing you to have as many sources (aka masters) and as many targets (aka slaves) as you wish. Bucardo can also replicate to other targets, including MySQL, MariaDB, Oracle, SQLite, MongoDB, and Redis. Bucardo has been completely rewritten and is more powerful and efficient than the previous version, known as Bucardo 4. You can always find the latest version by visiting the Bucardo wiki

This article will show a quick demonstration of Bucardo. Future posts will explore its capabilities further: for now, we will show how easy it is to get basic multimaster replication up and running.

For this demo, I used a quick and disposable server from Amazon Web Services (AWS, specifically a basic t1.micro server running Amazon Linux. If you want to follow along, it's free and simple to create your own instance. Once it is created and you have SSH'ed in as the ec2-user account, we can start to install PostgreSQL and Bucardo.

# Always a good idea:
$ sudo yum update
# This also installs other postgresql packages:
$ sudo yum install postgresql-plperl
# Create a new Postgres cluster:
$ initdb btest

We cannot start Postgres up yet, as this distro uses both /var/run/postgresql and /tmp for its socket directory. Once we adjust the permissions of the first directory, we can start Postgres, and then create our first test database:

$ sudo chmod 777 /var/run/postgresql
$ pg_ctl -D btest -l logfile start
$ createdb shake1

Next, we need something to replicate! For a sample dataset, I like to use the open source Shakespeare project. It's a small, free, simple schema that is easy to load. There's a nice little project on github the contains a ready-made Postgres schema, so we will load that in to our new database:

$ sudo yum install git
$ git clone -q https://github.com/catherinedevlin/opensourceshakespeare.git
$ psql shake1 -q

[continue reading]