Big Data is a buzz word doing the rounds nowadays, and the tool people generally associate with Big Data is Hadoop. While it is true that the sheer volume of data Hadoop can efficiently and cost effectively manage is unparalleled, not all businesses actually need Hadoop. What if you don’t want to adopt yet another fairly complex technology (Hadoop) that you would need to maintain and have in-house expertise in? What if you could do it all in PostgreSQL?
Due to many rich and diverse features, I have found PostgreSQL to be able to do a good enough job for most business cases. I have already talked about handling unstructured data with JSON, JSONB, and HSTORE. Another feature that you can use for scaling is table partitioning.
Let me walk you through a very simple tutorial on how to partition tables in PostgreSQL to enable massive scale-out.
Before starting off, let’s create a new database and then a table that tracks events in it. The table simply stores the type of event, its time, and comments associated with it.
postgres=# CREATE DATABASE partition_sample;
partition_sample =# CREATE TABLE events (id SERIAL, type INTEGER, time TIMESTAMP WITH TIME ZONE, comments TEXT);
1. Tables with inheritance and CHECK CONSTRAINT
First step is to create the partitions, which will essentially be tables that inherit from ‘events’. Each of these tables will have a CHECK CONSTRAINT specifying exactly what range of values are allowed in it. Below, I am setting up 9 such tables.
Table to contain events of type ‘1’ occurring in January 2015:
partition_sample =# create table events_1_2015_01 (CONSTRAINT events_type_time_check CHECK (time >= '2015-01-01' AND time < '2015-02-01' AND type = 1)) inherits(events);
Table to contain events of type ‘2’ occurring in January 2015:
partition_sample=# create table events_2_2015_01