With just couple of weeks to go for PGDay India, I’m quite excited about the upcoming PostgreSQL conference at Bangalore on 26th February, 2016. This is going to be the biggest ever conglomeration of users, developers and companies interested in PostgreSQL in India. This is also by far the largest PostgreSQL conference in terms of number of speakers and attendees. Till last year, we’d to almost find speakers for the conference, but this year we’d a good problem of rejecting almost 1 in every 2 submitted proposals. It’s almost certain that in coming years, we either have to do multiple-track or multiple-day event. Again, a nice to have problem.
For the first time, we are looking to easily cross 100 mark in terms of attendees. We have secured 3 Platinum Sponsors, 2 Gold Sponsors, 1 Silver Sponsor and a media partner. All these things clearly indicate a fast growing community of users and developers in India. The conference will see representation from many companies, including Government organizations. Again, a solid demonstration of the reach of this fantastic open source database technology.
From 2ndQuadrant’s perspective, we are sponsoring the event at the highest level, showing our commitment to the adoption of PostgreSQL and our interest in this fast growing market. Abhijit Menon-Sen will speak about “VACUUM Internals” and I’ll do a talk on “What’s New in Postgres-XL 9.5″. I’ve worked very hard to improve Postgres-XL in the last few months and catching it up with the latest release of PostgreSQL, I’m quite excited to talk about the work we have done on this release. I’m keeping my fingers crossed, but I sincerely hope that Postgres-XL 9.5 Beta will be out before the conference.
So if you are a PostgreSQL user or a developer or just an IT enthusiast wanting to know about the technology, this is a great opportunity for you to learn from other users and hear from some of the finest developers of the technology. Visit PGDay.in for more details.
select sent, ts_headline(body_plain, to_tsquery('bikeshedding'),'StartSel="",StopSel=""') from messages where body_tsvector @@ to_tsquery('bikeshedding') order by sent asc; sent | ts_headline ---------------------+-------------------------------------------------------------------------------------- 2007-08-10 18:04:28 | bikeshedding on this list list is a very big danger. | Advocacy is something everyone feels 2008-02-20 14:55:14 | bikeshedding can start as to how to "use" it through the SQL interface. | | But, until 2008-02-20 15:03:49 | bikeshedding can start as to how to "use" it through the SQL interface. | > | > But, until 2009[...]
Sure, managing a Postgres cluster is great. We’ve got the software installed, more tool wrappers than we can wave several sticks at, and a running cluster that’s ready to do our bidding. But what do we do with it? In fact, what is a Postgres cluster anyway? How are databases involved? What about schemas? There’s critically important nomenclature that new users may find somewhat non-intuitive, so let’s demystify things a bit.
First, I’ll go on record that I never really liked the “cluster” label when it comes to how it’s often used in relation to Postgres. When people in the tech industry think of a cluster, they probably imagine something like this. Yes, a cluster can simply mean a grouping of related things, but context is everything. This gets even more confusing in larger contexts where a “cluster” of Postgres servers is a legitimate architecture, for horizontal scalability or otherwise.
So let’s back away for a second and consider the components
themselves. Starting with the Postgres software, we have something
similar to how programmers think of objects: instances. Classes are
unusable until they’re instantiated as an object (barring virtual
classes, interfaces, and other special cases.) In this way, the
Postgres software is just there, doing nothing, until there’s an
instance that’s created and running. Whether we use
pg_createcluster, or some other
wrapper to do this, no Postgres databases can exist anywhere
without an instance to hold them.
The primary job of an instance is to merely exist; by itself, a Postgres instance is not usable. Postgres instances are assigned a port number for traffic control purposes, so it’s entirely possible to run multiple simultaneous instances on the same server. But otherwise, you can’t create tables in one, and they certainly are not a database.
This should not blow anyone’s mind, but there is an implicit lie here. Every new Postgres instance gets created along with with a database named “postgres”. Because of this, it’s very common for new users to consider an ins[...]
One of the problems with Postgres-XL 9.2 is that it assigns a global transaction identifier (GXID) for every transaction that is started in the cluster. Since its hard for the system to know if a transaction is read-only or not (remember, even SELECT queries can do write activities), Postgres-XL would always assign a GXID and send that to the datanodes. This is quite bad for system performance because every read-only transaction now must go to the GTM, start a new transaction, grab an GXID and also finish the transaction on the GTM when its committed or aborted. For short transactions, like read-only pgbench workload, this adds a lot of overhead and latency. The overhead is even more noticeable when the actual work done by the transaction is very fast, say because the data is fully cached in shared buffers.
This is now set to change in the upcoming Postgres-XL 9.5 release. We have implemented the same lazy GXID assignment technique similar to PostgreSQL. This required us to enhance the concept of global session. A transaction which involves more than one node runs a global session and gets a unique global session identifier. If a node while executing the query decides to assign GXID to the transaction, it sends the global session identifier to the GTM. If the global transaction, identified by the global session, had already been assigned a GXID, the GTM sends back the same GXID to the node or a new GXID is assigned to the transaction. Further, the node sends back the assigned GXID to the coordinator so that it can decide whether to run a single phase or two-phase commit for the transaction.
While the lazy GXID assignment resulted in significant reduction in the GXID consumption rate, its still not enough because of yet another problem that we must solve. As you’re aware, Postgres-XL uses Multi-version Concurrency Control or MVCC for providing various transaction guarantees. But to reclaim the dead row versions, every node must check row versions against something called as RecentGlobalXmin which, in simple terms,[...]
I’ve noticed an increasing tendency in PostgreSQL users to over-index tables, often constructing very complex partial indexes to try to speed up very particular queries.
Be careful about doing this. Not only do additional indexes increase the plan time, they greatly increase insert time.
By way of example, I created a table with a single bigint column, and populated it with:
time psql -c "insert into t select generate_series(1, 100000000)"
That run without any indexes took 1 minute, 55 seconds; that run with eight indexes on the same table took 26 minutes, 39 seconds, or almost 14 times slower.
pg_stat_user_indexes and drop
indexes that aren’t being used. Your disk space and insert times
will thank you.
I have just pushed code for a new version of the codebase for Planet PostgreSQL.
For those of you who are just reading Planet, hopefully nothing at all should change. There will probably be some bugs early on, but there are no general changes in functionality. If you notice something that is wrong (given a couple of hours from this post at least), please send an email to planet(at)postgresql.org and we'll look into it!
For those who have your blogs aggregated at Planet PostgreSQL, there are some larger changes. In particular, you will notice the whole registration interface has been re-written. Hopefully that will make it easier to register blogs, and also to manage the ones you have (such as removing a post that needs to be hidden). The other major change is that Planet PostgreSQL will now email you whenever something has been fetched from your blog - to help you catch configuration mistakes bigger.
The by far largest changes are in the moderation and administration backend. This will hopefully lead to faster processing of blog submissions, and less work for the moderators.
Postgres supports a variety of data types that allow data architects to store their data consistently, enforce constraints through validation, maximize performance, and maximize space. Recently, I was asked to show someone how to store a UUID (Universally Unique Identifier) into Postgres by way of JPA (Java Persistence API).
It is always exciting to meet community members and exchange ideas about PostgreSQL and the eco-system around the database. I was lucky enough to be a part of FOSDEM PGDay in Brussels on 29th January this year. 3 of 2ndQuadrant’s very best spoke at the event. If you missed the talks, you can take a look at their slide decks shared below.
I will be sure to direct any queries you might have to the experts themselves! :-)
The slides from my talk at PG Day at FOSDEM 2016 are now available.
The 9th annual “Prague PostgreSQL Developer Day” conference will be held on 17th and 18th February in Prague.
Me with PostgreSQL elephants at Prague Zoo :)
Prague PostgreSQL Developer Day is a two day conference and talks & trainings are mostly in Czech language. Conference program is also ready, you can check the full schedule.
The first day of the conference, there will be trainings and practical lectures for a smaller number of attendees. You can find the detailed trainings program. All trainings will be in Czech language.
One of the most interesting ones is “Reading query execution plans” training. With this training Tomas Vondra from 2ndQuadrant aims to help trainees to understand the basic stages of processing a SQL query (parsing, planning and execution) and how the database trying to find the best plan.
If you are more into PostgreSQL database administration you might want to join Pavel Stěhule‘s “Configuration, administration and monitoring PostgreSQL” training. He will focus on monitoring PostgreSQL for preventing problems and unwanted downtimes also figuring out possible bottlenecks of the PostgreSQL systems.
If you are interested in awesome PostGIS you might consider joining Jan Michalek‘s “Introduction to PostGIS” training. After the training participants are expected to have ability to perform simple spatial queries, import / export data, etc.
The last training that I would like to mention is Aleš Zelený‘s “Learning PostgreSQL for Oracle DBAs”. He will show differences and similarities between Oracle and PostgreSQL databases from the perspective of an Oracle DBA. If you are interested in listening his journey while he has been adapting PostgreSQL with a different DBMS background and different experiences, you might want to join his training.
With these 4 trainings that I mentioned above, the first day will end and the conference will continue with the talks both in Czech and English languages.
Databases form the cornerstone of many applications, Web sites, and platforms. A huge amount of time, money, and research has been poured into databases over the last few decades. But our thirst for data, and the quantities that we’re trying to read and analyze, continue to grow. What can and should we do? How can we ensure reliability? How can we communicate with a growing number of other systems? And where does PostgreSQL, an open-source relational database with a growing number of features, fit into this trend? In this talk, Siimon Riggs answers all of these questions, and describes how PostgreSQL’s developers are working to keep as many of these questions in mind as they continue to improve their contribution to the world of databases.
Time: 22 minutes
The post [Video 435] Simon Riggs: Databases — The Long View appeared first on Daily Tech Video.
Say that three times fast! Joking aside, managing database object access is a headache for users and DBAs alike. Postgres isn’t really an exception in that regard, but it does provide a few tools to greatly reduce the pain involved. The crazy thing is that few people even know this feature exists. I’ve known about it for a while myself, but it always slips my mind because it feels so wrong not to explicitly grant permissions.
What does that mean? Well, consider we have a schema with some existing tables, and a role we want to have read permission on tables in that schema. Here’s one as an example:
CREATE SCHEMA acl_app; CREATE TABLE acl_app.t1 (foo INT); CREATE TABLE acl_app.t2 (bar INT); CREATE ROLE acl_read; GRANT USAGE ON SCHEMA acl_app TO acl_read;
If this were a preexisting schema, normally we would grant read permission to tables like this:
GRANT SELECT ON acl_app.t1 TO acl_read; GRANT SELECT ON acl_app.t2 TO acl_read;
And that’s also the usual suggestion for grants after tables are created. Create the table, then grant the permissions. It’s fairly straight-forward, and an expected part of database administration. But what about when we have an existing table with dozens or hundreds of tables? Doing the grants manually as shown above would be monumentally irritating! In fact, for many database systems, the only way forward is to use system catalogs to generate a script, and then execute the output in the database. Postgres lets you do that:
COPY ( SELECT 'GRANT SELECT ON acl_app.' || tablename || ' TO acl_read;' FROM pg_tables WHERE schemaname = 'acl_app' ) TO '/tmp/grant_perms.sql'; \i /tmp/grant_perms.SQL
But the kind Postgres devs have also provided us with some extremely useful shorthand, because while usable, the script approach is something of an ugly hack. Here’s how that looks:
GRANT SELECT ON ALL TABLES IN SCHEMA acl_app TO acl_read; \z acl_app.* Schema | Name | TYPE | Access privileges ---------+------+-------+----------
When: 6-8pm Thursday Feb 18, 2016
Who: Jason Owen
What: Incremental Schema Discovery via JSON Wrangling
Over the last few releases, Postgres has added and expanded support for storing and querying JSON documents. While the simplicity and flexibility of storing free-form data can be appealing, frequently what’s stored is actually fairly structured. Using data from the GitHub Archive, Jason will show how easy it is to load JSON into Postgres, demonstrate some fairly mechanical schema refactorings to extract structure from JSON data, and then compare the resulting normalized tables with the original, with the documentation, and with the data set loaded into Google BigQuery.
Jason has been using Postgres in earnest for two years now. While he’s worn the DBA hat as a full stack engineer from time to time, he is primarily a developer. He works at Tripwire and is an alumnus of Portland State University. Sometimes he says things online as @jasonaowen.
If you have a job posting or event you would like me to announce at the meeting, please send it along. The deadline for inclusion is 5pm the day before the meeting.
Our meeting will be held at Iovation, on the 32nd floor of the US Bancorp Tower at 111 SW 5th (5th & Oak). It’s right on the Green & Yellow Max lines. Underground bike parking is available in the parking garage; outdoors all around the block in the usual spots. No bikes in the office, sorry!
Elevators open at 5:45 and building security closes access to the floor at 6:30.
When you arrive at the Iovation office, please sign in on the iPad at the reception desk.
See you there!
PgBouncer is a lightweight connection pooler for PostgreSQL. PgBouncer 1.7 was announced on the 18th of December 2015. In this blog post we’ll talk about the major new improvements in PgBouncer.
I enjoy checking their funny release names at every new release and for celebrating PgBouncer 1.7: “Colors Vary After Resurrection” release, I used the lovely Dumbo image.
Note: TLS (Transport Layer Security) is a protocol that ensures privacy between communicating applications and their users on the Internet. When a server and client communicate, TLS ensures that no third party may eavesdrop or tamper with any message. TLS is the successor to the Secure Sockets Layer (SSL).
Note: Traditionally, TLS Client Authentication has been considered the alternative to bearer tokens (passwords and cookies) for web authentication. In TLS Client Authentication, the client (browser) uses a certificate to authenticate itself during the TLS handshake. Once the TLS connection is established (and authenticated), the client and server run HTTP on top of the TLS layer.
Let’s dig into details of TLS settings of PgBouncer. There are 14 config parameters related with TLS setup (client side + server side settings).
For assigning which TLS mode to use for connections from clients, we should set client_tls_sslmode parameter. TLS connections are disabled by default. When enabled, client_tls_key_file and client_tls_cert_file must be also configured to set up key and cert PgBouncer uses to accept client connections.
We can assign a root certificate to validate client certificates by setting client_tls_ca_file parameter, default is unset.
We can specify which TLS protocol versions are allowed by setting client_tls_protocols parameter, default is all.
For more detailed cl[...]
The PostgreSQL statistics collector generates a lot of very important statistics about the state of the database. If it’s not working, autovacuum doesn’t work, among other problems. But it does generate a lot of write activity, and by default, that goes back onto the database volume.
Instead, always set statstempdirectory to point to a
RAM disk (which has to be owned by the
with 0600 permissions). The statistics are written back to the
database on shutdown, so in normal operations, you won’t lose
anything on a reboot. (You’ll lose the most recent statistics on a
crash, but you will anyway; the statistics are reset on recovery
operations, including restart from a crash.)
This can substantially cut down the amount of write I/O the main database storage volume has to receive, and it’s free!
I recently noted that the
COPY command in Postgres
doesn’t have syntax to skip columns in source data when importing
it into a table. This necessitates using one or more junk columns
to capture data we’ll just be throwing away. During that, I
completely forgot that friendly devs had contributed alternative
file handling methods as Foreign
Data Wrappers. Most people think of foreign wrappers as a
method for interacting with remote databases. Perusing the full
list however, reveals some surprising data sources. Twitter? Hive?
Well, let’s take a quick look at a real case. Here’s a table we want to import data into. We can’t insert into it directly, because of that extra column. So we’ll use an intermediate table as a raw import target, then insert into the target table. Let’s use COPY as a first approximation with 500,000 rows:
CREATE UNLOGGED TABLE option_quote_raw ( ext_quote_id BIGINT, quote_date DATE NOT NULL, quote_time_ns BIGINT NOT NULL, seq_no BIGINT NOT NULL, sub_seq_no BIGINT NOT NULL, exchange_id INT NOT NULL, time_us BIGINT NOT NULL, feed_side VARCHAR NOT NULL, flag INT NOT NULL, bid NUMERIC(16, 8) NOT NULL, ask NUMERIC(16, 8) NOT NULL, bid_size INT NOT NULL, ask_size INT NOT NULL, trash TEXT ); CREATE UNLOGGED TABLE option_quote ( ext_quote_id BIGINT, quote_date DATE NOT NULL, quote_time_ns BIGINT NOT NULL, seq_no BIGINT NOT NULL, sub_seq_no BIGINT NOT NULL, exchange_id INT NOT NULL, time_us BIGINT NOT NULL, feed_side VARCHAR NOT NULL, flag INT NOT NULL, bid NUMERIC(16, 8) NOT NULL, ask NUMERIC(16, 8) NOT NULL, bid_size INT NOT NULL, ask_size INT NOT NULL ); CREATE INDEX idx_quote_id ON option_quote (ext_quote_id); \timing ON COPY option_quote_raw FROM '/tmp/quote_source.csv' WITH CSV; TIME: 1705.967 ms INSERT INTO option_quote SELECT ext_quote_id, quote_date, quote_time_ns, seq_no, sub_seq_no, exchange_id, time_us, feed_side, flag, bid, ask, bid_size, ask_size FROM option_quote_raw; TIME: 2062.863 ms
If you already have a working PostgreSQL 9.5 install, and just want to skip to relevant sections, follow this list:
As a general note, these instructions are what I did for CentOS 7. For lower versions ther are some differences in packages you'll get. For example currently if you are installing on CentOS 6 (and I presume by extension other 6 family), you won't get SFCGAL and might have pgRouting 2.0 (instead of 2.1)
The recent release of Postgres 9.5 has many people excited about the big new features such as UPSERT (docs) and row-level security (docs). Today I would like to celebrate three of the smaller features that I love about this release.
Before jumping into my list, I'd like to thank everyone who contributes to Postgres. I did some quick analysis and found that 85 people, from Adrien to Zeus, have helped version 9.5 of Postgres, at least according to the git logs. Of course, that number is actually higher, as it doesn't take into account people helping out on the #postgresql channel, running buildfarm animals, doing packaging work, keeping the infrastructure running, etc. Thanks to you all!
The first feature is one I've been wishing for a long time - a verbose form of the REINDEX command. Thanks to Sawada Masahiko for adding this. Similar to VACUUM, REINDEX gets kicked off and then gives no progress or information until it finishes. While VACUUM has long had the VERBOSE option to get around this, REINDEX gives you no clue to which index it was working on, or how much work each index took to rebuild. Here is a normal reindex, along with another 9.5 feature, the ability to reindex an entire schema:
greg=# reindex schema public; ## What seems like five long minutes later... REINDEX
The new syntax uses parenthesis to support VERBOSE and any other future options. If you are familiar with EXPLAIN's newer options, you may see a similarity. More on the syntax in a bit. Here is the much improved version in action:
greg=# reindex (verbose) schema public; INFO: index "foobar_pkey" was reindexed DETAIL: CPU 11.00s/0.05u sec elapsed 19.38 sec. INFO: index "foobar_location" was reindexed DETAIL: CPU 5.21s/0.05u sec elapsed 18.27 sec. INFO: index "location_position" was reindexed DETAIL: CPU 9.10s/0.05u sec elapsed 19.70 sec. INFO: table "public.foobar" was reindexed INFO: index "foobaz_pkey" was reindexed DETAIL: CPU 7.04s/0.05u sec elapsed 19.61 sec. INFO: index[...]