pg_qualstats is a PostgreSQL extension keeping statistics on predicates found
in WHERE
statements and JOIN
clauses.
This is useful if you want to be able to analyze what are the most-often executed quals (predicates) on your database. The powa project makes use of this to provide index suggestions.
It also allows you to identify correlated columns, by identifying which columns are most frequently queried together.
The extension works by looking for known patterns in queries. Currently, this includes:
-
Binary OpExpr where at least one side is a column from a table. Whenever possible, the predicate will be swaped so that CONST OP VAR expressions are turned into VAR COMMUTED_OP CONST. AND and OR expression members are counted as separate entries. Ex: WHERE column1 = 2, WHERE column1 = column2, WHERE 3 = column3
-
ScalarArrayOpExpr where the left side is a VAR, and the right side is an array constant. Those will be counted one time per element in the array. Ex: WHERE column1 IN (2, 3) will be counted as 2 occurences for the (column1, '=') operator pair
This project is sponsored by Dalibo
- Compatible with PostgreSQL 9.4
- Needs postgresql header files
- sudo make install
- Add pg_qualstats to the shared preload libraries:
shared_preload_libraries = 'pg_qualstats'
The following GUCs can be configured, in postgresql.conf:
- pg_qualstats.enabled (boolean, default true): whether or not pg_qualstats should be enabled
- pg_qualstats.track_constants (bolean, default true): whether or not pg_qualstats should keep track of each constant value individually. Disabling this GUC will considerably reduce the number of entries necessary to keep track of predicates.
- pg_qualstats.max: the maximum number of statements tracked (defaults to 1000)
- pg_qualstats.resolve_oids (boolean, default false): whether or not pg_qualstats should resolve_oids at query time, or juste store the oids. Enabling this parameter makes the data analysis much more easy, since a connection to the database where the query was executed won't be necessary, but it will eat much more space (616 bytes per entry instead of 168). Additionnaly, this will require some catalog lookups, which aren't free.
- pg_qualstats.sample_rate (double, default -1): the fraction of queries that should be sampled. For example, 0.1 means that only one out of ten queries will be sampled. The default (-1) means automatic, and results in a value of 1 / max_connections, so that statiscally, concurrency issues will be rare.
- Create the extension in any database:
CREATE EXTENSION pg_qualstats;
The extension defines the following functions:
-
pg_qualstats: returns the counts for every qualifier, identified by the expression hash. This hash identifies each expression.
- userid: oid of the user who executed the query
- dbid: oid of the database in which the query has been executed
- lrelid, lattnum: oid of the relation and attribute number of the VAR on the left hand side, if any
- rrelid, rattnum: oid of the relation and attribute number of the VAR on the right hand side, if any
- parenthash: hash of the parent "AND" expression, if any. This is useful for identifying predicates which are used together
- nodehash: the predicate hash. Everything (down to constants) is used to compute this hash
- count: the total number of occurences of this predicate
- queryid: if pg_stats_statements is installed, the queryid identifying this query
- constvalue: a string representation of the right-hand side constant, if any, truncated to 80 bytes.
Example:
ro=# select * from pg_qualstats;
userid | dbid | lrelid | lattnum | opno | rrelid | rattnum | parenthash | nodehash | count | queryid | constvalue
--------+-------+--------+---------+------+--------+---------+-------------+------------+-------+---------+-------------
10 | 16546 | 1262 | 1 | 93 | | | 1167468204 | -312474735 | 1 | | 12::integer
10 | 16546 | | | 607 | 1262 | -2 | -1449854762 | 1327480291 | 1 | |
- pg_qualstats_reset: reset the internal counters and forget about every encountered qual.
In addition to that, the extension defines some views on top of the pg_qualstats function:
-
pg_qualstats: filters calls to pg_qualstats() by the current database.
-
pg_qualstats_pretty: performs the appropriate joins to display a readable form for every attribute from the pg_qualstats view
Example:
ro=# select * from pg_qualstats_pretty;
left_schema | left_table | left_column | operator | right_schema | right_table | right_column | count
-------------+------------------+-------------+--------------+--------------+-------------+--------------+-------
public | pgbench_accounts | aid | pg_catalog.= | | | | 20
public | pgbench_tellers | tid | pg_catalog.= | | | | 10
public | pgbench_branches | bid | pg_catalog.= | | | | 10
-
pg_qualstats_all: sums the counts for each attribute / operator pair, regardless of its position as an operand (LEFT or RIGHT), grouping together attributes used in AND clauses.
Example:
ro=# select * from pg_qualstats_all;
relid | attnums | opno | parenthash | count
-------+---------+------+-------------+-------
74150 | {1,3} | 96 | -1878264478 | 2
74153 | {1} | 96 | 0 | 10
74156 | {1} | 96 | 0 | 20
74159 | {1} | 96 | 0 | 10
- pg_qualstats_indexes: looks up those attributes for which an index doesn't exist with the attribute in first position.
Example:
ro=# select * from pg_qualstats_indexes;
relid | attnames | possible_types | count
------------------+-----------------------------+----------------+-------
pgbench_accounts | {filler} | {btree,hash} | 5
pgbench_accounts | {bid} | {btree,hash} | 2
pgbench_accounts | {bid,filler} | {btree,hash} | 8
(9 rows)
- pg_qualstats_by_query: returns only predicates of the form VAR OPERATOR CONSTANT, aggregated by queryid.