To make some common tasks easier for the user, there are some functions to interact with the aforementioned tables:
Name | Description |
---|---|
blame(repository, commit, file) |
Returns an array of lines changes and authorship for the specific file and commit. |
commit_file_stats(repository_id, [from_commit_hash], to_commit_hash) json array |
returns an array with the stats of each file in to_commit_hash since the given from_commit_hash . If from_commit_hash is not given, the parent commit will be used. Vendored files stats are not included in the result of this function. This function is more thoroughly explained later in this document. |
commit_stats(repository_id, [from_commit_hash], to_commit_hash) json |
returns the stats between two commits for a repository. If from_commit_hash is empty, it will compare the given to_commit_hash with its parent commit. Vendored files stats are not included in the result of this function. This function is more thoroughly explained later in this document. |
is_remote(reference_name)bool |
checks if the given reference name is from a remote one. |
is_tag(reference_name)bool |
checks if the given reference name is a tag. |
is_vendor(file_path)bool |
checks if the given file name is a vendored file. |
language(path, [blob])text |
gets the language of a file given its path and the optional content of the file. |
loc(path, blob) json |
returns a JSON map, containing the lines of code of a file, separated in three categories: Code, Blank and Comment lines. |
uast(blob, [lang, [xpath]]) blob |
returns a node array of UAST nodes in semantic mode. |
uast_children(blob) blob |
returns a flattened array of the children UAST nodes from each one of the UAST nodes in the given array. |
uast_extract(blob, key) text array |
extracts information identified by the given key from the uast nodes. |
uast_imports(blob) text array |
returns all imports given the specified UAST blob. |
uast_mode(mode, blob, lang) blob |
returns a node array of UAST nodes specifying its language and mode (semantic, annotated or native). |
uast_xpath(blob, xpath) blob |
performs an XPath query over the given UAST nodes. |
version() text |
returns the gitbase version in the following format 8.0.11-{GITBASE_VERSION} for compatibility with MySQL versioning. |
These are all functions that are available because they are implemented in go-mysql-server
, used by gitbase.
Name | Description |
---|---|
ARRAY_LENGTH(json) |
if the json representation is an array, this function returns its size. |
AVG(expr) |
returns the average value of expr in all rows. |
CEIL(number) |
returns the smallest integer value that is greater than or equal to number . |
CEILING(number) |
returns the smallest integer value that is greater than or equal to number . |
CHAR_LENGTH(str) |
returns the length of the string in characters. |
COALESCE(...) |
returns the first non-null value in a list. |
CONCAT(...) |
concatenates any group of fields into a single string. |
CONCAT_WS(sep, ...) |
concatenates any group of fields into a single string. The first argument is the separator for the rest of the arguments. The separator is added between the strings to be concatenated. The separator can be a string, as can the rest of the arguments. If the separator is NULL, the result is NULL. |
CONNECTION_ID() |
returns the current connection ID. |
COUNT(expr) |
returns a count of the number of non-NULL values of expr in the rows retrieved by a SELECT statement. |
DATE_ADD(date, interval) |
adds the interval to the given date . |
DATE_SUB(date, interval) |
subtracts the interval from the given date . |
DAY(date) |
is a synonym for DAYOFMONTH(). |
DATE(date) |
returns the date part of the given date . |
DAYOFMONTH(date) |
returns the day of the month (0-31). |
DAYOFWEEK(date) |
returns the day of the week of the given date . |
DAYOFYEAR(date) |
returns the day of the year of the given date . |
FIRST(expr) |
returns the first value in a sequence of elements of an aggregation. |
FLOOR(number) |
returns the largest integer value that is less than or equal to number . |
FROM_BASE64(str) |
decodes the base64-encoded string str . |
GREATEST(...) |
returns the greatest numeric or string value. |
HOUR(date) |
returns the hours of the given date . |
IFNULL(expr1, expr2) |
if expr1 is not NULL, it returns expr1 ; otherwise it returns expr2 . |
IS_BINARY(blob) |
returns whether a blob is a binary file or not. |
JSON_EXTRACT(json_doc, path, ...) |
extracts data from a json document using json paths. Extracting a string will result in that string being quoted. To avoid this, use JSON_UNQUOTE(JSON_EXTRACT(json_doc, path, ...)) . |
JSON_UNQUOTE(json) |
unquotes JSON value and returns the result as a utf8mb4 string. |
LAST(expr) |
returns the last value in a sequence of elements of an aggregation. |
LEAST(...) |
returns the smaller numeric or string value. |
LENGTH(str) |
returns the length of the string in bytes. |
LN(X) |
returns the natural logarithm of X . |
LOG(X), LOG(B, X) |
if called with one parameter, this function returns the natural logarithm of X . If called with two parameters, this function returns the logarithm of X to the base B . If X is less than or equal to 0, or if B is less than or equal to 1, then NULL is returned. |
LOG10(X) |
returns the base-10 logarithm of X . |
LOG2(X) |
returns the base-2 logarithm of X . |
LOWER(str) |
returns the string str with all characters in lower case. |
LPAD(str, len, padstr) |
returns the string str , left-padded with the string padstr to a length of len characters. |
LTRIM(str) |
returns the string str with leading space characters removed. |
MAX(expr) |
returns the maximum value of expr in all rows. |
MID(str, pos, [len]) |
returns a substring from the provided string starting at pos with a length of len characters. If no len is provided, all characters from pos until the end will be taken. |
MIN(expr) |
returns the minimum value of expr in all rows. |
MINUTE(date) |
returns the minutes of the given date . |
MONTH(date) |
returns the month of the given date . |
NOW() |
returns the current timestamp. |
NULLIF(expr1, expr2) |
returns NULL if expr1 = expr2 is true, otherwise returns expr1 . |
POW(X, Y) |
returns the value of X raised to the power of Y . |
REGEXP_MATCHES(text, pattern, [flags]) |
returns an array with the matches of the pattern in the given text . Flags can be given to control certain behaviours of the regular expression. Currently, only the i flag is supported, to make the comparison case insensitive. |
REPEAT(str, count) |
returns a string consisting of the string str repeated count times. |
REPLACE(str,from_str,to_str) |
returns the string str with all occurrences of the string from_str replaced by the string to_str . |
REVERSE(str) |
returns the string str with the order of the characters reversed. |
ROUND(number, decimals) |
rounds the number to decimals decimal places. |
RPAD(str, len, padstr) |
returns the string str , right-padded with the string padstr to a length of len characters. |
RTRIM(str) |
returns the string str with trailing space characters removed. |
SECOND(date) |
returns the seconds of the given date . |
SLEEP(seconds) |
waits for the specified number of seconds (can be fractional). |
SOUNDEX(str) |
returns the soundex of a string. |
SPLIT(str,sep) |
returns the parts of the string str split by the separator sep as a JSON array of strings. |
SQRT(X) |
returns the square root of a nonnegative number X . |
SUBSTR(str, pos, [len]) |
returns a substring from the string str starting at pos with a length of len characters. If no len is provided, all characters from pos until the end will be taken. |
SUBSTRING(str, pos, [len]) |
returns a substring from the string str starting at pos with a length of len characters. If no len is provided, all characters from pos until the end will be taken. |
SUM(expr) |
returns the sum of expr in all rows. |
TO_BASE64(str) |
encodes the string str in base64 format. |
TRIM(str) |
returns the string str with all spaces removed. |
UPPER(str) |
returns the string str with all characters in upper case. |
WEEKDAY(date) |
returns the weekday of the given date . |
YEAR(date) |
returns the year of the given date . |
YEARWEEK(date, mode) |
returns year and week for a date. The year in the result may be different from the year in the date argument for the first and the last week of the year. |
These functions make use of UAST version 2, so you should get familiar with the concepts explained in the bblfsh documentation.
The data returned by these functions is a serialized array node using protobuf which contains UAST object nodes.
As an example of how to manage the serialized data programatically, checkout out the Go code below:
import (
"bytes"
"fmt"
"gopkg.in/bblfsh/sdk.v2/uast/nodes"
"gopkg.in/bblfsh/sdk.v2/uast/nodes/nodesproto"
)
func marshalNodes(arr nodes.Array) (interface{}, error) {
if len(arr) == 0 {
return nil, nil
}
buf := &bytes.Buffer{}
if err := nodesproto.WriteTo(buf, arr); err != nil {
return nil, err
}
return buf.Bytes(), nil
}
func unmarshalNodes(data []byte) (nodes.Array, error) {
if len(data) == 0 {
return nil, nil
}
buf := bytes.NewReader(data)
n, err := nodesproto.ReadTree(buf)
if err != nil {
return nil, err
}
if n.Kind() != nodes.KindArray {
return nil, fmt.Errorf("unmarshal: wrong kind of node found %q, expected %q",
n.Kind(), nodes.KindArray.String())
}
return n.(nodes.Array), nil
}
Have a look at the bblfsh docs to query UASTs with XPath.
Check out the UAST v2 specification to know what an UAST node represents.
uast_extracts
accepts special selectors to match common node properties:
@type
@token
@role
@pos
Using these selectors as in,
uast_extract(nodes_column, @common_selector)
you will extract the value of that property for each node.
Nodes that have no value for the requested property will not be present in any way in the final array. That is, having a sequence of nodes [node-1, node-2, node-3]
and knowing that node-2 doesn't have a value for the requested property, the returned array will be [prop-1, prop-3]
.
Also, if you want to retrieve values from a non common property, you can pass it directly
uast_extract(nodes_column, 'some-property')
loc
will return statistics about the lines of code in a file, such as the code lines, comment lines, etc.
It requires a file path and a file content.
loc(file_path, blob_content)
The result of this function is a JSON document with the following shape:
{
"Code": code lines,
"Comment": comment lines,
"Blank": blank lines,
"Name": file name,
"Lang": language
}
commit_file_stats
will return statistics about the line changes in all files in the given range of commits classifying them in 4 categories: code, comments, blank lines and other.
It can be used in two ways:
- To get the statistics of files in a specific commit
COMMIT_FILE_STATS(repository_id, commit_hash)
- To get the statistics of files in a commit range
COMMIT_FILE_STATS(repository_id, from_commit, to_commit)
The result of this function is an array of JSON documents with the following shape:
{
"Path": file path,
"Language": file language,
"Code": {
"Additions": number of code additions in this file,
"Deletions": number of code deletions in this file,
},
"Comment": {
"Additions": number of comment line additions in this file,
"Deletions": number of comment line deletions in this file,
},
"Blank": {
"Additions": number of blank line additions in this file,
"Deletions": number of blank line deletions in this file,
},
"Other": {
"Additions": number of other additions in this file,
"Deletions": number of other deletions in this file,
},
"Total": {
"Additions": number of total additions in this file,
"Deletions": number of total deletions in this file,
},
}
NOTE: Files that are considered vendored files are ignored for the purpose of computing these statistics. Note that .gitignore
is considered a vendored file.
Because the result of this function is an array of JSON documents, we will need two functions to make use of its data effectively:
EXPLODE
which will make each element in the array have its own rowJSON_EXTRACT
to get data from inside the documents
For example, to get the stats of the HEAD commits:
SELECT
repository_id,
EXPLODE(COMMIT_FILE_STATS(repository_id, commit_hash)) AS stats
FROM refs
WHERE ref_name = 'HEAD'
EXPLODE
here will make sure a single row is returned for every single result returned by COMMIT_FILE_STATS
instead of an array with all of them combined.
Then, to extract code additions from this:
SELECT
repository_id
JSON_EXTRACT(stats, '$.Code.Additions')
FROM (
SELECT
repository_id,
EXPLODE(COMMIT_FILE_STATS(repository_id, commit_hash)) AS stats
FROM refs
WHERE ref_name = 'HEAD'
) t
NOTE: When extracting Path
or Language
using JSON_EXTRACT
, by the way that function works, the result will be quoted (e.g. "Python"
instead of Python
). For that reason, for these two string fields JSON_EXTRACT
should be combined with JSON_UNQUOTE
like JSON_UNQUOTE(JSON_EXTRACT(stats, '$.Path'))
.
commit_stats
will return statistics about the line changes in the given range of commits classifying them in 4 categories: code, comments, blank lines and other.
It can be used in two ways:
- To get the statistics of a specific commit
COMMIT_STATS(repository_id, commit_hash)
- To get the statistics of the diff of a commit range
COMMIT_STATS(repository_id, from_commit, to_commit)
commit_stats
is pretty much an aggregation of the result of commit_file_stats
. While commit_file_stats
has the stats for each file in a commit, commit_stats
has the global stats of all files in the commit. As a result, it outputs a single structure instead of an array of them.
The shape of the result returned by this function is the following:
{
"Files": number of files changed in this commit,
"Code": {
"Additions": number of code additions in this commit,
"Deletions": number of code deletions in this commit,
},
"Comment": {
"Additions": number of comment line additions in this commit,
"Deletions": number of comment line deletions in this commit,
},
"Blank": {
"Additions": number of blank line additions in this commit,
"Deletions": number of blank line deletions in this commit,
},
"Other": {
"Additions": number of other additions in this commit,
"Deletions": number of other deletions in this commit,
},
"Total": {
"Additions": number of total additions in this commit,
"Deletions": number of total deletions in this commit,
},
}
NOTE: Files that are considered vendored files are ignored for the purpose of computing these statistics. Note that .gitignore
is considered a vendored file.
The result returned by this function is a JSON, which means that to access its fields, the use of JSON_EXTRACT
is needed.
For example, code additions would be accessed like this:
JSON_EXTRACT(COMMIT_STATS(repository_id, commit_hash), '$.Code.Additions')