Functions

gitbase functions

To make some common tasks easier for the user, there are some functions to interact with the aforementioned tables:

Name	Description
`blame(repository, commit, file)`	Returns an array of lines changes and authorship for the specific file and commit.
`commit_file_stats(repository_id, [from_commit_hash], to_commit_hash) json array`	returns an array with the stats of each file in `to_commit_hash` since the given `from_commit_hash`. If `from_commit_hash` is not given, the parent commit will be used. Vendored files stats are not included in the result of this function. This function is more thoroughly explained later in this document.
`commit_stats(repository_id, [from_commit_hash], to_commit_hash) json`	returns the stats between two commits for a repository. If `from_commit_hash` is empty, it will compare the given `to_commit_hash` with its parent commit. Vendored files stats are not included in the result of this function. This function is more thoroughly explained later in this document.
`is_remote(reference_name)bool`	checks if the given reference name is from a remote one.
`is_tag(reference_name)bool`	checks if the given reference name is a tag.
`is_vendor(file_path)bool`	checks if the given file name is a vendored file.
`language(path, [blob])text`	gets the language of a file given its path and the optional content of the file.
`loc(path, blob) json`	returns a JSON map, containing the lines of code of a file, separated in three categories: Code, Blank and Comment lines.
`uast(blob, [lang, [xpath]]) blob`	returns a node array of UAST nodes in semantic mode.
`uast_children(blob) blob`	returns a flattened array of the children UAST nodes from each one of the UAST nodes in the given array.
`uast_extract(blob, key) text array`	extracts information identified by the given key from the uast nodes.
`uast_imports(blob) text array`	returns all imports given the specified UAST blob.
`uast_mode(mode, blob, lang) blob`	returns a node array of UAST nodes specifying its language and mode (semantic, annotated or native).
`uast_xpath(blob, xpath) blob`	performs an XPath query over the given UAST nodes.
`version() text`	returns the gitbase version in the following format `8.0.11-{GITBASE_VERSION}` for compatibility with MySQL versioning.

Standard functions

These are all functions that are available because they are implemented in go-mysql-server, used by gitbase.

Name	Description
`ARRAY_LENGTH(json)`	if the json representation is an array, this function returns its size.
`AVG(expr)`	returns the average value of expr in all rows.
`CEIL(number)`	returns the smallest integer value that is greater than or equal to `number`.
`CEILING(number)`	returns the smallest integer value that is greater than or equal to `number`.
`CHAR_LENGTH(str)`	returns the length of the string in characters.
`COALESCE(...)`	returns the first non-null value in a list.
`CONCAT(...)`	concatenates any group of fields into a single string.
`CONCAT_WS(sep, ...)`	concatenates any group of fields into a single string. The first argument is the separator for the rest of the arguments. The separator is added between the strings to be concatenated. The separator can be a string, as can the rest of the arguments. If the separator is NULL, the result is NULL.
`CONNECTION_ID()`	returns the current connection ID.
`COUNT(expr)`	returns a count of the number of non-NULL values of expr in the rows retrieved by a SELECT statement.
`DATE_ADD(date, interval)`	adds the interval to the given `date`.
`DATE_SUB(date, interval)`	subtracts the interval from the given `date`.
`DAY(date)`	is a synonym for DAYOFMONTH().
`DATE(date)`	returns the date part of the given `date`.
`DAYOFMONTH(date)`	returns the day of the month (0-31).
`DAYOFWEEK(date)`	returns the day of the week of the given `date`.
`DAYOFYEAR(date)`	returns the day of the year of the given `date`.
`FIRST(expr)`	returns the first value in a sequence of elements of an aggregation.
`FLOOR(number)`	returns the largest integer value that is less than or equal to `number`.
`FROM_BASE64(str)`	decodes the base64-encoded string `str`.
`GREATEST(...)`	returns the greatest numeric or string value.
`HOUR(date)`	returns the hours of the given `date`.
`IFNULL(expr1, expr2)`	if `expr1` is not NULL, it returns `expr1`; otherwise it returns `expr2`.
`IS_BINARY(blob)`	returns whether a `blob` is a binary file or not.
`JSON_EXTRACT(json_doc, path, ...)`	extracts data from a json document using json paths. Extracting a string will result in that string being quoted. To avoid this, use `JSON_UNQUOTE(JSON_EXTRACT(json_doc, path, ...))`.
`JSON_UNQUOTE(json)`	unquotes JSON value and returns the result as a utf8mb4 string.
`LAST(expr)`	returns the last value in a sequence of elements of an aggregation.
`LEAST(...)`	returns the smaller numeric or string value.
`LENGTH(str)`	returns the length of the string in bytes.
`LN(X)`	returns the natural logarithm of `X`.
`LOG(X), LOG(B, X)`	if called with one parameter, this function returns the natural logarithm of `X`. If called with two parameters, this function returns the logarithm of `X` to the base `B`. If `X` is less than or equal to 0, or if `B` is less than or equal to 1, then NULL is returned.
`LOG10(X)`	returns the base-10 logarithm of `X`.
`LOG2(X)`	returns the base-2 logarithm of `X`.
`LOWER(str)`	returns the string `str` with all characters in lower case.
`LPAD(str, len, padstr)`	returns the string `str`, left-padded with the string `padstr` to a length of `len` characters.
`LTRIM(str)`	returns the string `str` with leading space characters removed.
`MAX(expr)`	returns the maximum value of `expr` in all rows.
`MID(str, pos, [len])`	returns a substring from the provided string starting at `pos` with a length of `len` characters. If no `len` is provided, all characters from `pos` until the end will be taken.
`MIN(expr)`	returns the minimum value of `expr` in all rows.
`MINUTE(date)`	returns the minutes of the given `date`.
`MONTH(date)`	returns the month of the given `date`.
`NOW()`	returns the current timestamp.
`NULLIF(expr1, expr2)`	returns NULL if `expr1 = expr2` is true, otherwise returns `expr1`.
`POW(X, Y)`	returns the value of `X` raised to the power of `Y`.
`REGEXP_MATCHES(text, pattern, [flags])`	returns an array with the matches of the `pattern` in the given `text`. Flags can be given to control certain behaviours of the regular expression. Currently, only the `i` flag is supported, to make the comparison case insensitive.
`REPEAT(str, count)`	returns a string consisting of the string `str` repeated `count` times.
`REPLACE(str,from_str,to_str)`	returns the string `str` with all occurrences of the string `from_str` replaced by the string `to_str`.
`REVERSE(str)`	returns the string `str` with the order of the characters reversed.
`ROUND(number, decimals)`	rounds the `number` to `decimals` decimal places.
`RPAD(str, len, padstr)`	returns the string `str`, right-padded with the string `padstr` to a length of `len` characters.
`RTRIM(str)`	returns the string `str` with trailing space characters removed.
`SECOND(date)`	returns the seconds of the given `date`.
`SLEEP(seconds)`	waits for the specified number of seconds (can be fractional).
`SOUNDEX(str)`	returns the soundex of a string.
`SPLIT(str,sep)`	returns the parts of the string `str` split by the separator `sep` as a JSON array of strings.
`SQRT(X)`	returns the square root of a nonnegative number `X`.
`SUBSTR(str, pos, [len])`	returns a substring from the string `str` starting at `pos` with a length of `len` characters. If no `len` is provided, all characters from `pos` until the end will be taken.
`SUBSTRING(str, pos, [len])`	returns a substring from the string `str` starting at `pos` with a length of `len` characters. If no `len` is provided, all characters from `pos` until the end will be taken.
`SUM(expr)`	returns the sum of `expr` in all rows.
`TO_BASE64(str)`	encodes the string `str` in base64 format.
`TRIM(str)`	returns the string `str` with all spaces removed.
`UPPER(str)`	returns the string `str` with all characters in upper case.
`WEEKDAY(date)`	returns the weekday of the given `date`.
`YEAR(date)`	returns the year of the given `date`.
`YEARWEEK(date, mode)`	returns year and week for a date. The year in the result may be different from the year in the date argument for the first and the last week of the year.

Note about uast, uast_mode, uast_xpath and uast_children functions

These functions make use of UAST version 2, so you should get familiar with the concepts explained in the bblfsh documentation.

The data returned by these functions is a serialized array node using protobuf which contains UAST object nodes.

As an example of how to manage the serialized data programatically, checkout out the Go code below:

import (
	"bytes"
	"fmt"

	"gopkg.in/bblfsh/sdk.v2/uast/nodes"
	"gopkg.in/bblfsh/sdk.v2/uast/nodes/nodesproto"
)

func marshalNodes(arr nodes.Array) (interface{}, error) {
	if len(arr) == 0 {
		return nil, nil
	}

	buf := &bytes.Buffer{}
	if err := nodesproto.WriteTo(buf, arr); err != nil {
		return nil, err
	}

	return buf.Bytes(), nil
}

func unmarshalNodes(data []byte) (nodes.Array, error) {
	if len(data) == 0 {
		return nil, nil
	}

	buf := bytes.NewReader(data)
	n, err := nodesproto.ReadTree(buf)
	if err != nil {
		return nil, err
	}

	if n.Kind() != nodes.KindArray {
		return nil, fmt.Errorf("unmarshal: wrong kind of node found %q, expected %q",
			n.Kind(), nodes.KindArray.String())
	}

	return n.(nodes.Array), nil
}

How to formulate XPath queries when using uast and uast_xpath functions

Have a look at the bblfsh docs to query UASTs with XPath.

How to use uast_extract

Check out the UAST v2 specification to know what an UAST node represents.

uast_extracts accepts special selectors to match common node properties:

@type
@token
@role
@pos

Using these selectors as in,

uast_extract(nodes_column, @common_selector)

you will extract the value of that property for each node.

Nodes that have no value for the requested property will not be present in any way in the final array. That is, having a sequence of nodes [node-1, node-2, node-3] and knowing that node-2 doesn't have a value for the requested property, the returned array will be [prop-1, prop-3].

Also, if you want to retrieve values from a non common property, you can pass it directly

uast_extract(nodes_column, 'some-property')

How to use `loc`

loc will return statistics about the lines of code in a file, such as the code lines, comment lines, etc.

It requires a file path and a file content.

loc(file_path, blob_content)

The result of this function is a JSON document with the following shape:

{
	"Code": code lines,
	"Comment": comment lines,
	"Blank": blank lines,
	"Name": file name,
	"Lang": language
}

How to use `commit_file_stats`

commit_file_stats will return statistics about the line changes in all files in the given range of commits classifying them in 4 categories: code, comments, blank lines and other.

It can be used in two ways:

To get the statistics of files in a specific commit COMMIT_FILE_STATS(repository_id, commit_hash)
To get the statistics of files in a commit range COMMIT_FILE_STATS(repository_id, from_commit, to_commit)

The result of this function is an array of JSON documents with the following shape:

{
	"Path": file path,
	"Language": file language,
	"Code": {
		"Additions": number of code additions in this file,
		"Deletions": number of code deletions in this file,
	},
	"Comment": {
		"Additions": number of comment line additions in this file,
		"Deletions": number of comment line deletions in this file,
	},
	"Blank": {
		"Additions": number of blank line additions in this file,
		"Deletions": number of blank line deletions in this file,
	},
	"Other": {
		"Additions": number of other additions in this file,
		"Deletions": number of other deletions in this file,
	},
	"Total": {
		"Additions": number of total additions in this file,
		"Deletions": number of total deletions in this file,
	},
}

NOTE: Files that are considered vendored files are ignored for the purpose of computing these statistics. Note that .gitignore is considered a vendored file.

Because the result of this function is an array of JSON documents, we will need two functions to make use of its data effectively:

EXPLODE which will make each element in the array have its own row
JSON_EXTRACT to get data from inside the documents

For example, to get the stats of the HEAD commits:

SELECT
	repository_id,
	EXPLODE(COMMIT_FILE_STATS(repository_id, commit_hash)) AS stats
FROM refs
WHERE ref_name = 'HEAD'

EXPLODE here will make sure a single row is returned for every single result returned by COMMIT_FILE_STATS instead of an array with all of them combined.

Then, to extract code additions from this:

SELECT
	repository_id
	JSON_EXTRACT(stats, '$.Code.Additions')
FROM (
	SELECT
		repository_id,
		EXPLODE(COMMIT_FILE_STATS(repository_id, commit_hash)) AS stats
	FROM refs
	WHERE ref_name = 'HEAD'
) t

NOTE: When extracting Path or Language using JSON_EXTRACT, by the way that function works, the result will be quoted (e.g. "Python" instead of Python). For that reason, for these two string fields JSON_EXTRACT should be combined with JSON_UNQUOTE like JSON_UNQUOTE(JSON_EXTRACT(stats, '$.Path')).

How to use `commit_stats`

commit_stats will return statistics about the line changes in the given range of commits classifying them in 4 categories: code, comments, blank lines and other.

It can be used in two ways:

To get the statistics of a specific commit COMMIT_STATS(repository_id, commit_hash)
To get the statistics of the diff of a commit range COMMIT_STATS(repository_id, from_commit, to_commit)

commit_stats is pretty much an aggregation of the result of commit_file_stats. While commit_file_stats has the stats for each file in a commit, commit_stats has the global stats of all files in the commit. As a result, it outputs a single structure instead of an array of them.

The shape of the result returned by this function is the following:

{
	"Files": number of files changed in this commit,
	"Code": {
		"Additions": number of code additions in this commit,
		"Deletions": number of code deletions in this commit,
	},
	"Comment": {
		"Additions": number of comment line additions in this commit,
		"Deletions": number of comment line deletions in this commit,
	},
	"Blank": {
		"Additions": number of blank line additions in this commit,
		"Deletions": number of blank line deletions in this commit,
	},
	"Other": {
		"Additions": number of other additions in this commit,
		"Deletions": number of other deletions in this commit,
	},
	"Total": {
		"Additions": number of total additions in this commit,
		"Deletions": number of total deletions in this commit,
	},
}

NOTE: Files that are considered vendored files are ignored for the purpose of computing these statistics. Note that .gitignore is considered a vendored file.

The result returned by this function is a JSON, which means that to access its fields, the use of JSON_EXTRACT is needed.

For example, code additions would be accessed like this:

JSON_EXTRACT(COMMIT_STATS(repository_id, commit_hash), '$.Code.Additions')

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

functions.md

functions.md

Functions

gitbase functions

Standard functions

Note about uast, uast_mode, uast_xpath and uast_children functions

How to formulate XPath queries when using uast and uast_xpath functions

How to use uast_extract

How to use `loc`

How to use `commit_file_stats`

How to use `commit_stats`

Files

functions.md

Latest commit

History

functions.md

File metadata and controls

Functions

gitbase functions

Standard functions

Note about uast, uast_mode, uast_xpath and uast_children functions

How to formulate XPath queries when using uast and uast_xpath functions

How to use uast_extract

How to use loc

How to use commit_file_stats

How to use commit_stats

How to use `loc`

How to use `commit_file_stats`

How to use `commit_stats`