Changelog

This changelog documents all notable changes to VAST and is updated on every release.

v2.3.0-rc3

Changes

We improved the operability of VAST servers under high load from automated low-priority queries. VAST now considers queries issued with --low-priority, such as automated retro-match queries, with even less priority compared to regular queries (down from 33.3% to 4%) and internal high-priority queries used for rebuilding and compaction (down from 12.5% to 1%). #2484
The default value for vast.active-partition-timeout is now 5 minutes (down from 1 hour), causing VAST to persist underful partitions earlier. #2493
We split the vast rebuild command into two: vast rebuild start and vast rebuild stop. Rebuild orchestration now runs server-side, and only a single rebuild may run at a given time. We also made it more intuitive to use: --undersized now implies --all, and a new --detached option allows for running rebuilds in the background. #2493

Features

VAST's partition indexes are now optional, allowing operators to control the trade-off between disk-usage and query performance for every field. #2430
We can now use matchers in AWS using the vast-cloud CLI matcher plugin. #2473
VAST now continuously rebuilds outdated and merges undersized partitions in the background. The new option vast.automatic-rebuild controls how many resources to spend on this. To disable this behavior, set the option to 0; the default is 1. #2493
Rebuilding now emits metrics under the keys rebuilder.partitions.{remaining,rebuilding,completed}. The vast status rebuild command additionally shows information about the ongoing rebuild. #2493
The new vast.connection-timeout option allows for configuring the timeout VAST clients use when connecting to a VAST server. The value defaults to 10s; setting it to a zero duration causes produces an infinite timeout. #2499

Bug Fixes

VAST properly processes queries for fields with skip attribute. #2430
VAST can now store data in segments bigger than 2GiB in size each. #2449
VAST can now store column indices that are bigger than 2GiB. #2449
VAST no longer occasionally prints warnings about no longer available partitions when queries run concurrently to imports. #2500
Configuration options representing durations with an associated command-line option like vast.connection-timeout and --connection-timeout were not picked up from configuration files or environment variables. This now works as expected. #2503
Partitions now fail early when their stores fail to load from disk, detailing what went wrong in an error message. #2507
The rebuild command, automatic rebuilds, and compaction are now much faster, and match the performance of the import command for building indexes. #2515

v2.2.0

Changes

Metrics for VAST's store lookups now use the keys {active,passive}-store.lookup.{runtime,hits}. The store type metadata field now distinguishes between the various supported store types, e.g., parquet, feather, or segment-store, rather than containing active or passive. #2413
The summarize pipeline operator is now a builtin; the previously bundled summarize plugin no longer exists. Aggregation functions in the summarize operator are now plugins, which makes them easily extensible. The syntax of summarize now supports specification of output field names, similar to SQL's AS in SELECT f(x) AS name. #2417
The undocumented count pipeline operator no longer exists. #2417
The put pipeline operator is now called select, as we've abandoned plans to integrate the functionality of replace into it. #2423
The replace pipeline operator now supports multiple replacements in one configuration, which aligns the behavior with other operators. #2423
Transforms are now called pipelines. In your configuration, replace transform with pipeline in all keys. #2429
An init command was added to vast-cloud to help getting out of inconsistent Terraform states. #2435

Features

The new flush command causes VAST to decommission all currently active partitions, i.e., write all active partitions to disk immediately regardless of their size or the active partition timeout. This is particularly useful for testing, or when needing to guarantee in automated scripts that input is available for operations that only work on persisted passive partitions. The flush command returns only after all active partitions were flushed to disk. #2396
The summarize operator supports three new aggregation functions: sample takes the first value in every group, distinct filters out duplicate values, and count yields the number of values. #2417
The drop pipeline operator now drops entire schemas spcefied by name in the schemas configuration key in addition to dropping fields by extractors in the fields configuration key. #2419
The new extend pipeline operator allows for adding new fields with fixed values to data. #2423
The cloud execution commands (run-lambda and execute-command) now accept scripts from file-like handles. To improve the usability of this feature, the whole host file system is now mounted into the CLI container. #2446

Bug Fixes

VAST will export real values in JSON consistently with at least one decimal place. #2393
VAST is now able to detect corrupt index files and will attempt to repair them on startup. #2431
The JSON export with --omit-nulls now correctly handles nested records whose first field is null instead of dropping them entirely. #2447
We fixed a race condition when VAST crashed while applying a partition transform, leading to data duplication. #2465
The rebuild command no longer crashes on failure, and displays the encountered error instead. #2466
Missing arguments for the --plugins, --plugin-dirs, and --schema-dirs command line options no longer cause VAST to crash occasionally. #2470

v2.1.0

Changes

The mdx-regenerate tool is no longer part of VAST binary releases. #2260
Partition transforms now always emit homogenous partitions, i.e., one schema per partition. This makes compaction and aging more efficient. #2277
VAST now requires Arrow >= v8.0.0. #2284
The vast.store-backend configuration option no longer supports archive, and instead always uses the superior segment-store instead. Events stored in the archive will continue to be available in queries. #2290
The vast.use-legacy-query-scheduler option is now ignored because the legacy query scheduler has been removed. #2312
VAST will from now on always format time and timestamp values with six decimal places (microsecond precision). The old behavior used a precision that depended on the actual value. This may require action for downstream tooling like metrics collectors that expect nanosecond granularity. #2380

Features

The lsvast tool can now print contents of individual .mdx files. It now has an option to print raw Bloom filter contents of string and IP address synopses. #2260
The mdx-regenerate tool was renamed to vast-regenerate and can now also regenerate an index file from a list of partition UUIDs. #2260
VAST now compresses data with Zstd. When persisting data to the segment store, the default configuration achieves over 2x space savings. When transferring data between client and server processes, compression reduces the amount of transferred data by up to 5x. This allowed us to increase the default partition size from 1,048,576 to 4,194,304 events, and the default number of events in a single batch from 1,024 to 65,536. The performance increase comes at the cost of a ~20% memory footprint increase at peak load. Use the option vast.max-partition-size to tune this space-time tradeoff. #2268
VAST now produces additional metrics under the keys ingest.events, ingest.duration and ingest.rate. Each of those gets issued once for every schema that VAST ingested during the measurement period. Use the metadata_schema key to disambiguate the metrics. #2274
A new parquet store plugin allows VAST to store its data as parquet files, increasing storage efficiency at the expense of higher deserialization costs. Storage requirements for the VAST database is reduced by another 15-20% compared to the existing segment store with Zstd compression enabled. CPU usage for suricata import is up ~ 10%, mostly related to the more expensive serialization. Deserialization (reading) of a partition is significantly more expensive, increasing CPU utilization by about 100%, and should be carefully considered and compared to the potential reduction in storage cost and I/O operations. #2284
The status command now supports filtering by component name. E.g., vast status importer index only shows the status of the importer and index components. #2288
VAST emits the new metric partition.events-written when writing a partition to disk. The metric's value is the number of events written, and the metadata_schema field contains the name of the partition's schema. #2302
The new rebuild command rebuilds old partitions to take advantage of improvements in newer VAST versions. Rebuilding takes place in the VAST server in the background. This process merges partitions up to the configured max-partition-size, turns VAST v1.x's heterogeneous into VAST v2.x's homogenous partitions, migrates all data to the currently configured store-backend, and upgrades to the most recent internal batch encoding and indexes. #2321
PyVAST now supports running client commands for VAST servers running in a container environment, if no local VAST binary is available. Specify the container keyword to customize this behavior. It defaults to {"runtime": "docker", "name": "vast"}. #2334 @KaanSK
The csv import gained a new --seperator='x' option that defaults to ','. Set it to '\t' to import tab-separated values, or ' ' to import space-separated values. #2336
VAST now compresses on-disk indexes with Zstd, resulting in a 50-80% size reduction depending on the type of indexes used, and reducing the overall index size to below the raw data size. This improves retention spans significantly. For example, using the default configuration, the indexes for suricata.ftp events now use 75% less disk space, and suricata.flow 30% less. #2346
The index statistics in vast status --detailed now show the event distribution per schema as a percentage of the total number of events in addition to the per-schema number, e.g., for suricata.flow events under the key index.statistics.layouts.suricata.flow.percentage. #2351
The output vast status --detailed now shows metadata from all partitions under the key .catalog.partitions. Additionally, the catalog emits metrics under the key catalog.num-events and catalog.num-partitions containing the number of events and partitions respectively. The metrics contain the schema name in the field metadata_schema and the (internal) partition version in the field metadata_partition-version. #2360 #2363
The VAST Cloud CLI can now authenticate to the Tenzir private registry and download the vast-pro image (including plugins such as Matcher). The deployment script can now be configured to use a specific image and can thus be set to use vast-pro. #2415

Bug Fixes

VAST no longer crashes when importing map or pattern data annotated with the #skip attribute. #2286
The command-line options --plugins, --plugin-dirs, and --schema-dirs now correctly overwrite their corresponding configuration options. #2289
VAST no longer crashes when a query arrives at a newly created active partition in the time window between the partition creation and the first event arriving at the partition. #2295
Setting the environment variable VAST_ENDPOINT to host:port pair no longer fails on startup with a parse error. #2305
VAST no longer hangs when it is shut down while still importing events. #2324
VAST now reads the default false-positive rate for sketches correctly. This broke accidentally with the v2.0 release. The option moved from vast.catalog-fp-rate to vast.index.default-fp-rate. #2325
The parser for real values now understands scientific notation, e.g., 1.23e+42. #2332
The csv import no longer crashes when the CSV file contains columns not present in the selected schema. Instead, it imports these columns as strings. #2336
vast export csv now renders enum columns in their string representation instead of their internal numerical representation. #2336
The JSON import now treats time and duration fields correctly for JSON strings containing a number, i.e., the JSON string "1654735756" now behaves just like the JSON number 1654735756 and for a time field results in the value 2022-06-09T00:49:16.000Z. #2340
VAST will no longer terminate when it can't write any more data to disk. Incoming data will still be accepted but discarded. We encourage all users to enable the disk-monitor or compaction features as a proper solution to this problem. #2376
VAST no longer ignores environment variables for plugin-specific options. E.g., the environment variable VAST_PLUGINS__FOO__BAR now correctly refers to the bar option of the foo plugin, i.e., plugins.foo.bar. #2390
We improved the mechanism to recover the database state after an unclean shutdown. #2394

v2.0.0

Breaking Changes

We removed the experimental vast get command. It relied on an internal unique event ID that was only exposed to the user in debug messages. This removal is a preparatory step towards a simplification of some of the internal workings of VAST. #2121
The meta-index is now called the catalog. This affects multiple metrics and entries in the output of vast status, and the configuration option vast.meta-index-fp-rate, which is now called vast.catalog-fp-rate. #2128
The command line option --verbosity has the new name --console-verbosity. This synchronizes the CLI interface with the configuration file that solely understands the option vast.console-verbosity. #2178
Multiple transform steps now have new names: select is now called where, delete is now called drop, project is now called put, and aggregate is now called summarize. This breaking change is in preparation for an upcoming feature that improves the capability of VAST's query language. #2228
The layout-names option of the rename transform step was renamed schemas. The step now additonally supports renaming fields. #2228

Changes

VAST ships experimental Terraform scripts to deploy on AWS Lambda and Fargate. #2108
We revised the query scheduling logic to exploit synergies when multiple queries run at the same time. In that vein, we updated the related metrics with more accurate names to reflect the new mechanism. The new keys scheduler.partition.materializations, scheduler.partition.scheduled, and scheduler.partition.lookups provide periodic counts of partitions loaded from disk and scheduled for lookup, and the overall number of queries issued to partitions, respectively. The keys query.workers.idle, and query.workers.busy were renamed to scheduler.partition.remaining-capacity, and scheduler.partition.current-lookups. Finally, the key scheduler.partition.pending counts the number of currently pending partitions. It is still possible to opt-out of the new scheduling algorithm with the (deprecated) option --use-legacy-query-scheduler. #2117
VAST now requires Apache Arrow >= v7.0.0. #2122
VAST's internal data model now completely preserves the nesting of the stored data when using the arrow encoding, and maps the pattern, address, subnet, and enumeration types onto Arrow extension types rather than using the underlying representation directly. This change enables use of the export arrow command without needing information about VAST's type system. #2159
Transform steps that add or modify columns now transform the columns in-place rather than at the end, preserving the nesting structure of the original data. #2159
The deprecated msgpack encoding no longer exists. Data imported using the msgpack encoding can still be accessed, but new data will always use the arrow encoding. #2159
Client commands such as vast export or vast status now create less threads at runtime, reducing the risk of hitting system resource limits. #2193
The index section in the status output no longer contains the catalog and catalog-bytes keys. The information is already present in the top-level catalog section. #2233

Features

The new vast.index section in the configuration supports adjusting the false-positive rate of first-stage lookups for individual fields, allowing users to optimize the time/space trade-off for expensive queries. #2065
VAST now creates one active partition per layout, rather than having a single active partition for all layouts. #2096
The new option vast.active-partition-timeout controls the time after which an active partition is flushed to disk. The timeout may hit before the partition size reaches vast.max-partition-size, allowing for an additional temporal control for data freshness. The active partition timeout defaults to 1 hour. #2096
The output of vast status now displays the total number of events stored under the key index.statistics.events.total. #2133
The disk monitor has new status entries blacklist and blacklist - size containing information about partitions failed to be erased. #2160
VAST has now complete support for passing environment variables as alternate path to configuration files. Environment variables have lower precedence than CLI arguments and higher precedence than config files. Variable names of the form VAST_FOO__BAR_BAZ map to vast.foo.bar-baz, i.e., __ is a record separator and _ translates to -. This does not apply to the prefix VAST_, which is considered the application identifier. Only variables with non-empty values are considered. #2162
VAST v1.0 deprecated the experimental aging feature. Given popular demand we've decided to un-deprecate it, and to actually implement it on top of the same building blocks the compaction mechanism uses. This means that it is now fully working and no longer considered experimental. #2186
The replace transform step now allows for setting values of complex types, e.g., lists or records. #2228
The lsvast tool now prints the whole store contents when given a store file as an argument. #2247

Bug Fixes

The explore command now properly terminates after the requested number of results are delivered. #2120
The count --estimate erroneously materialized store files from disk, resulting in an unneeded performance penalty. VAST now answers approximate count queries by solely consulting the relevant index files. #2146
The import zeek command now correctly marks the event timestamp using the timestamp type alias for all inferred schemas. #2155
Some queries could get stuck when an importer would time out during the meta index lookup. This race condition no longer exists. #2167
We optimized the queue size of the logger for commands other than vast start. Client commands now show a significant reduction in memory usage and startup time. #2176
The CSV parser no longer fails when encountering integers when floating point values were expected. #2184
The vast(1) man-page is no longer empty for VAST distributions with static binaries. #2190
VAST servers no longer accept queries after initiating shutdown. This fixes a potential infinite hang if new queries were coming in faster than VAST was able to process them. #2215
VAST no longer sometimes crashes when aging or compaction erase whole partitions. #2227
Environment variables for options that specify lists now consistently use comma-separators and respect escaping with backslashes. #2236
The JSON import no longer rejects non-string selector fields. Instead, it always uses the textual JSON representation as a selector. E.g., the JSON object {id:1,...} imported via vast import json --selector=id:mymodule now matches the schema named mymodule.1 rather than erroring because the id field is not a string. #2255
Transform steps removing all nested fields from a record leaving only empty nested records no longer cause VAST to crash. #2258
The query optimizer incorrectly transformed queries with conjunctions or disjunctions with several operands testing against the same string value, leading to missing result. This was rarely an issue in practice before the introduction of homogenous partitions with the v2.0 release. #2264

v1.1.2

Bug Fixes

Terminating or timing out exports during the catalog lookup no longer causes query workers to become stuck indefinitely. #2165

v1.1.1

Bug Fixes

The disk monitor now correctly continues deleting until below the low water mark after a partition failed to delete. #2160
We fixed a rarely occurring race condition caused query workers to become stuck after delivering all results until the corresponding client process terminated. #2160
Queries that timed out or were externally terminated while in the query backlog and with more than five unhandled candidate partitions no longer permanently get stuck. #2160

v1.1.0

Changes

VAST no longer attempts to intepret query expressions as Sigma rules automatically. Instead, this functionality moved to a dedicated sigma query language plugin that must explicitly be enabled at build time. #2074
The msgpack encoding option is now deprecated. VAST issues a warning on startup and automatically uses the arrow encoding instead. A future version of VAST will remove this option entirely. #2087
The experimental aging feature is now deprecated. The compaction plugin offers a superset of the aging functionality. #2087
Actor names in log messages now have an -ID suffix to make it easier to tell multiple instances of the same actor apart, e.g., exporter-42. #2119
We fixed an issue where partition transforms that erase complete partitions trigger an internal assertion failure. #2123

Features

The built-in select and project transform steps now correctly handle dropping all rows and columns respectively, effectively deleting the input data. #2064 #2082
VAST has a new query language plugin type that allows for adding additional query language frontends. The plugin performs one function: compile user input into a VAST expression. The new sigma plugin demonstrates usage of this plugin type. #2074
The new built-in rename transform step allows for renaming event types during a transformation. This is useful when you want to ensure that a repeatedly triggered transformation does not affect already transformed events. #2076
The new aggregate transform plugin allows for flexibly grouping and aggregating events. We recommend using it alongside the compaction plugin, e.g., for rolling up events into a more space-efficient representation after a certain amount of time. #2076

Bug Fixes

A performance bug in the first stage of query evaluation caused VAST to return too many candidate partitions when querying for a field suffix. For example, a query for the ts field commonly used in Zeek logs also included partitions for netflow.pkts from suricata.netflow events. This bug no longer exists, resulting in a considerable speedup of affected queries. #2086
VAST does not lose query capacity when backlogged queries are cancelled any more. #2092
VAST now correctly adjusts the index statistics when applying partition transforms. #2097
We fixed a bug that potentially resulted in the wrong subset of partitions to be considered during query evaluation. #2103

v1.0.0

Changes

Building VAST now requires Arrow >= 6.0. #2033
VAST no longer uses calendar-based versioning. Instead, it uses a semantic versioning scheme. A new VERSIONING.md document installed alongside VAST explores the semantics in-depth. #2035
Plugins now have a separate version. The build scaffolding installs README.md and CHANGELOG.md files in the plugin source tree root automatically. #2035

Features

VAST has a new transform step: project, which keeps the fields with configured key suffixes and removes the rest from the input. At the same time, the delete transform step can remove not only one but multiple fields from the input based on the configured key suffixes. #2000
The new --omit-nulls option to the vast export json command causes VAST to skip over fields in JSON objects whose value is null when rendering them. #2004
VAST has a new transform step: select, which keeps rows matching the configured expression and removes the rest from the input. #2014
The #import_time meta extractor allows for querying events based on the time they arrived at the VAST server process. It may only be used for comparisons with time value literals, e.g., vast export json '#import_time > 1 hour ago' exports all events that were imported within the last hour as NDJSON. #2019

Bug Fixes

The index now emits the metrics query.backlog.{low,normal} and query.workers.{idle,busy} reliably. #2032
VAST no longer ignores the --schema-dirs option when using --bare-mode. #2046
Starting VAST no longer fails if creating the database directory requires creating intermediate directories. #2046

2021.12.16

Changes

VAST's internal type system has a new on-disk data representation. While we still support reading older databases, reverting to an older version of VAST will not be possible after this change. Alongside this change, we've implemented numerous fixes and streamlined handling of field name lookups, which now more consistently handles the dot-separator. E.g., the query #field == "ip" still matches the field source.ip, but no longer the field source_ip. The change is also performance-relevant in the long-term: For data persisted from previous versions of VAST we convert to the new type system on the fly, and for newly ingested data we now have near zero-cost deserialization for types, which should result in an overall speedup once the old data is rotated out by the disk monitor. #1888

Features

All metrics events now contain the version of VAST. Additionally, VAST now emits startup and shutdown metrics at the start and stop of the VAST server. #1973
JSON field selectors are now configurable instead of being hard-coded for Suricata Eve JSON and Zeek Streaming JSON. E.g., vast import json --selector=event_type:suricata is now equivalent to vast import suricata. This allows for easier integration of JSONL data containing a field that indicates its type. #1974
Metrics events now optionally contain a metadata field that is a key-value mapping of string to string, allowing for finer-grained introspection. For now this enables correlation of metrics events and individual queries. A set of new metrics for query lookup use this feature to include the query ID. #1987 #1992

Bug Fixes

The field-based default selector of the JSON import now correctly matches types with nested record types. #1988

2021.11.18

Changes

The max-queries configuration option now works at a coarser granularity. It used to limit the number of queries that could simultaneously retrieve data, but it now sets the number of queries that can be processed at the same time. #1896
VAST no longer vendors xxHash, which is now a regular required dependency. Internally, VAST switched its default hash function to XXH3, providing a speedup of up to 3x. #1905
Building VAST from source now requires CMake 3.18+. #1914
A recently added features allows for exporting everything when no query is provided. We've restricted this to prefer reading a query from stdin if available. Additionally, conflicting ways to read the query now trigger errors. #1917

Features

A new 'apply' handler in the index gives plugin authors the ability to apply transforms over entire partitions. Previously, transforms were limited to streams of table slice during import or export. #1887
The export command now has a --low-priority option to reduce the priority of the request while query backlogs are being worked down. #1929 #1947
The keys query.backlog.normal and query.backlog.low have been added to the metrics output. The values indicate the number of quries that are currently in the backlog. #1942

Bug Fixes

The timeout duration to delete partitions has been increased to one minute, reducing the frequency of warnings for hitting this timeout significantly. #1897
When reading IPv6 addresses from PCAP data, only the first 4 bytes have been considered. VAST now stores all 16 bytes. #1905
Store files now get deleted correctly if the database directory differs from the working directory. #1912
Debug builds of VAST no longer segfault on a status request with the --debug option. #1915
The suricata.dns schema has been updated to match the currently used EVE-JSON structure output by recent Suricata versions. #1919
VAST no longer tries to create indexes for fields of type list<record{...}> as that wasn't supported in the first place. #1933
Static plugins are no longer always loaded, but rather need to be explicitly enabled as documented. To restore the behavior from before this bug fix, set vast.plugins: [bundled] in your configuration file. #1959

2021.09.30

Changes

The default store backend now is segment-store in order to enable the use of partition transforms in the future. To continue using the (now deprecated) legacy store backend, set vast.store-backend to archive. #1876
Example configuration files are now installed to the datarootdir as opposed to the sysconfdir in order to avoid overriding previously installed configuration files. #1880

Features

If present in the plugin source directory, the build scaffolding now automatically installs <plugin>.yaml.example files, commenting out every line so the file has no effect. This serves as documentation for operators that can modify the installed file in-place. #1860
The broker plugin is now a also writer plugin on top of being already a reader plugin. The new plugin enables exporting query results directly into a a Zeek process, e.g., to write Zeek scripts that incorporate context from the past. Run vast export broker <expr> to ship events via Broker that Zeek dispatches under the event VAST::data(layout: string, data: any). #1863
The new tool mdx-regenerate allows operators to re-create all .mdx files in a database directory to the latest file format version while VAST is running. This is useful for advanced users in preparation for version upgrades that bump the format version. #1866
Running vat status --detailed now lists all loaded configuration files under system.config-files. #1871
The query argument to the export and count commands may now be omitted, which causes the commands to operate on all data. Note that this may be a very expensive operation, so use with caution. #1879
The output of vast status --detailed now contains information about queries that are currently processed in the index. #1881

Bug Fixes

The status command no longer occasionally contains garbage keys when the VAST server is under high load. #1872
Remote sources and sinks are no longer erroneously included in the output of VAST status. #1873
The index now correctly cancels pending queries when the requester dies. #1884
Import filter expressions now work correctly with queries using field extractors, e.g., vast import suricata 'event_type == "alert"' < path/to/eve.json. #1885
Expression predicates of the #field type now produce error messages instead of empty result sets for operations that are not supported. #1886
The disk monitor no longer fails to delete segments of particularly busy partitions with the segment-store store backend. #1892

2021.08.26

Changes

VAST no longer strips link-layer framing when ingesting PCAPs. The stored payload is the raw PCAP packet. Similarly, vast export pcap now includes a Ethernet link-layer framing, per libpcap's DLT_EN10MB link type. #1797
Strings in error or warning log messages are no longer escaped, greatly improving readability of messages containing nested error contexts. #1842
VAST now supports building against {fmt} 8 and spdlog 1.9.2, and now requires at least {fmt} 7.1.3. #1846
VAST now ships with an updated schema type for the suricata.dhcp event, covering all fields of the extended output. #1854

Features

The segment-store store backend works correctly with vast get and vast explore. #1805
VAST can now process Eve JSON events of type suricata.packet that Suricata emits when the config option tagged-packets is set and a rule tags a packet using, e.g., tag:session,5,packets;. #1819 #1833

Bug Fixes

Previously missing fields of suricata event types are now part of the concept definitions of net.src.ip, net.src.port, net.dst.ip, net.dst.port, net.app, net.proto, net.community_id, net.vlan, and net.packets. #1798
Invalid segment files will no longer crash VAST at startup. #1820
Plugins in the prebuilt Docker images no longer show unspecified as their version. #1828
The configuration options vast.metrics.{file,uds}-sink.path now correctly specify paths relative to the database directory of VAST, rather than the current working directory of the VAST server. #1848
The segment-store store backend and built-in transform steps (hash, replace, and delete) now function correctly in static VAST binaries. #1850
The output of VAST status now includes status information for sources and sinks spawned in the VAST node, i.e., via vast spawn source|sink <format> rather than vast import|export <format>. #1852
In order to align with the GNU Coding Standards, the static binary (and other relocatable binaries) now uses /etc as sysconfdir for installations to /usr/bin/vast. #1856
VAST now only switches to journald style logging by default when it is actually supported. #1857
The CSV parser now correctly parses quoted fields in non-string types. E.g., "127.0.0.1" in CSV now successfully parsers when a matching schema contains an address type field. #1858
The memory counts in the output of vast status now represent bytes consistently, as opposed to a mix of bytes and kilobytes. #1862

2021.07.29

Changes

VAST no longer officially supports Debian Buster with GCC-8. In CI, VAST now runs on Debian Bullseye with GCC-10. The provided Docker images now use debian:bullseye-slim as base image. Users that require Debian Buster support should use the provided static builds instead. #1765
From now on VAST is compiled with the C++20 language standard. Minimum compiler versions have increased to GCC 10, Clang 11, and AppleClang 12.0.5. #1768
The vast binaries in our prebuilt Docker images no longer contain AVX instructions for increased portability. Building the image locally continues to add supported auto-vectorization flags automatically. #1778
The following new build options exist: VAST_ENABLE_AUTO_VECTORIZATION enables/disables all auto-vectorization flags, and VAST_ENABLE_SSE_INSTRUCTIONS enables -msse; similar options exist for SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX, and AVX2. #1778

Features

VAST has new a store_plugin type for custom store backends that hold the raw data of a partition. The new setting vast.store-backend controls the selection of the store implementation, which has a default value is segment-store. This is still an opt-in feature: unless the configuration value is set, VAST defaults to the old implementation. #1720 #1762 #1802
VAST now supports import filter expressions. They act as the dual to export query expressions: vast import suricata '#type == "suricata.alert"' < eve.json will import only suricata.alert events, discarding all other events. #1742
VAST now comes with a tenzir/vast-dev Docker image in addition to the regular tenzir/vast. The vast-dev image targets development contexts, e.g., when building additional plugins. The image contains all build-time dependencies of VAST and runs as root rather than the vast user. #1749
lsvast now prints extended information for hash indexes. #1755
The new Broker plugin enables seamless log ingestion from Zeek to VAST via a TCP socket. Broker is Zeek's messaging library and the plugin turns VAST into a Zeek logger node. Use vast import broker to establish a connection to a Zeek node and acquire logs. #1758
Plugin versions are now unique to facilitate debugging. They consist of three optional parts: (1) the CMake project version of the plugin, (2) the Git revision of the last commit that touched the plugin, and (3) a dirty suffix for uncommited changes to the plugin. Plugin developers no longer need to specify the version manually in the plugin entrypoint. #1764
VAST now supports the arm64 architecture. #1773
Installing VAST now includes a vast.yaml.example configuration file listing all available options. #1777
VAST now exports per-layout import metrics under the key <reader>.events.<layout-name> in addition to the regular <reader>.events. This makes it easier to understand the event type distribution. #1781
The static binary now bundles the Broker plugin. #1789

Bug Fixes

Configuring VAST to use CAF's built-in OpenSSL module via the caf.openssl.* options now works again as expected. #1740
The the status command now prints information about input and output transformations. #1748
A [*** LOG ERROR #0001 ***] error message on startup under Linux no longer occurs. #1754
Queries against fields using a #index=hash attribute could have missed some results. Fixing a bug in the offset calculation during bitmap processing resolved the issue. #1755
A regression caused VAST's plugins to be loaded in random order, which printed a warning about mismatching plugins between client and server. The order is now deterministic. #1756
VAST does not abort JSON imports anymore when encountering something other than a JSON object, e.g., a number or a string. Instead, VAST skips the offending line. #1759
Import processes now respond quicker. Shutdown requests are no longer delayed when the server process has busy imports, and metrics reports are now written in a timely manner. #1771
Particularly busy imports caused the shutdown of the server process to hang, if import processes were still running or had not yet flushed all data. The server now shuts down correctly in these cases. #1771
The static binary no longer behaves differently than the regular build with regards to its configuration directories: system-wide configuration files now reside in <prefix>/etc/vast/vast.yaml rather than /etc/vast/vast.yaml. #1777
The VAST_ENABLE_JOURNALD_LOGGING CMake option is no longer ignored. #1780
Plugins built against an external libvast no longer require the CMAKE_INSTALL_LIBDIR to be specified as a path relative to the configured CMAKE_INSTALL_PREFIX. This fixes an issue with plugins in separate packages for some package managers, e.g., Nix. #1786
The official Docker image and static binary distribution of VAST now produce the correct version output for plugins from the vast version command. #1799
The disk budget feature no longer triggers a rare segfault while deleting partitions. #1804 #1809

2021.06.24

Breaking Changes

Apache Arrow is now a required dependency. The previously deprecated build option -DVAST_ENABLE_ARROW=OFF no longer exists. #1683
VAST no longer loads static plugins by default. Generally, VAST now treats static plugins and bundled dynamic plugins equally, allowing users to enable or disable static plugins as needed for their deployments. #1703

Changes

The VAST community chat moved from Gitter to Slack. Join us in the #vast channel for vibrant discussions. #1696
The tenzir/vast Docker image bundles the PCAP plugin. #1705
VAST merges lists from configuration files. E.g., running VAST with --plugins=some-plugin and vast.plugins: [other-plugin] in the configuration now results in both some-plugin and other-plugin being loaded (sorted by the usual precedence), instead of just some-plugin. #1721 #1734

Features

The new option vast.start.commands allows for specifying an ordered list of VAST commands that run after successful startup. The effect is the same as first starting a node, and then using another VAST client to issue commands. This is useful for commands that have side effects that cannot be expressed through the config file, e.g., starting a source inside the VAST server that listens on a socket or reads packets from a network interface. #1699
The options vast.plugins and vast.plugin-dirs may now be specified on the command line as well as the configuration. Use the options --plugins and --plugin-dirs respectively. #1703
Add the reserved plugin name bundled to vast.plugins to enable load all bundled plugins, i.e., static or dynamic plugins built alongside VAST, or use --plugins=bundled on the command line. The reserved plugin name all causes all bundled and external plugins to be loaded, i.e., all shared libraries matching libvast-plugin-* from the configured vast.plugin-dirs. #1703
It's now possible to configure the VAST endpoint as an environment variable by setting VAST_ENDPOINT. This has higher precedence than setting vast.endpoint in configuration files, but lower precedence than passing --endpoint= on the command-line. #1714
Plugins load their respective configuration from <configdir>/vast/plugin/<plugin-name>.yaml in addition to the regular configuration file at <configdir>/vast/vast.yaml. The new plugin-specific file does not require putting configuration under the key plugins.<plugin-name>. This allows for deploying plugins without needing to touch the <configdir>/vast/vast.yaml configuration file. #1724

Bug Fixes

VAST no longer crashes when querying for string fields with non-string values. Instead, an error message warns the user about an invalid query. #1685
Building plugins against an installed VAST no longer requires manually specifying -DBUILD_SHARED_LIBS=ON. The option is now correctly enabled by default for external plugins. #1697
The UDS metrics sink continues to send data when the receiving socket is recreated. #1702
The vast.log-rotation-threshold option was silently ignored, causing VAST to always use the default log rotation threshold of 10 MiB. The option works as expected now. #1709
Additional tags for the tenzir/vast Docker image for the release versions exist, e.g., tenzir/vast:2021.05.27. #1711
The import csv command handles quoted fields correctly. Previously, the quotes were part of the parsed value, and field separators in quoted strings caused the parser to fail. #1712
Import processes no longer hang on receiving SIGINT or SIGKILL. Instead, they shut down properly after flushing yet to be processed data. #1718

2021.05.27

Breaking Changes

Schemas are no longer implicitly shared between sources, i.e., an import process importing data with a custom schema will no longer affect other sources started at a later point in time. Schemas known to the VAST server process are still available to all import processes. We do not expect this change to have a real-world impact, but it could break setups where some sources have been installed on hosts without their own schema files, the VAST server did not have up-to-date schema files, and other sources were (ab)used to provide the latest type information. #1656
The configure script was removed. This was a custom script that mimicked the functionality of an autotools-based configure script by writing directly to the cmake cache. Instead, users now must use the cmake and/or ccmake binaries directly to configure VAST. #1657

Changes

Building VAST without Apache Arrow via -DVAST_ENABLE_ARROW=OFF is now deprecated, and support for the option will be removed in a future release. As the Arrow ecosystem and libraries matured, we feel confident in making it a required dependency and plan to build upon it more in the future. #1682

Features

The new transforms feature allows VAST to apply transformations to incoming and outgoing data. A transform consists of a sequence of steps that execute sequentially, e.g., to remove, overwrite, hash, encrypt data. A new plugin type makes it easy to write custom transforms. #1517 #1656
Plugin schemas are now installed to <datadir>/vast/plugin/<plugin>/schema, while VAST's built-in schemas reside in <datadir>/vast/schema. The load order guarantees that plugins are able to reliably override the schemas bundled with VAST. #1608
The new option vast export --timeout=<duration> allows for setting a timeout for VAST queries. Cancelled exports result in a non-zero exit code. #1611
To enable easier post-processing, the new option vast.export.json.numeric-durations switches JSON output of duration types from human-readable strings (e.g., "4.2m") to numeric (e.g., 252.15) in fractional seconds. #1628
The status command now prints the VAST server version information under the version key. #1652
The new setting vast.disk-monitor-step-size enables the disk monitor to remove N partitions at once before re-checking if the new size of the database directory is now small enough. This is useful when checking the size of a directory is an expensive operation itself, e.g., on compressed filesystems. #1655

Bug Fixes

VAST now correctly refuses to run when loaded plugins fail their initialization, i.e., are in a state that cannot be reasoned about. #1618
A recent change caused imports over UDP not to forward its events to the VAST server process. Running vast import -l :<port>/udp <format> now works as expected again. #1622
Non-relocatable VAST binaries no longer look for configuration, schemas, and plugins in directories relative to the binary location. Vice versa, relocatable VAST binaries no longer look for configuration, schemas, and plugins in their original install directory, and instead always use paths relative to their binary location. On macOS, we now always build relocatable binaries. Relocatable binaries now work correctly on systems where the libary install directory is lib64 instead of lib. #1624
VAST no longer erroneously skips the version mismatch detection between client and server. The check now additionally compares running plugins. #1652
Executing VAST's unit test suite in parallel no longer fails. #1659
VAST and transform plugins now build without Arrow support again. #1673
The delete transform step correctly deletes fields from the layout when running VAST with Arrow disabled. #1673
VAST no longer erroneously warns about a version mismatch between client and server when their plugin load order differs. #1679

2021.04.29

Breaking Changes

The previously deprecated (#1409) option vast.no-default-schema no longer exists. #1507
Plugins configured via vast.plugins in the configuration file can now be specified using either the plugin name or the full path to the shared plugin library. We no longer allow omitting the extension from specified plugin files, and recommend using the plugin name as a more portable solution, e.g., example over libexample and /path/to/libexample.so over /path/to/libexample. #1527
The previously deprecated usage (#1354) of format-independent options after the format in commands is now no longer possible. This affects the options listen, read, schema, schema-file, type, and uds for import commands and the write and uds options for export commands. #1529
Plugins must define a separate entrypoint in their build scaffolding using the argument ENTRYPOINT to the CMake function VASTRegisterPlugin. If only a single value is given to the argument SOURCES, it is interpreted as the ENTRYPOINT automatically. #1549
To avoid confusion between the PCAP plugin and libpcap, which both have a library file named libpcap.so, we now generally prefix the plugin library output names with vast-plugin-. E.g., The PCAP plugin library file is now named libvast-plugin-pcap.so. Plugins specified with a full path in the configuration under vast.plugins must be adapted accordingly. #1593

Changes

The metrics for Suricata Eve JSON and Zeek Streaming JSON imports are now under the categories suricata-reader and zeek-reader respectively so they can be distinguished from the regular JSON import, which is still under json-reader. #1498
VAST now ships with a schema record type for Suricata's rfb event type. #1499 @satta
The exporter.hits metric has been removed. #1514 #1574
We upstreamed the Debian patches provided by @satta. VAST now prefers an installed tsl-robin-map>=0.6.2 to the bundled one unless configured with --with-bundled-robin-map, and we provide a manpage for lsvast if pandoc is installed. #1515
The Suricata dns schema type now defines the dns.grouped.A field containing a list of all returned addresses. #1531
The status output of Analyzer Plugins moved from the importer.analyzers key into the top-level record. #1544
The new option --disable-default-config-dirs disables the loading of user and system configuration, schema, and plugin directories. We use this option internally when running integration tests. #1557
Building VAST now requires CMake >= 3.15. #1559
The VAST community chat moved from Element to Gitter. Join us at gitter.im/tenzir/vast or via Matrix at #tenzir_vast:gitter.im. #1591

Features

The disk monitor gained a new vast.start.disk-budget-check-binary option that can be used to specify an external binary to determine the size of the database directory. This can be useful in cases where stat() does not give the correct answer, e.g. on compressed filesystems. #1453
The VAST_PLUGIN_DIRS and VAST_SCHEMA_DIRS environment variables allow for setting additional plugin and schema directories separated with : with higher precedence than other plugin and schema directories. #1532 #1541
It is now possible to build plugins against an installed VAST. This requires a slight adaptation to every plugin's build scaffolding. The example plugin was updated accordingly. #1532
Component Plugins are a new category of plugins that execute code within the VAST server process. Analyzer Plugins are now a specialization of Component Plugins, and their API remains unchanged. #1544 #1547 #1588
Reader Plugins and Writer Plugins are a new family of plugins that add import/export formats. The previously optional PCAP format moved into a dedicated plugin. Configure with --with-pcap-plugin and add pcap to vast.plugins to enable the PCAP plugin. #1549

Bug Fixes

VAST no longer erroneously tries to load explicitly specified plugins dynamically that are linked statically. #1528
Custom commands from plugins ending in start no longer try to write to the server instead of the client log file. #1530
Linking against an installed VAST via CMake now correctly resolves VAST's dependencies. #1532
VAST no longer refuses to start when any of the configuration file directories is unreadable, e.g., because VAST is running in a sandbox. #1533
The CSV reader no longer crashes when encountering nested type aliases. #1534
The command-line parser no longer crashes when encountering a flag with missing value in the last position of a command invocation. #1536
A bug in the parsing of ISO8601 formatted dates that incorrectly adjusted the time to the UTC timezone has been fixed. #1537
Plugin unit tests now correctly load and initialize their respective plugins. #1549
The shutdown logic contained a bug that would make the node fail to terminate in case a plugin actor is registered at said node. #1563
A race condition in the shutdown logic that caused an assertion was fixed. #1563
VAST now correctly builds within shallow clones of the repository. If the build system is unable to determine the correct version from git-describe, it now always falls back to the version of the last release. #1570
We fixed a regression that made it impossible to build static binaries from outside of the repository root directory. #1573
The VASTRegisterPlugin CMake function now correctly removes the ENTRYPOINT from the given SOURCES, allowing for plugin developers to easily glob for sources again. #1573
The exporter.selectivity metric is now 1.0 instead of NaN for idle periods. #1574
VAST no longer renders JSON numbers with non-finite numbers as NaN, -NaN, inf, or -inf, resulting in invalid JSON output. Instead, such numbers are now rendered as null. #1574
Specifying relative CMAKE_INSTALL_*DIR in the build configuration no longer causes VAST not to pick up system-wide installed configuration files, schemas, and plugins. The configured install prefix is now used correctly. The defunct VAST_SYSCONFDIR, VAST_DATADIR, and VAST_LIBDIR CMake options no longer exist. Use a combination of CMAKE_INSTALL_PREFIX and CMAKE_INSTALL_*DIR instead. #1580
Spaces before SI prefixes in command line arguments and configuration options are now generally ignored, e.g., it is now possible to set the disk monitor budgets to 2 GiB rather than 2GiB. #1590

2021.03.25

Breaking Changes

The previously deprecated #timestamp extractor has been removed from the query language entirely. Use :timestamp instead. #1399
Plugins can now be linked statically against VAST. A new VASTRegisterPlugin CMake function enables easy setup of the build scaffolding required for plugins. Configure with --with-static-plugins or build a static binary to link all plugins built alongside VAST statically. All plugin build scaffoldings must be adapted, older plugins do no longer work. #1445 #1452

Changes

The default size of table slices (event batches) that is created from vast import processes has been changed from 1,000 to 1,024. #1396
VAST now ships with schema record types for Suricata's mqtt and anomaly event types. #1408 @satta
The option vast.no-default-schema is deprecated, as it is no longer needed to override types from bundled schemas. #1409
Query latency for expressions that contain concept names has improved substantially. For DB sizes in the TB region, and with a large variety of event types, queries with a high selectivity experience speedups of up to 5x. #1433
The zeek-to-vast utility was moved to the tenzir/zeek-vast repository. All options related to zeek-to-vast and the bundled Broker submodule were removed. #1435
The type extractor in the expression language now works with type aliases. For example, given the type definition for port from the base schema type port = count, a search for :count will also consider fields of type port. #1446

Features

The schema language now supports 4 operations on record types: + combines the fields of 2 records into a new record. <+ and +> are variations of + that give precedence to the left and right operand respectively. - creates a record with the field specified as its right operand removed. #1407 #1487 #1490
VAST now supports nested records in Arrow table slices and in the JSON import, e.g., data of type list<record<name: string, age: count>. While nested record fields are not yet queryable, ingesting such data will no longer cause VAST to crash. MessagePack table slices don't support records in lists yet. #1429

Bug Fixes

Some non-null pointers were incorrectly rendered as *nullptr in log messages. #1430
Data that was ingested before the deprecation of the #timestamp attribute wasn't exported correctly with newer versions. This is now corrected. #1432
The JSON parser now accepts data with numerical or boolean values in fields that expect strings according to the schema. VAST converts these values into string representations. #1439
A query for a field or field name suffix that matches multiple fields of different types would erroneously return no results. #1447
The disk monitor now correctly erases partition synopses from the meta index. #1450
The archive, index, source, and sink components now report metrics when idle instead of omitting them entirely. This allows for distinguishing between idle and not running components from the metrics. #1451
VAST no longer crashes when the disk monitor tries to calculate the size of the database while files are being deleted. Instead, it will retry after the configured scan interval. #1458
Insufficient permissions for one of the paths in the schema-dirs option would lead to a crash in vast start. #1472
A race condition during server shutdown could lead to an invariant violation, resulting in a firing assertion. Streamlining the shutdown logic resolved the issue. #1473 #1485
Enabling the disk budget feature no longer prevents the server process from exiting after it was stopped. #1495

2021.02.24

Breaking Changes

VAST switched to spdlog >= 1.5.0 for logging. For users, this means: The vast.console-format and vast.file-format now must be specified using the spdlog pattern syntax as described here. All settings under caf.logger.* are now ignored by VAST, and only the vast.* counterparts are used for logger configuration. #1223 #1328 #1334 #1390 @a4z
VAST now requires {fmt} >= 5.2.1 to be installed. #1330
All options in vast.metrics.* had underscores in their names replaced with dashes to align with other options. For example, vast.metrics.file_sink is now vast.metrics.file-sink. The old options no longer work. #1368
User-supplied schema files are now picked up from <SYSCONFDIR>/vast/schema and <XDG_CONFIG_HOME>/vast/schema instead of <XDG_DATA_HOME>/vast/schema. #1372
The previously deprecated options vast.spawn.importer.ids and vast.schema-paths no longer work. Furthermore, queries spread over multiple arguments are now disallowed instead of triggering a deprecation warning. #1374
The special meaning of the #timestamp attribute has been removed from the schema language. Timestamps can from now on be marked as such by using the timestamp type instead. Queries of the form #timestamp <op> value remain operational but are deprecated in favor of :timestamp. Note that this change also affects :time queries, which aren't supersets of #timestamp queries any longer. #1388

Changes

Schema parsing now uses a 2-pass loading phase so that type aliases can reference other types that are later defined in the same directory. Additionally, type definitions from already parsed schema dirs can be referenced from schema types that are parsed later. Types can also be redefined in later directories, but a type can not be defined twice in the same directory. #1331
The infer command has an improved heuristic for the number types int, count, and real. #1343 #1356 @ngrodzitski
The options listen, read, schema, schema-file, type, and uds can from now on be supplied to the import command directly. Similarly, the options write and uds can be supplied to the export command. All options can still be used after the format subcommand, but that usage is deprecated. #1354
The query normalizer interprets value predicates of type subnet more broadly: given a subnet S, the parser expands this to the expression :subnet == S || :addr in S. This change makes it easier to search for IP addresses belonging to a specific subnet. #1373
The output of vast help and vast documentation now goes to stdout instead of to stderr. Erroneous invocations of vast also print the helptext, but in this case the output still goes to stderr to avoid interference with downstream tooling. #1385

Experimental Features

Sigma rules are now a valid format to represent query expression. VAST parses the detection attribute of a rule and translates it into a native query expression. To run a query using a Sigma rule, pass it on standard input, e.g., vast export json < rule.yaml. #1379

Features

VAST rotates server logs by default. The new config options vast.disable-log-rotation and vast.log-rotation-threshold can be used to control this behaviour. #1223 #1362
The meta index now stores partition synopses in separate files. This will decrease restart times for systems with large databases, slow disks and aggressive readahead settings. A new config setting vast.meta-index-dir allows storing the meta index information in a separate directory. #1330 #1376
The JSON import now always relies upon simdjson. The previously experimental --simdjson option to the vast import json|suricata|zeek-json commands no longer exist as the feature is considered stable. #1343 #1356 @ngrodzitski
The new options vast.metrics.file-sink.real-time and vast.metrics.uds-sink.real-time enable real-time metrics reporting for the file sink and UDS sink respectively. #1368
The type extractor in the expression language now works with user defined types. For example the type port is defined as type port = count in the base schema. This type can now be queried with an expression like :port == 80. #1382

Bug Fixes

An ordering issue introduced in #1295 that could lead to a segfault with long-running queries was reverted. #1381
A bug in the new simdjson based JSON reader introduced in #1356 could trigger an assertion in the vast import process if an input field could not be converted to the field type in the target layout. This is no longer the case. #1386

2021.01.28

Breaking Changes

The new short options -v, -vv, -vvv, -q, -qq, and -qqq map onto the existing verbosity levels. The existing short syntax, e.g., -v debug, no longer works. #1244
The GitHub CI changed to Debian Buster and produces Debian artifacts instead of Ubuntu artifacts. Similarly, the Docker images we provide on Docker Hub use Debian Buster as base image. To build Docker images locally, users must set DOCKER_BUILDKIT=1 in the build environment. #1294

Changes

VAST preserves nested JSON objects in events instead of formatting them in a flattened form when exporting data with vast export json. The old behavior can be enabled with vast export json --flatten. #1257 #1289
vast start prints the endpoint it is listening on when providing the option --print-endpoint. #1271
The option vast.schema-paths is renamed to vast.schema-dirs. The old option is deprecated and will be removed in a future release. #1287

Experimental Features

VAST features a new plugin framework to support efficient customization points at various places of the data processing pipeline. There exist several base classes that define an interface, e.g., for adding new commands or spawning a new actor that processes the incoming stream of data. The directory examples/plugins/example contains an example plugin. #1208 #1264 #1275 #1282 #1285 #1287 #1302 #1307 #1316
VAST relies on simdjson for JSON parsing. The substantial gains in throughput shift the bottleneck of the ingest path from parsing input to indexing at the node. To use the (yet experimental) feature, use vast import json|suricata|zeek-json --simdjson. #1230 #1246 #1281 #1314 #1315 @ngrodzitski

Features

The new import zeek-json command allows for importing line-delimited Zeek JSON logs as produced by the json-streaming-logs package. Unlike stock Zeek JSON logs, where one file contains exactly one log type, the streaming format contains different log event types in a single stream and uses an additional _path field to disambiguate the log type. For stock Zeek JSON logs, use the existing import json with the -t flag to specify the log type. #1259
VAST queries also accept nanoseconds, microseconds, milliseconds seconds and minutes as units for a duration. #1265
The output of vast status contains detailed memory usage information about active and cached partitions. #1297
VAST installations bundle a LICENSE.3rdparty file alongside the regular LICENSE file that lists all embedded code that is under a separate license. #1306

Bug Fixes

Invalid Arrow table slices read from disk no longer trigger a segmentation fault. Instead, the invalid on-disk state is ignored. #1247
Manually specified configuration files may reside in the default location directories. Configuration files can be symlinked. #1248
For relocatable installations, the list of schema loading paths does not include a build-time configured path any more. #1249
Values in JSON fields that can't be converted to the type that is specified in the schema won't cause the containing event to be dropped any longer. #1250
Line based imports correctly handle read timeouts that occur in the middle of a line. #1276
Disk monitor quota settings not ending in a 'B' are no longer silently discarded. #1278
A potential race condition that could lead to a hanging export if a partition was persisted just as it was scanned no longer exists. #1295

2020.12.16

Breaking Changes

The splunk-to-vast script has a new name: taxonomize. The script now also generates taxonomy declarations for Azure Sentinel. #1134
CAF-encoded table slices no longer exist. As such, the option vast.import.batch-encoding now only supports arrow and msgpack as arguments. #1142
The on-disk format for table slices now supports versioning of table slice encodings. This breaking change makes it so that adding further encodings or adding new versions of existing encodings is possible without breaking again in the future. #1143 #1157 #1160 #1165
Archive segments no longer include an additional, unnecessary version identifier. We took the opportunity to clean this up bundled with the other recent breaking changes. #1168
The build configuration of VAST received a major overhaul. Inclusion of libvast in other procects via add_subdirectory(path/to/vast) is now easily possible. The names of all build options were aligned, and the new build summary shows all available options. #1175
The port type is no longer a first-class type. The new way to represent transport-layer ports relies on count instead. In the schema, VAST ships with a new alias type port = count to keep existing schema definitions in tact. However, this is a breaking change because the on-disk format and Arrow data representation changed. Queries with :port type extractors no longer work. Similarly, the syntax 53/udp no longer exists; use count syntax 53 instead. Since most port occurrences do not carry a known transport-layer type, and the type information exists typically in a separate field, removing port as native type streamlines the data model. #1187

Changes

VAST no longer requires you to manually remove a stale PID file from a no-longer running vast process. Instead, VAST prints a warning and overwrites the old PID file. #1128
VAST does not produce metrics by default any more. The option --disable-metrics has been renamed to --enable-metrics accordingly. #1137
VAST now processes the schema directory recursively, as opposed to stopping at nested directories. #1154
The default segment size in the archive is now 1 GiB. This reduces fragmentation of the archive meta data and speeds up VAST startup time. #1166
VAST now listens on port 42000 instead of letting the operating system choose the port if the option vast.endpoint specifies an endpoint without a port. To restore the old behavior, set the port to 0 explicitly. #1170
The Suricata schemas received an overhaul: there now exist vlan and in_iface fields in all types. In addition, VAST ships with new types for ikev2, nfs, snmp, tftp, rdp, sip and dcerpc. The tls type gets support for the additional sni and session_resumed fields. #1176 #1180 #1186 #1237 @satta
Installed schema definitions now reside in <datadir>/vast/schema/types, taxonomy definitions in <datadir>/vast/schema/taxonomy, and concept definitions in <datadir/vast/schema/concepts, as opposed to them all being in the schema directory directly. When overriding an existing installation, you may have to delete the old schema definitions by hand. #1194
The zeek export format now strips off the prefix zeek. to ensure full compatibility with regular Zeek output. For all non-Zeek types, the prefix remains intact. #1205

Experimental Features

VAST now ships with its own taxonomy and basic concept definitions for Suricata, Zeek, and Sysmon. #1135 #1150
The query language now supports models. Models combine a list of concepts into a semantic unit that can be fulfiled by an event. If the type of an event contains a field for every concept in a model. Turn to the documentation for more information. #1185 #1228
The expression language gained support for the #field meta extractor. It is the complement for #type and uses suffix matching for field names at the layout level. #1228

Features

The new option vast.client-log-file enables client-side logging. By default, VAST only writes log files for the server process. #1132
The new option --print-bytesizes of lsvast prints information about the size of certain fields of the flatbuffers inside a VAST database directory. #1149
The storage required for index IP addresses has been optimized. This should result in significantly reduced memory usage over time, as well as faster restart times and reduced disk space requirements. #1172 #1200 #1216
A new key 'meta-index-bytes' appears in the status output generated by vast status --detailed. #1193
The new dump command prints configuration and schema-related information. The implementation allows for printing all registered concepts and models, via vast dump concepts and vast dump models. The flag to --yaml to dump switches from JSON to YAML output, such that it confirms to the taxonomy configuration syntax. #1196 #1233
On Linux, VAST now contains a set of built-in USDT tracepoints that can be used by tools like perf or bpftrace when debugging. Initially, we provide the two tracepoints chunk_make and chunk_destroy, which trigger every time a vast::chunk is created or destroyed. #1206
Low-selectivity queries of string (in)equality queries now run up to 30x faster, thanks to more intelligent selection of relevant index partitions. #1214

Bug Fixes

vast import no longer stalls when it doesn't receive any data for more than 10 seconds. #1136
The vast.yaml.example contained syntax errors. The example config file now works again. #1145
VAST no longer starts if the specified config file does not exist. #1147
The output of vast status --detailed now contains informations about runnings sinks, e.g., vast export <format> <query> processes. #1155
VAST no longer blocks when an invalid query operation is issued. #1189
The type registry now detects and handles breaking changes in schemas, e.g., when a field type changes or a field is dropped from record. #1195
The index now correctly drops further results when queries finish early, thus improving the performance of queries for a limited number of events. #1209
The index no longer crashes when too many parallel queries are running. #1210
The index no longer causes exporters to deadlock when the meta index produces false positives. #1225
The summary log message of vast export now contains the correct number of candidate events. #1228
The vast status command does not collect status information from sources and sinks any longer. They were often too busy to respond, leading to a long delay before the command completed. #1234
Concepts that reference other concepts are now loaded correctly from their definition. #1236

2020.10.29

Changes

The new option import.read-timeout allows for setting an input timeout for low volume sources. Reaching the timeout causes the current batch to be forwarded immediately. This behavior was previously controlled by import.batch-timeout, which now only controls the maximum buffer time before the source forwards batches to the server. #1096
VAST will now warn if a client command connects to a server that runs on a different version of the vast binary. #1098
Log files are now less verbose because class and function names are not printed on every line. #1107
The default database directory moved to /var/lib/vast for Linux deployments. #1116

Experimental Features

The query language now comes with support for concepts, the first part of taxonomies. Concepts is a mechanism to unify the various naming schemes of different data formats into a single, coherent nomenclature. #1102
A new disk monitor component can now monitor the database size and delete data that exceeds a specified threshold. Once VAST reaches the maximum amount of disk space, the disk monitor deletes the oldest data. The command-line options --disk-quota-high, --disk-quota-low, and --disk-quota-check-interval control the rotation behavior. #1103

Features

When running VAST under systemd supervision, it is now possible to use the Type=notify directive in the unit file to let VAST notify the service manager when it becomes ready. #1091
The new options vast.segments and vast.max-segment-size control how the archive generates segments. #1103
The new script splunk-to-vast converts a splunk CIM model file in JSON to a VAST taxonomy. For example, splunk-to-vast < Network_Traffic.json renders the concept definitions for the Network Traffic datamodel. The generated taxonomy does not include field definitions, which users should add separately according to their data formats. #1121
The expression language now accepts records without field names. For example,id == <192.168.0.1, 41824, 143.51.53.13, 25, "tcp"> is now valid syntax and instantiates a record with 5 fields. Note: expressions with records currently do not execute. #1129

Bug Fixes

The lookup for schema directories now happens in a fixed order. #1086
Sources that receive no or very little input do not block vast status any longer. #1096
The vast status --detailed command now correctly shows the status of all sources, i.e., vast import or vast spawn source commands. #1109
VAST no longer opens a random public port, which used to be enabled in the experimental VAST cluster mode in order to transparently establish a full mesh. #1110
The lsvast tool failed to print FlatBuffers schemas correctly. The output now renders correctly. #1123

2020.09.30

Breaking Changes

Data exported in the Apache Arrow format now contains the name of the payload record type in the metadata section of the schema. #1072
The persistent storage format of the index now uses FlatBuffers. #863

Changes

The JSON export format now renders duration and port fields using strings as opposed to numbers. This avoids a possible loss of information and enables users to re-use the output in follow-up queries directly. #1034
The delay between the periodic log messages for reporting the current event rates has been increased to 10 seconds. #1035
The global VAST configuration now always resides in <sysconfdir>/vast/vast.conf, and bundled schemas always in <datadir>/vast/schema/. VAST no longer supports reading a vast.conf file in the current working directory. #1036
The proprietary VAST configuration file has changed to the more ops-friendly industry standard YAML. This change introduced also a new dependency: yaml-cpp version 0.6.2 or greater. The top-level vast.yaml.example illustrates how the new YAML config looks like. Please rename existing configuration files from vast.conf to vast.yaml. VAST still reads vast.conf but will soon only look for vast.yaml or vast.yml files in available configuration file paths. #1045 #1055 #1059 #1062
The options that affect batches in the import command received new, more user-facing names: import.table-slice-type, import.table-slice-size, and import.read-timeout are now called import.batch-encoding, import.batch-size, and import.read-timeout respectively. #1058
All configuration options are now grouped into vast and caf sections, depending on whether they affect VAST itself or are handed through to the underlying actor framework CAF directly. Take a look at the bundled vast.yaml.example file for an explanation of the new layout. #1073
We refactored the index architecture to improve stability and responsiveness. This includes fixes for several shutdown issues. #863

Experimental Features

The vast get command has been added. It retrieves events from the database directly by their ids. #938

Features

VAST now supports the XDG base directory specification: The vast.conf is now found at ${XDG_CONFIG_HOME:-${HOME}/.config}/vast/vast.conf, and schema files at ${XDG_DATA_HOME:-${HOME}/.local/share}/vast/schema/. The user-specific configuration file takes precedence over the global configuration file in <sysconfdir>/vast/vast.conf. #1036
VAST now merges the contents of all used configuration files instead of using only the most user-specific file. The file specified using --config takes the highest precedence, followed by the user-specific path ${XDG_CONFIG_HOME:-${HOME}/.config}/vast/vast.conf, and the compile-time path <sysconfdir>/vast/vast.conf. #1040
VAST now ships with a new tool lsvast to display information about the contents of a VAST database directory. See lsvast --help for usage instructions. #863
The output of the status command was restructured with a strong focus on usability. The new flags --detailed and --debug add additional content to the output. #995

Bug Fixes

Stalled sources that were unable to generate new events no longer stop import processes from shutting down under rare circumstances. #1058

2020.08.28

Breaking Changes

We now bundle a patched version of CAF, with a changed ABI. This means that if you're linking against the bundled CAF library, you also need to distribute that library so that VAST can use it at runtime. The versions are API compatible so linking against a system version of CAF is still possible and supported. #1020

Changes

The set type has been removed. Experience with the data model showed that there is no strong use case to separate sets from vectors in the core. While this may be useful in programming languages, VAST deals with immutable data where set constraints have been enforced upstream. This change requires updating existing schemas by changing set<T> to vector<T>. In the query language, the new symbol for the empty map changed from {-} to {}, as it now unambiguously identifies map instances. #1010
The vector type has been renamed to list. In an effort to streamline the type system vocabulary, we favor list over vector because it's closer to existing terminology (e.g., Apache Arrow). This change requires updating existing schemas by changing vector<T> to list<T>. #1016
The expression field parser now allows the '-' character. #999

Features

VAST now writes a PID lock file on startup to prevent multiple server processes from accessing the same persistent state. The pid.lock file resides in the vast.db directory. #1001
The default schema for Suricata has been updated to support the suricata.ftp and suricata.ftp_data event types. #1009
VAST now prints the location of the configuration file that is used. #1009

Bug Fixes

The shutdown process of the server process could potentially hang forever. VAST now uses a 2-step procedure that first attempts to terminate all components cleanly. If that fails, it will attempt a hard kill afterwards, and if that fails after another timeout, the process will call abort(3). #1005
When continuous query in a client process terminated, the node did not clean up the corresponding server-side state. This memory leak no longer exists. #1006
The port encoding for Arrow-encoded table slices is now host-independent and always uses network-byte order. #1007
Importing JSON no longer fails for JSON fields containing null when the corresponding VAST type in the schema is a non-trivial type like vector<string>. #1009
Some file descriptors remained open when they weren't needed any more. This descriptor leak has been fixed. #1018
When running VAST under heavy load, CAF stream slot ids could wrap around after a few days and deadlock the system. As a workaround, we extended the slot id bit width to make the time until this happens unrealistically large. #1020
Incomplete reads have not been handled properly, which manifested for files larger than 2GB. On macOS, writing files larger than 2GB may have failed previously. VAST now respects OS-specific constraints on the maximum block size. #1025
VAST would overwrite existing on-disk state data when encountering a partial read during startup. This state-corrupting behavior no longer exists. #1026
VAST did not terminate when a critical component failed during startup. VAST now binds the lifetime of the node to all critical components. #1028
MessagePack-encoded table slices now work correctly for nested container types. #984
A bug in the expression parser prevented the correct parsing of fields starting with either 'F' or 'T'. #999

2020.07.28

Breaking Changes

FlatBuffers is now a required dependency for VAST. The archive and the segment store use FlatBuffers to store and version their on-disk persistent state. #972

Changes

The suricata schema file contains new type definitions for the stats, krb5, smb, and ssh events. #954 #986
VAST now recognizes /etc/vast/schema as an additional default directory for schema files. #980

Features

Starting with this release, installing VAST on any Linux becomes significantly easier: A static binary will be provided with each release on the GitHub releases page. #966
We open-sourced our MessagePack-based table slice implementation, which provides a compact row-oriented encoding of data. This encoding works well for binary formats (e.g., PCAP) and access patterns that involve materializing entire rows. The MessagePack table slice is the new default when Apache Arrow is unavailable. To enable parsing into MessagePack, you can pass --table-slice-type=msgpack to the import command, or set the configuration option import.table-slice-type to 'msgpack'. #975

Bug Fixes

The PCAP reader now correctly shows the amount of generated events. #954

2020.06.25

Changes

The options system.table-slice-type and system.table-slice-size have been removed, as they duplicated import.table-slice-type and import.table-slice-size respectively. #908 #951
The JSON export format now renders timestamps using strings instead of numbers in order to avoid possible loss of precision. #909
The default table slice type has been renamed to caf. It has not been the default when built with Apache Arrow support for a while now, and the new name more accurately reflects what it is doing. #948

Experimental Features

VAST now supports aging out existing data. This feature currently only concerns data in the archive. The options system.aging-frequency and system.aging-query configure a query that runs on a regular schedule to determine which events to delete. It is also possible to trigger an aging cycle manually. #929

Features

VAST now has options to limit the amount of results produced by an invocation of vast explore. #882
The import json command's type restrictions are more relaxed now, and can additionally convert from JSON strings to VAST internal data types. #891
VAST now supports /etc/vast/vast.conf as an additional fallback for the configuration file. The following file locations are looked at in order: Path specified on the command line via --config=path/to/vast.conf, vast.conf in current working directory, ${INSTALL_PREFIX}/etc/vast/vast.conf, and /etc/vast/vast.conf. #898
The import command gained a new --read-timeout option that forces data to be forwarded to the importer regardless of the internal batching parameters and table slices being unfinished. This allows for reducing the latency between the import command and the node. The default timeout is 10 seconds. #916
The output format for the explore and pivot commands can now be set using the explore.format and pivot.format options respectively. Both default to JSON. #921
The meta index now uses Bloom filters for equality queries involving IP addresses. This especially accelerates queries where the user wants to know whether a certain IP address exists in the entire database. #931

Bug Fixes

A use after free bug would sometimes crash the node while it was shutting down. #896
A bogus import process that assembled table slices with a greater number of events than expected by the node was able to lead to wrong query results. #908
The export json command now correctly unescapes its output. #910
VAST now correctly checks for control characters in inputs. #910

2020.05.28

Changes

The command line flag for disabling the accountant has been renamed to --disable-metrics to more accurately reflect its intended purpose. The internal vast.statistics event has been renamed to vast.metrics. #870
Spreading a query over multiple command line arguments in commands like explore/export/pivot/etc. has been deprecated. #878

Experimental Features

Added a new explore command to VAST that can be used to show data records within a certain time from the results of a query. #873 #877

Features

All input parsers now support mixed \n and \r\n line endings. #865
When importing events of a new or updated type, VAST now only requires the type to be specified once (e.g., in a schema file). For consecutive imports, the event type does not need to be specified again. A list of registered types can now be viewed using vast status under the key node.type-registry.types. #875
When importing JSON data without knowing the type of the imported events a priori, VAST now supports automatic event type deduction based on the JSON object keys in the data. VAST selects a type iff the set of fields match a known type. The --type / -t option to the import command restricts the matching to the set of types that share the provided prefix. Omitting -t attempts to match JSON against all known types. If only a single variant of a type is matched, the import falls back to the old behavior and fills in nil for mismatched keys. #875
VAST now prints a message when it is waiting for user input to read a query from a terminal. #878
VAST now ships with a schema suitable for Sysmon import. #886

Bug Fixes

The parser for Zeek tsv data used to ignore attributes that were defined for the Zeek-specific types in the schema files. It has been modified to respect and prefer the specified attributes for the fields that are present in the input data. #847
Fixed a bug that caused vast import processes to produce 'default' table slices, despite having the 'arrow' type as the default. #866
Fixed a bug where setting the logger.file-verbosity in the config file would not have an effect. #866

2020.04.29

Changes

The index specific options max-partition-size, max-resident-partitions, max-taste-partitions, and max-queries can now be specified on the command line when starting a node. #728
The default bind address has been changed from :: to localhost. #828
The option --skip-candidate-checks / -s for the count command was renamed to --estimate / -e. #843

Features

Packet drop and discard statistics are now reported to the accountant for PCAP import, and are available using the keys pcap-reader.recv, pcap-reader.drop, pcap-reader.ifdrop, pcap-reader.discard, and pcap-reader.discard-rate in the vast.statistics event. If the number of dropped packets exceeds a configurable threshold, VAST additionally warns about packet drops on the command line. #827 #844
Bash autocompletion for vast is now available via the autocomplete script located at scripts/vast-completions.bash in the VAST source tree. #833

Bug Fixes

Archive lookups are now interruptible. This change fixes an issue that caused consecutive exports to slow down the node, which improves the overall performance for larger databases considerably. #825
Fixed a crash when importing data while a continuous export was running for unrelated events. #830
Queries of the form x != 80/tcp were falsely evaluated as x != 80/? && x != ?/tcp. (The syntax in the second predicate does not yet exist; it only illustrates the bug.) Port inequality queries now correctly evaluate x != 80/? || x != ?/tcp. E.g., the result now contains values like 80/udp and 80/?, but also 8080/tcp. #834
Fixed a bug that could cause stalled input streams not to forward events to the index and archive components for the JSON, CSV, and Syslog readers, when the input stopped arriving but no EOF was sent. This is a follow-up to #750. A timeout now ensures that that the readers continue when some events were already handled, but the input appears to be stalled. #835
For some queries, the index evaluated only a subset of all relevant partitions in a non-deterministic manner. Fixing a violated evaluation invariant now guarantees deterministic execution. #842
The stop command always returned immediately, regardless of whether it succeeded. It now blocks until the remote node shut down properly or returns an error exit code upon failure. #849

2020.03.26

Changes

The VERBOSE log level has been added between INFO and DEBUG. This level is enabled at build time for all build types, making it possible to get more detailed logging output from release builds. #787
The internal statistics event type vast.account has been renamed to vast.statistics for clarity. #789
The command line options prefix for changing CAF options was changed from --caf# to --caf.. #797
The log folder vast.log/ in the current directory will not be created by default any more. Users must explicitly set the system.file-verbosity option if they wish to keep the old behavior. #803
The config option system.log-directory was deprecated and replaced by the new option system.log-file. All logs will now be written to a single file. #806

Features

The new vast import syslog command allows importing Syslog messages as defined in RFC5424. #770
The option --disable-community-id has been added to the vast import pcap command for disabling the automatic computation of Community IDs. #777
Continuous export processes can now be stopped correctly. Before this change, the node showed an error message and the exporting process exited with a non-zero exit code. #779
The short option -c for setting the configuration file has been removed. The long option --config must now be used instead. This fixed a bug that did not allow for -c to be used for continuous exports. #781
Expressions must now be parsed to the end of input. This fixes a bug that caused malformed queries to be evaluated until the parser failed. For example, the query #type == "suricata.http" && .dest_port == 80 was erroneously evaluated as #type == "suricata.http" instead. #791
The hash index has been re-enabled after it was outfitted with a new high-performance hash map implementation that increased performance to the point where it is on par with the regular index. #796
An under-the-hood change to our parser-combinator framework makes sure that we do not discard possibly invalid input data up the the end of input. This uncovered a bug in our MRT/bgpdump integrations, which have thus been disabled (for now), and will be fixed at a later point in time. #808

2020.02.27

Changes

The build system will from now on try use the CAF library from the system, if one is provided. If it is not found, the CAF submodule will be used as a fallback. #740
VAST now supports (and requires) Apache Arrow >= 0.16. #751
The option --historical for export commands has been removed, as it was the default already. #754
The option --directory has been replaced by --db-directory and log-directory, which set directories for persistent state and log files respectively. The default log file path has changed from vast.db/log to vast.log. #758
Hash indices have been disabled again due to a performance regression. #765

Features

For users of the Nix package manager, expressions have been added to generate reproducible development environments with nix-shell. #740

Bug Fixes

Continuously importing events from a Zeek process with a low rate of emitted events resulted in a long delay until the data would be included in the result set of queries. This is because the import process would buffer up to 10,000 events before sending them to the server as a batch. The algorithm has been tuned to flush its buffers if no data is available for more than 500 milliseconds. #750

2020.01.31

Changes

The import pcap command no longer takes interface names via --read,-r, but instead from a separate option named --interface,-i. This change has been made for consistency with other tools. #641
Record field names can now be entered as quoted strings in the schema and expression languages. This lifts a restriction where JSON fields with whitespaces or special characters could not be ingested. #685
Build configuration defaults have been adapated for a better user experience. Installations are now relocatable by default, which can be reverted by configuring with --without-relocatable. Additionally, new sets of defaults named --release and --debug (renamed from --dev-mode) have been added. #695
Two minor modifications were done in the parsing framework: (i) the parsers for enums and records now allow trailing separators, and (ii) the dash (-) was removed from the allowed characters of schema type names. #706
VAST is switching to a calendar-based versioning scheme starting with this release. #739

Features

When a record field has the #index=hash attribute, VAST will choose an optimized index implementation. This new index type only supports (in)equality queries and is therefore intended to be used with opaque types, such as unique identifiers or random strings. #632 #726
Added Apache Arrow as new export format. This allows users to export query results as Apache Arrow record batches for processing the results downstream, e.g., in Python or Spark. #633
The import pcap command now takes an optional snapshot length via --snaplen. If the snapshot length is set to snaplen, and snaplen is less than the size of a packet that is captured, only the first snaplen bytes of that packet will be captured and provided as packet data. #642
An experimental new Python module enables querying VAST and processing results as pyarrow tables. #685
The long option --config, which sets an explicit path to the VAST configuration file, now also has the short option -c. #689
On FreeBSD, a VAST installation now includes an rc.d script that simpliefies spinning up a VAST node. CMake installs the script at PREFIX/etc/rc.d/vast. #693

Bug Fixes

In some cases it was possible that a source would connect to a node before it was fully initialized, resulting in a hanging vast import process. #647
PCAP ingestion failed for traces containing VLAN tags. VAST now strips IEEE 802.1Q headers instead of skipping VLAN-tagged packets. #650
Importing events over UDP with vast import <format> --listen :<port>/udp failed to register the accountant component. This caused an unexpected message warning to be printed on startup and resulted in losing import statistics. VAST now correctly registers the accountant. #655
The import process did not print statistics when importing events over UDP. Additionally, warnings about dropped UDP packets are no longer shown per packet, but rather periodically reported in a readable format. #662
A bug in the quoted string parser caused a parsing failure if an escape character occurred in the last position. #685
A race condition in the index logic was able to lead to incomplete or empty result sets for vast export. #703
The example configuration file contained an invalid section vast. This has been changed to the correct name system. #705

0.2 - 2019.10.30

Changes

The query language has been extended to support expression of the form X == /pattern/, where X is a compatible LHS extractor. Previously, patterns only supports the match operator ~. The two operators have the same semantics when one operand is a pattern.
CAF and Broker are no longer required to be installed prior to building VAST. These dependencies are now tracked as git submodules to ensure version compatibility. Specifying a custom build is still possible via the CMake variables CAF_ROOT_DIR and BROKER_ROOT_DIR.
When exporting data in pcap format, it is no longer necessary to manually restrict the query by adding the predicate #type == "pcap.packet" to the expression. This now happens automatically because only this type contains the raw packet data.
When defining schema attributes in key-value pair form, the value no longer requires double-quotes. For example, #foo=x is now the same as #foo="x". The form without double-quotes consumes the input until the next space and does not support escaping. In case an attribute value contains whitespace, double-quotes must be provided, e.g., #foo="x y z".
The PCAP packet type gained the additional field community_id that contains the Community ID flow hash. This identifier facilitates pivoting to a specific flow from data sources with connnection-level information, such Zeek or Suricata logs.
Log files generally have some notion of timestamp for recorded events. To make the query language more intuitive, the syntax for querying time points thus changed from #time to #timestamp. For example, #time > 2019-07-02+12:00:00 now reads #timestamp > 2019-07-02+12:00:00.
Default schema definitions for certain import formats changed from hard-coded to runtime-evaluated. The default location of the schema definition files is $(dirname vast-executable)/../share/vast/schema. Currently this is used for the Suricata JSON log reader.
The default directory name for persistent state changed from vast to vast.db. This makes it possible to run ./vast in the current directory without having to specify a different state directory on the command line.
Nested types are from now on accessed by the .-syntax. This means VAST now has a unified syntax to select nested types and fields. For example, what used to be zeek::http is now just zeek.http.
The (internal) option --node for the import and export commands has been renamed from -n to -N, to allow usage of -n for --max-events.
To make the export option to limit the number of events to be exported more idiomatic, it has been renamed from --events,e to --max-events,n. Now vast export -n 42 generates at most 42 events.

Features

The default schema for Suricata has been updated to support the new suricata.smtp event type in Suricata 5.
The export null command retrieves data, but never prints anything. Its main purpose is to make benchmarking VAST easier and faster.
The new pivot command retrieves data of a related type. It inspects each event in a query result to find an event of the requested type. If a common field exists in the schema definition of the requested type, VAST will dynamically create a new query to fetch the contextual data according to the type relationship. For example, if two records T and U share the same field x, and the user requests to pivot via T.x == 42, then VAST will fetch all data for U.x == 42. An example use case would be to pivot from a Zeek or Suricata log entry to the corresponding PCAP packets. VAST uses the field community_id to pivot between the logs and the packets. Pivoting is currently implemented for Suricata, Zeek (with community ID computation enabled), and PCAP.
The new infer command performs schema inference of input data. The command can deduce the input format and creates a schema definition that is sutable to use with the supplied data. Supported input types include Zeek TSV and JSONLD.
The newly added count comman allows counting hits for a query without exporting data.
Commands now support a --documentation option, which returns Markdown-formatted documentation text.
A new schema for Argus CSV output has been added. It parses the output of ra(1), which produces CSV output when invoked with -L 0 -c ,.
The schema language now supports comments. A double-slash (//) begins a comment. Comments last until the end of the line, i.e., until a newline character (\n).
The import command now supports CSV formatted data. The type for each column is automatically derived by matching the column names from the CSV header in the input with the available types from the schema definitions.
Configuring how much status information gets printed to STDERR previously required obscure config settings. From now on, users can simply use --verbosity=<level>,-v <level>, where <level> is one of quiet, error, warn, info, debug, or trace. However, debug and trace are only available for debug builds (otherwise they fall back to log level info).
The query expression language now supports data predicates, which are a shorthand for a type extractor in combination with an equality operator. For example, the data predicate 6.6.6.6 is the same as :addr == 6.6.6.6.
The index object in the output from vast status has a new field statistics for a high-level summary of the indexed data. Currently, there exists a nested layouts objects with per-layout statistics about the number of events indexed.
The accountant object in the output from vast status has a new field log-file that points to the filesystem path of the accountant log file.
Data extractors in the query language can now contain a type prefix. This enables an easier way to extract data from a specific type. For example, a query to look for Zeek conn log entries with responder IP address 1.2.3.4 had to be written with two terms, #type == zeek.conn && id.resp_h == 1.2.3.4, because the nested id record can occur in other types as well. Such queries can now written more tersely as zeek.conn.id.resp_h == 1.2.3.4.
VAST gained support for importing Suricata JSON logs. The import command has a new suricata format that can ingest EVE JSON output.
The data parser now supports count and integer values according to the International System for Units (SI). For example, 1k is equal to 1000 and 1Ki equal to 1024.
VAST can now ingest JSON data. The import command gained the json format, which allows for parsing line-delimited JSON (LDJSON) according to a user-selected type with --type. The --schema or --schema-file options can be used in conjunction to supply custom types. The JSON objects in the input must match the selected type, that is, the keys of the JSON object must be equal to the record field names and the object values must be convertible to the record field types.
For symmetry to the export command, the import command gained the --max-events,n option to limit the number of events that will be imported.
The import command gained the --listen,l option to receive input from the network. Currently only UDP is supported. Previously, one had to use a clever netcat pipe with enough receive buffer to achieve the same effect, e.g., nc -I 1500 -p 4200 | vast import pcap. Now this pipe degenerates to vast import pcap -l.
The new --disable-accounting option shuts off periodic gathering of system telemetry in the accountant actor. This also disables output in the accounting.log.

Bug Fixes

The user environments LDFLAGS were erroneously passed to ar. Instead, the user environments ARFLAGS are now used.
Exporting data with export -n <count> crashed when count was a multiple of the table slice size. The command now works as expected.
Queries of the form #type ~ /pattern/ used to be rejected erroneously. The validation code has been corrected and such queries are now working as expected.
When specifying enum types in the schema, ingestion failed because there did not exist an implementation for such types. It is now possible to use define enumerations in schema as expected and query them as strings.
Queries with the less < or greater > operators produced off-by-one results for the duration when the query contained a finer resolution than the index. The operator now works as expected.
Timestamps were always printed in millisecond resolution, which lead to loss of precision when the internal representation had a higher resolution. Timestamps are now rendered up to nanosecond resolution - the maximum resolution supported.
All query expressions in the form #type != X were falsely evaluated as #type == X and consequently produced wrong results. These expressions now behave as expected.
Parsers for reading log input that relied on recursive rules leaked memory by creating cycling references. All recursive parsers have been updated to break such cycles and thus no longer leak memory.
The Zeek reader failed upon encountering logs with a double column, as it occurs in capture_loss.log. The Zeek parser generator has been fixed to handle such types correctly.
Some queries returned duplicate events because the archive did not filter the result set properly. This no longer occurs after fixing the table slice filtering logic.
The map data parser did not parse negative values correctly. It was not possible to parse strings of the form "{-42 -> T}" because the parser attempted to parse the token for the empty map "{-}" instead.
The CSV printer of the export command used to insert 2 superfluous fields when formatting an event: The internal event ID and a deprecated internal timestamp value. Both fields have been removed from the output, bringing it into line with the other output formats.
When a node terminates during an import, the client process remained unaffected and kept processing input. Now the client terminates when a remote node terminates.
Evaluation of predicates with negations return incorrect results. For example, the expression :addr !in 10.0.0.0/8 created a disjunction of all fields to which :addr resolved, without properly applying De-Morgan. The same bug also existed for key extractors. De-Morgan is now applied properly for the operations !in and !~.

0.1 - 2019.02.28

This is the first official release.

Files

CHANGELOG.md

Latest commit

History

CHANGELOG.md

File metadata and controls

Changelog

Changes

Features

Bug Fixes

Changes

Features

Bug Fixes

Changes

Features

Bug Fixes

Breaking Changes

Changes

Features

Bug Fixes

Bug Fixes

Bug Fixes

Changes

Features

Bug Fixes

Changes

Features

Bug Fixes

Changes

Features

Bug Fixes

Changes

Features

Bug Fixes

Changes

Features

Bug Fixes

Changes

Features

Bug Fixes

Changes

Features

Bug Fixes

Breaking Changes

Changes

Features

Bug Fixes

Breaking Changes

Changes

Features

Bug Fixes

Breaking Changes

Changes

Features

Bug Fixes

Breaking Changes

Changes

Features

Bug Fixes

Breaking Changes

Changes

Experimental Features

Features

Bug Fixes

Breaking Changes

Changes

Experimental Features

Features

Bug Fixes

Breaking Changes

Changes

Experimental Features

Features

Bug Fixes

Changes

Experimental Features

Features

Bug Fixes

Breaking Changes

Changes