8. Hardware guide

If you are building or upgrading a system for Kilosort2, the only important components are the GPU and the SSD. However, if you plan to use this computer for other computational tasks, it might be worth investing a little more in the CPU. See the second part below for such more general recommendations.

We start with a list of recommendations for a standard Neuropixels recording (384 channels, <3 hours), and then we continue with a list of modifications depending on how your recordings differ from this standard one.

Recommendations for standard Neuropixels recordings

GPU: only Nvidia GPUs are supported. Of the currently available GTX or RTX gpus, anything starting at GTX 1660 should work fine, but the speed will typically be proportional to the cost, except for the 2080Ti, which is not 2x faster than 2080 despite being 2x more expensive. The sweet spot is probably the 2070. Do not consider the Titan RTX or Tesla GPUs: they are very expensive for no performance gains.
SSD: check for read/write speed. Any good SATA SSD is around 500MB/s, and PCI-based SSD are usually a lot better. You shouldn't need anything faster than 500MB/s, but you might want to have generous capacity for the purpose of batch processing of many datasets. A typical workflow is to copy datasets you want processed to the SSD, then run Kilosort2 on them (which makes an additional temporary copy of the data on this SSD), then run Phy on the local copy of the data with the Kilosort2 results. Once you are happy with the sorting, you would typically free up the local copy of the data to make space for the next round of spike sorting.
rest of the system: 16GB of RAM are recommended and a 4-core CPU. There is little performance to be gained by increasing these. However, if you want to use the same system for other computational tasks, it would be a good idea to increase to a 6- or 8-core CPU, with 32 GB of RAM (see below for more details). A 650W power supply will be sufficient even for this slightly beefed up system with an RTX 2080Ti.

Modifications depending on your recording type

Same number of channels, but longer recordings, like 6h or more. Right now, this situation requires more RAM, like 32 or 64 GB. Kilosort2 splits the data into small batches (default = 2s) and sorts each separately. However, a few steps (drift sorting, final extraction) require keeping some information from each batch in RAM, before it is used or saved at the end of the run through the entire data. This situation is also likely to arise with fewer channels, and correspondingly longer recordings. We will aim to remove this limitation in future development.
More channels. Currently Kilosort2 is limited to 1024 channels and will crash with more. This is due to low-level shared memory management on the GPU. You should not need more computer resources for 1024 channels, but we have not tested this thoroughly. It should not be too difficult to remove the channel count limitation in future versions.
Very few channels. Currently Kilosort2 will fail for 1 channel, and there are reports of failures for 4 channels. We have been able to run it in the latter case without problems. You will still need a GPU for this, and we do not plan to support a CPU version in the future.

Let us know if you have other use cases, and we can try to add them to the list.

Recommendations for general numerical computation tasks

There are three main factors (we know of) when choosing a good CPU for numerical computing. The considerations below are likely to stay true for future Intel CPUs, but may change for AMD CPUs, which have been evolving rapidly with the new lines of Ryzen/Threadripper/Epyc processors.

The Intel Math Kernel Library (MKL). These are optimized computational routines which run almost everything under the hood, regardless of whether you are using Matlab, Python, Julia etc, if you have an Intel CPU. If you don't, you might still be running the Intel MKL, because it also tends to be the best numerical library for AMD processors. However, it just doesn't perform as well as it could for obvious reasons...
The AVX-512 instruction set. These are hardware features that make vectorized computations much faster, which are exclusive to Intel CPUs (for now, AMD still uses AVX2). They are known as SIMD instructions (single instruction multiple data), which is the same principle at work in GPU processors. This makes even the Threadripper 32-core (AMD), slower at numerical compute than an Intel CPU with 14 cores (see here). The same applies for the server version of the AMD processors (Epyc). AVX512 is not found on Intel "consumer" CPUs, like 9900K.
Memory bandwidth: dual vs quad channel. All "consumer" CPUs from Intel and AMD use dual channel RAM. Quad-channel memory almost doubles memory bandwidth, which tends to be a limiting factor for many numerical applications. Almost all servers have quad (or even 6-channel) RAM, but these tend to be more expensive, both for the motherboards and the CPUs themselves. In addition, Ryzen/Threadripper/Epyc series CPUs from AMD have a tricky compromise, where not all CPUs have direct access to the RAM, which makes their effective bandwidth limited by the communication bandwidth between CPUs, which tends to be low because these are made from different dies.

Taking all three factors in consideration, it would seem the only CPUs which have all three features are Intel server CPUs. However, there is one other CPU line that has them all, referred to as the "Intel enthusiast consumer" line (usually appended with -X, like Skylake-X). These have: AVX512 instructions, quad channel memory and good support for Intel MKL. The cheapest processor in these series typically is about ~50-100% faster in standard numerical compute benchmarks than a similarly priced, top of the line processor from their main "consumer" series. See here for example. What is not evident in those benchmarks is performance on unoptimized code, which most of us are likely to write, especially in Python, Matlab or Julia. That's where relying on MKL libraries makes all the difference, as you can see here.

For all these reasons, my current recommendation is an Intel 9800X, which has a list price of $589. Get the cheapest X299 motherboard to go with that, and make sure your RAM has 4 memory sticks, even if the total RAM is low, otherwise quad channel RAM won't be activated. This configuration will incur an extra ~$200 compared to a configuration using the top of the line consumer Intel CPU (9900K), and much cheaper than a server configuration. As always, make sure you have a dedicated SSD for the operating system, and another SSD for buffering big datasets (i.e. what Kilosort does, but can also apply more generally). The cheapest X299 motherboards will only have one on-board PCIe SSD slot, but SATA SSDs are just as good for Kilosort(2). CPU fans will typically be rated for CPU power: get the cheapest one rated for ~165mW, should be in the $30-40 range. Go with a utilitarian case (i.e. lots of built-in fans), in the $50 range, or get a showy $100 case. A (semi-) modular PSU rated for ~650mW is sufficient if it's from a good named brand (i.e. Corsair, Rosewill, EVGA etc), for ~$70. If you want large capacity HDD, remember that they can fail, and get two smaller ones rather than one big one.

Finally, try Linux! You might get extra performance, but that depends heavily on your programming patterns (i.e. language, dataset size etc). Windows is a safe bet too, but Mac often requires modifications/special libraries for a lot of packages, including Kilosort2.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

8. Hardware guide

Recommendations for standard Neuropixels recordings

Modifications depending on your recording type

Recommendations for general numerical computation tasks

Clone this wiki locally