This is a cross-platform .NET Standard wrapper for Facebook's FastText library. The wrapper comes with bundled precompiled native binaries for all three platforms: Windows, Linux and MacOs.
Just add it to your project and start using it! No additional setup required. This library will unpack and call appropriate native binary depending on target platform.
Of course not! It's just complete :) There are no major updates for fastText, and most bugs in this repository are fixed. All features should work and if something doesn't — just ping me with an issue and I will try to get back to you.
Library API closely follows fastText command-line interface, so you can jump right in.
The simplest use case is to train a supervised model with default parameters. We create a FastTextWrapper
and call Supervised()
.
var fastText = new FastTextWrapper();
fastText.Supervised("cooking.train.txt", "cooking");
Note the arguments:
- We specify an input file with one labeled example per line. Here we use Stack Overflow cooking dataset from Facebook:
https://dl.fbaipublicfiles.com/fasttext/data/cooking.stackexchange.tar.gz. You can find extracted files split into training
and validation sets in
UnitTests
directory in this repository. - Your model will be saved to
cooking.bin
andcooking.vec
with pretrained vectors will be placed if the same directory. - Here we use
Supervised()
overload with 2 arguments. This means that training will be done with default parameters. It's a good starting point and is the same as calling fastText this way:
./fasttext supervised -input cooking.train.txt -output cooking
Call LoadModel()
and specify path to the .bin
model file:
var fastText = new FastTextWrapper();
fastText.LoadModel("model.bin");
To use pretrained vectors for your supervised model, create an instance of SupervisedArgs
and customize it:
❗ Important ❗ It doesn't say this anywhere in the original documentation, but you must use preterained vectors in text format
(.vec
file extension), and not in binary format. If you try to use binary vectors, you will get an error about your vectors having
the dimension 0.
var fastText = new FastTextWrapper();
var args = new SupervisedArgs
{
PretrainedVectors = "cooking.unsup.300.vec",
dim = 300
};
fastText.Supervised("cooking.train.txt", "cooking", args);
Here we get default training arguments, supply a path to pretrained vectors file and adjust vector dimension accordingly.
❗ Important ❗ Be sure to always check the dimension of your pretrained vectors! Many vectors on the internet have dimension 300
,
but default dimension for fastText supervised model training is 100
.
Now you can easily test a supervised model against a validation set. You can specify different values for k
and threshlod
as well.
var result = fastText.Test("cooking.valid.txt");
You will get an instance of TestResult
where you can find aggregated or per-label metrics:
Console.WriteLine($"Results:\n\tPrecision: {result.GlobalMetrics.GetPrecision()}" +
$"\n\tRecall: {result.GlobalMetrics.GetRecall()}" +
$"\n\tF1: {result.GlobalMetrics.GetF1()}");
You can even get a precision-recall curve (aggregated or per-label)! Here is an example of exporting an SVG plot with cross-platform OxyPlot library:
var result = fastText.Test("cooking.valid.txt");
var curve = result.GetPrecisionRecallCurve();
var series = new LineSeries {StrokeThickness = 1};
series.Points.AddRange(curve.Select(x => new DataPoint(x.recall, x.precision)).OrderBy(x => x.X));
var plotModel = new PlotModel
{
Series = { series },
Axes =
{
new LinearAxis {Position = AxisPosition.Bottom, Title = "Recall"},
new LinearAxis {Position = AxisPosition.Left, Title = "Precision"}
}
};
using (var stream = new FileStream("precision-recall.svg", FileMode.Create, FileAccess.Write))
{
SvgExporter.Export(plotModel, stream, 600, 600, false);
}
You can train a new supervised model and quantize it immediatly by replacing SupervisedArgs
with QuantizedSupervisedArgs
:
var fastText = new FastTextWrapper();
fastText.Supervised("cooking.train.txt", "cooking", new QuantizedSupervisedArgs());
You can also load an existing model and quantize it:
var fastText = new FastTextWrapper();
fastText.LoadModel("model.bin");
fastText.Quantize();
Use Unsupervised()
method specifying model type: Skipgram or Cbow:
var fastText = new FastTextWrapper();
fastText.Unsupervised(UnsupervisedModel.SkipGram, "cooking.train.nolabels.txt", "cooking");
You can use an optional UnsupervisedArgs
argument to customize training.
You can use fastText autotune to do an automatic hyperparameter search.
Refer to https://github.com/facebookresearch/fastText/blob/master/docs/autotune.md for complete parameter reference.
Use AutotuneArgs
to control tuning:
var fastText = new FastTextWrapper();
var autotuneArgs = new AutotuneArgs
{
Duration = 30, // in seconds
Metric = "precisionAtRecall:30", // supports custom metrics
Predictions = 2, // Supports @k predictions
ModelSize = "10M", // Set this to train a quantized model and do an
// additional quantization hyperparameter search. Requires QuantizedSupervisedArgs.
ValidationFile = "cooking.valid.txt" // REQUIRED: path to a validation file
};
fastText.Supervised("cooking.train.txt", "cooking", new QuantizedSupervisedArgs(), autotuneArgs);
You can get progress callbacks from the native library. To do so, add a handler to (Un)SupervisedArgs.TrainProgressCallback
for
simple training, or to AutotuneArgs.AutotuneProgressCallback
for hyperparameter tuning.
See ConsoleTest
project for an example of using training callbacks with ShellProgressBar
library:
using (var pBar = new ProgressBar(100, "Training"))
{
var ftArgs = new SupervisedArgs
{
// ... Other args
verbose = 0,
TrainProgressCallback = (progress, loss, wst, lr, eta) =>
{
pBar.Tick((int)Math.Ceiling(progress * 100), $"Loss: {loss}, words/thread/sec: {wst}, LR: {lr}, ETA: {eta}");
}
};
fastText.Supervised("cooking.train.txt", outPath, ftArgs);
}
Native FastText library reports training progress to stderr
by default. You can turn off this output by setting
(Un)SupervisedArgs.verbose = 0
for simple training and AutotuneArgs.Verbose = 0
for hyperparameter tuning.
FastTextWrapper
can produce a small amount of logs mostly concerning native library management. You can turn logging on by providing an
instance of Microsoft.Extensions.Logging.ILoggerFactory
. In this example we use Serilog with console sink.
You can also inject your standard IloggerFactory
through .NET Core DI.
// Add the following Nuget packages to your project:
// * Serilog.Sinks.Console
// * Serilog.Extensions.Logging
Log.Logger = new LoggerConfiguration()
.MinimumLevel.Debug()
.WriteTo.Console(theme: ConsoleTheme.None)
.CreateLogger();
var fastText = new FastTextWrapper(loggerFactory: new SerilogLoggerFactory());
In version 1.1
I've added much better native error handling. Now in case of most native errors you will get a nice
NativeLibraryException
which you can inspect for detailed error description.
Since this wrapper uses native C++ binaries under the hood, you will need to have Visual C++ Runtime Version 140 installed when running under Windows. Visit the MS Downloads page (https://support.microsoft.com/en-us/help/2977003/the-latest-supported-visual-c-downloads) and select the appropriate redistributable.
If you are interested in using FastText with C-style API, here is my fork of the official library: https://github.com/olegtarasov/fastText.
- Updated fastText binaries with latest improvements from the Facebook repo.
- Native libraries are now explicitly included in target project and copied to output directory. Hopefully, this solves a couple of problems with the previous approach of dynamically extracting libraries from resources.
- Fixed progress callbacks for unsupervised model training.
- Added progress callbacks for model training and autotuning.
- Added supervised model quantization with
Quantize
method. - Stable version released! 🎉
- Merged #20 with new
GetWordVector
method.
- Added model autotuning with quantization support.
- Fixed a horrible bug with
bool
marshalling.
Version 1.2.0 introduces a few breaking changes to library API. If you are not ready to migrate, use v. 1.1.2
.
- ❗️Breaking change:️ Removed both deprecated
Train()
methods. - ❗️Breaking change:️ Removed deprecated
SupervisedArgs
class. - ❗️Breaking change:️ Removed
FastTextArgs.SupervisedDefaults()
in favor of newSupervisedArgs
with default constructor. - ❗️Breaking change:️
FastTextArgs
class can't be constructed directly, use newSupervisedArgs
andUnsupervisedArgs
classes. - Added an
Unsupervised()
method to train Skipgram or Cbow models.
- Fixed a horrible bug with
bool
marshalling on a1.1.*
branch.
- Added new
Supervised()
method as part of streamlining the API. - Added new
Test()
method for testing supervised model. - Deprecated both
Train()
methods. They will be removed in v.1.2.0
.
- Fixed a horrible bug with
bool
marshalling on a1.0.*
branch.
- Instead of old
Train()
methods useSupervised()
andUnsupervised()
methods. - Instead of
FastTextArgs.SupervisedDefaults()
useSupervisedArgs
orSupervised()
overload with 2 arguments.