Skip to content

Commit

Permalink
Merge pull request #1 from Lombiq/issue/HAST-155
Browse files Browse the repository at this point in the history
HAST-155: Vitis communication library
  • Loading branch information
Piedone authored Sep 10, 2020
2 parents 7a9851d + c3aedb1 commit 31e5bd1
Show file tree
Hide file tree
Showing 135 changed files with 3,029 additions and 1,090 deletions.
13 changes: 13 additions & 0 deletions Docs/Benchmarks.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ Here are some basic performance benchmarks on how Hastlayer-accelerated code com

## Notes on the hardware used

- "Vitis": [Xilinx Vitis Unified Software Platform](https://www.xilinx.com/products/design-tools/vitis/vitis-platform.html) cards were used (eg. [Alveo U280 Data Center Accelerator Card](https://www.xilinx.com/products/boards-and-kits/alveo/u280.html)).
- "Catapult": [Microsoft Project Catapult](https://www.microsoft.com/en-us/research/project/project-catapult/) servers used via the [Project Catapult Academic Program](https://www.microsoft.com/en-us/research/academic-program/project-catapult-academic-program/). These contain the following hardware:
- FPGA: Mt Granite card with an Altera Stratix V 5SGSMD5H2F35 FPGA and two channels of 4 GB DDR3 RAM, connected to the host via PCIe Gen3 x8. Main clock is 150 Mhz, power consumption is at most 29 W (source: "[A Cloud-Scale Acceleration Architecture](https://www.microsoft.com/en-us/research/wp-content/uploads/2016/10/Cloud-Scale-Acceleration-Architecture.pdf)").
- Host PC: 2 x Intel Xeon E5-2450 CPUs with 16 physical, 32 logical cores each, with a base clock of 2.1 GHz. Power consumption is around 95 W under load (based on [the processor's TDP](https://ark.intel.com/content/www/us/en/ark/products/64611/intel-xeon-processor-e5-2450-20m-cache-2-10-ghz-8-00-gt-s-intel-qpi.html); this is just a rough number and power draw is likely larger when the CPU increases its clock speed under load)
Expand All @@ -27,6 +28,18 @@ Here you can find some measurements of execution times of various algorithms on
- FPGA resource utilization figures are based on the "main" resource's utilization with all other resource types assumed to be below 100%. For Xilinx FPGAs the main resource type is LUT, for Intel (Altera) ones ALM.
- For FPGA measurements "total" means the total execution time, including the communication latency of the FPGA; since this varies because of the host PC's load the lowest achieved number is used. "Net" means just the execution of the algorithm itself on the FPGA, not including the time it took to send data to and receive from the device; FPGA execution time is deterministic and doesn't vary significantly. With faster communication channels "total" can be closer to "net". If the input and output data is small then the two measurements will practically be the same.

### Vitis

Comparing the performance of a Vitis platform FPGA (Xilinx Alveo U280) to the host PC's performance on a [Nimbix](https://www.nimbix.net/alveo) "Xilinx Vitis Unified Software Platform 2019.2" instance. Only a single CPU is assumed to be running under 100% load for the power usage figures for the sake of simplicity.


| Algorithm | Speed advantage | Power advantage | Parallelism | CPU | CPU power | FPGA utilization | Net FPGA | Total FPGA | FPGA power |
|:----------------------|:---------------:|:---------------:|:--------------:|:------:|:---------:|:----------------:|:--------:|:----------:|:----------:|
| ImageContrastModifier | 568% | ???% | 25 | 568 ms | ?? Ws | ??% | 28 ms | 85 ms | ?? Ws |
| ImageContrastModifier | 620% | ???% | 150<sup>1</sup>| 568 ms | ?? Ws | ??% | 24 ms | 79 ms | ?? Ws |

<sup>1</sup>More could fit actually, needs more testing.

### Catapult

Comparing the performance of the Catapult FPGA (i.e. the Mt Granite card) to the Catapult node's host PC's performance. Only a single CPU is assumed to be running under 100% load for the power usage figures for the sake of simplicity.
Expand Down
4 changes: 2 additions & 2 deletions Docs/GettingStarted.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ To begin working with Hastlayer you'll need the following:
- For simpler workloads and testing: The [Nexys A7 (formerly known as Nexys 4 DDR)](https://store.digilentinc.com/nexys-a7-fpga-trainer-board-recommended-for-ece-curriculum/) board (which is **NOT** the same as the non-DDR Nexys 4, be sure to purchase the linked board!) is suitable, so you'll need to purchase one. The **Nexys A7-100T** version is required. Note that this is a relatively low-end development board that can't fit huge algorithms and it only supports slow communication channels. So with this board Hastlayer is only suitable for simpler algorithms that only need to exchange small amount of data.
- For academic workloads: Microsoft's FPGA platform, [Project Catapult](https://www.microsoft.com/en-us/research/project/project-catapult/) is supported too, which offers high-end hardware. You'll need to apply for a cloud Catapult node via the [Project Catapult Academic Program](https://www.microsoft.com/en-us/research/academic-program/project-catapult-academic-program/). Be sure to [let us know](https://hastlayer.com/contact) if you'd like to use Catapult and we'll help you get going.
- For production-level commercial workloads:
- Using [Xilinx Alveo U200, U250 or U280 Data Center Accelerator Cards](https://www.xilinx.com/products/boards-and-kits/alveo.html) on-premise or in the cloud. In the cloud these cards are currently available at [Nimbix](https://www.nimbix.net/).
- Using [Xilinx Alveo U50, U200, U250 or U280 Data Center Accelerator Cards](https://www.xilinx.com/products/boards-and-kits/alveo.html) on-premise or in the cloud. In the cloud these cards are currently available at [Nimbix](https://www.nimbix.net/).
- Using [AWS EC2 F1 instances](https://aws.amazon.com/ec2/instance-types/f1/).
- [Visual Studio 2019 or later](https://www.visualstudio.com/downloads/) installed (any edition will work).
- On Linux if you are going to use System.Drawing (eg. the ImageProcessingAlgorithms sample) you need to install the [Mono project's](https://www.mono-project.com/) implementation of [libgdiplus](https://github.com/mono/libgdiplus). On CentOS you need the "libgdiplus" package, while on Debian systems such as Ubuntu you need "libgdiplus" and "libc6-dev" too.
Expand Down Expand Up @@ -45,4 +45,4 @@ These would be your first steps on starting to work with Hastlayer by getting th
5. Start the sample project. That will by default run the sample that is also added by default to the Hardware project.
6. You should be able to see the results of the sample in its console window.

If everything is alright follow up with the rest of this documentation to write your first own Hastlayer-using algorithm. You can also check out the many documented samples under the *Samples* solution folder.
If everything is alright follow up with the rest of this documentation to write your first own Hastlayer-using algorithm. You can also check out the many documented samples under the *Samples* solution folder.
4 changes: 2 additions & 2 deletions Docs/ReleaseNotes.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ Note that the Hardware Framework projects have their own release cycle and relea

## vNext

- Added support for the high-end [Xilinx Alveo U200, U250 or U280 Data Center Accelerator Cards](https://www.xilinx.com/products/boards-and-kits/alveo.html). These are suitable hardware for any kind of demanding production-level workload. Apart from using such devices on-premise they're also available in the cloud.
- Added support for the high-end [Xilinx Alveo U50, U200, U250 or U280 Data Center Accelerator Cards](https://www.xilinx.com/products/boards-and-kits/alveo.html). These are suitable hardware for any kind of demanding production-level workload. Apart from using such devices on-premise they're also available in the cloud.
- Added support for the high-end [AWS EC2 F1 FPGA cloud instances](https://aws.amazon.com/ec2/instance-types/f1/).
- Migrated the code generating Transformer to the latest version of [ILSpy](https://github.com/icsharpcode/ILSpy), see [the issue](https://github.com/Lombiq/Hastlayer-SDK/issues/20). Hastlayer uses the .NET decompilation tool ILSpy in the background to process .NET assemblies. Previously Hastlayer depended on an old version of ILSpy, 2.3.1, which came out in 2015. Now ILSpy is at version 6 so you can imagine all the changes such a jump brings! This Hastlayer release is updated to the most recent ILSpy binaries, bringing better support for .NET language features. Shoutout to the ILSpy developers for their awesome work!
- Migrated to .NET Core ([see issue](https://github.com/Lombiq/Hastlayer-SDK/issues/34)). Now not only can Hastlayer process .NET Standard assemblies as before but now the whole projects is built on .NET Core/.NET Standard. While the apps target .NET Core 3.1 directly all the libraries are .NET Standard so Hastlayer continues to support .NET Framework projects.
Expand Down Expand Up @@ -203,4 +203,4 @@ For all publicly tracked issues resolved with this release [see the correspondin
- Documented samples.
- Runs on [Orchard](http://orchardproject.net) with [Orchard Application Host](https://github.com/Lombiq/Orchard-Application-Host).

Before this: Various proof of concept and experimental versions.
Before this: Various proof of concept and experimental versions.
Original file line number Diff line number Diff line change
Expand Up @@ -87,7 +87,7 @@ public override async Task<IHardwareExecutionInformation> Execute(
// The illegal endpoint number messages are normal for higher endpoints if they aren't
// populated, so it's OK to suppress them.
if (!(i > 0 && ex.Status == Status.IllegalEndpointNumber))
_logger.LogError(ex, $"Received {ex.Status} while trying to instantiate CatapultLibrary on EndPoint {i}. This device won't be used.");
Logger.LogError(ex, $"Received {ex.Status} while trying to instantiate CatapultLibrary on EndPoint {i}. This device won't be used.");
return null;
}
})));
Expand Down Expand Up @@ -120,7 +120,7 @@ public override async Task<IHardwareExecutionInformation> Execute(

if (outputPayloadByteCount > SimpleMemory.MemoryCellSizeBytes) outputBuffer = HotfixOutput(outputBuffer);
dma.Set(outputBuffer, OutputHeaderSizes.Total / SimpleMemory.MemoryCellSizeBytes);
_logger.LogInformation("Incoming data size in bytes: {0}", outputPayloadByteCount);
Logger.LogInformation("Incoming data size in bytes: {0}", outputPayloadByteCount);

EndExecution(context);

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -18,5 +18,7 @@ public class CatapultManifestProvider : IDeviceManifestProvider
AvailableMemoryBytes = 8_000_000_000UL / 16,
ToolChainName = CommonToolChainNames.QuartusPrime
};

public void ConfigureMemory(MemoryConfiguration memory, IHardwareGenerationConfiguration hardwareGeneration) { }
}
}
Original file line number Diff line number Diff line change
@@ -1,18 +1,18 @@
<Project Sdk="Microsoft.NET.Sdk">
<PropertyGroup>
<TargetFramework>netstandard2.0</TargetFramework>
<GenerateAssemblyInfo>false</GenerateAssemblyInfo>
</PropertyGroup>
<ItemGroup>
<Compile Include="..\..\SharedAssemblyInfo.cs">
<Link>Properties\SharedAssemblyInfo.cs</Link>
</Compile>
</ItemGroup>
<ItemGroup>
<ProjectReference Include="..\..\Hast.Common\Hast.Common.csproj" />
</ItemGroup>
<ItemGroup>
<PackageReference Include="Microsoft.CSharp" Version="4.7.0" />
<PackageReference Include="System.Data.DataSetExtensions" Version="4.5.0" />
</ItemGroup>
</Project>
<Project Sdk="Microsoft.NET.Sdk">
<PropertyGroup>
<TargetFramework>netstandard2.0</TargetFramework>
<GenerateAssemblyInfo>false</GenerateAssemblyInfo>
</PropertyGroup>
<ItemGroup>
<Compile Include="..\..\SharedAssemblyInfo.cs">
<Link>Properties\SharedAssemblyInfo.cs</Link>
</Compile>
</ItemGroup>
<ItemGroup>
<ProjectReference Include="..\..\Hast.Common\Hast.Common.csproj" />
</ItemGroup>
<ItemGroup>
<PackageReference Include="Microsoft.CSharp" Version="4.7.0" />
<PackageReference Include="System.Data.DataSetExtensions" Version="4.5.0" />
</ItemGroup>
</Project>
Original file line number Diff line number Diff line change
@@ -1,10 +1,13 @@
using Hast.Common.Interfaces;
using Hast.Layer;
using Microsoft.Extensions.Configuration;

namespace Hast.Synthesis.Abstractions
{
public interface IDeviceManifestProvider : ISingletonDependency
{
IDeviceManifest DeviceManifest { get; }

void ConfigureMemory(MemoryConfiguration memory, IHardwareGenerationConfiguration hardwareGeneration);
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
namespace Hast.Synthesis.Abstractions
{
public interface IMemoryConfiguration
{
/// <summary>
/// The alignment value. If set to greater than 0, the starting address of the content is aligned to be a
/// multiple of that number. It must be an integer and power of 2. It can only be set before any instances
/// are created.
/// </summary>
int Alignment { get; }


/// <summary>
/// The minimum cell count to be reserved in front of the payload. It's required for device-specific headers.
/// </summary>
int MinimumPrefix { get; }
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
using System;
using System.Collections.Generic;
using System.Linq;
using Hast.Layer;

namespace Hast.Synthesis.Abstractions
{
public class MemoryConfiguration : IMemoryConfiguration
{
private int _alignment = 0;

public int Alignment
{
get => _alignment;
set
{
if (value < 0 || (value & (value - 1)) != 0)
{
throw new InvalidOperationException("The alignment value must be a power of 2.");
}

_alignment = value;
}
}

public int MinimumPrefix { get; set; }


private MemoryConfiguration() { }


public static IMemoryConfiguration Create(
IHardwareGenerationConfiguration hardwareGenerationConfiguration,
IEnumerable<IDeviceManifestProvider> deviceManifestProviders)
{
var memoryConfiguration = new MemoryConfiguration();
var deviceManifestProvider = deviceManifestProviders.First(manifestProvider =>
manifestProvider.DeviceManifest.Name == hardwareGenerationConfiguration.DeviceName);
deviceManifestProvider.ConfigureMemory(memoryConfiguration, hardwareGenerationConfiguration);
return memoryConfiguration;
}

}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
using System.Runtime.InteropServices;

namespace System
{
public static class SimpleMemoryExtensions
{
public static void SetIntegers(this Span<byte> buffer, int startIndex, params int[] values)
{
for (int i = 0, index = startIndex; i < values.Length; i++, index += sizeof(int))
{
var slide = buffer.Slice(index, sizeof(int));
MemoryMarshal.Write(slide, ref values[i]);
}
}
}
}
Original file line number Diff line number Diff line change
@@ -1,15 +1,17 @@
<Project Sdk="Microsoft.NET.Sdk">

<PropertyGroup>
<TargetFramework>netstandard2.0</TargetFramework>
<GenerateAssemblyTitleAttribute>false</GenerateAssemblyTitleAttribute>
</PropertyGroup>

<ItemGroup>
<ProjectReference Include="..\..\Hast.Common\Hast.Common.csproj" />
</ItemGroup>
<ItemGroup>
<PackageReference Include="Microsoft.CSharp" Version="4.7.0" />
<PackageReference Include="System.Data.DataSetExtensions" Version="4.5.0" />
</ItemGroup>
</Project>
<Project Sdk="Microsoft.NET.Sdk">

<PropertyGroup>
<TargetFramework>netstandard2.0</TargetFramework>
<GenerateAssemblyTitleAttribute>false</GenerateAssemblyTitleAttribute>
<AllowUnsafeBlocks>true</AllowUnsafeBlocks>
</PropertyGroup>

<ItemGroup>
<ProjectReference Include="..\..\Hast.Common\Hast.Common.csproj" />
<ProjectReference Include="..\Hast.Synthesis.Abstractions\Hast.Synthesis.Abstractions.csproj" />
</ItemGroup>
<ItemGroup>
<PackageReference Include="Microsoft.CSharp" Version="4.7.0" />
<PackageReference Include="System.Data.DataSetExtensions" Version="4.5.0" />
</ItemGroup>
</Project>
Loading

0 comments on commit 31e5bd1

Please sign in to comment.