Note: NFAI is in a very early stage of development. Expect rapid changes, missing features, and breaking changes.
NFAI is a native .NET inference engine for GGUF models, focused on efficient inference for Llama 3.2 models in FP16 and FP32 precision. It leverages Vulkan compute shaders for high performance on modern GPUs.
- Native .NET: Written in C#, no Python dependencies.
- GGUF Support: Loads and runs GGUF models (currently Llama 3.2 only, in theory should work with any Llama 3.X).
- Vulkan Backend: Uses Vulkan compute shaders for fast inference.
- Precision: Supports both FP16 and FP32 weights.
- Cross-vendor GPU: Tested on Windows with both AMD and NVIDIA GPUs.
- .NET 9.0+
- glslangValidator (must be in your PATH, I personally use the lunarg vulkan sdk for this)
- Vulkan-compatible GPU and drivers (AMD or NVIDIA)
- Windows OS (tested)
-
Install dependencies
- Ensure Vulkan drivers are installed for your GPU.
- Download and install
glslangValidatorand add it to your system PATH.
-
Prepare a GGUF model
- Obtain a Llama 3.2 model in GGUF format (FP16 or FP32).
-
Set the GGUF path
- Set the
GGUF_PATHenvironment variable to point to your model file.
- Set the
-
Run the program
dotnet run --project NFAIYou will be prompted for input in the console.
Below is a minimal example from Program.cs:
// parse the GGUF file
using Microsoft.Extensions.AI;
using NFAI.GGUF;
var path = Environment.GetEnvironmentVariable("GGUF_PATH") ?? throw new ArgumentNullException("GGUF_PATH", "GGUF_PATH environment variable is not set.");
var parser = new Parser();
var model = parser.Parse(path);
Console.WriteLine("Enter 'quit' to quit: ");
var input = string.Empty;
while (input != "quit")
{
Console.Write("User: ");
input = Console.ReadLine() ?? string.Empty;
if (input == "quit") break;
Console.Write("Assistant: ");
await foreach(var part in model.GetStreamingResponseAsync(input, default))
{
Console.Write(part);
}
Console.WriteLine();
}- Only Llama 3.2 GGUF models are supported at this time.
- Model quantization formats other than FP16/FP32 are not yet supported.
- Performance and compatibility may vary depending on your GPU and drivers.
- Add support for more GGUF model types and quantizations.
- Improve tokenizer compatibility.
- Add Linux support.
This project is licensed under the Mozilla Public License 2.0 (MPL-2.0). See the LICENSE file for details.
Contributions and issues welcome!