Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
55 commits
Select commit Hold shift + click to select a range
c045bf3
support audio streaming-csharp
Mar 5, 2026
9a1578c
support audio streaming-js
Mar 5, 2026
3970936
delete dll mock test
Mar 5, 2026
ef2e9e0
update core api
Mar 5, 2026
535b735
update sdk
Mar 11, 2026
f5bd916
update the api
Mar 13, 2026
6d067e0
rename LiveAudioTranscription
Mar 13, 2026
eb6598d
Merge branch 'main' into ruiren/audio-streaming-support-sdk
rui-ren Mar 13, 2026
6dee740
fix: add missing using directives for EnumeratorCancellation and Channel
Mar 13, 2026
b89e1bd
update test
Mar 13, 2026
4cf6cb4
update js package
Mar 13, 2026
eb9f282
e2e test
Mar 18, 2026
5e98119
update for test
Mar 18, 2026
d2e3513
Fix C# SDK audio streaming PR: namespace corrections, restored public…
Copilot Mar 20, 2026
ed9e350
merge the main
Mar 20, 2026
0cac7f3
update response type
Mar 22, 2026
06dc45c
fix nenad
Mar 23, 2026
709788c
add unitest
Mar 24, 2026
24aacb1
update the ci core package
Mar 24, 2026
eeb34b8
update the ci core package
Mar 24, 2026
292a5bc
Add live audio transcription support to JS SDK
Mar 24, 2026
4c2eb5e
Merge branch 'ruiren/audio-streaming-support-sdk-js-2' into ruiren/au…
Mar 24, 2026
5287519
Remove leftover sdk_v2/ directory
Mar 24, 2026
57ce460
Update Core version to 0.9.0 in JS install script
Mar 24, 2026
e9f2b5f
Merge branch 'ruiren/audio-streaming-support-sdk' into ruiren/audio-s…
Mar 24, 2026
18389cb
Update Core version to 0.9.0 in JS install script
Mar 24, 2026
10bbcb8
update the npkg
Mar 25, 2026
27e358c
update the npkg
Mar 25, 2026
a02e381
Trigger CI re-run
Mar 25, 2026
c17a74d
Update package versions
kunal-vaishnavi Mar 25, 2026
fc0fa6e
Temporarily use different FL Core WinML version
kunal-vaishnavi Mar 25, 2026
5678587
Use ORT nightly feed for getting ORT GenAI in JS builds
kunal-vaishnavi Mar 25, 2026
a373cd7
erge branch 'ruiren/audio-streaming-support-sdk' into ruiren/audio-st…
Mar 25, 2026
3ae22b4
update test pkg
Mar 25, 2026
8a897c5
update unitest
Mar 25, 2026
be21735
update unitest
Mar 25, 2026
093653b
update rust build
Mar 25, 2026
0feb274
update genai version
Mar 25, 2026
95fccd4
update genai version
Mar 25, 2026
795d281
update rust build
Mar 26, 2026
24fe228
update rust build
Mar 26, 2026
2ac5b1d
update rust build
Mar 26, 2026
2e464cb
update rust build
Mar 26, 2026
40794d6
update rust build
Mar 26, 2026
1693c90
update rust unitest
Mar 26, 2026
ec0038a
update rust unitest
Mar 26, 2026
6615654
reverse rust version
Mar 26, 2026
80665a0
update js & rust
Mar 26, 2026
e1cef6f
update js & rust
Mar 26, 2026
6d43bf9
update rust
Mar 26, 2026
7b1f735
update rust
Mar 26, 2026
fc0c5a5
bitsPerSample
Mar 26, 2026
4f656f5
Merge remote-tracking branch 'origin/main' into ruiren/audio-streamin…
Mar 26, 2026
3322120
lint
Mar 26, 2026
53d4ad7
Merge branch 'ruiren/audio-streaming-support-sdk' into ruiren/audio-s…
Mar 26, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .github/workflows/build-js-steps.yml
Original file line number Diff line number Diff line change
Expand Up @@ -95,12 +95,12 @@ jobs:
- name: npm install (WinML)
if: ${{ inputs.useWinML == true }}
working-directory: sdk/js
run: npm install --winml
run: npm install --winml --nightly

- name: npm install (Standard)
if: ${{ inputs.useWinML == false }}
working-directory: sdk/js
run: npm install
run: npm install --nightly

- name: Set package version
working-directory: sdk/js
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/build-rust-steps.yml
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ jobs:
working-directory: sdk/rust

env:
CARGO_FEATURES: ${{ inputs.useWinML && '--features winml' || '' }}
CARGO_FEATURES: ${{ inputs.useWinML && '--features winml,nightly' || '--features nightly' }}

steps:
- name: Checkout repository
Expand Down
Binary file added Notes-Audio.docx
Binary file not shown.
2 changes: 1 addition & 1 deletion samples/cs/GettingStarted/Directory.Packages.props
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
<Project>
<PropertyGroup>
<ManagePackageVersionsCentrally>true</ManagePackageVersionsCentrally>
<OnnxRuntimeGenAIVersion>0.12.1</OnnxRuntimeGenAIVersion>
<OnnxRuntimeGenAIVersion>0.13.0-dev-20260319-1131106-439ca0d51</OnnxRuntimeGenAIVersion>
<OnnxRuntimeVersion>1.23.2</OnnxRuntimeVersion>
</PropertyGroup>
<ItemGroup>
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
<Project Sdk="Microsoft.NET.Sdk">

<PropertyGroup>
<OutputType>Exe</OutputType>
<TargetFramework>net9.0</TargetFramework>
<ImplicitUsings>enable</ImplicitUsings>
<Nullable>enable</Nullable>
</PropertyGroup>

<PropertyGroup Condition="'$(RuntimeIdentifier)'==''">
<RuntimeIdentifier>$(NETCoreSdkRuntimeIdentifier)</RuntimeIdentifier>
</PropertyGroup>

<!-- Include the main program -->
<ItemGroup>
<Compile Include="../../src/LiveAudioTranscriptionExample/*.cs" />
<Compile Include="../../src/Shared/*.cs" />
</ItemGroup>

<!-- Packages -->
<ItemGroup>
<PackageReference Include="Microsoft.AI.Foundry.Local" />
<PackageReference Include="NAudio" Version="2.2.1" />
</ItemGroup>

<!-- ONNX Runtime GPU and CUDA provider (required for Linux)-->
<ItemGroup Condition="'$(RuntimeIdentifier)' == 'linux-x64'">
<PackageReference Include="Microsoft.ML.OnnxRuntime.Gpu" />
<PackageReference Include="Microsoft.ML.OnnxRuntimeGenAI.Cuda" />
</ItemGroup>

</Project>
105 changes: 105 additions & 0 deletions samples/cs/GettingStarted/src/LiveAudioTranscriptionExample/Program.cs
Original file line number Diff line number Diff line change
@@ -0,0 +1,105 @@
// Live Audio Transcription — Foundry Local SDK Example
//
// Demonstrates real-time microphone-to-text using:
// SDK (FoundryLocalManager) → Core (NativeAOT DLL) → onnxruntime-genai (StreamingProcessor)

using Microsoft.AI.Foundry.Local;
using NAudio.Wave;

Console.WriteLine("===========================================================");
Console.WriteLine(" Foundry Local -- Live Audio Transcription Demo");
Console.WriteLine("===========================================================");
Console.WriteLine();

var config = new Configuration
{
AppName = "foundry_local_samples",
LogLevel = Microsoft.AI.Foundry.Local.LogLevel.Information
};

await FoundryLocalManager.CreateAsync(config, Utils.GetAppLogger());
var mgr = FoundryLocalManager.Instance;

await Utils.RunWithSpinner("Registering execution providers", mgr.EnsureEpsDownloadedAsync());

var catalog = await mgr.GetCatalogAsync();

var model = await catalog.GetModelAsync("nemotron") ?? throw new Exception("Model \"nemotron\" not found in catalog");

await model.DownloadAsync(progress =>
{
Console.Write($"\rDownloading model: {progress:F2}%");
if (progress >= 100f)
{
Console.WriteLine();
}
});

Console.Write($"Loading model {model.Id}...");
await model.LoadAsync();
Console.WriteLine("done.");

var audioClient = await model.GetAudioClientAsync();
var session = audioClient.CreateLiveTranscriptionSession();
session.Settings.SampleRate = 16000;
session.Settings.Channels = 1;
session.Settings.Language = "en";

await session.StartAsync();
Console.WriteLine(" Session started");

var readTask = Task.Run(async () =>
{
try
{
await foreach (var result in session.GetTranscriptionStream())
{
if (result.IsFinal)
{
Console.WriteLine();
Console.WriteLine($" [FINAL] {result.Text}");
Console.Out.Flush();
}
else if (!string.IsNullOrEmpty(result.Text))
{
Console.ForegroundColor = ConsoleColor.Cyan;
Console.Write(result.Text);
Console.ResetColor();
Console.Out.Flush();
}
}
}
catch (OperationCanceledException) { }
});

using var waveIn = new WaveInEvent
{
WaveFormat = new WaveFormat(rate: 16000, bits: 16, channels: 1),
BufferMilliseconds = 100
};

waveIn.DataAvailable += (sender, e) =>
{
if (e.BytesRecorded > 0)
{
_ = session.AppendAsync(new ReadOnlyMemory<byte>(e.Buffer, 0, e.BytesRecorded));
}
};

Console.WriteLine();
Console.WriteLine("===========================================================");
Console.WriteLine(" LIVE TRANSCRIPTION ACTIVE");
Console.WriteLine(" Speak into your microphone.");
Console.WriteLine(" Transcription appears in real-time (cyan text).");
Console.WriteLine(" Press ENTER to stop recording.");
Console.WriteLine("===========================================================");
Console.WriteLine();

waveIn.StartRecording();
Console.ReadLine();
waveIn.StopRecording();

await session.StopAsync();
await readTask;

await model.UnloadAsync();
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
<Project Sdk="Microsoft.NET.Sdk">

<PropertyGroup>
<OutputType>Exe</OutputType>
<ImplicitUsings>enable</ImplicitUsings>
<Nullable>enable</Nullable>
<!-- For Windows use the following -->
<TargetFramework>net9.0-windows10.0.26100</TargetFramework>
<WindowsAppSDKSelfContained>false</WindowsAppSDKSelfContained>
<Platforms>ARM64;x64</Platforms>
<WindowsPackageType>None</WindowsPackageType>
<EnableCoreMrtTooling>false</EnableCoreMrtTooling>
</PropertyGroup>

<PropertyGroup Condition="'$(RuntimeIdentifier)'==''">
<RuntimeIdentifier>$(NETCoreSdkRuntimeIdentifier)</RuntimeIdentifier>
</PropertyGroup>

<ItemGroup>
<Compile Include="../../src/LiveAudioTranscriptionExample/*.cs" />
<Compile Include="../../src/Shared/*.cs" />
</ItemGroup>

<!-- Use WinML package for local Foundry SDK on Windows -->
<ItemGroup>
<PackageReference Include="Microsoft.AI.Foundry.Local.WinML" />
<PackageReference Include="NAudio" Version="2.2.1" />
</ItemGroup>

</Project>
60 changes: 60 additions & 0 deletions sdk/cs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -233,6 +233,64 @@ audioClient.Settings.Language = "en";
audioClient.Settings.Temperature = 0.0f;
```

### Live Audio Transcription (Real-Time Streaming)

For real-time microphone-to-text transcription, use `CreateLiveTranscriptionSession()`. Audio is pushed as raw PCM chunks and transcription results stream back as an `IAsyncEnumerable`.

The streaming result type (`LiveAudioTranscriptionResponse`) extends `AudioCreateTranscriptionResponse` from the Betalgo OpenAI SDK, so it's compatible with the file-based transcription output format while adding streaming-specific fields.

```csharp
var audioClient = await model.GetAudioClientAsync();
var session = audioClient.CreateLiveTranscriptionSession();

// Configure audio format (must be set before StartAsync)
session.Settings.SampleRate = 16000;
session.Settings.Channels = 1;
session.Settings.Language = "en";

await session.StartAsync();

// Push audio from a microphone callback (thread-safe)
waveIn.DataAvailable += (sender, e) =>
{
_ = session.AppendAsync(new ReadOnlyMemory<byte>(e.Buffer, 0, e.BytesRecorded));
};

// Read transcription results as they arrive
await foreach (var result in session.GetTranscriptionStream())
{
// result inherits from AudioCreateTranscriptionResponse
// - result.Text — incremental transcribed text (per chunk, not accumulated)
// - result.IsFinal — true for final results, false for interim hypotheses
// - result.Segments — segment-level timing data (Start/End in seconds)
// - result.Language — language code
Console.Write(result.Text);
}

await session.StopAsync();
```

#### Output Type

| Field | Type | Description |
|-------|------|-------------|
| `Text` | `string` | Transcribed text from this audio chunk (inherited from `AudioCreateTranscriptionResponse`) |
| `IsFinal` | `bool` | Whether this is a final or interim result. Nemotron always returns `true`. |
| `Language` | `string` | Language code (inherited) |
| `Duration` | `float` | Audio duration in seconds (inherited) |
| `Segments` | `List<Segment>` | Segment timing with `Start`/`End` offsets (inherited) |
| `Words` | `List<WordSegment>` | Word-level timing (inherited, when available) |

#### Session Lifecycle

| Method | Description |
|--------|-------------|
| `StartAsync()` | Initialize the streaming session. Settings are frozen after this call. |
| `AppendAsync(pcmData)` | Push a chunk of raw PCM audio. Thread-safe (bounded internal queue). |
| `GetTranscriptionStream()` | Async enumerable of transcription results. |
| `StopAsync()` | Signal end-of-audio, flush remaining audio, and clean up. |
| `DisposeAsync()` | Calls `StopAsync` if needed. Use `await using` for automatic cleanup. |

### Web Service

Start an OpenAI-compatible REST endpoint for use by external tools or processes:
Expand Down Expand Up @@ -297,6 +355,8 @@ Key types:
| [`ModelVariant`](./docs/api/microsoft.ai.foundry.local.modelvariant.md) | Specific model variant (hardware/quantization) |
| [`OpenAIChatClient`](./docs/api/microsoft.ai.foundry.local.openaichatclient.md) | Chat completions (sync + streaming) |
| [`OpenAIAudioClient`](./docs/api/microsoft.ai.foundry.local.openaiaudioclient.md) | Audio transcription (sync + streaming) |
| [`LiveAudioTranscriptionSession`](./docs/api/microsoft.ai.foundry.local.openai.liveaudiotranscriptionsession.md) | Real-time audio streaming session |
| [`LiveAudioTranscriptionResponse`](./docs/api/microsoft.ai.foundry.local.openai.liveaudiotranscriptionresponse.md) | Streaming transcription result (extends `AudioCreateTranscriptionResponse`) |
| [`ModelInfo`](./docs/api/microsoft.ai.foundry.local.modelinfo.md) | Full model metadata record |

## Tests
Expand Down
Loading
Loading