Skip to content

Performance Best Practices

Prerequisites: Defining Pipelines, Custom Nodes

NPipeline includes analyzers that detect many of these issues at build time. In the HighThroughput optimization profile, all performance analyzers are active and enforce these practices automatically. In the Default profile, the most aggressive rules (NP9103–NP9107) are suppressed - apply these guidelines when you're optimizing for throughput.

Do's

Use ValueTask Fast Paths for Synchronous Transforms

If your transform completes synchronously (e.g., mapping, filtering, simple calculations), override ExecuteValueTaskAsync to avoid Task allocations:

csharp
public class ToUpper : TransformNode<string, string>
{
    public override Task<string> TransformAsync(
        string item, PipelineContext ctx, CancellationToken ct)
        => Task.FromResult(item.ToUpperInvariant());

    // Fast path - avoids Task allocation
    protected internal override ValueTask<string> ExecuteValueTaskAsync(
        string item, PipelineContext ctx, CancellationToken ct)
        => new(item.ToUpperInvariant());
}

See Synchronous Fast Paths for details.

Stream Large Datasets

Use DataStream<T> lazy streaming rather than materializing entire datasets:

csharp
// Good - streams one item at a time
public override DataStream<Record> OpenStream(PipelineContext ctx, CancellationToken ct)
    => DataStream.FromAsyncEnumerable(ReadRecordsAsync(ct));

// Bad - loads everything into memory
public override DataStream<Record> OpenStream(PipelineContext ctx, CancellationToken ct)
    => DataStream.FromEnumerable(ReadAllRecords()); // OOM risk

Use Batching for I/O-Bound Operations

Batch database writes and API calls to amortize per-call overhead:

csharp
handle.WithBatching(builder, new BatchingOptions { BatchSize = 100 });

Use Parallel Execution for CPU-Bound Transforms

csharp
handle.WithParallelExecution(builder, options =>
    options.WithDegreeOfParallelism(Environment.ProcessorCount));

Don'ts

Avoid LINQ in Hot Paths

LINQ allocates enumerator objects and delegates on every call. In TransformAsync methods that run per-item, use loops instead:

csharp
// Bad - allocates on every item (NP9103)
var filtered = items.Where(x => x.IsValid).ToList();

// Good - no allocations
foreach (var item in items)
{
    if (item.IsValid) results.Add(item);
}

Avoid Blocking on Async Code

Never use .Result, .Wait(), or .GetAwaiter().GetResult() in nodes (NP9101):

csharp
// Bad - deadlock risk and thread pool starvation
var data = httpClient.GetAsync(url).Result;

// Good
var data = await httpClient.GetAsync(url, ct);

Avoid String Concatenation in Loops

Use StringBuilder instead of + in hot paths (NP9104):

csharp
// Bad - O(n²) allocations
foreach (var item in items) result += item.ToString();

// Good
var sb = new StringBuilder();
foreach (var item in items) sb.Append(item);

Avoid Anonymous Objects in Hot Paths

Anonymous object creation causes GC pressure (NP9105). Use records or structs instead.

Analyzer-Backed Rules

NPipeline's Roslyn analyzers enforce these practices at build time:

RuleSeverityWhat It Catches
NP9101WarningBlocking on async (.Result, .Wait())
NP9103WarningLINQ in TransformAsync hot paths
NP9104WarningString concatenation in loops
NP9105WarningAnonymous object allocations in hot paths
NP9106InfoMissing ValueTask fast path override

See Build-Time Analyzers for the complete list.

Performance Characteristics

Memory Model

NPipeline streams data lazily via IAsyncEnumerable<T>. Memory usage is proportional to items in flight, not total dataset size:

Streaming (default):
  Item 1: [Read] → [Transform] → [Write] → [GC] → Item 2
  Memory: O(k) where k = items in flight (typically 1–2)

Eager (.ToList()):
  [All N items in memory] → Process → [GC]
  Memory: O(N) - entire dataset

For a 1 million row CSV at 500 bytes per row: streaming uses ~1–2 MB; .ToList() requires ~500 MB.

Throughput

Sequential (default): items processed end-to-end, one at a time. Throughput = 1 item per processing cycle.

Parallel (with NPipeline.Extensions.Parallelism): multiple items processed concurrently. Throughput scales with MaxDegreeOfParallelism for CPU-bound work, and higher for I/O-bound work (where threads are mostly waiting).

Built-In Optimizations

OptimizationImpactConfiguration
Context caching~150–250μs saved per 1K items by caching retry options, tracer, and logger at node scopeAutomatic
ValueTask fast pathUp to 90% reduction in GC pressure for synchronous transformsOverride ExecuteValueTaskAsync
Compiled expression factoriesNode instantiation as fast as new() after first callAutomatic
Execution plan cachingSkips type inspection on repeated pipeline runsAutomatic (disable with WithoutExecutionPlanCache())
Object poolingReuses common collection types during orchestrationAutomatic

NPipeline vs Alternatives

AspectNPipelineLINQ StreamingMessage QueuesManual Iteration
MemoryO(k) active itemsO(1) per itemO(batch)O(N) all items
Latency to first item< 1ms< 1ms10–100msN/A (batch)
Typed compositionYesYesWeakNo
Error handlingRetry, skip, dead-letter, circuit breakerBasic try/catchRich (platform-specific)Manual
ObservabilityBuilt-in extensionLimitedRich (platform-specific)Manual
Parallel executionBounded with backpressurePLINQ (unbounded)Consumer groupsManual threading

Next Steps

Released under the MIT License.