Scroll Top

Critical Path Optimization in Performance-Critical Applications

Critical Path Optimization in Performance-Critical Applications

In performance-sensitive applications, such as trading systems, a portion of the code handles the core logic that directly responds to market changes. This section, often referred to as the Critical Path or Hot Path, is responsible for processing market fluctuations, executing trades, and identifying profitable opportunities. Since this logic generates revenue, it must be exceptionally optimized.

Trading applications operate in highly competitive environments, where delays measured in microseconds can cause missed opportunities. Competitors with more efficient Critical Paths can exploit such delays to gain a market advantage. Hence, every nanosecond counts.

Key Considerations for Optimizing the Critical Path

  1. Algorithm Efficiency: While having a high-performing algorithm is essential, it represents only one aspect of optimization.
  2. Memory Allocation: Minimize memory allocation to avoid wasting CPU cycles that can add crucial microseconds to execution time. Structs, rather than classes, can play a pivotal role here.

Choosing Between Struct and Class

  • Structs are value types and are allocated on the stack, making them faster to allocate compared to classes, which are allocated on the heap. Heap allocation takes longer due to garbage collection overhead. However, there’s a trade-off: structs are copied by value.
    • Passing structs by value is efficient for small data types (e.g., int, double, bool) but introduces overhead for larger structs since the entire object is copied to the stack.
    • To avoid this, use the ref keyword to pass structs by reference, preventing costly copying operations.

void Compute(ref StructType s) { … }

Avoiding Boxing Overhead

When working with structs, boxing is a common pitfall. Boxing occurs when a struct is treated as an object, causing it to be copied to the heap. Consider this example:

 

struct T { … }
T t = new T();
Object o = t; // This triggers boxing

 

Boxing introduces significant performance penalties. To safeguard against such scenarios, C# provides the ref struct type, which ensures that the struct cannot be boxed. If the compiler detects a situation where the ref struct would be allocated on the heap, it throws a compilation error.

Important Note: Declaring a ref struct ensures heap allocation is avoided but does not imply that the struct is passed by reference. You must explicitly use the ref keyword in method definitions and calls to achieve this.

 

ref struct RT { … }
Compute1(RT rt);    // Passed by value
Compute2(ref RT rt); // Passed by reference

 

Performance Benchmarking

Below are the performance benchmarks for various implementation approaches. Execution times are measured in ticks:

Method

Execution Time (ticks)

performComputationStruct(T t)

3802

performComputationBoxing(Object o)

7616

performComputationRef(ref T)

2011

performComputationRT(RT)

1780

performComputationClass(C c)

5299

Full Benchmark Code

The following C# code demonstrates the performance characteristics of different approaches:

using System;

using System.Diagnostics;

struct T {

    public int id;

    public double value;

    public bool valid;

}

ref struct RT {

    public int id;

    public double value;

    public bool valid;

}

class C {

    public int id;

    public double value;

    public bool valid;

}

public class CriticalPathBenchmark

{

    const int MAX = 10000;

    public static void Main(string[] args)

    {

        Console.WriteLine($"Execution Time:\n");

        MeasureExecutionTime(() =>

        {

            for (int i = 0; i < MAX; i++)

            {

                var t = new T { id = 1, value = 1.0, valid = false };

                performComputationStruct(t);

            }

        }, "performComputationStruct(T t)");

        MeasureExecutionTime(() =>

        {

            for (int i = 0; i < MAX; i++)

            {

                var t = new T { id = 1, value = 1.0, valid = false };

                performComputationBoxing(t);

            }

        }, "performComputationBoxing(Object o)");

        MeasureExecutionTime(() =>

        {

            for (int i = 0; i < MAX; i++)

            {

                var t = new T { id = 1, value = 1.0, valid = false };

                performComputationRef(ref t);

            }

        }, "performComputationRef(ref T)");

        MeasureExecutionTime(() =>

        {

            for (int i = 0; i < MAX; i++)

            {

                var rt = new RT { id = 1, value = 1.0, valid = false };

                performComputationRT(ref rt);

            }

        }, "performComputationRT(RT)");

        MeasureExecutionTime(() =>

        {

            for (int i = 0; i < MAX; i++)

            {

                var c = new C { id = 1, value = 1.0, valid = false };

                performComputationClass(c);

            }

        }, "performComputationClass(C c)");

    }

    static void MeasureExecutionTime(Action action, string label)

    {

        Stopwatch stopwatch = Stopwatch.StartNew();

        action();

        stopwatch.Stop();

        Console.WriteLine($"{label} : {stopwatch.ElapsedTicks} ticks");

    }

    static T performComputationStruct(T t)

    {

        t.value *= 10;

        t.valid = !t.valid;

        return t;

    }

    static T performComputationBoxing(Object o)

    {

        T t = (T)o;

        t.value *= 10;

        t.valid = !t.valid;

        return t;

    }

    static void performComputationRef(ref T t)

    {

        t.value *= 10;

        t.valid = !t.valid;

    }

    static void performComputationRT(ref RT rt)

    {

        rt.value *= 10;

        rt.valid = !rt.valid;

    }

    static void performComputationClass(C c)

    {

        c.value *= 10;

        c.valid = !c.valid;

    }

}

Takeaways

  1. Use structs for lightweight, stack-allocated data structures but avoid large structs unless you can pass them by reference using ref.
  2. Employ ref struct to prevent heap allocation and avoid boxing overhead in performance-critical scenarios.
  3. Benchmark your code to ensure your optimizations align with real-world performance expectations.