When Files Outperform Databases

By Csharp_Admin Advanced SQL/Database May 10, 2024

When Files Outperform Databases

At first glance, comparing file systems to databases might seem counterintuitive. Databases are generally favored for their robust capabilities in data manipulation, offering a vast array of tools and features. While this is undoubtedly true, there are scenarios where system files can outperform databases, particularly in terms of raw performance.

One such scenario arises in the financial sector, where market data providers deliver real-time information about financial instruments, their prices, and related metrics. This data arrives as a continuous stream of tickers, each containing a TickerID, timestamp, and values such as Ask, Bid, and Mean.

Given the high velocity of incoming data—potentially thousands of records in seconds—it’s crucial to store this information efficiently for later processing. Attempting to insert each record into a traditional database in real-time can result in significant performance bottlenecks.

The Problem with Direct Database Inserts

Consider the following MySQL table structure for storing tickers:

CREATE TABLE `tickers` (

`TIK_ID` VARCHAR(50) NOT NULL,

`TIK_DATE` DATETIME NOT NULL,

`TIK_BID` DOUBLE NOT NULL,

`TIK_ASK` DOUBLE NOT NULL,

`TIK_MEAN` DOUBLE NOT NULL,

PRIMARY KEY (`TIK_ID`)

) ENGINE=InnoDB DEFAULT CHARSET=latin1;

Inserting records into this table requires executing SQL statements like:

INSERT INTO tickers (TIK_ID, TIK_DATE, TIK_ASK, TIK_BID, TIK_MEAN)

VALUES (@id, @date, @ask, @bid, @mean);

If you need to insert 1,000 tickers, this command must be executed 1,000 times. Even with optimizations like stored procedures, the execution time can range from 30 to 40 seconds. This is problematic for systems that cannot afford such delays or resource-intensive operations.

A File-Based Solution

A more efficient alternative involves writing the ticker data to a text file and periodically importing it into the database in bulk. Instead of executing an SQL command for each incoming record, you append the data to a text file. Once the file reaches a sufficient size, it is imported into the database using a bulk import tool—most database systems include such functionality.

Using this approach, the time required to write 1,000 tickers to a file and import them into the database is approximately 500 milliseconds, a performance improvement of over 600%.

Here’s an example implementation using modern C#:

private async Task SaveToFileAndImportAsync()

{

    var startTime = DateTime.Now;

    var filePath = "dbvsfile.txt";

    try

    {

        // Generate ticker data and write to file

        var random = new Random();

        var tickerData = new StringBuilder();

        

        for (int i = 0; i < 1000; i++)

        {

            string id = $"ticker{i}";

            DateTime timestamp = DateTime.Now;

            double ask = random.NextDouble() * 100;

            double bid = random.NextDouble() * 100;

            double mean = (ask + bid) / 2;

            tickerData.AppendLine($"{id};{timestamp:O};{ask:F2};{bid:F2};{mean:F2}");

        }

        await File.WriteAllTextAsync(filePath, tickerData.ToString());

        // Import the file into the database

        var connectionString = ConfigurationManager.ConnectionStrings["dbvsfileConnectionString"].ConnectionString;

        using var connection = new MySqlConnection(connectionString);

        await connection.OpenAsync();

        var command = connection.CreateCommand();

        command.CommandText = @"

            LOAD DATA INFILE @filePath

            INTO TABLE tickers

            FIELDS TERMINATED BY ';'

            LINES TERMINATED BY '\n'

            (TIK_ID, TIK_DATE, TIK_ASK, TIK_BID, TIK_MEAN);";

        command.Parameters.AddWithValue("@filePath", Path.GetFullPath(filePath));

        await command.ExecuteNonQueryAsync();

    }

    catch (Exception ex)

    {

        Console.Error.WriteLine($"Error: {ex.Message}");

    }

    finally

    {

        var endTime = DateTime.Now;

        Console.WriteLine($"Elapsed time: {(endTime - startTime).TotalMilliseconds} ms");

    }

}

Benefits and Drawbacks of the File-Based Approach

Advantages:

Massive Performance Gains: Writing to a file and importing in bulk significantly reduces the time and resources needed compared to individual SQL inserts.
Simplified Operations: Bulk imports bypass the overhead of repeated database communication layers.

Disadvantages:

Delayed Data Availability: Since data resides in the text file before being imported, there’s a slight delay (often measured in minutes) during which the data is unavailable for queries.
Increased Management Complexity: Additional code is needed to handle file management, ensure successful imports, and manage error handling.

Why the File Approach is Faster

The performance advantage lies in avoiding the repeated overhead of executing individual SQL commands. Each SQL statement must traverse multiple layers—application framework, database drivers, and network communication—before being processed by the database. By contrast, bulk import tools write directly to the database, eliminating this overhead.

Conclusion

While APIs and direct SQL commands are often the default choice, they’re not always the optimal solution for high-throughput scenarios. As demonstrated here, leveraging a text file as an intermediary can deliver substantial performance improvements, albeit with trade-offs in real-time availability and added complexity. Carefully evaluating your system’s requirements will help determine the best approach for your specific use case.