Scroll Top
Create Expression Parser with Parlot
Industry Leading Active Noise Cancellation
Create Expression Parser with Parlot

In the landscape of modern software development, the ability to parse custom expressions and allow users to create their own formulas, akin to the functionality found in Microsoft Excel, represents a significant leap in usability, flexibility, and power for end-users. This article explores the benefits and uses of custom expression parsing in software applications, highlighting its impact on user experience and software versatility.

Custom expression parsing refers to the process where software interprets and executes user-defined commands or formulas. This feature is instrumental in applications that require mathematical computations, data analysis, and dynamic content generation. By integrating custom expression parsing, developers can offer users a more interactive and tailored experience, significantly expanding the software’s utility beyond its predefined functions.

One of the primary benefits of custom expression parsing is enhanced user empowerment. Users are not limited to a set of hard-coded operations or functions; instead, they can define their own calculations or logic to suit their specific needs. This capability is invaluable in fields such as finance, engineering, and data science, where custom formulas are often necessary to analyze complex datasets or perform specialized calculations.

The grammar of simple arithmetic expression would be like the following:

 

 Grammar:
 statement      => ((assignment | expression) ";")* ;
 assignment     => identifier "=" expression ;
 expression     => factor ( ( "-" | "+" ) factor )* ;
 factor         => unary ( ( "/" | "*" ) unary )* ;
 unary          => ( "-" ) unary
                 | primary ;
 primary        => NUMBER | identifier
                  | "(" expression ")" ;

This grammar outlines the structure of simple arithmetic expressions. It breaks down the parsing process into hierarchical components, from broader expressions to individual numerical values or nested expressions.

  • Statement: At the highest level, a statement consists of a series of assignments or expressions followed by semi-colon (;).
  • Assignment: An assignment is where a variable is created (updated) and assigned a value, it consists of an identifier followed by (=) followed by an expression.
  • Expression: An expression consists of factors that can be combined using addition (+) or subtraction (-) operators. An expression can contain multiple factors, separated by these operators.
  • Factor: A factor is formed by unaries, which are the basic units that can be multiplied (*) or divided (/). Factors are essentially the product or quotient of these unary elements.
  • Unary: A unary can either be a straightforward primary value or a negated unary, indicated by a preceding minus (-) sign. This structure allows for the expression of negative numbers or negated expressions.
  • Primary: The most basic element, a primary, is either a numerical value (NUMBER) or an expression enclosed in parentheses. This recursive definition enables the nesting of expressions, allowing for complex arithmetic calculations.

This hierarchical arrangement ensures that arithmetic operations adhere to the conventional order of operations, with parentheses dictating the precedence of sub-expressions, followed by multiplication and division, and finally, addition and subtraction.

This grammar makes sure that the numbers and the parenthesis are considered first, then the unary, then the multiplication or division, and lastly the addition or subtraction.

Developing a parser for this particular grammar from the ground up can be an overwhelming challenge. Therefore, leveraging an established and efficient library, such as Parlot, is a more advisable approach. 

Parlot

Parlot is a fast, lightweight and simple to use .NET parser combinator.
Parlot provides a fluent API based on parser combinators that provide a more readable grammar definition.

Fluent API

The Fluent API provides simple parser combinators that are assembled to express more complex expressions. The main goal of this API is to provide an easy-to-read grammar. Another advantage is that grammars are built at runtime, and they can be extended dynamically.

The goal is to swiftly and effortlessly design compact parsers for domain-specific expressions, sidestepping the extensive time and effort usually required to establish underlying infrastructure. Parlot stands out as the ideal instrument for this purpose. In our upcoming example, we will demonstrate how to both create and implement the previously mentioned grammar using merely two C# classes: one for initializing the parser and another for constructing the syntax tree.

Through the FluentParser class, we illustrate the simplicity with which the grammar can be translated into a sequence of Fluent API calls. This process facilitates the creation of a hierarchical assembly of parsers. These parsers are adept at analyzing the input and generating an Abstract Syntax Tree (AST), all while relying on the Expression class. This approach not only streamlines the development process but also enhances the efficiency and readability of the resulting parser architecture.

using Parlot.Fluent;
using static Parlot.Fluent.Parsers;

namespace Parlot.Example
{
    public class FluentParser
    {
        public static readonly Parser<BlockExpression> Statements;

        static FluentParser()
        {
            /*
             * Grammar:
             * statement      => ((assignment | expression) ";")* ;
             * assignment     => identifier "=" expression ;
             * expression     => factor ( ( "-" | "+" ) factor )* ;
             * factor         => unary ( ( "/" | "*" ) unary )* ;
             * unary          => ( "-" ) unary
             *                 | primary ;
             * primary        => NUMBER | identifier
             *                  | "(" expression ")" ;
            */

            // The Deferred helper creates a parser that can be referenced by others before it is defined
            var expression = Deferred<Expression>();

            var number = Terms.Decimal()
                .Then<Expression>(static d => new Number(d))
                ;

            var divided = Terms.Char('/');
            var times = Terms.Char('*');
            var minus = Terms.Char('-');
            var plus = Terms.Char('+');
            var openParen = Terms.Char('(');
            var closeParen = Terms.Char(')');
            var semiColon = Terms.Char(';');
            var assign = Terms.Char('=');
            var identifier = Terms.Identifier().Then<Expression>(t => new IdentifierExpr(t.ToString()));

            // "(" expression ")"
            var groupExpression = Between(openParen, expression, closeParen);

            // primary => NUMBER | identifier | "(" expression ")";
            var primary = OneOf(number, identifier, groupExpression);
 
            // The Recursive helper allows to create parsers that depend on themselves.
            // ( "-" ) unary | primary;
            var unary = Recursive<Expression>((u) =>
                minus.And(u)
                    .Then<Expression>(static x => new NegateExpression(x.Item2))
                    .Or(primary));

            // factor => unary ( ( "/" | "*" ) unary )* ;
            var factor = unary.And(ZeroOrMany(divided.Or(times).And(unary)))
                .Then(static x =>
                {
                    // unary
                    var result = x.Item1;

                    // (("/" | "*") unary ) *
                    foreach (var op in x.Item2)
                    {
                        result = op.Item1 switch
                        {
                            '/' => new Division(result, op.Item2),
                            '*' => new Multiplication(result, op.Item2),
                            _ => null
                        };
                    }

                    return result;
                });

            // expression => factor ( ( "-" | "+" ) factor )* ;
            expression.Parser = factor.And(ZeroOrMany(plus.Or(minus).And(factor)))
                .Then(static x =>
                {
                    // factor
                    var result = x.Item1;

                    // (("-" | "+") factor ) *
                    foreach (var op in x.Item2)
                    {
                        result = op.Item1 switch
                        {
                            '+' => new Addition(result, op.Item2),
                            '-' => new Subtraction(result, op.Item2),
                            _ => null
                        };
                    }

                    return result;
                });

            // assignment     => identifier "=" expression ; 
            var assignement = identifier.And(assign).And(expression).Then<Expression>(static x=>
            {
                return new AssignmentExpression(((IdentifierExpr)x.Item1).Id, x.Item3);
            });

            // statement      => ((assignment | expression) ";")* ;
            var block = ZeroOrMany(assignement.Or<Expression>(expression).And(semiColon)).Then(static x => 
            {
                return new BlockExpression( x.Select(a => a.Item1).ToList());
            });

            Statements = block;
        }
    }
}

The Expression file houses an array of compact classes, each symbolizing a component of the Abstract Syntax Tree (AST) crafted by the builder. These classes are pivotal in executing each segment of the code through the invocation of the Evaluate() method, seamlessly translating the AST into operational outcomes.

namespace Parlot.Example
{
    public abstract class Expression
    {
        public abstract decimal Evaluate();
    }

    public abstract class BinaryExpression : Expression
    {
        protected BinaryExpression(Expression left, Expression right)
        {
            Left = left;
            Right = right;
        }

        public Expression Left { get; }
        public Expression Right { get; }

    }

    public abstract class UnaryExpression : Expression
    {
        protected UnaryExpression(Expression inner)
        {
            Inner = inner;
        }

        public Expression Inner { get; }
    }

    public class NegateExpression : UnaryExpression
    {
        public NegateExpression(Expression inner) : base(inner)
        {
        }

        public override decimal Evaluate()
        {
            return -1 * Inner.Evaluate();
        }
    }


    public class Addition : BinaryExpression
    {
        public Addition(Expression left, Expression right) : base(left, right)
        {
        }

        public override decimal Evaluate()
        {
            return Left.Evaluate() + Right.Evaluate();
        }
    }

    public class Subtraction : BinaryExpression
    {
        public Subtraction(Expression left, Expression right) : base(left, right)
        {
        }

        public override decimal Evaluate()
        {
            return Left.Evaluate() - Right.Evaluate();
        }
    }


    public class Multiplication : BinaryExpression
    {
        public Multiplication(Expression left, Expression right) : base(left, right)
        {
        }

        public override decimal Evaluate()
        {
            return Left.Evaluate() * Right.Evaluate();
        }
    }


    public class Division : BinaryExpression
    {
        public Division(Expression left, Expression right) : base(left, right)
        {
        }

        public override decimal Evaluate()
        {
            return Left.Evaluate() / Right.Evaluate();
        }
    }

    public class Number : Expression
    {
        public Number(decimal value)
        {
            Value = value;
        }

        public decimal Value { get; }

        public override decimal Evaluate()
        {
            return Value;
        }
    }

    public class IdentifierExpr : Expression
    {
        public IdentifierExpr(String id)
        {
            Id = id;
            IdentifierRepo.PutIfMissing(Id, 0);
        }

        public String Id { get; }

        public override decimal Evaluate()
        {
            return IdentifierRepo.Get(Id);
        }
    }

    public class AssignmentExpression : Expression
    {
        public AssignmentExpression(String id, Expression expr)
        {
            Id = id;
            Expr = expr;
        }

        public String Id { get; }
        public Expression Expr { get; }

        public override decimal Evaluate()
        {
            var v = Expr.Evaluate();
            IdentifierRepo.Put(Id, v);
            return v;
        }
    }

    public class BlockExpression : Expression
    {
        private List<Expression> exprLst;
        public BlockExpression(List<Expression> exprLst)
        {
            this.exprLst = exprLst;
        }

        public Expression Expr { get; }

        public override decimal Evaluate()
        {
            decimal v = 0;
            foreach (var expr in exprLst)
            {
                v = expr.Evaluate();
            }
            return v;
        }
    }

    public static class IdentifierRepo{
        private static Dictionary<String, decimal> values = new();

        public static void Put(String id, decimal v) {
            values[id] = v;

        }

        public static decimal Get(String id)
        {
            return values[id];
        }
                
        internal static void PutIfMissing(string id, int v)
        {
            if (!values.ContainsKey(id))
            {
                Put(id, v);
            }
        }
    }
}

It’s important to highlight that this parser offers the capability to interact with the software’s internal variables, provided the developer opts to enable this feature. To achieve this, one simply needs to integrate variables or constants into the IdentifierRepo instance.
Consider, for example, a scenario where a user desires to incorporate formulas that rely on various factors such as product quantity, price, and dimensions. By adding these properties to the IdentifierRepo, it becomes possible for the end user to directly reference and manipulate them within their formulas, thereby significantly enhancing the flexibility and utility of the software for customized calculations.

To evaluate the effectiveness of this grammar, we’ll construct a string encompassing variable assignments alongside computational operations. This string is then parsed through the FluentParser, followed by the execution of the encapsulated code via the Expression.Evaluate() method, allowing us to observe the grammar’s performance in practical scenarios.

using Parlot.Example;

// add predefined varibales coming from the system or the database
IdentifierRepo.Put("availableQty", 10);
IdentifierRepo.Put("unitPrice", 10);

var code = """
    sold = availableQty / 2;
    newPrice = unitPrice * 1.5;
    total = sold * newPrice;
    totalWithDiscount = total - (total * 0.05);
    totalWithVAT = totalWithDiscount * 1.2;
    """;

FluentParser.Statements.TryParse(code, out var expression);
var v = expression.Evaluate();
Console.Write("result: ");
Console.WriteLine(v); // result is 85.5
Console.ReadLine();

Conclusion

So in conclusion, custom expression parsing facilitates a higher degree of flexibility and adaptability. Software that supports user-defined expressions can easily adapt to various contexts and requirements without the need for extensive modifications or updates from the developers. This gives the software the ability to accommodate new use cases and scenarios that were not initially anticipated.

The integration of custom expression parsing in software offers a myriad of benefits, from enhancing user empowerment and flexibility to facilitating collaboration and innovation. By allowing users to define their own formulas, software developers can create more versatile and adaptive applications that cater to a wide range of needs and preferences. Despite the technical challenges, the advantages of custom expression parsing make it a valuable feature in the development of dynamic and user-friendly software.