Source-generated RegEx (C#)

Karen Payne - Aug 24 - - Dev Community

Introduction

Learn about source generated (.NET 7 and higher) regular expressions to improve performance and documentation.

  • Rather than a long article with benchmarks, this article is short without benchmarks as for learning there is plenty of information provided along with useful source code.

  • Creation is a manual process while those with Jetbrains ReSharper installed will assist with the creation process and even make useful name recommendations.

Source code (NET8)

The source project contains two classes with several helpful example to learn from.

  • Credit card masking.
  • Remove extra spaces from strings.
  • Conditional extracting parts from hyper links.
  • Uncommon date extraction from a string with multiple dates.
  • Method to increment alpha numeric strings.

Note
There is a secondary project below with source code

Performance

Rather repeat a great resource, see Regular Expression Improvements in .NET 7 by Stephen Toub - MSFT.

Implementation

To use source generated regular expressions GeneratedRegex attribute, create a static partial class.

Simple example which proper cases a string.

using System.Text.RegularExpressions;

namespace GeneratedRegexSamplesApp.Classes;
public static partial class Helpers
{
     public static string ProperCased(this string source)
        => SentenceCaseRegex()
            .Replace(source.ToLower(), s => s.Value.ToUpper());


     [GeneratedRegex(@"(^[a-z])|\.\s+(.)", RegexOptions.ExplicitCapture)]
    private static partial Regex SentenceCaseRegex();
}
Enter fullscreen mode Exit fullscreen mode

After adding the class to a project, SentenceCaseRegex() will have red squiggly below until the project is built.

Once the project is built, the source code can be viewed under Dependences ➡️ Analyzers.

Shows RegexGenator.g.cs

Documentation

Source generation has a bonus, documentation of the regular expression pattern which is helpful in two ways. First, if a developer did not write the expression pattern the XML documentation helps to explain the pattern and secondly when the expression is in a library helps developer to know if the method using a specific pattern fits their needs.

To see the documentation, hover over the implementation or the method as shown below.

Shows the explanation for the regular expression

Important even though the documentation is provided does not mean there is no need for documentation of the method using the regular expression.

Perfect example, a method to determine if a social security number is valid were the social security number is passed with dashes.

SSN validation

Hover over the method provides the following which is correct but does not explain the why.

Shows XML documentation

In this case the developer needs to explain the why as shown below to prevent fraud.

/// <summary>
/// Is a valid SSN
/// </summary>
/// <returns>True if valid, false if invalid SSN</returns>
/// <remarks>
/// 
/// Guaranteed to never be an empty string or null, client code handles this. 
/// 
/// ^                                       #Start of expression
/// (?!\b(\d)\1+-(\d)\1+-(\d)\1+\b)         #Don't allow all matching digits for every field
/// (?!123-45-6789|219-09-9999|078-05-1120) #Don't allow "123-45-6789", "219-09-9999" or "078-05-1120"
/// (?!666|000|9\d{2})\d{3}                 #Don't allow the SSN to begin with 666, 000 or anything between 900-999
/// -                                       #A dash (separating Area and Group numbers)
/// (?!00)\d{2}                             #Don't allow the Group Number to be "00"
/// -                                       #Another dash (separating Group and Serial numbers)
/// (?!0{4})\d{4}                           #Don't allow last four digits to be "0000"
/// $                                       #End of expression
/// </remarks>
public static bool IsValidSocialSecurityNumber(string value) => SSNValidationRegex().IsMatch(value.Replace("-", ""));
Enter fullscreen mode Exit fullscreen mode

Separation of GeneratedRegex

Consider placing GeneratedRegex in a new file if there are many. For instance, given a class named StringExtensions create StringExtensions.cs then create a new file named GeneratedRegularExpressions.cs and alter the class name to StringExtensions.

Example for GeneratedRegularExpressions.cs

Project code

public static partial class StringExtensions
{
    [GeneratedRegex(@"^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[^\da-zA-Z]).{8,}$")]
    private static partial Regex PasswordRegEx();

    [GeneratedRegex(@"([A-Z][a-z]+)")]
    private static partial Regex CaseRegEx();

    [GeneratedRegex("[0-9]+$")]
    private static partial Regex NumericSuffixRegEx();

    [GeneratedRegex("[0-9][0-9 ]{13,}[0-9]")]
    private static partial Regex CreditCardMaskRegEx();
}
Enter fullscreen mode Exit fullscreen mode

Then the main code which is shown below.

public static partial class StringExtensions
{
    /// <summary>
    /// Splits the given string into separate words based on the case of the letters.
    /// </summary>
    /// <param name="sender">The string to split.</param>
    /// <returns>A new string with the words separated by spaces.</returns>
    [DebuggerStepThrough]
    public static string SplitCase(this string sender) =>
        string.Join(" ", CaseRegEx().Matches(sender)
            .Select(m => m.Value));

    /// <summary>
    /// Validates the password based on a regular expression pattern.
    /// </summary>
    /// <param name="password">The password to validate.</param>
    /// <returns>True if the password is valid, otherwise false.</returns>
    [DebuggerStepThrough]
    public static bool ValidatePassword(this string password)
        => PasswordRegEx().IsMatch(password);


    /// <summary>
    /// Gets the next value by incrementing the numeric suffix in the given string.
    /// </summary>
    /// <param name="sender">The string to get the next value from.</param>
    /// <returns>The next value with the numeric suffix incremented.</returns>
    [DebuggerStepThrough]
    public static string NextValue(string sender)
    {
        var value = NumericSuffixRegEx().Match(sender).Value;
        return sender[..^value.Length] + (long.Parse(value) + 1)
            .ToString().PadLeft(value.Length, '0');
    }

    /// <summary>
    /// Masks the credit card number in the given string by replacing the digits with a specified mask character.
    /// </summary>
    /// <param name="sender">The string containing the credit card number.</param>
    /// <param name="maskCharacter">The character used as a mask. Default is 'X'.</param>
    /// <returns>A new string with the credit card number masked.</returns>
    public static string MaskCreditCardNumber(this string sender, char maskCharacter = 'X')
    {
        if (string.IsNullOrEmpty(sender))
        {
            return sender;
        }

        return CreditCardMaskRegEx().Replace(sender, match =>
        {
            var digits = string.Concat(match.Value.Where(char.IsDigit));

            return digits.Length is 16 or 15
                ? new string(maskCharacter, digits.Length - 4) + digits[^4..]
                : match.Value;
        });
    }
}
Enter fullscreen mode Exit fullscreen mode

Summary

Information and code samples have been provided to show how to implement source generation for RegEx (Regular Expressions) which provide better performance gains than conventional implementation of regular expressions with the bonus of XML documentation.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .