Introduction
Learn about source generated (.NET 7 and higher) regular expressions to improve performance and documentation.
Rather than a long article with benchmarks, this article is short without benchmarks as for learning there is plenty of information provided along with useful source code.
Creation is a manual process while those with Jetbrains ReSharper installed will assist with the creation process and even make useful name recommendations.
The source project contains two classes with several helpful example to learn from.
- Credit card masking.
- Remove extra spaces from strings.
- Conditional extracting parts from hyper links.
- Uncommon date extraction from a string with multiple dates.
- Method to increment alpha numeric strings.
Note
There is a secondary project below with source codeNote
Some code was not formatting in the last section so a link was done instead.
Performance
Rather repeat a great resource, see Regular Expression Improvements in .NET 7 by Stephen Toub - MSFT.
Implementation
To use source generated regular expressions GeneratedRegex attribute, create a static partial class.
Simple example which proper cases a string.
using System.Text.RegularExpressions;
namespace GeneratedRegexSamplesApp.Classes;
public static partial class Helpers
{
public static string ProperCased(this string source)
=> SentenceCaseRegex()
.Replace(source.ToLower(), s => s.Value.ToUpper());
[GeneratedRegex(@"(^[a-z])|\.\s+(.)", RegexOptions.ExplicitCapture)]
private static partial Regex SentenceCaseRegex();
}
After adding the class to a project, SentenceCaseRegex() will have red squiggly below until the project is built.
Once the project is built, the source code can be viewed under Dependences ➡️ Analyzers.
Documentation
Source generation has a bonus, documentation of the regular expression pattern which is helpful in two ways. First, if a developer did not write the expression pattern the XML documentation helps to explain the pattern and secondly when the expression is in a library helps developer to know if the method using a specific pattern fits their needs.
To see the documentation, hover over the implementation or the method as shown below.
Important even though the documentation is provided does not mean there is no need for documentation of the method using the regular expression.
Perfect example, a method to determine if a social security number is valid were the social security number is passed with dashes.
Hover over the method provides the following which is correct but does not explain the why.
In this case the developer needs to explain the why as shown below to prevent fraud.
/// <summary>
/// Is a valid SSN
/// </summary>
/// <returns>True if valid, false if invalid SSN</returns>
/// <remarks>
///
/// Guaranteed to never be an empty string or null, client code handles this.
///
/// ^ #Start of expression
/// (?!\b(\d)\1+-(\d)\1+-(\d)\1+\b) #Don't allow all matching digits for every field
/// (?!123-45-6789|219-09-9999|078-05-1120) #Don't allow "123-45-6789", "219-09-9999" or "078-05-1120"
/// (?!666|000|9\d{2})\d{3} #Don't allow the SSN to begin with 666, 000 or anything between 900-999
/// - #A dash (separating Area and Group numbers)
/// (?!00)\d{2} #Don't allow the Group Number to be "00"
/// - #Another dash (separating Group and Serial numbers)
/// (?!0{4})\d{4} #Don't allow last four digits to be "0000"
/// $ #End of expression
/// </remarks>
public static bool IsValidSocialSecurityNumber(string value) => SSNValidationRegex().IsMatch(value.Replace("-", ""));
Separation of GeneratedRegex
Consider placing GeneratedRegex in a new file if there are many. For instance, given a class named StringExtensions create StringExtensions.cs then create a new file named GeneratedRegularExpressions.cs and alter the class name to StringExtensions.
Example for GeneratedRegularExpressions.cs
public static partial class StringExtensions
{
[GeneratedRegex(@"^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[^\da-zA-Z]).{8,}$")]
private static partial Regex PasswordRegEx();
[GeneratedRegex(@"([A-Z][a-z]+)")]
private static partial Regex CaseRegEx();
[GeneratedRegex("[0-9]+$")]
private static partial Regex NumericSuffixRegEx();
[GeneratedRegex("[0-9][0-9 ]{13,}[0-9]")]
private static partial Regex CreditCardMaskRegEx();
}
Then the main code which is shown here.
Summary
Information and code samples have been provided to show how to implement source generation for RegEx (Regular Expressions) which provide better performance gains than conventional implementation of regular expressions with the bonus of XML documentation.