Introduction
When working with Excel frequently while coding in C#, the smart direction is to pay for a quality library which supports all Excel formats even if not needed as there may be a time a format is needed.
While Excel operations are infrequent the choices are code yourself without a library which means if Microsoft makes breaking changes the developer must figure out how to fix code to handle the breaking changes.
In this article learn how to read sheets, write to sheets, update cells. For the code samples ExcelMapper and SpreadSheetLight NuGet packages are used where ExcelMapper does the core work.
Data operations are performed by Microsoft Entity Framework Core 8.
Data validation
After reading data from Excel with the intention to save the Excel data to a database, in some cases it would be prudent to validate the incoming data. For example, several columns in a worksheet have unacceptable characters as shown below.
To handle unacceptable characters, create a model to represent the WorkSheet.
public partial class Products : INotifyPropertyChanged
{
[NotMapped]
public int RowIndex { get; set; }
public int Id { get; set; }
public int ProductID { get; set; }
public string ProductName { get; set; }
public string CategoryName { get; set; }
public int? SupplierID { get; set; }
public string Supplier { get; set; }
public int? CategoryID { get; set; }
public string QuantityPerUnit { get; set; }
public decimal UnitPrice { get; set; }
public int? UnitsOnOrder { get; set; }
public int? ReorderLevel { get; set; }
public event PropertyChangedEventHandler? PropertyChanged;
protected virtual void OnPropertyChanged([CallerMemberName] string propertyName = null!)
{
PropertyChanged?.Invoke(this, new PropertyChangedEventArgs(propertyName));
}
}
Decide how to validate, for this we will use FluentValidation NuGet package.
For those new to FluentValidation see FluentValidation tips.
Since the validation needed for unacceptable characters is not handled natively by FluentValidation a custom extension is needed.
The following language extension method will be used in a FluentValidation extension method to follow.
public static class StringExtensions
{
/// <summary>
/// Used for validating a class string property is valid via FluentValidation
/// </summary>
/// <param name="text">Text to validate</param>
/// <returns>True if valid and false if invalid</returns>
/// <remarks>
/// What it considers if there are foreign characters in the string, allows spaces and numbers
/// </remarks>
public static bool IsOnlyAsciiLetters(this string text)
{
foreach (var item in text)
{
if (char.IsNumber(item))
{
continue;
}
switch (item)
{
case >= 'A' and <= 'Z':
case >= 'a' and <= 'z':
case ' ':
case '.':
case ',':
case '/':
case '\'':
case '&':
continue;
default:
return false;
}
}
return true;
}
}
FluentValidation extension method using the extension method above.
public static class RuleBuilderExtensions
{
public static IRuleBuilderOptions<T, string> NoNoneAsciiCharacters<T>(this IRuleBuilder<T, string> ruleBuilder)
=> ruleBuilder
.NotEmpty()
.Must(m => m.IsOnlyAsciiLetters())
.WithMessage("'{PropertyName}' is not valid");
}
Validator
public class ProductsValidator : AbstractValidator<Products>
{
public ProductsValidator()
{
RuleFor(p => p.ProductName).NoNoneAsciiCharacters();
RuleFor(p => p.CategoryName).NoNoneAsciiCharacters();
RuleFor(p => p.QuantityPerUnit).NotEmpty();
RuleFor(p => p.Supplier).NoNoneAsciiCharacters();
RuleFor(p => p.ProductID).GreaterThan(0);
}
}
Read and validate
Using the following method. Read data from Excel Iterate each row, create an instance of the validator, pass in the row data, see if the data is valid.
There are two list, one for valid data and one for invalid data. Invalidate data is saved to a database table which can be used to inspect, delete or fix.
The entire process is done in a Windows Forms project yet since there is limited code in the forms a developer can do the same in other project types.
Note
The entire section for validating is not covered in respect to saving bad data as the reader may have their own ideas.
internal class ImportOperations
{
public static async Task<(string badRecord, List<Products> badRecords, int saved, int rejected)> Validate(string fileName = "Products.xlsx")
{
ExcelMapper excel = new();
var products = (await excel.FetchAsync<Products>(fileName, nameof(Products))).ToList();
List<Products> goodList = [];
List<Products> badList = [];
StringBuilder builder = new();
int rejected = 0;
for (int index = 0; index < products.Count; index++)
{
var validator = new ProductsValidator();
var result = await validator.ValidateAsync(products[index]);
if (result.IsValid == false)
{
rejected++;
foreach (var error in result.Errors)
{
builder.AppendLine($"{index + 1,-10} {error.PropertyName,-30}{error.AttemptedValue}");
}
products[index].RowIndex = index + 1;
badList.Add(products[index]);
}
else
{
goodList.Add(products[index]);
}
}
var saved = 0;
if (goodList.Count > 0)
{
await using var context = new Context();
await context.Database.EnsureDeletedAsync();
await context.Database.EnsureCreatedAsync();
context.Products.AddRange(goodList);
saved = await context.SaveChangesAsync();
}
return (builder.ToString(), badList, saved, rejected);
}
}
Calling the above where the variable returned is deconstruction, for more on this see The art of Deconstructing.
var (badRecords, badList, saved, rejected) = await ImportOperations.Validate();
Screenshots
Important
All data read from Excel files in the code sample are tubular for 99% of the examples presented and the data is perfect, meaning that for instance if a column contains dates, all rows of that column have valid dates.
In the real world there will be imperfect data which means that a developer should always consider at least one cell has bad or invalid data. With that known there is no one method to checks for proper data. For each Excel file a developer must understand assertion must be performed such as, for example for a sheet containing customer data can first and last name be empty or that for a sheet of products, what to do if data is missing.
Example of checking if the fourth column contains valid dates. This is a hard coded sample which is provided in source code.
using SpreadsheetLight;
namespace ExcelMapperApp1.Classes;
internal class LightOperations
{
/// <summary>
/// For article to show an example to test if the person's birthdate can be read as a date
/// </summary>
/// <returns>
/// If there are issues, the list of rows with issues is returned
/// </returns>
public static (List<int> rows, bool hasIssues) Iterate()
{
List<int> list = [];
const string excelFile = "Nested1.xlsx";
const int columnIndex = 4;
using SLDocument document = new(excelFile);
var stats = document.GetWorksheetStatistics();
// skip header row
for (int rowIndex = 2; rowIndex < stats.EndRowIndex + 1; rowIndex++)
{
var date = document.GetCellValueAsDateTime(rowIndex, columnIndex);
if (date == new DateTime(1900,1,1))
{
list.Add(rowIndex);
}
}
return (list, list.Any());
}
}
Other considerations, does the file exists? Can the file be opened? Does the worksheet exists?
Always check if the file exists. Concerning if a file can be opened, if a user has the file open outside of the program and a developer’s code attempting to read the file, an exception is throw so wrap the code in a try/catch. Regarding sheet names, if they happen to changed, code will fail. Code has been provided to show how to check for sheet existence.
Read sheet data to a SQL-Server table
First check if the worksheet exists, if not return to caller.
Next, reset the SQL-Server database table to zero rows and reset identity.
Read worksheet data using ExcelMapper using the following model.
public partial class Customers
{
public int Id { get; set; }
public string Company { get; set; }
public string ContactType { get; set; }
public string ContactName { get; set; }
public string Country { get; set; }
public DateOnly JoinDate { get; set; }
public override string ToString() => Company;
}
To match this sheet.
Next, take the list returned and add to EF Core database. If an exception is thrown it is written to the console screen while for a real application the exception should be written to a log file using a package like SeriLog. See Serilog logging and EF Core logging.
public static async Task CustomersToDatabase()
{
PrintCyan();
const string excelFile = "Customers.xlsx";
if (SheetExists(excelFile, nameof(Customers)) == false)
{
AnsiConsole.MarkupLine($"[red]Sheet {nameof(Customers)} not found in {excelFile}[/]");
return;
}
try
{
DapperOperations operations = new();
operations.Reset();
ExcelMapper excel = new();
await using var context = new Context();
var customers = (await excel.FetchAsync<Customers>(excelFile, nameof(Customers))).ToList();
context.Customers.AddRange(customers);
var affected = await context.SaveChangesAsync();
AnsiConsole.MarkupLine(affected > 0 ? $"[cyan]Saved[/] [b]{affected}[/] [cyan]records[/]" : "[red]Failed[/]");
}
catch (Exception ex)
{
ex.ColorWithCyanFuchsia();
}
}
Read/write to secondary Excel file
In this example the idea is say business asked for a report and only a few columns are required.
Data is read using the following model.
public class Products
{
public int ProductID { get; set; }
public string ProductName { get; set; }
public string CategoryName { get; set; }
public int? SupplierID { get; set; }
public int? CategoryID { get; set; }
public string Supplier { get; set; }
public string QuantityPerUnit { get; set; }
public decimal? UnitPrice { get; set; }
public short? UnitsInStock { get; set; }
public short? UnitsOnOrder { get; set; }
public short? ReorderLevel { get; set; }
public override string ToString() => ProductName;
}
Then written to another Excel file using the following model.
public class ProductItem
{
public int ProductID { get; set; }
public string ProductName { get; set; }
public string CategoryName { get; set; }
public decimal? UnitPrice { get; set; }
}
Code first checks if the Excel file exists (as mentioned above), delete the file for a fresh start.
Next, read the worksheet into a list then create a second list with less properties than the first list.
Save the smaller list to a new file.
/// <summary>
/// Read products from Products.xlsx as list of <see cref="Products"/> then write to a new
/// file as <see cref="ProductItem"/> ProductsCopy.xlsx
/// </summary>
/// <returns></returns>
public static async Task ReadProductsCreateCopyWithLessProperties()
{
PrintCyan();
const string excelReadFile = "Products.xlsx";
const string excelWriteFile = "ProductsCopy.xlsx";
if (File.Exists(excelWriteFile))
{
try
{
File.Delete(excelWriteFile);
}
catch (Exception ex)
{
ex.ColorWithCyanFuchsia();
return;
}
}
ExcelMapper excel = new();
var products = (await excel.FetchAsync<Products>(excelReadFile,
nameof(Products))).ToList();
var productItems = products.Select(p => new ProductItem
{
ProductID = p.ProductID,
ProductName = p.ProductName,
CategoryName = p.CategoryName,
UnitPrice = p.UnitPrice
}).ToList();
await new ExcelMapper().SaveAsync("productsCopy.xlsx", productItems, "Products");
}
Read sheet - update properties and save
This example demonstrates reading a worksheet, making edits and removal of a row. Other than the fact edits are made, this code sample follows the same path as the example above.
/// <summary>
/// Read products from Products.xlsx as list of <see cref="Products"/> then update
/// several products and save to a new file ProductsOut.xlsx
/// </summary>
public static async Task ReadProductsAndUpdate()
{
PrintCyan();
const string excelReadFile = "Products.xlsx";
const string excelWriteFile = "ProductsOut.xlsx";
if (File.Exists(excelWriteFile))
{
try
{
File.Delete(excelWriteFile);
}
catch (Exception ex)
{
ex.ColorWithCyanFuchsia();
return;
}
}
ExcelMapper excel = new();
var products = excel.Fetch<Products>(excelReadFile, nameof(Products)).OrderBy(x => x.ProductName).ToList();
var p1 = products.FirstOrDefault(x => x.ProductName == "CÃ\u00b4te de Blaye");
if (p1 is not null)
{
p1.ProductName = "Cafe de Blave";
}
var p2 = products.FirstOrDefault(x => x.Supplier == "Aux joyeux ecclÃ\u00a9siastiques");
if (p2 is not null)
{
p2.Supplier = "Aux Joy";
}
var p3 = products.FirstOrDefault(x => x.ProductID == 48);
if (p3 is not null)
{
products.Remove(p3);
}
await excel.SaveAsync(excelWriteFile, products, "Products");
}
Read nested properties
This code sample has person details and their address on the same row and the task is to separate both.
Models
public class Person
{
public int Id { get; set; }
public string FirstName { get; set; }
public string LastName { get; set; }
public DateOnly BirthDate { get; set; }
public Address Address { get; set; }
public override string ToString() => $"{FirstName} {LastName} {Address}";
}
public class Address
{
public string Street { get; set; }
public string City { get; set; }
public string State { get; set; }
public string ZipCode { get; set; }
public override string ToString() => State;
}
ExcelMapper seamlessly figures out what to do with the models above.
public static async Task NestedReadPeople()
{
PrintCyan();
const string excelFile = "Nested.xlsx";
ExcelMapper excel = new();
var contactList = (await excel.FetchAsync<Person>(excelFile, "Contacts")).ToList();
AnsiConsole.MarkupLine(ObjectDumper.Dump(contactList)
.Replace("{Person}", "[cyan]{Person}[/]")
.Replace("Address:", "[cyan]Address:[/]"));
}
Below is the output using Nuget package ObjectDumper.NET which is great for examining results.
Header not the first row
A developer may need to read a WorkSheet where the header row is not the first row. ExcelMapper can handle this by telling which row is the header row.
This example has the header row at row 10, two nested records.
Models to read data.
public class Person
{
public int Id { get; set; }
[Column("First Name")]
public string FirstName { get; set; }
[Column("Last Name")]
public string LastName { get; set; }
[Column("Birth Date")]
public DateOnly BirthDate { get; set; }
public Address Address { get; set; }
public override string ToString() => $"{FirstName} {LastName} {BirthDate}";
}
public class Address
{
public int Id { get; set; }
public string Street { get; set; }
public string City { get; set; }
public string Zip { get; set; }
public override string ToString() => $"{Street} {City} {Zip}";
}
In this case to read from row 10, HeaderRowNumber indicates the header row and MinRowNumber (optional) how many rows to read.
ObjectDumper.Dump writes results to Visual Studio Output window.
public async Task HeaderNotAtFirstRow()
{
ExcelMapper excel = new()
{
HeaderRowNumber = 9,
MinRowNumber = 2
};
var people = (await excel.FetchAsync<Person>("ExcelFiles\\Header1.xlsx", "People")).ToList();
Debug.WriteLine(ObjectDumper.Dump(people));
}
Results
{Person}
Id: 0
FirstName: "Karen"
LastName: "Payne"
BirthDate: {DateOnly}
Year: 1956
Month: 9
Day: 24
DayOfWeek: DayOfWeek.Monday
DayOfYear: 268
DayNumber: 714315
Address: {Address}
Id: 0
Street: "111 Cherry Lane"
City: "Salem"
Zip: "98765"
{Person}
Id: 0
FirstName: "Mike"
LastName: "Smith"
BirthDate: {DateOnly}
Year: 1988
Month: 7
Day: 3
DayOfWeek: DayOfWeek.Sunday
DayOfYear: 185
DayNumber: 725920
Address: {Address}
Id: 0
Street: "222 Apple Way"
City: "Portland"
Zip: "22222"
Reading worksheet modify and save to database
Suppose the task is to read data from a Worksheet followed by filtering out data then save the filtered data to a database? The process is to use Excel Mapper to read a Worksheet, filter or perform validation using a lambda statement or FluidValidation NuGet package for instance then in this case save data to a SQL-Server database table using Microsoft EF Core.
The code which follows reads customer data and filters out any row where the country is Germany then uses EF Core to save the data.
Note
operations.Reset(); removes rows from the table and resets the primary key.
private static async Task ReadCustomersFromExcelToDatabase()
{
SpectreConsoleHelpers.PrintCyan();
try
{
DapperOperations operations = new();
operations.Reset();
const string excelFile = "Customers.xlsx";
ExcelMapper excel = new();
await using var context = new Context();
var customers = (
await excel.FetchAsync<Customers>(excelFile, nameof(Customers)))
.ToList();
var germanyItems =
customers.Where(c => c.Country == "Germany").ToArray();
foreach (var c in germanyItems)
customers.Remove(c);
context.Customers.AddRange(customers);
var affected = await context.SaveChangesAsync();
AnsiConsole.MarkupLine(affected > 0 ?
$"[cyan]Saved[/] [b]{affected}[/] [cyan]records[/]" :
"[red]Failed[/]");
}
catch (Exception ex)
{
ex.ColorWithCyanFuchsia();
}
}
Reading data to model with enum
ExcelMapper handles enumerations.
Example worksheet were WineType is an enum.
Sample code.
using Ganss.Excel;
using SampleApp6.Classes;
using static ObjectDumper;
namespace SampleApp6;
internal partial class Program
{
static void Main(string[] args)
{
AnsiConsole.MarkupLine("[yellow]Working with[/] [cyan]enum[/]");
var excel = new ExcelMapper("Products.xlsx");
var products = excel.Fetch<Wines>().ToList();
AnsiConsole.MarkupLine(Dump(products).Replace("WineType:", "[cyan]WineType:[/]"));
SpectreConsoleHelpers.ExitPrompt();
}
}
public class Wines
{
public int WineId { get; set; }
public string Name { get; set; }
public WineType WineType { get; set; }
}
public enum WineType
{
Red = 1,
White = 2,
Rose = 3
}
Results:
Before running provided code
See the following.
Summary
Code has been provided to read tubular data working with several NuGet packages were ExcelMapper performs most of the work along with Microsoft Entity Framework Core for database operations.
See ExcelMapper GitHub page for more e.g.
- Map to Excel files using header rows (column names) or column indexes (no header row)
- Optionally skip blank lines when reading
- Preserve formatting when saving back files
- Map formulas or formula results depending on property type
- Use records
- Provide custom object factories