Unifying Documentation and Provenance for AI and ML: A Developer’s Guide to Navigating the Chaos

Gorkem Ercan - Oct 17 - - Dev Community

In the fast-paced, constantly evolving world of artificial intelligence (AI) and machine learning (ML), you might expect there to be a well-defined standard for something as critical as model documentation. Yet, the current reality is far from expectation. While AI model documentation tools like Model Cards were meant to streamline accountability and transparency, we’ve instead landed in a fragmented space that lacks consistency.

What Is a Model Card?

A Model Card is a standardized documentation framework designed to provide essential information about a machine learning (ML) model, including its attributes, performance metrics, and ethical considerations. Model Cards help developers, researchers, and end-users better understand the model's intended use, its limitations, and any potential risks or biases associated with it. This documentation aims to improve transparency, accountability, and trust in AI and ML systems.

Key Information in a Model Card:

  • Model Overview: A description of the model, its architecture, and its intended use case.
  • Performance: Detailed metrics on how the model performs across different datasets, environments, or user demographics.
  • Ethical Considerations: Information on potential biases in the model and any fairness or safety concerns.
  • Training Data: Description of the data used to train the model, including its provenance, size, and any preprocessing steps.
  • Limitations: Clear details about where and how the model should not be used, including scenarios where it might fail.

Model Cards were introduced in a 2019 paper by Margaret Mitchell and her collaborators at Google AI. The idea emerged from the recognition that machine learning models, especially those deployed in real-world applications, often have far-reaching ethical and societal implications. Without clear and transparent documentation, these models can be misused or misunderstood, potentially leading to harmful outcomes, such as biased predictions or unfair decision-making processes.

The paper proposed Model Cards as a way to address these challenges, by offering a standardized and accessible format for documenting models. It drew inspiration from nutrition labels, which provide clear and consistent information to consumers about the contents of food products. Similarly, Model Cards are intended to serve as "nutrition labels" for ML models, offering critical details in a standardized and understandable format. In reality, it’s a bit more complex.

The Model Card Maze

Model Cards, in theory, are straightforward. They’re designed to offer clear, standardized documentation on the attributes, performance, and ethical considerations of machine learning models. The idea behind them is solid—a one-size-fits-all tool for explaining how models work and their implications.

However, in practice, Model Cards have taken on multiple forms:

  • HuggingFace uses YAML frontmatter and Markdown for its Model Cards.
  • AWS SageMaker employs a JSON schema.
  • VerifyML has its own unique spin on the format.
  • Google? They follow a different JSON schema entirely.

And that’s not even touching on the original Model Card proposal from the foundational paper. The variation across platforms is not just about different tastes or minor tweaks. These differences reflect deeper structural and intent-based divergences. HuggingFace's Markdown-driven simplicity is very different from SageMaker's JSON schema-based precision, and that disparity matters. Developers trying to adhere to best practices for AI accountability are left grappling with a lack of coherence.

Model Cards Are More Than Just Documentation

These differences aren’t just a matter of aesthetics. Model Cards play a critical role in ensuring compliance with a growing web of AI regulations, including:

  • The EU AI Act
  • The NIST AI Risk Management Framework (RMF)
  • ISO 42001

These regulations require robust documentation, and without a unified standard, developers are left to navigate this growing regulatory minefield without clear guidance. The result? Increased risk of non-compliance, and potentially, the perpetuation of biased or unsafe AI systems.

SBOMs: A Glimmer of Hope for Standardization

But all is not lost. Amid the chaos, there’s a promising development: SBOM formats (Software Bill of Materials) like SPDX 3.0 and CycloneDX. While not originally created for AI, these formats have started to incorporate AI models and datasets. This is a crucial step forward because SBOMs are a logical solution to providing the standardization that Model Cards are currently lacking, and they are already commonplace in software development practices.

Why SBOMs Matter for AI

  • Comprehensive Coverage: SBOMs can include both models and data, giving developers a more complete view of their AI systems.
  • Standardization: With a unified format like SPDX 3.0 or CycloneDX, we could bridge the gap left by the fragmented Model Card landscape.
  • Provenance and Trust: SBOMs offer a way to trace the lineage of AI models—what they do, where they came from, how they were trained, and under what conditions they should be used.

A Path Forward

The inclusion of AI models in SBOM standards like SPDX 3.0 and CycloneDX is a critical advancement. If these formats gain widespread adoption, they could provide the transparency and accountability that the AI industry so desperately needs. This isn’t just about technical improvements—embracing SBOMs is a moral imperative to ensure that AI is developed and deployed ethically and transparently.

In the end, the future of AI documentation depends on our ability to standardize and unify our approaches. It’s time for the industry to rally around SBOMs and adopt standards like SPDX 3.0 and CycloneDX, before the lack of coherence in documentation leads us down a risky path.

Let’s not wait for regulations, like the EU AI Act, to force our hand. The time to act is now.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .