Python Wheels vs Eggs (And How Data-Driven Decisions Must Become The Norm in Open-Source)

Avi Press - May 10 '22 - - Dev Community

Originally posted on Hackernoon.

The world of software development is always evolving, and every now and then, you get to a moment in time when the path forks ahead of you. As developers, we have a choice as to which path to take, but each comes with specific consequences that impact the effectiveness and impact of the code going forward.

Currently, we’re going through one of those moments when it comes to Python Eggs. A recent tweet from Dustin Ingram showed that eggs accounted for less than 1% of built distribution uploads in December 2021.

This data kickstarted a vibrant debate as to whether Python eggs should be deprecated. This conversation has a lot that we can learn from in the open-source community, and so we thought we’d explore the lessons in this post.

Python Wheels vs Eggs

The .egg format for Python packages was first introduced in 2004 and has served an important purpose most of the time since then. However, when Python Wheels were introduced in 2012, the weaknesses of the egg format became more well known, like the lack of support for clean uninstallations or upgrades, and the fact that only a single version of a project can be installed on any single directory.

Python Wheels provided meaningful improvements on these, as well as additional improvements in terms of the distribution format, a richer file naming convention, versioning, and better internal organization.

In light of this, many are debating whether deprecating uploads of new eggs is the right path to take.

How should developers make this decision?

The Importance of Data-Driven Decisions

As a developer, we have limited time and resources with which to accomplish our objectives. Our efficiency relies a lot on prioritization – choosing what to spend time on and, more importantly, what not to spend time on.  So, when we have to consider whether deprecating eggs is the right decision, we want to do so with the right data backing that decision up.

As a result, many have pointed to these upload statistics as a clear sign that Python eggs are no longer the force they once were. Indeed, this data enables a vastly more informed discussion on the decision than if it was absent.

However, that doesn’t tell the full story. Even if the proportion of uploads is immaterial, we still don’t have much clarity on how the eggs are being used and relied on. This download data may exist somewhere, but it’s not generally available so that we can do proper analysis.

This is indicative of a wide range of other key decisions that currently get made on the basis of anecdotal experience, gut intuition, and personal preferences. Without the data being readily accessible, we are impairing our ability to make informed and unbiased decisions.

As a community, we should be pushing for better visibility, so that we can service users better and unlock the leverage that we have built up over time.

Introducing Scarf Gateway for Python

This data is crucial to our long-term decisions as developers because it helps us understand what aspects of our code are truly driving results for our users. It’s with this in mind that we’re excited to have launched Python support for Scarf Gateway, which will bring better visibility for any and all Python packages.

This is an important step toward securing the data that we need as an open-source community to make better decisions on all kinds of key matters; from infrastructure to packaging formats to vulnerability assessments and beyond.

For every decision point that you arrive at as a developer, having timely and accurate data on how your projects are being used helps you to sidestep your own biases and push towards the path that is making the largest impact for your key stakeholders.

Data-driven decision-making is how the open-source world can work smarter, not harder.

. . . . . .