Live Blogging at DeveloperWeek: Promoting Reliable Code

Erik Dietrich - Feb 12 '20 - - Dev Community

I'm live blogging my experience at DeveloperWeek, en masse, at my site. You can follow along there, or look for me to post some talk summaries here, individually.

This is a talk by Chen Harel, co-founder of OverOps.

He wants to suggest a few quality gates that those of us in attendance can take back to our engineering groups.  This is ultimately about preventing severity-one production defects.

Consider that speed and stability are naturally at odds when it comes to software development and deployment.  With an emphasis on time to market, the cost of poorly written software is actually growing, notwithstanding agile methodologies and increased awareness of the importance of software quality.

And here's a contextual linchpin for the talk:

"Speed is important. Quality is fundamental."

So how do organizations address this today?  Here are some ways:

  1. DevOps: "you build it, you run it," to increase accountability.
  2. A "shift left" approach to software quality -- bake quality concerns into the process earlier.

But how do we measure quality?

Measuring Code Quality

Measuring software quality is all about good data.  And we tend not to have that data readily at our disposal as much as we might want.

Here are some conventional sources of this type of data:

  • Static code analysis
  • Code coverage
  • Log files

But what about the following as a new approach?

  • New errors
  • Increasing errors
  • Slowdowns

Using these metrics, the OverOps team was able to create a composite means of scoring code quality.

A Scoring Process

So, let's look at the reliability score. Releases are scored for stability and safety based these measures.  And it requires the following activities for data gathering:

  • Detect all errors and slowdowns
  • Classify each detected event
  • Prioritize them by severity
  • Score the build
  • Block builds that score too low
  • And then, in retrospect, visualize the data

Toward this, let's consider some data detection methods.  Manually, they have log files and metrics libraries.  But they can automatically detect issues using APM, log aggregators, and error tracking.

(Halfway through the talk, and I'm really enjoying this. As both a techie and business builder, gathering actionable data is a constant focus for me these days. I love the idea of measuring code quality both with leading and lagging indicators, and feeding both into a feedback loop.)

Classification

Now, drilling a little further into step (2) above, classification.  Ask questions like:

  • Is the dev group responsible?
  • What's the type or potential impact?
  • What dependent services are affected?
  • What is the volume and rate of this issue -- how often did it happen or not happen?

Prioritizing

When it comes to prioritizing, we can think of new errors as ones that have never been observed. And a new error is severe if it's

  • Uncaught
  • A critical exception
  • Volume/rate exceeds a threshold

There's also the idea of  increasing errors that can help determine severity. An error can become severe if its rate or volume is increasing past a threshold.

And you think about errors in terms of seasonality as well to mitigate this concern a bit. That is, do you have cyclical error rates, depending on time of day or week, or other cyclical factors? If so, you want to account for that to make sure temporary rate increases aren't expected as the normal course of business.

And, finally, you can think of prioritizing slowdowns. Slowdowns mean response time starts tot take longer, and a slowdown becomes severe based on the number of standard deviations it is away from normal operation.

Scoring Formula

So based on classification and priority, the OverOps team starts to assign points to errors that occur. They took a look at severity, as measured by things like "did this get us up in the middle of the night," and adjusted scoring weights accordingly until they fit known data.

This then provides the basis for future prediction and a reliable scoring mechanism.

Now, assume all of this is in place. You can automate the gathering of this type of data and generate scores right from within your CI/CD setup, using them as a quality gate.

A Code Quality Report

Having this integrated into you build not only allows you to reject builds that don't pass the quality gate. You can also generate some nice reporting.

Have a readout for why a given build failed, and have general reporting on the measured quality of each build that you do.

My Takeaway:

I've spent a lot of time in my career on static code analysis, which I find to be a fascinating topic. It promises to be a compile-time, leading indicator of code quality, and, in some ways, it does this quite well. But the weakness here is that it's never really tied reliably into actual runtime behaviors.

In a sense, a lot of static analysis involves predicting the future. "Methods this complex will probably result in bugs" or "you might have exposed yourself to a SQL injection."

But the loop never gets closed. Does this code wind up causing problems?

I love this approach because it starts to close the loop. By all means, keep doing static analysis. But also run experiments and measure what's actually happening when you deploy code, and feed that information back into how you work.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .