Reliability of open source from a software engineering point of view


At the Philly ETE conference Michael Tiemann presented some interesting facts about open source quality, and in particular mentioned that open source software has an average defect density that is 50-150 times lower than proprietary software. As it stands, this statement is somewhat incorrect, and I would like to provide a small clarification of the context and the real values:

  • First of all, the average that is mentioned by Michael is related to a small number of projects, in particular the Linux kernel, the Apache web server (and later the entire LAMP stack), and a small number of additional, “famous” projects. For all of these projects, the reality is that the defect density is substantially lower than that of comparable proprietary products. A very good article on this is Succi, Paulson, Eberlein. An Empirical Study of Open-Source and Closed-Source Software Products, IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, V.30/4, april 2004, where the study was performed. It was not the only study on the subject, but all pointed at more or less the same results.
  • Other than the software engineering community, some results from companies working in the code defect identification industry also published some results, like Reasoning Inc. A Quantitative Analysis of TCP/IP Implementations in Commercial Software and in the Linux Kernel, and How Open Source and Commercial Software Compare: Database Implementations in Commercial Software and in MySQL. All results confirm the much higher quality (in terms of defect per line of code) of the academic research.
  • Additional research identified a common pattern: the initial quality of the source code is roughly the same for proprietary and open source, but the defect density decreases in a much faster way with open source. So, it’s not the fact that OSS coders are on average code wonders, but that the process itself creates more opportunity for defect resolution on average. As Succi et al. pointed out: “In terms of defects, our analysis finds that the changing rate or the functions modified as a percentage of the total functions is higher in open-source projects than in closed- source projects. This supports the hypothesis that defects may be found and fixed more quickly in open-source projects than in closed-source projects and may be an added benefit for using the open-source development model.” (emphasis mine).

I have a personal opinion on why this happens, and is really related to two different phenomenons:the first aspect is related to code reuse: the general modularity and great reuse of components is in fact helping developers, because instead of recoding something (introducing new bugs) the reuse of an already debugged component reduces the overall defect density. This aspect was found in other research groups focusing on reuse; for example in a work by Mohagheghi, Conradi, Killi and Schwarz called “An Empirical Study of Software Reuse vs. Defect-Density and Stability” (available here) we can find that reuse introduces a similar degree of improvement in the bug density and the trouble report numbers of code:

defectreuse

As it can be observed from the graph, code originated from reuse has a significant higher quality compared to traditional code, and the gap between the two grows with the size (as expected from basic probabilistic models of defect generation and discovery).

The second aspect is that the fact that bug data is public allows a “prioritization” and a better coordination of developers on triaging and in general fixing things. This explains why this faster improvement appears not only in code that is reused, but in newly generated code as well; the sum of the two effects explains the incredible difference in quality (50-150 times), higher than any previous effort like formal methods, automated code generation and so on. And this quality differential can only grow with time, leading to a long-term push for proprietary vendor to include more and more open source code inside of their own products to reduce the growing effort of bug isolation and fixing.

  1. #1 by Michael - April 12th, 2009 at 03:42

    An aspect of the projects mentioned is that they are used by skilled users themselves – often developers.

    This is a fundamentally different model to the typical ‘consumer’ model which shoves a shrink-wrapped product down the ‘luser’s throat and expects them to pay for every upgrade, driven by features not stability.

    Of course, this improved model is a direct result of free software’s 4 fundamental freedoms, and not merely because the source is accessible.

  2. #2 by Yonah - April 12th, 2009 at 09:25

    So…. the Vice President of Open Source Affairs at Red Hat Inc and a well known Linux advocate tells us that “Open Source” is better. Great! I’m sure he’s completely lacking in any bias or conflicts of interest.

  3. #3 by cdaffara - April 12th, 2009 at 14:43

    Anyone does have bias or conflicts: simply because Michael is a linux advocate does not means that it is not entitled to an opinion, exactly like any Microsoft representative that would like to write a comment here. It is true that the fact that source code is available (not only from a technical point of view, but from a legal one as well) intriduces several potential advantages; this is something that was studied in other sectors as well (like the work of Von Hippel on user-created innovation). I also share the idea that the fact that code is available is not in itself sufficient to guarantee an advantage in every situation.

(will not be published)