A small and unscientific exploration of OSS license use


I was intrigued by an excellent (as usual) post by Matthew Aslett of 451 group, titled “On the fall and rise of the GNU GPL“, where Matthew muses on the impact of cloud computing and other factors in the decreasing role of the GPLv2 versus other type of licenses. Simon Phipps twittedyou only consider number of projects and not volume of deployed code. I have never found number of projects compelling” which is something that I absolutely believe is true: it is, however, quite difficult to imagine other possible ways to measure “impact” of a project. Do we have to add a weight related to usage? Then, given the large use of Linux, GNOME or KDE, OpenOffice, Firefox we would probably see a huge jump in the GPL and MPL percentages, at the cost of added uncertainty (as usage estimates are variable at best). As I am desperately try to avoid doing real work, I started using the Ohloh web site to extract slightly less than 100 projects (among the “active” ones, so there is already an initial preselection), along with the licensing and the number of committers for each project. My idea was to measure not only the number of projects, but how many people contributes to each, to see if this scenario gives different percentages. In a sense, the number of committers is a measure of “activity” or community interest in a project, and so my idea was to see if there was a difference between the percentages obtained with only the amount of projects listed under a license, and the number of committers using a license. The result is this:

license projects committers %projects %committers blackduck %
gpl2 49 15878 52.1% 62.9% 48.83
lgpl 8 2286 8.5% 9.1% 9.35
mit 6 1668 6.4% 6.6% 4
bsd 8 1150 8.5% 4.6% 6.26
gpl3 3 988 3.2% 3.9% 5.5
php 2 730 2.1% 2.9% 0.24
cddl 1 673 1.1% 2.7% 0.32
mpl 2 655 2.1% 2.6% 1.22
apache 10 557 10.6% 2.2% 4.02
boost 1 266 1.1% 1.1%
epl 2 241 2.1% 1.0% 0.46
python 1 133 1.1% 0.5%
cpl 1 6 1.1% 0.0% 0.56

The result is interesting: first of all, by looking in terms of contributors, the GPLv2 has an higher percentage of committers than that of projects; that is, there are more committers per project under the GPLv2 in respect to the normal share. The percentage of projects obtained is similar to that from BlackDuck (52.1% versus 48.83%), so I think that there is not too much bias in the choice of projects. The LGPL has more or less its fair share of committers, on a par with the number of projects and the results from BlackDuck. MIT is slightly higher, both in projects and commits, while the GPLv3 is under-represented – probably because the sample is too small, and in the project selection the “new” projects under the GPLv3 simply were not among the first 100 or so selected. A substantial difference exist for Apache-licensed projects, where the average number of committers seems smaller than its fair share; this may be an artefact of the project selected, or may be simply an effect of how Ohloh measures the active committers (I find strange that Boost has half of all the committers of all the Apache projects together!)

As I said, this is a little, unscientific experiment designed to explore what we can invent to better measure the “impact” of an OSS project. I would love to receive you comments and suggestions; on my side, I will try to leverage the FLOSSMETRICS database to try to find some numbers on a more consistent data sample.

,

  1. #1 by Joe Walker - March 18th, 2010 at 13:38

    It would be interesting to see this alongside some language and LOC statistics.

    I’m willing to bet that Apache 2 projects are more likely to be Java based, and that Java programmers are more likely to contribute on work time than on home time. Hence Apache 2 projects have more work done by less people.

    Thanks.

  2. #2 by cdaffara - March 18th, 2010 at 13:46

    It’s an interesting consideration. I have not added the development language to avoid difficulties related to code productivity issues – as it is difficult to estimate effort spent per line of code. I will try to use the results from FLOSSMETRICS to obtain two separate numbers for “core developers” and peripheral contributors. This distinction should show the effect you hint at, that is that the participants are less but more consistent in their effort.

(will not be published)