To summarise the prior writings from my doctoral thesis: I've shown how there are both notable differences and similarities between certain measures of FLOSS repositories (such as number of contributors attracted, rate of contributions, complexity control work, etc.) The pattern of similarities and differences that emerged clearly differentiated one group (containing Debian, GNOME, and KDE) from the other (Savannah and SourceForge), which I tentatively named "controlled" and "open" repositories respectively. Furthermore, the result demonstrated that Debian deserved to be differentiated somewhat from GNOME and KDE, although to a lesser extent.
What is also interesting about this is that when we compare repositories that are grouped together (GNOME to KDE, or Savannah to SourceForge) we observe very great similarities in these same measures. This raises intriguing possibility number one: that one can establish "types", by which repositories can be classified. Not only defined by the sorts of measures we can expect, but also the way it functions and organises itself. For example, Savannah and SourceForge often incubate any type of newer software and are open to any contributors. Conversely, GNOME and KDE deliver more well-defined programs and erect meritocratic barriers to entry for contributors and new software.
Now, when you consider that individual projects in these repositories can transit between them, this raises the second intriguing possibility: these types could be arranged into an evolutionary framework, an eco-system of repositories, if you like. This diagram is an attempt to visualise it.
- Open Repository: A repository with a low barrier to entry. That is, the process of joining and adding software is essentially trivial and uncontrolled. Projects tend to be independent and no guiding policies cover the development or organisation process (aside from any terms of service agreements). Examples: Savannah, SourceForge.
- Controlled Repository: A repository with a higher barrier to entry. That is, joining as a developer or adding new projects are subject to the control of existing members. There is likely a set of guiding policies and development standards enforced throughout the community, as well as goals/roadmaps. Within this group, one can further differentiate:
- Distributions: E.g. Debian. The projects hosted here are part of the larger GNU/Linux operating system. From the point of view of the process, Debian developers are not typically programming for other Debian projects.
- Meta-Projects: E.g. GNOME and KDE. The projects contained in a meta-project are each part of a wider system (in these examples, each is a complete desktop environment), and there are typically several glue projects also.
- Transition: Moving a project from repository to another. This could be a migration (whereby the storage location of a project is changed), or an inclusion (whereby another repository distributes the project, but its location remains the same).
- Bold arrows: These are typically observed pathways between repository types taken by FLOSS projects.
- Dashed arrows: These represent atypical transitions, observed much less frequently. No empirical study has been performed on this transition.
The evidence I have gathered points this to being an evolutionary framework, because the differences observed are mostly between rates of activity rather than absolute differences in quality. Note that these statements pertain to the average project observed in each repository; there are overlaps present in the process and product attributes. Therefore it should not be expected that all projects in a repository will necessarily perform at the levels established as average for that repository.
What might reasonably be expected to be seen of an arbitrary project's measured attributes is no doubt dependent on a number of other factors that influence a free software project's success, such as the understandability of the initial version, the existence of other alternative projects with similar or identical functionality etc. While this framework insists that the repository is an important driver of determining project evolution (especially at the macro level), for the individual project it is one of a number of factors. When considering the consequences of a repository’s effects, this contextual detail determines how the framework may be useful.
Understandably, a software developer is expected to be primarily interested in how their own individual project may be influenced, yet within this framework that is informed by measurement of a collection of projects. The range of values of a metric observed act as some indicator of the expected value for a repository type, or even a specific repository. It is feasible that any desired quantifiable attribute of a project can be measured and built up into a set of values similarly to the work shown in previous posts. Critically, this provides a developer with quantifiable means for use in a comparative judgement of where a project is best placed for the project's particular evolutionary needs, as well as for judging how and why a transition should be brought about in future. All the time this planning will need to be balanced with the typical participatory requirements of each repository type.
For researchers it is useful to understand a repository's influence at both the individual and collective project levels. When embarking upon a study involving empirical research of free software projects, knowledge of the expected evolutionary effects of a repository, or that typical of a repository type, on a project is useful as it may have a bearing on how results are interpreted.