Collaborative software development can be hugely successful or fail spectacularly. An analysis of the metadata associated with these projects is teasing apart the difference.
Yoshikawa and co begin by downloading the data associated with over 300,000 projects from the GitHub website. This includes the number of internal developers, the number of stars a project receives over time and the number of pull requests it gets.
The team then analyse the effectiveness of the project by calculating factors such as the number of commits per internal team member, the popularity of the project over time, the number of pull requests that are fulfilled and so on.
The results provide a fascinating insight into the nature of social coding. Yoshikawa and co say the number of internal developers on a project plays a significant role in its success. “Projects with larger numbers of internal members have higher activity, popularity and sociality,” they say.
However, there is a downside to large projects as well. One measure of the efficiency of a project is the number of commits per internal team member. Yoshikawa and co say the data shows that the most efficient projects involve a single person working alone.
As a project grows, efficiency is roughly constant in projects with between two and 60 members but falls sharply after this. “We conclude that it is undesirable to involve more than 60 developers in a project if we want the project members to work efficiently,” they say.
The team also study how work is distributed between internal members. In general, teams with more evenly distributed work are more likely to have higher activity.
And when projects receive requests for changes from external developers, those that fulfil these requests faithfully are likely to be more popular.
They also measured the types of projects that are more popular. Unsurprisingly, they say that software designed to run on Apple’s various products have the highest popularity.
That is an interesting insight into an increasingly common form of software development. GitHub alone says it has 6 million registered users.
Of course, but these guys have found correlations and an important question is one of causation. It is possible, for example, that the positive correlations they have found are the result of some hidden variables that are not revealed in this study.
The best way to find out is for somebody to put into practice the lessons learnt in this study and see whether they work. There is certainly good reason to think that many of their conclusions are related to good practice.
Over to the developers!
Ref: arxiv.org/abs/1408.6012 : Collaboration on Social Media: Analyzing Successful Projects on Social Coding