This paper reports on a data analysis that compares <removed for anonymity> to illustrate the efficiency of the various projects. Efficiency relates to <removed for anonymity>.
Overall, I do not think this paper is ready for publication. The data is based on large approximations (e.g., <removed for anonymity>) as opposed to actual data. Moreover, the results provide little new understanding about /why/ some <removed for anonymity> projects are more efficient than others.
FRAMING: The paper is somewhat oddly framed. The abstract reports the actual problem (sort of) that the paper is tackling. The introduction then continues on and does not re-state the problem again. The reader has to rely on the abstract to situate the work. At the end of the introduction, the authors state that the problem of understanding <removed for anonymity> efficiency (comparing inputs to outputs) has not been studied. I am not very familiar with the <removed for anonymity> literature so I can only assume this is true. But thinking more deeply about this problem suggests that it hasn't been addressed because the data to do so would be very difficult to obtain. There are likely many factors that affect efficiency such as <removed for anonymity>. The authors have simplified this through approximations instead.
RELATED WORK: The authors cite a large body of literature that feels quite comprehensive. The only downside is that they have not presented it together in a succinct manner that allows the reader to understand how their current work goes beyond it. Instead, the literature is sprinkled throughout their theoretical section and discussion amidst descriptions of their own work.
DATA ANALYSIS: The data analysis is based on two hypotheses at the end of the theoretical section. These are not really hypotheses though in a formal analysis sense. They are also not posed as research questions. Because of this it is unclear at the end of the study whether or not anything has really been solved or proven.
The data analysis compares the efficiency of <removed for anonymity>. The authors exclude <removed for anonymity> because it is an "outlier". Yet the reason for this is unclear. They then collected data from <removed for anonymity> for each <removed for anonymity> project - this is fine to me and likely quite accurate in terms of how many people contributed, what the output was, etc. Thus, I think it provides sufficient data at a granular level. The issue I have though is with the data used to generate the total number of <removed for anonymity> to a project. The authors calculated this by starting with stats from <removed for anonymity> and then calculating <removed for anonymity>. In no way is it clear to me how this type of calculation could be used to estimate how many <removed for anonymity> there are to a <removed for anonymity> project. No rationale is provided and it appears to be highly prone to error. A similar calculation is done to estimate the number of <removed for anonymity>.
The results the authors find from their data analysis are weak and most are presented in tables and figures that the reader must parse on his/her own.
DISCUSSION: At a high level, the analysis results show that there are indeed differences between efficiencies in the various <removed for anonymity> projects. The authors then pose several possible reasons for this, e.g., <removed for anonymity>. These reasons are certainly speculative as the authors have no data to back up their reasoning. Beyond this, there are really no implications coming from the work that one could use to better understand the <removed for anonymity> research space.