Xiao-Li Meng writes, in his third President’s Column:

“Where should I put my name in the author list when I write a paper with my students?” is one recurring question I have been asked by junior colleagues since I earned the privilege to enjoy my senior moments. To people outside of the publication world, this question may sound purely academic, puzzling, or even silly. With zillions of publications available, shouldn’t there already be a well-established rule for such a simple matter? Besides, how many readers actually pay attention to the authorship order?

In the IMS community, however, we all understand that the matter is far from simple, and indeed it is a main source of frustration, even unhappiness, in the academic world. Many of us also have encountered situations where we have tried to infer the contribution made by a particular author from the authorship order, such as when we serve as a reference letter writer for someone we don’t know, as a member of a promotion or appointment committee, or on a funding or award panel. We understand that, whether in perception or in reality, being a lead author or mid author could mean the difference between having a leading professional life or a mediocre one.

At the societal and professional level, properly documenting, measuring, and conveying the contributions made by those who work with large collaborative teams is a critical step “to equalize our value systems for influential scholarly pursuits and for impactful collaborative effort” (https://imstat.org/2018/11/the-world-is-loving-us-almost-surely-can-we-love-back-passionately/). The current authorship metric adopted by all of our journals is structurally deficient for this task.

As we all know, our current authorship convention relies on a one-dimensional ordering. We use the ordering to index the degrees of contributions, usually with the leading author going first. Or we invoke a non-informative alphabetical ordering to signal equal contributions or, more likely, an agreement among the authors that there is no better way to make everyone happier. Consequently, the information conveyed by the current authorship data is insufficient, ambiguous, and even deceiving, the worst kind of data design and collection that we tell our students to avoid like fast food.

Successful collaborative projects are, by default, multi-dimensional, as the resulting impact of the project relies on different expertise and skills. Any univariate index is mathematically inadequate to represent multi-dimensional information, no matter how cleverly it is constructed (e.g., the h-index). Worse, inadequate representations tend to induce bad behaviors. Once, I had a potential collaborator who announced on day one that his position on our authorship list must be invariant to his actual contributions, that is, always the first. I appreciated his candidness, as it helped me establish my own invariance principle: I won’t co-author papers with any such invariant authors.

The scientific community is fueled by innovations and creativities, which have greatly advanced human societies and civilizations. Most ironically, the same community has been embarrassingly slow to come up with a creative solution to address this long-standing issue of its own: properly documenting authors’ contributions in research publications. And the statistical community has not helped much either, despite the problem being about data (accurately documenting each author’s roles) and inference (about the authors’ contributions to the overall project).

The movie industry solved this problem more than half a century ago, when films started to have both opening and closing credits (granted, few viewers care to stay until the very last, unless there are out-takes or alternative endings). Years ago, I joked with some colleagues that someday we would learn from Hollywood about crediting contributions. I am therefore particularly pleased to learn that the scientific publication world is indeed moving from authorship into contributorship by explicitly acknowledging the specific roles of each author, just as in movie credits.

The article “Credit where credit is due” in Nature (April 2014, pp. 312-313; https://www.nature.com/news/publishing-credit-where-credit-is-due-1.15033) proposed the Contributor Roles Taxonomy (CRediT) methodology for documenting authors’ contributions. The method has been endorsed by leaders of major scientific organizations and publications, such as the US National Academy of Sciences (NAS), Science, New England Journal of Medicine, PLOS, Cell Press, and SAGA Publishing; see “Transparency in authors’ contributions and responsibilities to promote integrity in scientific publication” in PNAS (March 2018, pp 2557-2560; https://www.pnas.org/content/115/11/2557). So far, CRediT has been adopted in various forms by Nature, PLOS, Cell, and in about 120 journals, as most recently reported in https://onlinelibrary.wiley.com/doi/full/10.1002/leap.1210.

At the heart of CRediT is a taxonomy of research contributions to list individual authors’ specific roles for articles. The table below, which lists 14 roles, is reproduced from the 2014 Nature article.

Table 1: A taxonomy of research contributions to list individual authors’ specific roles for articles (reproduced from Nature with permission)
Taxonomy category Description of role
Study conception Ideas; formulation of research question; statement of hypothesis
Methodology Development or design of methodology; creation of models
Computation Programming, software development; designing computer programs; implementation of the computer code and supporting algorithms
Formal analysis Application of statistical, mathematical or other formal techniques to analyse study data
Investigation: perform the experiments Conducting the research and investigation process, specifically performing the experiments
Investigation: data/evidence collection Conducting the research and investigation process, specifically data/evidence collection
Resources Provision of study materials, reagents, materials, patients, laboratory samples, animals, instrumentation or other analysis tools
Data curation Management activities to annotate (produce metadata) and maintain research data for initial use and later re-use
Writing/manuscript preparation: writing the initial draft Preparation, creation and/or presentation of the published work, specifically writing the initial draft
Writing/manuscript preparation: critical review, commentary or revision Preparation, creation and/or presentation of the published work, specifically critical review, commentary or revision
Writing/manuscript preparation: visualization/data presentation Preparation, creation and/or presentation of the published work, specifically visualization/data presentation
Supervision Responsibility for supervising research; project orchestration; principal investigator or other lead stakeholder
Project administration Coordination or management of research activities leading to this publication
Funding acquisition Acquisition of the financial support for the project leading to this publication

 

I am sure not everyone would agree on the specific descriptions of each role, or even the list of roles. Indeed, the authors of CRediT recognized that some roles, such as “Project administration” or “Funding acquisition,” might not even belong to the list (e.g., it is debatable if a lab director is automatically entitled to put her/his name on every paper from the lab because it is supported by funds s/he raised). Nevertheless, CRediT is a much-needed step in the right direction, providing richer data for inferring each author’s contributions. Therefore, I would strongly encourage our journals, especially those that publish more collaborative work (e.g., Annals of Applied Statistics) to consider adopting a form of CRediT. In doing so, I also hope we will be mindful about balancing appropriateness for our fields and adherence to a common standard across fields, especially considering the interdisciplinary nature of the collaborative work that we want to be appropriately credited for performing.

Of course, no metric or system is perfect, and each of them can (and will) be abused. If all authors put down their names for all the roles, then the CRediT system achieves little as a data collection process. There will be times when such “all-for-all” is appropriate, especially for some theoretical papers, which tend to have smaller number of authors who engage in all aspects of the project. But for large collaborative projects, where CRediT is needed most, it is typically not difficult to delineate the roles. I have been involved in multiple large scientific projects (e.g., in astrophysics, environmental sciences, health disparities) where I had zero roles in data collection, data curation, or data visualization. Not claiming credit for any process to which I made no contribution is obviously the right thing to do, but it also relieves me from the accountability for oversights, mistakes, or (God forbid!) even plagiarism in these processes.

Spider-Man said, “With great power comes great responsibility,” and the aforementioned PNAS article followed suit by stating that authorship implies both credit and accountability. Without proper attribution, all authors would be held accountable for any error or flaw in the paper; as NAS specifies, “an author who is willing to take credit for a paper must also bear responsibility for its contents. Thus, unless a footnote or the text of the paper explicitly assigns responsibility for different parts of the paper to different authors, the authors whose names appear on a paper must share responsibility for all of it” (https://www.nap.edu/catalog/12192/on-being-a-scientist-a-guide-to-responsible-conduct-in). For large collaborative projects, it is typically impossible for any author to know much about what every co-author did, let alone to watch for errors and flaws in every process. (As a matter of fact, among about 150 co-authors of mine, there are more than 10% of them that I never had any correspondence with, and all of those were from these large collaborative projects.)

Of course, there will be a few people who would hate anything like CRediT. A digital scholar told me recently, “Someone got really mad at me after I talked about the CRediT system in a presentation.” Others who knew the person explained that this someone is known to fight for credits that he does not deserve, including insisting on being the leading author, but without leading the project. I took it as a good sign that CRediT has frustrated such people, because its very purpose is to follow Samuel Adam’s advice and give credit only to whom credit is due. If you find yourself frustrated by CRediT or other similar authorship contribution taxonomies, I’d suggest you first get a case of Samuel Adams and then apply to become a dean or president, where you will be credited or blamed for things of which you are completely unaware.