Rxivist combines preprints from bioRxiv with data from Twitter to help you find the papers being discussed in your field. Currently indexing 53,027 bioRxiv papers from 245,564 authors.
Best practices for making reliable inferences from citizen science data: case study using eBird to estimate species distributions
Wesley M Hochachka,
Viviana Ruiz Gutierrez,
Orin J Robinson,
Eliot T Miller,
Steve T Kelling,
Posted 12 Mar 2019
bioRxiv DOI: 10.1101/574392
Posted 12 Mar 2019
Citizen science data are valuable for addressing a wide range of ecological research questions, and there has been a rapid increase in the scope and volume of data available. However, data from large-scale citizen science projects typically present a number of challenges that can inhibit robust ecological inferences. These challenges include: species bias, spatial bias, variation in effort, and variation in observer skill. To demonstrate key challenges in analysing citizen science data, we use the example of estimating species distributions with data from eBird, a large semi-structured citizen science project. We estimate three widely applied metrics for describing species distributions: encounter rate, occupancy probability, and relative abundance. For each method, we outline approaches for data processing and modelling that are suitable for using citizen science data for estimating species distributions. Model performance improved when data processing and analytical methods addressed the challenges arising from citizen science data. The largest gains in model performance were achieved with two key processes 1) the use of complete checklists rather than presence-only data, and 2) the use of covariates describing variation in effort and detectability for each checklist. Including these covariates accounted for heterogeneity in detectability and reporting, and resulted in substantial differences in predicted distributions. The data processing and analytical steps we outlined led to improved model performance across a range of sample sizes. When using citizen science data it is imperative to carefully consider the appropriate data processing and analytical procedures required to address the bias and variation. Here, we describe the consequences and utility of applying our suggested approach to semi-structured citizen science data to estimate species distributions. The methods we have outlined are also likely to improve other forms of inference and will enable researchers to conduct robust analyses and harness the vast ecological knowledge that exists within citizen science data.
- Downloaded 864 times
- Download rankings, all-time:
- Site-wide: 7,264 out of 53,027
- In ecology: 174 out of 2,198
- Year to date:
- Site-wide: 879 out of 53,027
- Since beginning of last month:
- Site-wide: 1,424 out of 53,027
Downloads over time
Distribution of downloads per paper, site-wide
- Top preprints of 2018
- Paper search
- Author leaderboards
- Overall metrics
- The API
- Email newsletter
- 21 May 2019: PLOS Biology has published a community page about Rxivist.org and its design.
- 10 May 2019: The paper analyzing the Rxivist dataset has been published at eLife.
- 1 Mar 2019: We now have summary statistics about bioRxiv downloads and submissions.
- 8 Feb 2019: Data from Altmetric is now available on the Rxivist details page for every preprint. Look for the "donut" under the download metrics.
- 30 Jan 2019: preLights has featured the Rxivist preprint and written about our findings.
- 22 Jan 2019: Nature just published an article about Rxivist and our data.
- 13 Jan 2019: The Rxivist preprint is live!