*This talk is part of the TECHNOLOGY AND CITIZEN SCIENCE track*
Robert Costello (Smithsonian’s National Museum of Natural History), William McShea(Smithsonian Conservation Biology Institute), Roland Kays (North Carolina Museum of Natural Sciences), Tavis Forrester (Smithsonian Conservation Biology Institute)
eMammal Citizen Scientist Data Quality and Cyber Infrastructure
eMammal is a collaboration between scientists and educators across multiple institutions and citizen scientists to determine how mammal populations are affected by human and environmental factors. The program has recruited around 200 citizen scientists to deploy cameratraps across the Mid-Atlantic region of the United States. In the past two years this effort has yielded 2.5 million photos collected from 2,300 cameratrap deployments. Cameratraps use motion sensors to detect wildlife within a field of view. The sensors trigger the camera to automatically capture images of the animals. eMammal developed a cyberscience infrastructure that enables all parties to focus on the tasks they do best, such as collect and analyze a large mammal population data set, evaluate the performance of public participants and their knowledge base, and support communication across the community.
This presentation will give an overview of the cyber infrastructure used by both citizen scientists and scientists, the data workflow and management, and an analysis of the types and rates of errors associated with different tasks performed by a sizeable volunteer corps managing a large dataset, and the reality of achieving a level of data quality that makes the science authoritative and repeatable.
As the data collected by citizen scientists will be used to statistically model the effects of land uses on wildlife populations, actual land use decisions could be influenced by this study. Therefore, the quality of the data collected by citizen scientists must be known and pass a threshold for a margin of error that will not affect analyses. The citizen scientists participating in this project collect and manage the metadata associated with each image. They review each sequence of images and tag them with a species identification and the number of animals captured in the photos before uploading the package to the cloud-based eMammal software environment. The images and tagging are reviewed by experts and the rate of error is captured and analyzed for patterns of behavior and ways to use software to mitigate the tendencies human beings have for introducing error into large camera-trapping datasets. Here we report on the quality of data, patterns of error, the cyber infrastructure and solutions for increasing the accuracy of the data produced for a project reliant on citizen scientists.
Deploying cameratraps requires some skill and the ability to follow a protocol, yet even when cameratraps are properly deployed cameras will sometimes fail and a deployment must be rejected. The citizen scientists working with eMammal have deployed cameras with a success rate of 94%, which is an acceptable rate of failure for the purposes of this study and comparable to rates of deployment failure for other camera-trapping projects our scientists have conducted without utilizing citizen scientists. Their ability to correctly recognize and identify species and the number animals in an image is at 96% overall. The 4% error is corrected by experts and is the focus of our efforts to reduce error to a level that no longer requires expert involvement, or at least limit the labor hours experts spend correcting metadata. We have found patterns of error associated with the level of difficulty.
Some species are more difficult to distinguish as they have closely related species within the study area--like Gray Squirrels and Fox Squirrels--some images only capture part of an animal, some animals are quite small and partially concealed, or they are difficult to find if they are in the background or on the periphery of a nighttime photograph. When a citizen scientist cannot find an animal in a photo they tag the image with ‘no animal.’ 31,000 of the 2.5 million images are tagged in this way. One solution to finding cryptic animals in photos is to use computer vision. With the integration of computer vision, eMammal software can detect animals in frames and place a red bounding box around them. Once discovered, the citizen scientists have the opportunity to identify the animal.
Mislabeling species mostly occurs with higher frequency around a small subset of similar species. With only a few species accounting for a disproportionate amount of the error there is an opportunity to help citizen scientists with their ability to distinguish between similar species and improve their overall accuracy. As the citizen scientists working on the eMammal project have been tested for their knowledge on wildlife before and after becoming participants, and their scores have been compared to a control group, error rates can be looked at with respect to individual effort and knowledge gain.
National Outreach Program Manager at Smithsonian Institution bio from LinkedIn
Sign in to add slides, notes or videos to this session