Skip to the content.

Ethics and Reproducibility

In discussing the ethics of using big data and the reproduction of existing studies, half of the students involved in this debate watched this webinar. This was a case study by geographer Xun Shi, looking at a bottom up approach to epidemic modeling, using personal-level data and aggregating up, rather than moving in the opposite direction.

Shi uses the metaphor of forests and trees as an apt descriptor for this process. First, we can look at the top-down approach to analysis of epidemics and other events: analyzing at a broader level - looking at the forest - one can grasp the broad strokes of general trends affecting an entire system. This certainly has its benefits, a less invasive analysis at a larger scale can still provide valuable insights that can inform policy interventions, but without the access or privacy issues that a study centered on big data may require.

The author also offers a bottom-up approach, different from the path typically taken by studies. With wider availability of big data, analysis at the level of the person - the trees in this metaphor - is becoming increasingly easy to conduct. Looking at the level of the individual, you may be able to identify issues or concerns that may not be apparent in a data set that has already been aggregated; while a forest may seem healthy, a small infection in one tree can pose a significant risk to the whole forest. This is the logic from which this study arises. Shi creates an “epidemic forest” from the trajectories and contacts of individuals, tracing contacts and creating a model that maps transmission relationships, a model that could help to predict future epidemics and give insight to policy interventions in those cases.

Unfortunately, this raises concerns regarding the privacy of such individual data. While even “big data” used in this and similar studies is generally already aggregated to some level (though still at a fine resolution), the collection and use of such data risks invading heavily on one’s personal privacy and personal autonomy. Controls are placed on how such data can and should be used, but this still rests on the assumption that researchers are not bad actors and will adhere to research plans given to obtain such data. Such fine grained data, mapping individual trajectories, risks the potential of over-policing individuals as such data becomes more and more commonly used, especially in fields outside of academia, as well as the inadvertent revealing of what in other circumstances might be considered sensitive information

Main Page

Creative Commons License This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.