- Open Access
Limits of data anonymity: lack of public awareness risks trust in health system activities
Life Sciences, Society and Policy volume 17, Article number: 7 (2021)
Public trust is paramount for the well functioning of data driven healthcare activities such as digital health interventions, contact tracing or the build-up of electronic health records. As the use of personal data is the common denominator for these healthcare activities, healthcare actors have an interest to ensure privacy and anonymity of the personal data they depend on. Maintaining privacy and anonymity of personal data contribute to the trustworthiness of these healthcare activities and are associated with the public willingness to trust these activities with their personal data. An analysis of online news readership comments about the failed care.data programme in England revealed that parts of the public have a false understanding of anonymity in the context of privacy protection of personal data as used for healthcare management and medical research. Some of those commenting demanded complete anonymity of their data to be willing to trust the process of data collection and analysis. As this demand is impossible to fulfil and trust is built on a false understanding of anonymity, the inability to meet this demand risks undermining public trust. Since public concerns about anonymity and privacy of personal data appear to be increasing, a large-scale information campaign about the limits and possibilities of anonymity with respect to the various uses of personal health data is urgently needed to help the public to make better informed choices about providing personal data.
Common public and professional agreement exists, that high levels of trust are critical for the well-function of data driven activities within health systems (Lawler et al. 2018). Sztompka defines trust as “a bet about the future contingent actions of others” (Sztompka 1999, 25). In data driven activities, trust is a relational construct where data donors trust the health system with their data in anticipation of a net-benefit for the health system, the donor and society (Gille et al. 2020). Examples of such trust relationships are the build-up of electronic health records to be able to improve health management and quality of care (Hays and Daker-White 2015), the collection of data to curb the coronavirus 2019 pandemic (Ienca and Vayena 2020), donation of personal data and biospecimens to biobank facilities for personalized health research (Brall et al. 2021), or acceptance of digital health interventions in general (Brall et al. 2019). As the use of personal data is the common denominator for these health system activities, all of them have an interest to ensure privacy and anonymity of the personal data they depend on. Recognising that both are contested concepts, we understand anonymity as “… one has anonymity or is anonymous when others are unable to relate a given feature of the person to other characteristics” (Wallace 1999, 24). Wallace’s definition of anonymity fits well to the issue discussed in the article as Wallace’s work on anonymity centers around public perceptions, social interactions and information systems. Ostherr and colleagues describe privacy as “health data privacy is not a stable natural object that has value regardless of the subjects who enact it; rather, health data privacy is a multifaceted cultural artifact that becomes assembled and maintained within a complex ecology of alliances and disconnections.” (Ostherr et al. 2017, 6). In contrast to set in stone definitions of privacy, one could follow Nissenabum and understand privacy as contextual integrity where privacy is defined by appropriate information flows in a given context. Appropriateness is shaped by the norms and values of the context. The context is described by a set of parameters: data subject; data sender; data recipient; information type; and transmission principle (Nissenbaum 2010). Further, privacy is controlled by existing oversight systems, laws and regulation and not something much in control of individuals (Ballantyne 2020). Hence, many individuals consider trust in data users and institutions to maintain anonymity and privacy as a critical issue.
Research shows that concerns about privacy are affecting patients willingness to share their medical history (Walker et al. 2017). For donors to be able to make an accurate decision about their willingness to share medical data and to build trust in health actors using the data, a precise understanding of what data anonymity means is essential. Otherwise, a decision to share data and subsequently donor’s trust is based on false assumptions.
Unfortunately, we observed in previous research on public trust in the English health system a tension between the limits of data anonymity and the rigorous expectations of parts of the public that their data have to be completely anonymised throughout its use both for the analysis of health service performance and medical research (Gille et al. 2020). While a popular understanding is that full anonymity should be guaranteed in all circumstances, the understanding in the research arena is that full anonymity is not feasible. From a research perspective there is the need to be able to link different data sets as the most value is to be made of linked personal health data in the public interest, namely for scientific discoveries. The values associated with data linkage are many, as for example, increased accuracy in long-term studies; the possibility to have a treatment and control group in one data set; being able to research the impact of the environment on health where environmental data is linked with health data; or researching rare diseases in large data sets (Wellcome Trust 2015). The observed tension threatens public trust in the health system, as the public understands the preservation of data anonymity as an important element for public trust in the health system. In this article we aim to highlight our observation and to discuss possible remedies.
Members of the public demand complete anonymity, but science recognizes the limits of anonymity in times of linking health data
As part of a research project on public trust in the health system, a secondary analysis of British national online news articles (published between 2013 and 2015) with their readership comments about the NHS’ care.data programme (Articles: British Broadcasting Corporation n = 2; Daily Mail n = 16; Guardian n = 14; Independent n = 15; Telegraph n = 11. Readership comments total: n = 1625) identified the importance of data anonymity for public trust in the health system (Gille et al. 2020). Care.data aimed to link and share general practitioner and hospital data about individuals’ care to improve the quality of care for all. Yet the programme failed as a result of public concerns about the ability to keep sensitive information secure and the potential for commercial gain to be made from patients’ personal data (Carter et al. 2015; NHS 2014). In a nutshell “The programme aimed at securing the bare minimum of trust while maximising potential returns on investment. It thus quickly dismissed privacy and respect for individual autonomy as individualistic rights opposing wider prosperity, rather than seeing them as principles of social trust and public engagement“ (Vezyridis and Timmons 2019) The rationale for the choice of news outlets was to achieve national coverage. News articles covering care.data were searched via Google.com or search engines on the news webpages in 2015. Readership comments were downloaded and analysed inductively according to a thematic analysis approach with NVIVO 9.
From a wide range of readership comments the following direct quotes from the online news readership comments sections show how parts of the public understand the role of anonymity for trust in the care.data programme and wider health system. Without doubt, other readership comments suggest that other parts of the public understand the limits of anonymity. At the time, news articles in England discussed the possible introduction of the care.data programme and what its implications are for the NHS England. Mirroring the critical voice of many professional associations and public voice the majority of articles was written with a critical if not negative tone towards the care.data programme.
… Destroy our faith and trust … and there will be no return. The only way to do this is either absolute and unbreakable anonymity or not at all …
Comment on: (Chapman and Dolan 2014)
I might trust that the information is well-protected, completely anonymous and requiring consent but one failing (and it doesn’t have to be my information specifically) and trust in the system is gone.
Comment on: (Naughton 2014)
… I’d love my data to be shared, with complete anonymity.
Comment on: (Goldacre 2014)
The quotes above show that the underlying fear of parts of the public regarding anonymity is that anonymised data will be deanonymised and misused for re-identification. Comments show that such breach in confidentiality is anticipated and it is stated that this will result in decreasing levels of trust in the health system. That fear of data misuse is a major concern for individuals that aim to share their health data and is also confirmed in various surveys (Middleton et al. 2019; Milne et al. 2019; Melas et al. 2010). Further, a study on public attitudes towards commercial access of health data confirms the quotes above, as the study shows that a low understanding of the purpose of data aggregation and anonymisation exists among members of the public (IpsosMORI 2016). In addition, several studies identified that the public is often unaware of research processes and use of personal data in practice (Hill et al. 2013; Aitken et al. 2016; Wellcome Trust 2013; Ipsos 2016). On the opposite, those that already participated in research studies that involved sharing genetic health data, indicate that a relevant motivator for them is the possibility to receive individual results of the study (Goodman et al. 2018; Clayton et al. 2018; Brall et al. 2021). It is obvious, that individual results can only be provided when data is not fully anonymised, but individuals can be re-contacted. These findings from the US and Swiss context translate into our view that if sceptical parts of the public better understand how data is used and what the benefits of different levels of anonymity are, they might weigh the utility of individual results against the need of full anonymity.
In contrast to the quotes above calling for full anonymity, the scientific community stressed for some time in various research areas that maintaining complete anonymity will become increasingly difficult in the health system. Already in 2008, Lunshof et al. stated clearly that in the context of genetic research data donors “need to realize that they are potentially identifiable and that their privacy cannot be guaranteed” (Lunshof et al. 2008, 409). This split between a) the scientific debate and knowledge of the limits of anonymity and b) the wish of parts of the public for full anonymity opens a gap between science and the public which needs to be closed to protect public trust in data use.
Initiating public engagement activities on the limits of anonymity
Since the scientific community discussed the limits of anonymity for a while (Kaye 2012; Dankar et al. 2019), it is worrying that parts of the public are still not fully informed about the limits of anonymity and what anonymity in the context of national data programmes and health research means. People with false expectations are at risk of feeling betrayed because of an unnecessary knowledge gap. This knowledge gap will become increasingly dangerous considering the development of current health research that depends particularly on the use of linked data, e.g. in genomics research or disease control. Due to the fact that society is increasingly revealing and using personal data in all areas of life, a reasonable understanding of what anonymity implies is advantageous. Further, if the public is well informed about anonymity and data linkage for research, the public is much more likely to support research using linked data sets and to build trust in research institutions (Aitken et al. 2016). Savage (2016) argues as anonymity is not the solution to privacy concerns due to the fact that full anonymity is impossible, it would be sensible to openly discuss and explain the benefits and risks concerning identification in the consent process and even before the public comes in close contact with research. Hence, there is a strong case for a large-scale information campaign to discuss with the public what anonymity means in the context of health research. When planning such an information campaign several questions need to be considered, foremost: What is the right format to address public concerns in a meaningful way? Who should lead such an endeavour? What are effective arguments?
It is likely that the way forward to close this knowldge gap is a mixed approach including open public debate about concerns regarding anonymity and public engagement in health data governance processes, by which public participants are not only empowered, but also their health data literacy levels increase (Fischer 2012). Initiatives such as Understanding Patient Data can help to support public debate in a meaningful way (Understanding Patient Data 2020). Recent research on fair partnerships within the NHS England and the use of NHS patients’ data confirms that public participation in data governance processes to empower citizens is favourable towards increasing trust (Understanding Patient Data and Ada Lovelace Institute 2020). Ways to allow public participation are citizen juries or alternative health data governance models such as health data cooperatives where cooperative members have control about health data processes within the cooperative (Blasimme et al. 2018). Besides these possibilities to actively involve the public, health system representatives need to demonstrate how it is still possible to protect privacy despite not being able to provide full anonymity. Here, the responsibility to inform the public lies with the health data users. The information should directly address the concerns of the public in relation to anonymity and privacy. Moreover, the information should explain how privacy can be protected despite using the data in a not anonymized way. Answering the broader question ‘How is data kept safe?’ the Understanding Patient Data Initiative suggests removing identifying information, independent review processes, legal contracts and security standards (Understanding Patient Data 2021). To generate trust, ideally the conveyed information relates to a) comparative experiences to create a feeling of familiarity with the intended data use; b) present capabilities of the data user to show that s/he is capable to protect privacy despite not fully anonymizing data; and c) explain how privacy will be maintained in the future (Gille and Brall 2021; Gille et al. 2020). The nitty-gritty details of the information will remain country and context specific. In general, the efforts to inform the public about anonymity and thereby contribute to trust building should be distributed among all actors involved in health data uses (Gille et al. 2017). Every actor has to take responsibility to improve health data uses since “actions and circumstances of one stakeholder also affect a variety of other stakeholders within the network” (Brall 2018, 129). Adapted from our own research, such network of actors, as in Fig. 1, consists of several actors of the health data use network. The public, is at the center of the figure and provides health data. The grey shaded actors are health data users and the media (including social media) might not preliminarily be considered as a health data user, but is an important actor and communication channel that influences the public understanding of anonymity.
Actors that can shape the discussion around anonymity are academia, which performs research using health data and can contribute with expert opinion, and the government that shapes the regulatory landscape in which health data use takes place. Further actors are public institutions (including research and healthcare service institutions), for-profit organisations (such as pharmaceutical or health technology companies) and non-profit organisations (such as charities or professional associations). In order to support this endeavour on a broader scale, media should be involved as they are a major outreach instrument and it has been shown in the readership comments above that members of the public engage in this type of public fora. Also, previous research on health campaign communication strategies suggests to involve media outlets (Wakefield et al. 2010). For example, newspapers with a high national readership could potentially address these topics on a weekly basis by means of a health data series, where readers can ask questions and experts or health data users provide answers. Not only in the public health field in general, but also in health data use specifically, rethinking networks and their interdependencies to achieve sustainable structures is key. To inform people about anonymity during an informed consent process (where applicable) alone will not be sufficient as it is too late in terms of the overall recruitment process. Some people will have been put off from participating far earlier in the process before having participated in the informed consent process due to false understandings of anonymity. That said, for participants that engage in the consent process, the consent process remains an effective and important process to convey the limits of anonymity and to provide a clear description of the practical limitations of data anonymization.
In summary stakeholders have to jointly work together to involve the public:
To openly discuss and explain the benefits and risks concerning identification.
To empower citizens by engaging them in decisions about health data use.
To demonstrate how privacy can be protected despite not being able to provide full anonymity.
If the entire public is fully informed about the limits of anonymity, people can adapt their expectations towards anonymity and make better informed choices. This would not only benefit research, but also support public trust in the wider health system.
Availability of data and materials
All data generated or analysed during this study are included in this published article.
Aitken, Mhairi, Jenna de St. Jorre, Claudia Pagliari, Ruth Jepson, and Sarah Cunningham-Burley. 2016. Public responses to the sharing and linkage of health Data for research purposes: A systematic review and thematic synthesis of qualitative studies. BMC Medical Ethics 17 (1): 73. https://doi.org/10.1186/s12910-016-0153-x.
Ballantyne, Angela. 2020. How should we think about clinical Data ownership? Journal of Medical Ethics 46 (5): 289 LP–289294. https://doi.org/10.1136/medethics-2018-105340.
Blasimme, Alessandro, Effy Vayena, and Ernst Hafen. 2018. Democratizing Health Research through Data cooperatives. Philosophy and Technology 31 (3): 473–479. https://doi.org/10.1007/s13347-018-0320-8.
Brall, Caroline, Els Maeckelberghe, Rouven Porz, Jihad Makhoul, and Peter Schröder-Bäck. 2017. Research ethics 2.0: New perspectives on norms, values, and integrity in genomic research in times of even scarcer resources. Public Health Genomics 20 (1): 27–35. https://doi.org/10.1159/000462960.
Brall, Caroline. 2018. “Health under austerity in Europe: Ethical considerations.” Maastricht University. doi:https://doi.org/10.26481/dis.20180709cb.
Brall, Caroline, Claudia Berlin, Marcel Zwahlen, Kelly E. Ormond, Matthias Egger, and Effy Vayena. 2021. Public willingness to participate in personalized Health Research and biobanking: A large-scale Swiss survey. Plos One 16 (4): e0249141. https://doi.org/10.1371/journal.pone.0249141.
Brall, Caroline, Peter Schröder-Bäck, and Els Maeckelberghe. 2019. Ethical Aspects of Digital Health from a Justice Point of View. European Journal of Public Health 29 (Supplement_3): 18–22. https://doi.org/10.1093/eurpub/ckz167.
Carter, Pam, Graeme T. Laurie, and Mary Dixon-Woods. 2015. The social Licence for research: Why care. Data ran into trouble. Journal of Medical Ethics 41 (5): 404–409. https://doi.org/10.1136/medethics-2014-102374.
Chapman, James, and Andy Dolan. 2014. “Cashing in on patient records to be banned: But You’ll still have to opt out to keep private details off database.” Mail Online, 2014. https://www.dailymail.co.uk/news/article-2570567/Cashing-patient-records-banned-But-youll-opt-private-details-database.html. Accessed 29 June 2021.
Clayton, Ellen W., Colin M. Halverson, Nila A. Sathe, and Bradley A. Malin. 2018. A systematic literature review of individuals’ perspectives on privacy and genetic information in the United States. Plos One 13 (10): e0204417. https://doi.org/10.1371/journal.pone.0204417.
Dankar, Fida K., Marton Gergely, and Samar K. Dankar. 2019. Informed consent in biomedical research. Computational and Structural Biotechnology Journal 17: 463–474. https://doi.org/10.1016/j.csbj.2019.03.010.
Fischer, Frank. 2012. Participatory governance: From theory to practice. In The Oxford Handbook of Governance, edited by David Levi-Faur. https://doi.org/10.1093/oxfordhb/9780199560530.013.0032.
Gille, Felix, and Caroline Brall. 2021. Can We Know If Donor Trust Expires? About Trust Relationships and Time in the Context of Open Consent for Future Data Use. Journal of Medical Ethics. https://doi.org/10.1136/medethics-2020-106244.
Gille, Felix, Sarah Smith, and Nicholas Mays. 2017. Towards a broader conceptualisation of ‘public trust’ in the health care system. Social Theory & Health 15 (1): 25–43. https://doi.org/10.1057/s41285-016-0017-y.
Gille, Felix, Sarah Smith, and Nicholas Mays. 2020. What is public Trust in the Healthcare System? A new conceptual framework developed from qualitative Data in England. Social Theory & Health. 19 (1): 1–20. https://doi.org/10.1057/s41285-020-00129-x.
Goldacre, Ben. 2014. “The NHS plan to share our medical Data can save lives – But must be done right.” The Guardian. 2014. https://www.theguardian.com/society/2014/feb/21/nhs-plan-share-medical-data-save-lives. Accessed 29 June 2021.
Goodman, Deborah, Deborah Bowen, Lari Wenzel, Paris Tehrani, Francis Fernando, Araksi Khacheryan, Farihah Chowdhury, Catherine O. Johnson, and Karen Edwards. 2018. The research participant perspective related to the conduct of genomic cohort studies: A systematic review of the quantitative literature. Translational Behavioral Medicine 8 (1): 119–129. https://doi.org/10.1093/tbm/ibx056.
Hays, Rebecca, and Gavin Daker-White. 2015. The care.data consensus? A qualitative analysis of opinions expressed on twitter. BMC Public Health 15 (1): 838. https://doi.org/10.1186/s12889-015-2180-9.
Hill, Elizabeth M., Emma L. Turner, Richard M. Martin, and Jenny L. Donovan. 2013. “‘“Let’s Get the Best Quality Research We Can”’: Public Awareness and Acceptance of Consent to Use Existing Data in Health Research: A Systematic Review and Qualitative Study.”. BMC Medical Research Methodology 13 (1): 72. https://doi.org/10.1186/1471-2288-13-72.
Ienca, Marcello, and Effy Vayena. 2020. On the responsible use of digital Data to tackle the COVID-19 pandemic. Nature Medicine 26 (4): 463–464. https://doi.org/10.1038/s41591-020-0832-5.
IpsosMORI. 2016. The one-way Mirror: Public attitudes to commercial access to health Data.
Kaye, Jane. 2012. The tension between Data sharing and the protection of privacy in genomics research. Annual Review of Genomics and Human Genetics 13 (1): 415–431. https://doi.org/10.1146/annurev-genom-082410-101454.
Lawler, Mark, Andrew D. Morris, Richard Sullivan, Ewan Birney, Anna Middleton, Lydia Makaroff, Bartha M. Knoppers, Denis Horgan, and Alexander Eggermont. 2018. A roadmap for restoring trust in big Data. The Lancet Oncology 19 (8): 1014–1015. https://doi.org/10.1016/S1470-2045(18)30425-X.
Lunshof, Jeantine E., Ruth Chadwick, Daniel B. Vorhaus, and George M. Church. 2008. From genetic privacy to open consent. Nature Reviews Genetics 9 (5): 406–411. https://doi.org/10.1038/nrg2360.
Melas, Philippe A., Louise K. Sjöholm, Tord Forsner, Maigun Edhborg, Niklas Juth, Yvonne Forsell, and Catharina Lavebratt. 2010. Examining the Public Refusal to Consent to DNA Biobanking: Empirical Data from a Swedish Population-Based Study. Journal of Medical Ethics 36 (2): 93 LP–93 98. https://doi.org/10.1136/jme.2009.032367.
Middleton, Anna, Richard Milne, Adrian Thorogood, Erika Kleiderman, Emilia Niemiec, Barbara Prainsack, Lauren Farley, Paul Bevan, Claire Steed, James Smith, Danya Vears, Jerome Atutornu, Heidi C. Howard, and Katherine I. Morley. 2019. Attitudes of publics who are unwilling to donate DNA Data for research. European Journal of Medical Genetics 62 (5): 316–323. https://doi.org/10.1016/j.ejmg.2018.11.014.
Milne, Richard, and Katherine I Morley, Heidi Howard, Emilia Niemiec, Dianne Nicol, Christine Critchley, Barbara Prainsack, et al. 2019. Trust in Genomic Data Sharing among members of the general public in the UK, USA, Canada and Australia. Human Genetics 138 (11): 1237–1246. https://doi.org/10.1007/s00439-019-02062-0.
Naughton, John. 2014. “Why your health secrets may no longer be safe with your GP.” The Guardian. 2014. https://www.theguardian.com/society/2014/jan/26/health-secrets-not-safe-with-gp. Accessed 29 June 2021.
NHS. 2014. “Better Information Means Better Care: NHS Contacts All English Households from Today.”
Nissenbaum, Helen. 2010. Privacy in Context: Technology, Policy, and the Integrity of Social Life. Stanford: Stanford law books, an imprint of Stanford University Press.
Ostherr, Kirsten, Svetlana Borodina, Rachel Conrad Bracken, Charles Lotterman, Eliot Storer, and Brandon Williams. 2017. Trust and privacy in the context of user-generated health Data. Big Data & Society 4 (1): 2053951717704673. https://doi.org/10.1177/2053951717704673.
Savage, Neil. 2016. “Privacy: The Myth of Anonymity.” Nature 537: S70. https://doi.org/10.1038/537S70a.
Sztompka, Piotr. 1999. Trust: A Sociological Theory. Cambridge: Cambridge University Press.
Wellcome Trust. 2013. Summary report of qualitative research into public attitudes to personal Data and linking personal Data. Wellcome Trust, London.
Understanding Patient Data “Supporting Conversations.” 2020. https://understandingpatientdata.org.uk/supporting-conversations.
Understanding Patient Data. “How Is Data Kept Safe?” 2021. https://understandingpatientdata.org.uk/how-data-kept-safe. Accessed 4 July 2021.
Understanding Patient Data, and Ada Lovelace Institute. 2020. “Foundations of Fairness: Where next for NHS Health Data Partnerships?”
Vezyridis Paraskevas, and Stephen Timmons. 2019. Resisting big Data exploitations in public healthcare: Free riding or distributive justice? Sociology of Health & Illness 41 (8): 1585–1599. https://doi.org/10.1111/1467-9566.12969.
Wakefield, Melanie A., Barbara Loken, and Robert C. Hornik. 2010. Use of mass media campaigns to change health behaviour. Lancet (London, England) 376 (9748): 1261–1271. https://doi.org/10.1016/S0140-6736(10)60809-4.
Walker, M. Daniel, Johnson Tyler, W. Eric Ford, and R. Timothy Huerta. 2017. Trust me, I m a doctor: Examining changes in how privacy concerns affect patient withholding behavior. Journal of Medical Internet Research 19 (1): e2. https://doi.org/10.2196/jmir.6296.
Wallace, Kathleen A. 1999. Anonymity. Ethics and Information Technology 1 (1): 21–31. https://doi.org/10.1023/A:1010066509278.
Wellcome Trust. 2015. Enabling Data linkage to maximise the value of public Health Research Data: Full report. Wellcome Trust, London.
We would like to thank Prof Nicholas Mays and Dr. Sarah Smith, both from London School of Hygiene and Tropical Medicine, for their valuable guidance and encouragement as supervisors during Felix’s PhD, as well as Dr. Peter Schröder-Bäck from Maastricht University for the fruitful exchange and supervision during Caroline’s PhD. The outcome of these collaborations forms the basis for this article.
No funding was received.
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Gille, F., Brall, C. Limits of data anonymity: lack of public awareness risks trust in health system activities. Life Sci Soc Policy 17, 7 (2021). https://doi.org/10.1186/s40504-021-00115-9
- Identifiable data
- Privacy protection
- Data literacy