Tomorrow I will be giving an invited talk at the Human Genome Variation 2015 conference in San Francisco. The title of my talk is Impediments to Interoperability: Why Can’t We All Just Get Along? For ethical and scientific reasons — some of which I will briefly outline in my talk — data sharing is imperative in genomics. And yet, very little data sharing actually happens. Why?
Technical standards are an integral part of the problem — systems need to be able to interact with one another at the level of bits and bytes. But making technical standards is the easy part. Organizational issues, often overlooked, can impede interoperability just as effectively as lack of common technical standards. For example, if a system requires that users be recognized as bona fide researchers, which authority is empowered to make this assertion? If there are multiple such authorities, how do they interact with one another? Are assertions made by one authority necessarily respected by the others? Are their assertions semantically equivalent? The assertion that a given person is a legitimate researcher depends on knowing the person’s identity — which authority (or authorities) have verified the identity? To what degree? And so on.
Even organizational issues seem minor in comparison to the many systemic factors and disincentives that frustrate interoperability. For example, academic or commercial competition can (and often do) prevent systems from working together. Privacy risks and fear of liability impede the flow of data, as do national laws and institutional policies. These issues appear to be almost insurmountable barriers to interoperability, but I will argue that organizations like the Global Alliance for Genomics and Health are ideally positioned to address these challenges, and indeed, are making progress in understanding the underlying dynamics through a series of high profile demonstrator projects. If we can clearly understand the factors that impede interoperability, then we can design systems that account for them rather than wishing (in vain) that they did not exist. This reality-based approach is the best way to unlock the potential of data sharing.
The official abstract for my talk.
At this very moment, a hard drive full of data is lying idle in a box under a desk in a nondescript laboratory. Buried among the countless gigabytes, there is a single bit of information that can save a life — a variant of unknown significance observed only once before, halfway around the world. This variant and its associated phenotype hold the key to understanding and treating a devastating disease. Unfortunately, the connections will never be made. A paper has been published, the researchers have moved on, and an anonymous child in a foreign country will finally succumb, without answers. Studies have shown that approximately 96% of variants predicted to be functionally important are rare, with an allele frequency below 0.5%. As a result, it is clear that no single institution will hold enough samples to achieve sufficient statistical power. In this context, collaboration is a necessity, not an option. Never before have we seen such a confluence of enabling technologies: inexpensive next-generation sequencing, virtually limitless storage and computational power, high-bandwidth communications channels, sophisticated machine learning. These technologies, which have revolutionized so many aspects of our lives, are poised to do the same for human health. And yet data sharing is still the exception rather than the norm. Why? The Global Alliance for Genomics and Health (GA4GH) is addressing these questions on multiple fronts, from file formats and API interoperability, to security and privacy, to nuanced regulatory and ethics considerations. In this talk, I will outline some of the challenges that impede our collective progress and describe how the GA4GH is helping to overcome them. As a member of the Security Working Group, I will focus in particular on issues of security, privacy, and trust, which can enable or inhibit interoperability just as effectively as agreement or disagreement over data formats. No single individual can ensure that the data on the hard drive underneath the desk will get into the hands of the geneticist that desperately needs it, but by working together we can build systems and institutions that do. I will conclude my talk with a discussion of how everyone in the community can get involved to help realize this important vision.