Skip to main content


Completing the sequence - Part 3

A ‘big data, AI, and genomics’ approach to identifying variation

31 March 2022

A ‘big data, AI, and genomics’ approach to identifying variation

During the COVID-19 lockdown, Matthew Borchers, then a graduate student at the University of Oregon, started a remote internship with the Gerton Lab. His task was to identify short unique sequences in certain parts of the genome for use by members of the consortium during genome assembly.

Soon after, he moved to Kansas City, Missouri, to join the Stowers Institute’s Big Data, AI, and Genomics group as a bioinformatics researcher.

Microscopic image on a screen showing fluorescently labeled human chromosomes showing attachment points called centromeres and regions of ribosomal DNA.

“Our group is eclectic,” said Borchers. “It’s a bunch of people with computational backgrounds in different bioscience fields. We spend less time processing data and more time developing the approaches and methods to address the questions that need answering for projects that are computational in nature.”

During his internship, Borchers developed a computational method to estimate the size of human centromeres, which he brought to the T2T project.

“We compared the newly-assembled centromeric sequences from the X chromosome of the CHM13 cell line with those from the 1000 Genomes Project, which had samples from different individuals from all over the world, to see how CHM13 compared to a more diverse representation of centromere sequences,” said Borchers.

Consistent with previous research, they found substantial size variation in centromeres between individuals with different geographic ancestry. The researchers published their work in a third Science paper, led by Nick Altemose, PhD, and Miga, reporting the characterization of the genetic and epigenetic landscape of human centromeres.

Productive during the pandemic

Initial efforts from the T2T consortium resulted in the complete assembly of chromosomes X and 8, coauthored by Stowers researchers and published in 2020 and 2021 in Nature. It has since grown into a team of about 100 talented individuals, mostly computational biologists.

“In twenty years from now, when we look back on this human genome project 2.0, it will become clear that it really happened during the pandemic,” reflected Gerton.

Gerton, Potapova, and Gomes de Lima all gave credit to Miga and Phillippy, the two co-leaders of the T2T consortium.

“They established some ground rules early on about how they wanted the consortium to work. It’s a public good we are generating, and they really want people to be collaborative and work together,” said Gerton.

Tamara Potapova, PhD, Leonardo Gomes de Lima, PhD, and Matt Borchers sitting at a table.

Leonardo Gomes de Lima, PhD, Tamara Potapova, PhD, and Matt Borchers

“The pandemic forced our consortium to go fully virtual in the summer of 2020, but this actually increased our interactions,” said Phillippy. Suddenly we could bring together the world’s top experts on an almost daily basis, which broadened participation and accelerated our progress. If I had a question about the chromosome structure of the CHM13 cell line, I could just message the Gerton lab and have an answer in real time."

Gomes de Lima was impressed by how Miga and Phillippy could keep track of every paper and idea being developed. “They are an amazing task force,” he said.

Potapova agreed. “There is a high degree of collegiality, mutual respect, and friendliness between members of the consortium, which makes me want to contribute more. Jennifer is really good at this, but so are the other T2T leaders. We have colleagues involved in this at all levels—from well-established investigators to graduate students. No single individual gets more respect than another. As the consortium grows, I hope this continues.”

“The next phase for us and the whole T2T consortium is the human pan-genome project. We have learned all these skills and peered into the dark matter of a single genome,” said Gerton. “Now, can we apply that to dozens more genomes?”

“One genome is not very representative of all humans. In my lab, and in the broader consortium, what that means is, we’re going to be taking a deeper dive into the dark matter across a geographically diverse panel of cell lines. We want to understand the diversity of the human genome. That’s the big picture. This sequence enables the discovery process to begin, so we can build a foundation for understanding genetic underpinnings of health and disease.”

Related content
Completing the human genome – FAQ
Telomere-to-Telomere - Multimedia educational resources and links to all six Science articles
Telomere-to-Telomere consortium website
Researchers generate the first complete, gapless sequence of a human genome - National Human Genome Research Institute
First complete, gapless sequence of a human genome reveals hidden regions- University of California, Santa Cruz
Completing the Human Genome Sequence (Again) - Scientific American
Most complete human genome yet reveals previously indecipherable DNA - Science
Human blueprint breakthrough: Scientists publish ‘gapless’ human genome - Washington Post
Complete Human Genome Deciphered for the First Time - Howard Hughes Medical Institute
Scientists Finish the Human Genome at Last - New York Times

Newsletter & Alerts