The team investigated how two transcription factors that belong to the Hippo signaling pathway, YAP1 and TEAD4, bind DNA together with transcription factors unique to mouse trophoblast stem cells to control which genes are turned on. The researchers utilized the AI tool BPNet, an interpretable deep learning framework that can both make predictions and can also explain how those predictions were made, to learn genome-wide relationships between DNA sequence patterns and transcription factor binding profiles.
“We reasoned that if the binding of YAP1 and TEAD4 is driven by the regulatory code, then the model should be able to learn their binding profiles from DNA sequence alone,” said Zeitlinger. “Moreover, analyzing the rules that the model learned should provide insights into exactly how the regulatory code is read.”
First, the researchers fed BPNet with real data showing where TEAD4, YAP1, and several other transcription factors are found in mouse trophoblast cells. After the model showed it could make accurate predictions from DNA sequence alone, they tested whether it could recognize patterns in new DNA sequences it hadn’t seen before—and it did.
“BPNet learned genome-wide rules that are predictive, showing that the information is encoded in the DNA sequence,” said Zeitlinger. “This is a step toward applying this approach more broadly to learn the regulatory code in the human genome by which signaling pathways instruct cells.”
“How exactly cells respond to signaling pathways has puzzled me since my Ph.D.,” said Zeitlinger. “These pathways are often the targets of therapeutic drugs, yet how different cell types respond is still poorly understood—finding a solution to this problem was very gratifying. Even more exciting was that we could extract the learned rules from the model, and these rules made sense and taught us something new about how transcription factors function.”
The team focused on two discoveries that emerged from the model and validated these findings in the lab. First, they characterized how YAP1 and TEAD4 work together with the trophoblast-specific transcription factor, TFAP2C. They found that when TFAP2C binds to DNA, this makes it easier for TEAD4 to bind to nearby regions, which in turn helps YAP1 connect to TEAD4 and activate genes.
“What was surprising was that this boost by TFAP2C is higher the closer it occurs to TEAD4,” said Zeitlinger. “Nevertheless, it is a flexible mechanism, which may explain how signaling pathways can receive regulatory input from a variety of transcription factors in different cell types.”
Second, the researchers uncovered that pairs of TEAD4 binding sequences—previously thought to be rare—have very strong effects and are much more common in the genome than expected. Although scientists had seen these double TEAD4 sites before, they hadn’t realized how important they are to the Hippo signaling pathway.
“The widespread double TEAD4 patterns had remained hidden because they don’t look very similar,” said Zeitlinger. “However, AI was able to recognize them because they have strong effects on binding.”
The rules for how DNA-protein interactions drive gene activation and cell fate specialization are highly complex. However, AI is proving to be an extraordinarily powerful tool that is transforming how biologists study gene regulation.
“Our work strongly supports the notion that the code of gene regulation can be deciphered,” said Zeitlinger. “If so, this would have huge implications for human health, not just for predicting drug targets, but also for helping identify disease susceptibilities and enabling personalized medicine.”
Additional authors include Charles McAnany, Ph.D., Melanie Weilert, Mary Cathleen McKinney, Ph.D., and Sabrina Krueger, Ph.D.
This work was funded by the National Human Genome Research Institute of the National Institutes of Health (NIH) (award: R01HG010211) and with institutional support from the Stowers Institute for Medical Research. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH.