Skip to main content

< All Technology Centers

Computational Biology

The Computational Biology team operates at the interface of the fields of biology, computer science, and statistics by developing and applying algorithms and models to understand biological systems and relationships.

Overview of Services

Research Services

The Computational Biology team at the Stowers Institute assists investigators with the analysis of biological data. The group combines software development and technical skills with biological insights to help find answers in complex and massive datasets. The bioinformatics expertise spans a wide range of topics, including but not limited to:

  • Sequence QA/QC
  • RNA-seq, single cell RNA-seq, and spatial RNA-seq
  • Small RNA-seq
  • ATAC-seq, single cell ATAC-seq
  • ChIP-seq, Cut & Tag
  • Ribosomal profiling
  • Transcriptome/Genome assembly and annotation
  • Variant Calling
  • PacBio and Oxford Nanopore long read data analysis
  • Custom code and pipeline creation

Technologies

Open source software

  • Bowtie, STAR, bwa are used for basic alignment.
  • Trinity and StringTie are used for assembly.
  • R and Bioconductor packages used for data analysis and visualization.
  • GATK / DeepVariant for variant analysis.

Custom Tools
Based on the research, Computational Biology can develop custom tools for specific needs.

Software & Computing

Tools We Use

The Computational Biology group uses a variety of software packages, both open-source and commercial, to assist us in our analysis process. Here are a few tools we currently use to process NGS data:

  • Bowtie2, STAR, bwa, and RSEM are used for basic alignment and analysis of ChIP-seq and RNA-seq.
  • R is used heavily for statistical analysis and data visualization. We commonly use edgeR and Seurat.
  • We use RMarkdown to prepare analysis reports.

Learn More

Team Contact

Hua Li

Director, Computational Biology, Bioinformatics, and Biostatistics

Stowers Institute for Medical Research

Portrait of Hua Li

New Developments in Computational Biology

The field of Bioinformatics is fast moving, constantly changing and adapting to new developments in biological protocols and experiments. Our group keeps up with changes in the field by following literature, attending conferences, and procuring new software.

Data visualization of single cell sequencing.

Single Cell Sequencing

Starting with single cell RNA sequencing and expanding to other single cell based technologies like scATAC-seq (measuring regions of open chromatin), these methods allow researchers to measure genes or other features of interest in thousands of individual cells at a time. On one hand, this increases both the resolution of data and the types of questions they can answer; however, it also magnifies the complexity of data. We are always trying to learn new and better ways of analyzing and visualizing this data to provide our collaborators more useful and meaningful results. Currently, we use the cellranger software from 10x genomics and the Seurat R package among other tools to check data quality, cluster cells into types, identify marker genes, and compare samples.

Image showing results from slide-seq technique.

Spatial Transcriptomics

New techniques like slide-seq or 10x Visium place a slice of tissue on a slide and use barcoded beads with a known position to associate a given spatial position with gene expression measurements for thousands of genes at a time. This allows biologists’ understanding of anatomy to help inform their identification of cell type, and it allows them to find genes with interesting and spatially variable patterns in their systems of study. Many new approaches to analyzing spatial data and integrating spatial with single cell data are under development and being used in our group.

Image describing deep learning. Deep learning technologies have been widely used in biology to learn patterns from rapidly growing data for solving various problems. Our team has applied a set of deep learning tools for SNP and indel analysis (DeepVariant), peak calling (LanceOtron), motif discovery (BPNet), and protein structure prediction (AlphaFold).

Deep Learning

Deep learning technologies have been widely used in biology to learn patterns from rapidly growing data for solving various problems. Our team has applied a set of deep learning tools for SNP and indel analysis (DeepVariant), peak calling (LanceOtron), motif discovery (BPNet), and protein structure prediction (AlphaFold). We also actively develop deep learning methods for different applications, such as flow cytometry image classification and multi-omics integration. We participate in an institute-wide deep learning journal club to keep abreast of the latest developments, and better understand how deep learning can facilitate researchers at the institute.

A venn diagram on the computation biology section of the website depicting data types we frequently encounter (RNA-seq, ChIP-seq, and single cell RNA-seq). The team has developed robust pipelines to automatically run the first few steps of analysis and quality control. This saves time for challenging and more interesting downstream analyses, which varies from project to project.

In-house tools and pipelines

For data types we frequently encounter (RNA-seq, ChIP-seq, and single cell RNA-seq), we have developed robust pipelines to automatically run the first few steps of analysis and quality control. This saves time for challenging and more interesting downstream analyses, which varies from project to project.

For aspects of analysis that we or our collaborators regularly perform, we have developed a number of in-house web applications – RNA-seq differential expression, gene ontology enrichment, Venn diagram construction, and sequencing depth needed for a given experiment. These tools enable institute members who don’t necessarily want to go through the process of learning programming to perform some basic analysis themselves.

Bioinformatics Software

Optimal performance from our team requires many different types of software. From initial alignment and processing to custom scripts for making figures and tables of genes, we are constantly installing, testing, reading documentation, and writing code ourselves to help our collaborators find solutions to their questions.

Alignment and Processing

When data comes off a sequencing machine, it is encoded in a binary file format that is only meaningful to a computer. The first few steps of analysis involve turning these raw files into files full of DNA sequences, usually millions of short (50-100 base) reads containing As, Cs, Ts, and Gs. Once we construct these (.fastq) files with sequences and quality values, we can use an alignment software to align the sequence reads to a genome or transcriptome so that we can identify what they represent. Depending on the type of data we are working with, they may tell us something about how much a gene is expressed in a given condition (or cell), how open a region of chromatin is, or whether or not two regions of a genome are in contact.

Sequencing data encoded in binary file format.

Data analysis and visualization

Custom data analysis using R and python is most of what we work on day-to-day. This could mean running a package to analyze a specific type of data, generating data visualizations in R or python, making interactive plots, or developing R/shiny applications to allow users to interact with their data using a graphical interface.

Waterfall and scatter plot of the interaction of two Drosophila proteins with DNA.

Featured Publications

The methyltransferase SETD2 couples transcription and splicing by engaging mRNA processing factors through its SHI domain

Bhattacharya S, Levy MJ, Zhang N, Li H, Florens L, Washburn MP, Workman JL. Nat Commun. 2021;12:1443. doi: 1410.1038/s41467-41021-21663-w.

Adaptation to low parasite abundance affects immune investment and immunopathological responses of cavefish

Peuß R, Box AC, Chen S, Wang Y, Tsuchiya D, Persons JL, Kenzior A, Maldonado E, Krishnan J, Scharsack JP, Slaughter BD, Rohner N. Nat Ecol Evol. 2020;4:1416-1430.

Translation of small downstream ORFs enhances translation of canonical main open reading frames

Wu Q, Wright M, Gogol MM, Bradford WD, Zhang N, Bazzini AA. EMBO J. 2020;39:e104763. doi: 104710../embj.2020104763.

Prospectively Isolated Tetraspanin(+) Neoblasts Are Adult Pluripotent Stem Cells Underlying Planaria Regeneration

Zeng A, Li H, Guo L, Gao X, McKinney S, Wang Y, Yu Z, Park J, Semerad C, Ross E, Cheng LC, Davies E, Lei K, Wang W, Perera A, Hall K, Peak A, Box A, Sánchez Alvarado A. Cell. 2018;173:1593-1608.e20.

Set2 methylation of histone H3 lysine36 suppresses histone exchange on transcribed genes.

Venkatesh S, Smolle M, Li H, Gogol MM, Saint M, Kumar S, Natarajan K, Workman JL. Nature  2012;489:452-455.

Retinoid-Sensitive Epigenetic Regulation of the Hoxb Cluster Maintains Normal Hematopoiesis and Inhibits Leukemogenesis

Qian P, De Kumar B, He XC, Nolte C, Gogol M, Ahn Y, Chen S, Li Z, Xu H, Perry JM, Hu D, Tao F, Zhao M, Han Y, Hall K, Peak A, Paulson A, Zhao C, Venkatraman A, Box A, Perera A, Haug JS, Parmely T, Li H, Krumlauf R, Li L. Cell Stem Cell. 2018;22:740-754 e747.

Newsletter & Alerts