Tools of the Trade

By Anissa Anderson Orr

New approaches help analyze emerging model organisms

On a typical morning, Sofia Robb, PhD, and her three-year-old son Cedar tend to the sheep, goats, alpacas, and chickens behind her two-and-a-half-acre home in the stunning Utah mountains. When morning chores are finished, she drops Cedar off at preschool, then heads back to her home office, where she works remotely as a genomics scientist for the Stowers Institute.

After a cup of tea, Robb fires up her computer and launches SIMRbase, a website she constructed for the Institute to host genome sequences and related data, as well as computing tools to study them.

Up pop images of the menagerie whose genomic data she collects and cares for online—apple snail, mouse, planarian flatworm, cavefish, zebrafish, sea anemone, sea lamprey, and killifish.

They’re not quite as cute and cuddly as the animals in her backyard petting zoo, but these emerging model organisms are vitally important to her colleagues more than 1,000 miles away in Kansas City, Missouri. Stowers scientists study a wide variety of organisms, rich in biological diversity, to explore questions that expand our understanding of life’s biological processes and behavior. Robb gives their photos a glance and gets to work.

Challenging to study

Compared to model organisms that have been part of scientific research for decades, emerging model organisms can be challenging to study. There are online repositories for the human genome (Ensembl), plant genomes (JGI Phytozome), and common model organisms like the fruit fly Drosophila melanogaster (FlyBase) and the roundworm Caenorhabditis elegans (WormBase), but few for the cavefish or apple snail, for example.

And while advances in DNA sequencing technology have made it easier than ever to sequence genes and genomes, making sense of the resulting data isn’t simple.

“Right now, it’s fairly common to say, ‘Let’s sequence an organism.’ But If you don’t have a bioinformaticist in your lab to help you organize and evaluate all of the data you generate, then what the heck do you do with it?” Robb says.

Fortunately, she has some ideas. Robb draws on her experience as a bench scientist to find solutions for Stowers scientists.

Robb began integrating computer scripting and databases with her lab experiments as a technician in the laboratory of Alejandro Sánchez Alvarado, PhD, at the Carnegie Institution for Science in Baltimore, Maryland. She stayed with the Sánchez Alvarado Lab after its move to the University of Utah for her doctoral work, where she studied histone modifying enzymes and their role in stem cell biology and regeneration in planaria. In the course of her thesis research, she constructed genomic tools for planaria. As a postdoctoral associate at the University of California, Riverside, she further honed her bioinformatics skills while working with the genome of rice. Now, she works on several genomics initiatives for the Stowers Institute, including SIMRbase, and customizes tools for Stowers scientists, including Sánchez Alvarado, who moved his lab to the Stowers Institute in 2011.

A home for genomes

Launched in 2015, SIMRbase is built on open source code and provides a common framework of genomics tools that can be tailored to specific organisms. Researchers can upload sequenced genes to the site, curate genes, and browse other genes to make comparisons. It uses a customizable plug-and-play approach to accommodate emerging research models, and to encourage collaboration and data sharing between scientists.

“The guiding idea was to help labs with their genomics research by saving them time and effort, and to help the ones that couldn’t make the tools themselves,” says Robb. As SIMRbase grows to include more research model genomes, the general approach is to provide initial access to Stowers scientists to support ongoing research at the Institute, and then, after further development and refinement, open access to the rest of the scientific community.

The site features many tools that assist researchers, but Robb mentions three in particular—JBrowse allows researchers to browse genomes and line up genomic data, such as genes that are switched on under certain conditions; Tripal, a web tool that interfaces with a database of genes and associated information, is used to create descriptive pages about genes; and Apollo helps researchers describe and edit gene features.

Like a Sudoku puzzle

It’s this last tool that Hugo Parker, PhD, uses the most, both in the lab and out.

Nights and weekends, you might find him using Apollo to curate sea lamprey genes. He inspects and aligns them. He changes the size of their exons, the portions of genes that encode proteins. He merges two genes that were initially predicted to be separate. The overall goal is to describe and validate sea lamprey genomic information based on experimental results.

“It’s a little kind of mind puzzle, like doing a Sudoku puzzle. It’s fun, and a nice distraction,” he says.

Parker, a postdoctoral research associate in the lab of Stowers Scientific Director and Investigator Robb Krumlauf, PhD, and Jeramiah Smith, PhD, an associate professor in the Department of Biology at the University of Kentucky, used SIMRbase tools in their groundbreaking work, published in the January 2018 issue of Nature Genetics, to report the germline genome sequence of the sea lamprey.

Hugo Parker

These parasitic fish evolved from a lineage of jawless vertebrates that diverged early from the rest of vertebrates, about 550 million years ago, and are an important model organism for studying early vertebrate evolution.

Parker and Smith are investigating Hox genes, which control the layout of a developing embryo, marking where structures should appear along the body from head to tail.

“Looking at the Hox genes, the big question was, ‘How many Hox genes do lamprey have? How many Hox clusters do lamprey have?’ These are important developmental clusters of genes that control anterior to posterior patterning. And to answer that, we needed a genome,” says Parker.

While the lamprey genome had been sequenced before, it was taken from samples of the lamprey’s blood and liver and didn’t represent the full lamprey genome. The team extracted DNA from lamprey sperm, sequenced it, and sent the newly sequenced genome to Robb to upload to SIMRbase and make it public.

Parker uses SIMRbase to identify important Hox gene clusters, look at the timing of expression of these genes in lamprey embryonic development, and design RNA probes to characterize where in the embryo these genes are activated. Ultimately, this enables comparisons between vertebrates as to how they are using these important genes during their embryonic development. Designing probes in lamprey has been difficult, because the organisms have many repetitive sequences within their genome. Now that the entire genome assembly is available in SIMRbase, with all the repeats mapped, Parker can better spot the genes he’s seeking.

Both Parker and Smith say SIMRbase played a key role in their research. “For me, and for lamprey scientists in general, SIMRbase is critical because it serves as a place where anyone can go in and look at the same pocket of genome and the same annotations and use those for their work. It’s a common framework,” Smith says.

Understanding cavefish

Robert Peuss, PhD, a postdoctoral research associate working in the lab of Nick Rohner, PhD, on a fellowship funded by the German Research Society, uses SIMRbase tools to look at the immune cell composition of the blind cavefish Astyanax mexicanus.

The Mexican cavefish evolved from a species common in Mexican rivers. Between 100,000 and 200,000 years ago, flooding flushed some of the population into caves, trapping them in an environment that lacks most of their common parasites.

Over time, the fish underwent dramatic changes to survive in an environment devoid of light and food, losing their eyes and eating only when seasonal flooding pushed food into their caves. They also developed high body fat and insulin resistance, a discovery recently reported by Peuss, Rohner, and collaborators in a March 2018 paper in Nature.

“Our ultimate goal is to understand how these cavefish adapted to an environment with low parasite diversity without having these auto-inflammatory diseases that we see in human populations under similar environmental conditions,” he says, citing the rise in allergies, diabetes, and other inflammatory illnesses in humans. The rise of these illnesses is thought to be due in part to living in much cleaner environments compared to past generations, with less exposure to parasites that help activate the immune system. Cavefish live in a similarly tidy environment.

“That makes them interesting in regard to how an immune system evolves,” says Peuss. “How do you come to a point when your immune system attacks yourself? What are the environmental conditions that cause this? What are the genetic changes in the genome of cavefish that have enabled them to live under these parasite-free conditions?”

To find out, Peuss and his colleagues are using a process called QTL (quantitative trait locus) analysis to match sets of cavefish characteristics with specific genetic changes. SIMRbase is helpful for this type of analysis because of the sheer volume of cavefish that need to be sampled—up to 300 cavefish and their hybrid offspring—to complete a thorough analysis. With that data in hand, the scientists can target the genomic regions responsible for producing a fish with cavefish traits, Peuss says.

A planaria by anhy other name

Over in the Sánchez Alvarado Lab, Erin Davies, PhD, a postdoctoral research associate, studies flatworm regeneration and embryogenesis, the process of growing from a single fertilized egg into a properly formed organism.

Planarian flatworms have an unparalleled ability to regenerate. If an adult worm is cut apart, almost any piece can form a new, fully functional animal within just two weeks. But this phenomenon is still poorly understood.

Wanting to know more, Davies and colleagues generated a staging series, or a set of unique molecular fingerprints, for planaria embryos, as well as a gene expression atlas describing embryonic tissues and the formation of major organ systems during embryogenesis. Robb created a user-friendly, accessible, community resource to house this data, called Planosphere (, a SIMRbase spinoff site, launched in January 2017 in conjunction with the researchers’ report in eLife.


Their study was the first to discover that adult stem cells called neoblasts, key to planaria regeneration, arise during a specific stage of embryonic development, findings that could guide future therapeutic advances for patients suffering from degenerative diseases or traumatic injuries.

Now, Davies and Postdoctoral Research Associate Stephanie Nowotarski, PhD, have been busy adding terms to Planosphere’s Planarian Anatomy Ontology, a standardized vocabulary for data annotation and cross-species comparisons. The ontology includes more than 300 terms and definitions for cellular organelles, cell types, tissues, organ systems, anatomical entities, life cycle stages, and developmental processes described in scientific literature.

“With planaria, there’s a lot of cell biology that’s really just being discovered and described for the first time,” Davies says. “Our hope is that the Planarian Anatomy Ontology will be used by ourselves and other planarian researchers to curate genomic, imaging, and phenotypic data sets, and by researchers looking to make comparisons across different species.”

Robb also uses SIMRbase to compare molecular functions during embryogenesis and adulthood, overlaying expression data to understand how genes are behaving during development, homeostasis, and regeneration. SIMRbase is packed with information and tools, making such comparisons easy, Davies says.

Plans to grow

Going forward, Robb intends to add to the database of genomes available on SIMRbase and help provide scientists with whatever tools they need to succeed. In addition to the lamprey, she plans to make other emerging organisms public, and some of the tools as well.

“The goal is to help the scientific community,” she says, “and to promote sharing and collaboration, especially as a team.”