Erin H. Wilson

Data Scientist at LanzaTech || Pursuing a career at the intersection of data science, biology, and sustainability!


Current resume as pdf.
See my Research

Education

Ph.D. Computer Science & Engineering
      University of Washington, Seattle, WA
      June 2023
Visiting PhD student, DTU Center for Biosustainability
      Technical University of Denmark, Lyngby, Denmark
      April-August 2022
M.S. Computer Science & Engineering
      University of Washington, Seattle, WA
      Fall 2019  
B.A. Computational Biology
      Carleton College, Northfield, MN
      June 2014  

Awards & Honors

Scan Design Foundation Fellowship
      Technical University of Denmark (DTU)     2022
      Support for research and cultural exchange between Danish and American students
NSF Graduate Research Fellow
      University of Washington     2019
      Three years of research funding from the National Science Foundation
Marilyn Fries Fellow
      University of Washington     2017-2018
      First year funding from UW CSE
Graduation Honors
      Carleton College     2014
      Magna Cum Laude
      Received "Distinction" on Senior Thesis
Clare Boothe Luce Scholar
      Carleton College     Summer 2012
      Summer research funding for women in physics and computer science

Research Interests & Projects

My early scientific career has focused on metabolic engineering: the science of using microorganisms as tiny biological factories that can more sustainably produce everyday materials. To do this, we can edit microorganism genomes to convert renewable feedstocks (e.g., sugar) or waste streams (e.g., methane) into a desired target molecule such as medicine, biofuel, or theoretically any molecule found in nature.

My PhD research aimed to develop computational methods to better understand the "genetic grammar" underlying how methane-consuming bacteria control gene expression and use these insights to more efficiently engineer them to convert methane waste into useful materials.

  Using deep learning approaches to identify regulatory motifs involved in Methanotroph gene regulation

  with Mary Lidstrom & Dave Beck  
Built a framework to apply deep learning models to predict RNA-seq expression levels directly from DNA sequences (upstream promoter regions) in the methanotroph M. buryatense. The goal was to identify the sub-sequences within promoters that are particularly important for influencing expression changes across a variety of growth conditions, as these sequences may be candidates for further development as metabolic switching tools. After initial modeling approaches were unsuccessful, we shifted towards a deeper characterization of deep learning model performance in data-limited regimes.
      Workshop Proposal
      Tutorials exploring simple and complex synthetic prediction tasks
      Git Repo

  Predicting iModulon membership from promoter regions

  with Mary Lidstrom (UW), Dave Beck (UW) & Lars Nielsen (DTU)  
Used Independent Component Analysis (ICA) to identify independently regulated gene modules (iModulons) from a compendium of bulk RNA-seq data in M. buryatense. Subsequently, we are working to build deep learning models to predict each gene's iModulon label from its promoter region in order to learn sequence patterns relevant to iModulons' regulation.

 Developed a computational framework to idenitfy strong promoters in non-model organisms

  with Mary Lidstrom & Dave Beck  
Analyzing genomic sequence and RNA-seq data in the methanotroph Methylotuvimicrobium buryatense 5GB1 to identify promoter sequence patterns that confer constitutive, strong expression. Our compuational pipeline may be similarly applied to other non-model organisms that lack extensive genetic characterization to help identify key pieces of their regulatory grammars.
      Publication
      Project Page
      Git Repo

 Decoding yeast gene regulation from millions of random sequences

  with Georg Seelig  
Trained machine learning models on massively parallel reporter data from millions of randomized promoter sequences to characterize gene regulation in yeast.

 Understanding gene expression patterns in developing heart tissue

  with Georg Seelig  
Analyzed single-cell RNA-sequencing data to understand gene expression patterns in differentiating cardiomyocytes. (In collaboration with the Allen Institute for Cell Science)

Industry & Work Experience

I began my research journey as a biologist and have since grown into a computer scientist with an interest in understanding biological data. I am excited about opportunities that allow me to span across fields and require computational skillsets to dig into outstanding challenges in biology and global sustainability.

LanzaTech, Data Scientist

  Remote     2024 - present
Working within the AI-CompBio team to analyze omics and biological data from LanzaTech bioproduction strains for converting industrial carbon emissions into renewable materials.

University of Washington, Graduate Researcher, Computer Science

  Seattle, WA     2017 - 2023
Built computational frameworks to accelerate genetic engineering efforts in methane-consuming bacteria, a promising carbon removal platform; established a suite of new methanotroph promoter tools; characterized the effectiveness of machine learning models for discovering influential genetic patterns from RNA-seq experiments in microorganisms with limited data.

Zymergen, Intern, Data Science

  Seattle, WA     June 2018 - August 2018
Prototyped machine learning and convolutional neural network models for predicting DNA regulatory features in non-standard microbe genomes.

Amyris, Associate Scientist, Scientific Computing

  Emeryville, CA     July 2014 - July 2017
In my 3 years in the Scientific Computing group at Amyris, I applied my background in genetics and computer science to various computational projects in R&D. My roles ranged from the designated computational resource for a given project, to a member on a team of computational experts, and a communication bridge between software engineers and biologists. Several specific projects I worked on include:
  • characterizing the genomic impact of chemical mutagens
  • maintaining the company's whole genome sequencing pipeline
  • developing and training the Amyris community in Genotype Specification Language (a DNA design tool invented at Amyris)
  • building a Genotype Generator tool to translate high level designs for metabolic pathways into concrete build instructions for strains that can carry out pathway designs

Amyris, Intern, Scientific Computing

  Emeryville, CA     December 2013
Coded a data visualization tool to help strain engineers overlay experimental data onto yeast metabolic pathways.

 University of Minnesota, Research Assistant, Myers Lab (Computational Biology)

  Minneapolis, MN     June 2013 - August 2013
Used genetic interaction and chemical genetic interaction data to code a target prediction pipeline in Python. Developed a benchmark standard for accurately predicting gene targets for chemicals of interest.

 Carleton College, Research Assistant, Goings Lab (Evolutionary Computing)

  Northfield, MN     June 2012 - August 2012
Performed experiments on evolving populations of digital organisms to examine the effects of limited CPU resources on the populations’ ability to evolve complex Boolean logic functions.

 UCSF, Research Assistant, Ahituv Lab (Genetics)

  San Francisco, CA     June 2011 - August 2011
Perfomed chromatin immunoprecipitation sequencing experiments on mouse limb tissue to find enhancer candidates involved in limb patterning and development.

Scientific Communication

Publications

  • L. He, J. D. Groom, E. H. Wilson, J. Fernandez, M. C. Konopka, D. A. C. Beck, M. E. Lidstrom. “A methanotrophic bacterium to enable methane removal for climate mitigation.” (2023) PNAS. [Article] [Interactive viz gallery]

  • A. H. Singh, B. B. Kaufmann-Malaga, J. A. Lerman, D. P. Dougherty, Y. Zhang, A. L. Kilbo, E. H. Wilson, C. Y. Ng, O. Erbilgin, K. A. Curran, C. D. Reeves, J. E. Hung, S. Mantovani, Z. A. King, M. J. Ayson, J. R. Denery, C. Lu, P. Norton, C. Tran, D. M. Platt, J. R. Cherry, S. S. Chandran, A. L. Meadows. (2023) “An Automated Scientist to Design and Optimize Microbial Strains for the Industrial Production of Small Molecules” [bioRxiv]

  • E. H. Wilson, M. E. Lidstrom, and D. A. C. Beck. (2021) "A multi-task learning approach to enhance sustainable biomolecule production in engineered microorganisms." Tackling Climate Change with Machine Learning, workshop at ICML 2021. [Video Recording] [Proposal]

  • E. H. Wilson, J. D. Groom, M. C. Sarfatis, S. M. Ford, M. E. Lidstrom, and D. A. C. Beck. (2021) "A Computational Framework for Identifying Promoter Sequences in Nonmodel Organisms Using RNA-seq Data Sets." ACS Synthetic Biology. [Article]

  • E. H. Wilson, C. Macklin, and D. Platt. (2018) "Engineering genomes with Genotype Specification Language." In Methods in Molecular Biology, Synthetic Biology. J.C. Braman, ed. Springer Publishing Company, New York, NY. [PubMed]

  • S. W. Simpkins, J. Nelson, R. Deshpande, S.C. Li, J. S. Piotrowski, E. H. Wilson, A. A. Gebre, R. Okamoto, M. Yoshimura, M. Costanzo, Y. Yashiroda, Y. Ohya, H. Osada, M. Yoshida, C. Boone, C. L. Myers. (2018) “Predicting bioprocess targets of chemical compounds through integration of chemical-genetic and genetic interactions.” PLoS Computational Biology. [PubMed]

  • E. H. Wilson, S. Sagawa, J. Weis, M. Shubert, M. Bissell, B. Hawthorne, C. Reeves, J. Dean, and D. Platt. (2016) "Genotype Specification Language." ACS Synthetic Biology. 5(6), pp 471-478. [PubMed]

Technical Tutorials

Presentations & Posters

  • E. H. Wilson., M. E. Lidstrom, D. A. C. Beck. “Probing the limits of deep learning methods for predicting gene expression in non-model microbes.” Rapid talk and poster at SBFC. Portland, OR, April 2023. [Poster PDF]

  • E. H. Wilson, M. E. Lidstrom, D. A. C. Beck. “Methane, Microbes, and Machine Learning: Engineering biology to combat climate change.” Poster at Industry Affiliates Research Symposium at the University of Washington, November 2022. [Poster PDF]

  • "Using microorganisms to mitigate macro problems." Oral talk at Virtual Women's Research Day, University of Washington, 2020. [Video Recording]

  • "Using microorganisms to solve macro problems: untangling the genetic circuitry of methane-eating bacteria." Oral talk at MIDAS Data Science Symposium, University of Michigan, 2019.

  • "Can deep learning help us program biology?" Oral talk at Industry Affiliates Research Day, University of Washington, 2018.

  • E. H. Wilson, D. Platt. “Genotype Specification Language: Programming in DNA!” Poster at Synthetic Biology, Engineering, Evolution & Design (SEED) conference in Chicago, July 2016.

Outreach & Community Building

Science Communication for General Audiences

Leadership and Volunteering

  • Technical mentor for Paper Airplanes, Women in Tech intro Python course (2023-present)
  • Research mentor for undergraduate student (2020-2023)
  • Pre-application review mentor for prospective graduate students from diverse backgrounds (2021-2022)
  • Peer mentor for groups of incoming graduate students (2018-2022)
  • Graduate Peer Mentorship Program Organizer (2019-2020)
  • TGIF Social Chair (2018-2019)
  • New Grad Orientation Organizer (2018)

Youth Education

  • Programming Organisms with DNA Puzzles! - Developed an interactive activity to teach elementary/middle schoolers about genetic engineering.
    • Engineering Discovery Days, University of Washington
    • Introduce a Girl to CoRDS (Coding, Robotics, and Data Science), University of Washington

General Volunteering

  • PAWS - Wildlife Hospital Volunteer
  • MeadoWatch - Field Data Collector, Mt. Rainier National Park

Miscellaneous Science Nerdisms


Goofy data analysis

Puns and Lyrics

Sometimes my brain makes jokes. Often they involve Disney songs and/or science puns. Here are a few of them :)

Inspired by working at Amyris


Inspired by UW classes (ML, NLP, SynBio)


Contact Me

I'm always excited to learn more about how a computer/data scientist can help solve problems in biology and sustainability! Feel free to connect :)

Also, if you're considering exploring the intersection of Biology and Computer Science, I'd be happy to chat about my experience navigating undergrad, working in industry, and transitioning back to grad school.

You can reach me at erinhwilson gmail.com.

I also have a LinkedIn.