Often we do not know all the biological mechanisms required to predict the behavior of a bioengineered system through mechanistic modeling (e.g. how does induction time or amount affect flux through a pathway). In those cases, a data-intensive, statistics-based approach can still predict bioengineered systems to the degree required to drive metabolic engineering efforts. We use a variety of machine learning approaches (ranging from scikit learn to deep learning) for this purpose. We have used these techniques to tie proteomic profiles to production and suggest improved proteomics profiles, as well as to predict pathway dynamics. The combination of machine learning techniques with the ability to produce our own data in an automated fashion using the Biolector & Robolector is unique within the national labs. Machine learning provides a systematic method to leverage data stored in the Experiment Data Depot (EDD) to guide metabolic engineering methods without the need for a deep mechanistic understanding. This data-intensive approach is usually limited by the availability of data. Typically, 50-100 conditions (strains) are needed for successful predictions, as well as 3-4 consecutive rounds in which predictions are tested and the new data is used to improve predictions for the next round. This capability is available internally to ABF researchers, as well as to ABF CRADA projects.
References and Additional Information:
Denby, Charles M., et al. “Industrial brewing yeast engineered for the production of primary flavor determinants in hopped beer.” Nature communications 9.1 (2018): 965.
Alonso-Gutierrez, Jorge, et al. “Principal component analysis of proteomics (PCAP) as a tool to direct metabolic engineering.” Metabolic engineering 28 (2015): 123-133.
Argonne National Laboratory
Lawrence Berkeley National Laboratory
Pacific Northwest National Laboratory
National Renewable Energy Laboratory