CADLIVE (Computer-Aided Design of LIVing systEms) Project

INTRODUCTION

Our goals are to explore bioalgorithms (design principles), fundamental mechanisms of how a biochemical network generates particular cellular functions and to design cells based on such bioalgorithms at the molecular interaction level. Bioalgorithms create Design Engineering for Cellular Functions.

References regarding the CADLIVE Project

Research

1 Computational tools for analyzing and designing biochemical networks (CADLIVE)

1.1 Definition of graphical notation for biochemical networks

We propose the CADLIVE graphical notation and compared it with other proposals

(See Extended CADLIVE and Its supplementary data)

Presentation PPTfile

1.2 Dynamic Simulator: Automatic conversion from a biochemical network to dynamic simulation

We first propose the rules of how a complicated biochemical networks are automatically converted into a dynamic model.

Computational tools for optimization, system analysis, S-system and grid computing are available.

Presentation PPT file

Go to CADLIVE

Go to a summary of CADLIVE

2 Dynamic model construction

2.1 Forward and reverse engineering

Generally, there are two approaches to build molecular systems: reverse engineering and forward engineering. The very abstract model generally employs reverse engineering, whereas the concrete model adopts forward engineering. Reverse engineering typically requires the use of simplistic parametric models of a large-scale network, e.g., Bayesian networks and Boolean networks, and the parameters of which are adjusted to fit real-world data. In forward engineering, a dynamic model is built based on detailed molecular interactions with exact kinetic parameters to achieve biological reality. This requires extensive knowledge of the system being studied.

Ideally, we would like to gain access to the activities of all-important molecular species including complexes and modified molecules. There is a strong need for methods that can handle concrete and complicated molecular systems at an intermediate level without going all the way down to exact biochemical reactions. A solution for such a requirement is to combine forward engineering and reverse engineering. Forward engineering builds mathematical models with kinetic-related parameters from biochemical maps, and reverse engineering explores the kinetic parameters to fit to experimental data. From this viewpoint, the model would focus on capturing the intrinsic architecture of molecular networks rather than their detailed kinetics, where gene regulatory and metabolic network maps should play a central role in simulating their dynamics. The research of "biochemical maps to dynamics" is a promising field.

Cited from

Hiroyuki Kurata, Kouichi Masaki, Yoshiyuki Sumida, Rei Iwasaki, CADLIVE Dynamic Simulator: Direct Link of Biochemical Networks to Dynamic Models. Genome Res., 15: 590-600, 2005.

2.2 Parameter estimation (Inverse problem)

2.2.1 Two-phase search algorithm

Dynamic simulations are essential for understanding the mechanism of how biochemical networks generate robust properties to environmental stresses or genetic changes. However, typical dynamic modeling and analysis yield only local properties regarding a particular choice of plausible values of kinetic parameters, because it is hard to measure the exact values in vivo. Global and firm analyses are needed that consider how the changes in parameter values affect the results. A typical solution is to systematically analyze the dynamic behaviors in large parameter space by searching all plausible parameter values without any biases. However, a random search needs an enormous number of trials to obtain such parameter values. Ordinary evolutionary searches swiftly obtain plausible parameters but the searches are biased. To overcome these problems, we propose the two-phase search method that consists of a random search and an evolutionary search to effectively explore all possible solution vectors of kinetic parameters satisfying the target dynamics. We demonstrate that the proposed method enables a nonbiased and high-speed parameter search for dynamic models of biochemical networks through its applications to several benchmark functions and to the heat shock response model.

Kazuhiro Maeda, Hiroyuki Kurata, Two-phase search (TPS) method: Nonbiased and high-speed parameter search for dynamic models of biochemical networks, IPSJ Transaction on Bioinformatics 2:2-14, 2009

PPT file

3. System analysis

3.1 Module-based analysis of robustness

Biological systems maintain phenotypic stability in the face of various perturbations arising from environmental changes, stochastic fluctuations, and genetic variations. This robustness, which seems to be an inherent property of such systems, is still poorly understood at the molecular level. At the same time, systems approaches that were used with great success in the study and design of complex engineered systems provide a unique opportunity for investigating the basic tenants of robustness in cellular mechanisms. This is motivated by the fact that at the system level, biology and engineering seem to have a large number of common features despite their extremely different physical implementations.

The heat shock response is one such robust cellular system, which interestingly achieves its seemingly simple objective of refolding or eliminating heat-denatured proteins through a complicated set of interactions. In analogy to engineering control architectures, the complex regulation strategies seem to be a specifically designed solution to generate robustness against different types of perturbations.

Cited from

H. Kurata*, H. El Samad*, R. Iwasaki, H. Othake, J.C. Doyle, I. Grigorova, C. Gross, and M. Khammash, Module-Based Analysis of Robustness Tradeoffs in the Heat Shock Response System. PLoS Computational Biology, 2: e59, 2006. *Both contributed equally.

Heat shock response

Using module-based analysis coupled with rigorous mathematical comparisons, we propose that in analogy to control engineering architectures, the complexity of cellular systems and the presence of hierarchical modular structures can be attributed to the necessity of achieving robustness in the heat shock response.

Presentation PPTfile

Hierarchical modular architecture

Presentation PPTfile

3.2 Mathematical Analysis of Robustness (MAR): Beyond parameter problems

Dynamic simulations are necessary for understanding the mechanism of how biochemical networks generate robust properties to environmental stresses or genetic changes. Sensitivity analysis of mathematical models allows the linking of robustness to network structure. However, ordinary numerical analysis yields only local properties regarding a particular choice of plausible parameter values, because it is hard to know the exact parameter values in vivo. We need global and firm results that do not depend on particular parameter values.

We propose a mathematical analysis for robustness (MAR) that combines sensitivity analysis with novel evolutionary searches that explore many solution vectors of kinetic parameters, thereby determining critical reactions. We analyze the sensitivity of amplitudes and periods to changes in kinetic parameters in the Drosophila interlocked circadian clock system and clearly identified the critical reactions responsible for determining the circadian cycle. This work suggests that the circadian clock intensively evolves or designs the kinetic parameters so that it creates a highly robust cycle.

Cited from

H. Kurata, T. Tanaka, and F. Ohnishi, Mathematical identification of critical reactions in the interlocked feedback model, PLoS One, 2(10): e1103, 2007

Main text Supplementary data

Presentation PPTfile

4 Computer-Aided Rational Design (CARD)

The goals of systems biology are to understand the mechanisms of how biochemical networks generate particular cellular functions in response to environmental stresses or genetic changes, and to rationally design these molecular processes to meet an engineering purpose. To design biological systems at the molecular interaction level, it is essential to identify a biochemical network map, to build a dynamic model of the system, and to perform system analysis. Perturbation analysis is useful for identifying critical parameters that affect the system's performance. CAD is now a key technology to simulate or design the molecular architecture of a genetically engineered cell.

To rationally design a biochemical network, we propose a Computer-Aided Design (CAD) based strategy that consists of biochemical network design, module decomposition analysis, perturbation analysis for a dynamic model and experimental verification.

Assuming that the E. coli glucose phosphotransferase system (PTS) aims at controlling the glucose uptake rate, the PTS network model was decomposed into hierarchical modules in analogous to engineering control architectures, and the effect of changes in gene expression on the glucose uptake rate was simulated to make a plan of how the gene regulatory network is engineered. Such design and analysis predicted that the mlc knockout mutant with ptsI gene overexpression greatly increases the specific glucose uptake rate, and biological experiments validated the prediction, thereby demonstrating the feasibility of the proposed strategy.

Cited from

Yousuke Nishio, Yoshihiro Usuda, Kazuhiko Matsui, Hiroyuki Kurata, Computer-aided rational design of the phosphotransferase system for enhanced glucose uptake in Escherichia coli, Molecular Systems Biology, 4:160.

Main text Supplemental data

Presentation PPTfile

5 Mathematical tools for metabolic flux analysis

We propose various mathematical methods to construct and analyze large-scale metabolic network.

5.1 Integration proteome into metabolic flux analysis

Network-based pathway analysis facilitates understanding or designing metabolic systems and enables prediction of metabolic flux distributions. Network-based flux analysis requires considering not only pathway architectures but also the proteome or transcriptome to predict flux distributions, because recombinant microbes significantly change the distribution of gene expressions. The current problem is how to integrate such heterogeneous data to build a network-based model.

To link enzyme activity data to flux distributions of metabolic networks, we have proposed Enzyme Control Flux (ECF), a novel model that integrates enzyme activity into elementary mode analysis (EMA). ECF presents the power-law formula describing how changes in enzyme activities between wild-type and a mutant are related to changes in the elementary mode coefficients (EMCs). To validate the feasibility of ECF, we integrated enzyme activity data into the EMCs of Escherichia coli and Bacillus subtilis wild-type. The ECF model effectively uses an enzyme activity profile to estimate the flux distribution of the mutants and the increase in the number of incorporated enzyme activities decreases the model error of ECF.

Cited from

Hiroyuki Kurata, Quanyu Zhao, Ryuichi Okuda, Kazuyuki Shimizu: Integration of enzyme activities into metabolic flux distributions by elementary mode analysis, BMC Systems Biology, 1:31, 2007.

PresentationPPTfile

5.2 Elementary mode-based prediction of a broad range of genetically modified mutants

Gene deletion and over-expression are critical technologies for designing or improving the metabolic flux distribution of microbes. Some algorithms including flux balance analysis (FBA) and minimization of metabolic adjustment (MOMA) predict a flux distribution from a stoichiometric matrix in the mutants in which some metabolic genes are deleted or non-functional, but there are few algorithms that predict how a broad range of genetic modifications, such as over-expression and under-expression of metabolic genes, alters the phenotypes of the mutants at the metabolic flux level.

To overcome such existing limitations, we develop a novel algorithm that predicts the flux distribution of the mutants with a broad range of genetic modification, based on elementary mode analysis. It is denoted as Genetic Modification of Flux (GMF), which couples two algorithms that we have developed: Modified Control Effective Flux (mCEF) and Enzyme Control Flux (ECF). mCEF is proposed based on CEF to estimate the gene expression patterns in genetically modified mutants in terms of specific biological functions. GMF is demonstrated to predict the flux distribution of not only gene deletion mutants but also the mutants with under-expressed and over-expressed genes in Escherichia coli and Corynebacterium glutamicum. This achieves breakthrough in the a priori flux prediction of a broad range of genetically modified mutants.

Cited from:

Quanyu Zhao, Hiroyuki Kurata, Genetic modification of flux for flux prediction of mutants, Bioinformatics, 25: 1702-1708, 2009

Presentation PPT file

5.3 Maximum entropy principle (MEP) for a new objective function

Elementary Mode (EM) analysis is potentially effective in integrating transcriptome or proteome data into metabolic network analyses and in exploring the mechanism of how phenotypic or metabolic flux distribution is changed with respect to environmental and genetic perturbations. The EM Coefficients (EMCs) indicate the quantitative contribution of their associated EMs and can be estimated by maximizing Shannon's entropy as a general objective function in our previous study, but the use of EMCs is still restricted to a relatively small scale networks. We propose a fast and universal method that optimizes hundreds of thousands of EMCs under the constraint of the Maximum Entropy Principle (MEP). Lagrange multipliers (LMs) are applied to maximize the Shannon's entropy-based objective function, analytically solving each EMC as the function of LMs. Consequently, the number of such search variables, the EMC number, is dramatically reduced to the reaction number. To demonstrate the feasibility of the MEP with Lagrange multipliers (MEPLM), it is coupled with Enzyme Control Flux (ECF) to predict the flux distributions of E. coli and S. cerevisiae for different conditions (gene deletion, adaptive evolution, temperature and dilution rate) and to provide a quantitative understanding of how metabolic or physiological states are changed in response to these genetic or environmental perturbations at the elementary mode level. It is shown that the ECF-based method is a feasible framework for the prediction of metabolic flux distribution by integrating enzyme activity data into EMs to genetic and environmental perturbations.

Cited from:

Quanyu Zhao, Hiroyuki Kurata, Use of maximum entropy principle with Lagrange multipliers extends the feasibility of elementary mode analysis. J Biosci Bioeng, 110: 254-261, 2010.

Quanyu Zhao, Hiroyuki Kurata,Maximum entropy decomposition of flux distribution at steady state to elementary modes. J Biosci Bioeng, 107: 84–89, 2009

6 Statistical analysis of genome-scale networks

6.1 Spectral clustering for protein-protein interaction networks

A goal of systems biology is to analyze large-scale molecular networks including gene expressions and protein-protein interactions, revealing the relationships between network structures and their biological functions. Dividing a protein-protein interaction (PPI) network into naturally grouped parts is an essential way to investigate the relationship between topology of networks and their functions. However, clear modular decomposition is often hard due to the heterogeneous or scale-free properties of PPI networks.

To address this problem, we propose a diffusion model-based spectral clustering algorithm, which analytically solves the cluster structure of PPI networks as a problem of random walks in the diffusion process in them. To cope with the heterogeneity of the networks, the power factor is introduced to adjust the diffusion matrix by weighting the transition (adjacency) matrix according to a node degree matrix. This algorithm is named the adjustable diffusion matrix-based spectral clustering (ADMSC). To demonstrate the feasibility of ADMSC, we apply it to decomposition of a yeast PPI network, identifying biologically significant clusters with approximately equal size. Compared with other established algorithms, ADMSC facilitates clear and fast decomposition of PPI networks.

ADMSC is proposed by introducing the power factor that adjusts the diffusion matrix to the heterogeneity of the PPI networks. ADMSC effectively partition PPI networks into biologically significant clusters with almost equal sizes, while it is very fast, robust and appealing simple.

Cited from

Kentaro Inoue, Weijiang Li and Hiroyuki Kurata, Diffusion Model Based Spectral Clustering For Protein-Protein Interaction Networks. PLOS ONE, 5:e12623, 2010.

Manuals and program for ADMSC

Supplementary File

7 Biological Functional Networks

In synthetic biology and systems biology, a bottom-up approach can be used to construct a complex, modular, hierarchical structure of biological networks. To analyze or design such networks, it is critical to understand the relationship between network structure and function, the mechanism through which biological parts or biomolecules are assembled into building blocks or functional networks. A functional network is defined as a subnetwork of biomolecules that performs a particular function. Understanding the mechanism of building functional networks would help develop a methodology for analyzing the structure of large-scale networks and design a robust biological circuit to perform a target function. We propose a biological functional network database, named BioFNet, which can cover the whole cell at the level of molecular interactions. The BioFNet takes an advantage in implementing the simulation program for the mathematical models of the functional networks, visualizing the simulated results. It presents a sound basis for rational design of biochemical networks and for understanding how functional networks are assembled to create complex, high-level functions, which would reveal design principles underlying molecular architectures.

Cited from

BioFNet: biological functional network database for analysis and synthesis of biological systems

BioFNet: Biological Functional Network Database