<<
  "ICD is an interdisciplinary research center focused on employing massively parallel strategies towards the rapid discovery of new materials."  
     
The Institute for Combinatorial Discovery
   
 
Informatics & Statistical Analysis Research

 

Informatics and Statistical Analysis. The radical changes in information generation and structure driven by CombiSci require sophisticated informatics tools to digest massive data sets and advanced statistical analysis methods to address multi-dimensional error analysis and experimental design. Our focus is on the application of informatics tools to rationalize the vast experimental data sets while developing statistical tools to address library design and screening uncertainties.

Informatics (Rajan): The sheer quantity and complexity of data generated from CombiSci experiments far exceeds that from traditional experimental science. For example, library design parameters often include complex measurement and control errors with interdependent data components such that analysis requires intricate multiple risk-control. These issues are further compounded by split-and-pool processing, performance screening, or interpolation/extrapolation of outcomes. Thus, meta-analytic inference procedures are required to verify the scientific context of the data. Our informatics infrastructure collects and processes experimental data by using a variety of multivariate analyses (e.g., principal component, partial least squares, and self-organization maps). We have successfully employed these and other analytical tools in prior predictive and classification analyses based on data generated by CombiSci experiments. Outcomes from the informatics analysis are used to guide the next generation of CombiSci experiments. To validate the hypotheses and predictive models, we utilize cross-validation methods common to machine learning in addition to experimental validation of data mining results. We have also created “virtual” Combinatorial libraries by interfacing informatics with large data sets to develop new and better descriptors, which will serve as conduits for design and interpretation of experiments.

Statistical Analysis (Morris, Rollins): We are developing advanced statistical analysis to coordinate uncertainties associated with CombiSci experiments to relevant outcome measures. Understanding the relationship between predictive parameters and laboratory uncertainty requires unified analysis of the relevant theory and experimental process parameters. Characterization of uncertainty for a single experiment then leads to an understanding of how it must be characterized for
1) multiple experiments using the same library,
2) CombiSci ned analyses based on similar libraries, and ultimately
3) higher-level synthesis of information from related studies that do not rely on the same physical equipment or protocols.