Ruijie Shi

Date of Award


Degree Type


Degree Name

Doctor of Philosophy (PhD)


Chemical Engineering


John F. MacGregor


Subspace Identification Methods (SIMs) are a class of new identification methods that have drawn considerable interest in recent years. The key idea of these methods is to estimate the process states or the extended observability matrix directly from the process input and output data. The best-known SIMs are Canonical Variate Analysis (CVA), Numerical Subspace State-Space System Identification (N4SID) and Multivariable Output Error State space (MOESP). This thesis focuses on both fundamental research and application study of SIMs. The first part of the fundamental research involves the analysis of SIM algorithms from a statistical estimation viewpoint. For this purpose, a multi-step state-space model is set up first to reveal the relationships between the process states and the process data sets. Based on this model, SIM algorithms are analyzed to reveal their basic principles and bias issues. Several new SIM algorithms are proposed and shown to have similar performance as the existing algorithms. Relationships between SIMs and Latent Variable Methods (LVMs) for identification are then explored. It is shown that N4SID can be derived from Reduced-Rank Analysis (RRA) just as CVA is developed based on Canonical Correlation Analysis (CCA). Insights from this relationship lead to a variety of approaches to improve the performance of N4SID. The similarities and differences between SIMs and LVMs are investigated, with emphases on their causality, data collection and applications. For estimating the states, CCA and RRA are shown to be more efficient than Principal Component Analysis (PCA) and Partial Least Squares (PLS). A general statistical framework is proposed to unify SIM algorithms. The framework breaks all SIMs down into three common steps: (1) use of a linear regression method to estimate the predictable subspace, (2) use of a latent variable method to estimate a minimal set of the state variables, and (3) then fitting the estimated process states to the state-space model. Combining the approaches in the first two steps leads to a whole set of new SIM algorithms. Simulation studies show that these new SIM algorithms have similar performance as the existing SIMs. This framework reveals the nature of the computation steps in SIM algorithms and the fundamental ideas behind SIM algorithms. It also discloses the relationships among different SIM algorithms. The applicability of SIMs for closed-loop data is investigated. The original N4SID algorithm and the CVA algorithm based on regressing out the effects of future inputs are shown to give biased results. In general, whether a subspace identification algorithm is applicable for closed-loop data depends on how the effects of future inputs are treated in estimating the predictable subspace (step one of the proposed framework). Based on this analysis, several new N4SID and CVA algorithms are proposed for closed-loop data. Practical issues arising from applications of SIMs are also discussed. SIMs are shown to be able to handle the delays, common dynamics, non-stationary and co-integrating disturbances in the process. The advantages, as well as solutions for possible problems, are also presented. Some general guidelines are provided for applications of SIMs.

Files over 3MB may be slow to open. For best results, right-click and select "save as..."