Date of Award
Doctor of Philosophy (PhD)
Statistical classification and regression are two widely used statistical techniques. The former is concerned with the problem of classifying an object of unknown origin into one of two or more distinct groups of populations on the basis of observations made on it. The broad appeal of regression techniques results from the conceptually simple process of using an equation to express the relationship between a set of variables. These techniques are said to be resistant when the result is not greatly altered in the case a small fraction of the data is altered: techniques are said to be robust of efficiency when their statistical efficiency remains high for more realistic than the utopian cases of normal distributions. These properties are particularly important in the formative stages of model building when the form of the response is not known exactly. Techniques with these properties are proposed and discussed.
First, we present Tiku's (1967, 1970, 1980) univariate MML (Modified Maximum Likelihood) estimation method based on Type-II symmetrically censored normal samples. Next, we describe the extension of this method of estimation as given by Tiku (1988) for the bivariate case and by Tiku and Balakrishnan (1988) for the general multivariate case. Some asymptotic distributional properties of these multivariate MML estimators are also discussed. Then, we develop a robust multivariate linear two-way classification procedure based on the MML (Modified Maximum Likelihood) estimators and demonstrate that it performs overall more efficiently than the classical linear two-way classification procedure, for a fixed value of one of the two errors of misclassification. In the case when both the errors of misclassification are allowed to float, we also show that the robust procedure has a smaller error rate than the classical procedure and a much smaller error rate than non-parametric classification procedures like the nearest neighbour method and the method based on density estimates. For this vivid comparative study examining the performance of these various classification procedures, we use the data obtained from a very elaborative and extensive anthropometric survey conducted by Majumdar (1941) in the United Provinces of India.
Next, the classification procedures where the classification has to be based on the observed value of a pair of variables, one being a dichotomous random variable (univariate or multivariate) and the other being univariate or multivariate continuous variable have been studied. For this study, we consider the existing classification procedures due to Chang and Afifi (1974.) and Balakrishnan and Tiku (1988) and extended the latter method to the case when the data contain a multivariate dichotomous and an associated multivariate continuous variable. 'We illustrate the non-robust characteristic of the Chang and Afifi (1974) procedure through a study of the distribution functions and expected values of errors of misclassification when both errors are allowed to float. After showing that the linear classification procedure could be sensitive to departures from the homogeneity oi variances, we propose quadratic and transformed linear classification procedures for the problem of classification based on dichotomous and continuous variables with the populations differing from each other not only in means but also in variances.
Next, we adopt the modified maximum likelihood approach and derive a robust method of estimation of the parameters in a simple linear regression model. We derive asymptotic variances and covariances of these estimators via the information matrix. Then we compare the performance of this procedure based on MML estimators with many prominent procedures under both normal as well as a wide range of non-normal models for 'the error variable and also under departures from linearity in the simple linear regression model. We present two examples to illustrate the various methods of estimation considered in this study. Finally, we extend these results to the case of multiple linear regression models. Once again, through a comparative study we show that the MML estimators derived in this chapter are quite robust to departures from normality and remain highly efficient under normal and several non-normal models for the error variable.
Ambagaspitiya, R.S., "MML estimators and robust classification and linear regression procedures" (1991). Open Access Dissertations and Theses. Paper 3608.