Classification Methods

Hierarchical Methodology

A hierarchical architecture for classification entails dividing the classification process into simpler tasks based upon class separability. A three level hierarchical classifier shown in the figure below, which was employed for Landsat data over the KSC test site [10], was also applied to the AVIRIS data. The approach provides a means to separate "super" classes which are quite different and then discriminate between classes within the super classes which are spectrally more similar. Additionally, this structure allows utilization of potentially different algorithms at each level for the discrimination process. The approach provided improved results for the KSC Landsat data. The same structure was employed for the AVIRIS imagery.


Hierarchical structure of classification methodology

The Level-1 classifier separates the image into water and land, and the Level-2 classifier discriminates between the land classes consisting of wetlands and uplands. At Level-3, the different land cover types are classified into the "final" classes.

The Level-1 classifier uses a simple isoclustering method. A biomass index similar to a NDVI is employed at Level 2 to separate the uplands from the wetland vegetation. Because they have much more biomass than marshes, the willow swamps and hardwood swamps are included in the "uplands" category, even though they are actually wetland communities.

Several classification approaches were investigated for analysis of Level 3 data: Gaussian Maximum Likelihood (ML), a Gaussian Markov Random Field Contextual model, Canonical Analysis, and a multi-layer perceptron neural network with one hidden layer (NN). Inputs to the classifiers were investigated through both statistical approaches [Principal Component (PCA), Minimum Noise Fraction (MNF), Decision Boundary Surfaces (DBS)] and direct utilization of target response characteristics. The Maximum Likelihood (ML) classifier was used primarily as a comparative approach.

Determination of Outputs

Training sites for each class were determined using previous classification maps, aerial photography, and extensive knowledge of the area by KSC personnel. The initial "training" data were randomly divided into training and test data sets with equal numbers of observations. The output classes for the uplands were scrub, willow swamp, cabbage palm hammock, cabbage palm/oak hammock, slash pine, oak/broadleaf hammock, and hardwood swamp. Wetland classes were selected to be graminoid marsh, Spartina bakerii marsh, cattail marsh, salt marsh, and mud flats. The graminoid marsh class is actually a mixture of marsh grasses which do not appear in large homogeneous, spatially contiguous groups that can be readily identified in the imagery.

Classification Algorithms

Gaussian Maximum Likelihood and Gaussian MRF Pixel Based Classifiers
Maximum likelihood classification is widely used for classification of remotely sensed optical data. Maximum likelihood estimates of the parameters are computed, and individual pixels are assigned to the class which maximizes the likelihood function of the data set. As a pixel by pixel method, this approach does not take contextual information about the classes of neighboring classes into account in labeling a pixel. Contextual models such as Markov Random Field approaches utilize the conditional probability of a pixel being a particular class given the classes of its neighbors [11]. Although the MRF classifiers are pixel based, the increased information provided by the spatial context of the classes of the neighbors tends to mitigate the effects of noise, isolated mixed pixels, and individual pixels with values which are in the tails of the distribution of spectral values of a class. Thus, the output is general smoother than from the standard ML classifiers.

Neural Networks
Within the past decade, neural networks have more commonly been applied to remotely sensed data sets since they are non-parametric (meaning they don't assume a gaussian distribution). For this project, a multi-layer perceptron neural network with one hidden layer and a scaled conjugate gradient training algorithm was applied to various combination of input data.

Canonical Analysis
Generalized Discriminant or Canonical Analysis (CA) [15] optimizes class separability via the linear combination which maximizes the ratio of the between class to within class covariance. Although discriminant analysis is widely applied, when the means of the classes are very similar, the approach does not discriminate between classes reliably. Likewise, when the variance of one class is quite different from the others, it dominates the determination of the resulting linear combinations.