Testing Codes and Data

*These codes and datasets are distributed under GNU license. If you want to use these data or code in your publication, please reference our papers that are listed.


It is the most updated code developed by Alex Henneguelle. It is a Matlab package that includes BHC and BPC algorithms. This code also deals with small training sample size problems by aggregating similar bands together. CSRC

More instructions about how to run this code are available in the Readme file. This classification package needs four basic data files to run : 'site'_data.bin, 'site'_data.info, 'site'_classes.bin, 'site'_classes.info. There are some examples of train/test datasets available below for people to try this code.

~J.T. Morgan, A. Henneguelle, M.M. Crawford, J. Ghosh, and A.L. Neuenschwander, "Adaptive Feature Spaces for Land Cover Classification with Limited Ground Truth Data," Lecture Notes in Computer Science, Ed. F. Roli and J. Kittler, vol. 2364, 189-200, 2002.


Hierarchical support vector machine code is developed by Yangchi Chen. It is also a Matlab package that uses Max-cut class decomposition and SVM to create a fast SVM classifier. More details can been seen here: HSVM

~Y. Chen, M.M. Crawford, and J. Ghosh, "Integrating Support Vector Machines in a Hierarchical Output Decomposition Framework," Proc. 2004 International Geoscience and Remote Sensing Symposium, Anchorage, Alaska, Sept 20-24, 949-953, 2004.


The shortest path k-nearest neighbor classifier (SkNN), that utilizes nonlinear manifold learning, is proposed for analysis of hyperspectral data. In contrast to classifiers that deal with the high dimensional feature space directly, this approach uses the pairwise distance matrix over a nonlinear manifold to classify novel observations. Because manifold learning preserves the local pairwise distances and updates distances of a sample to samples beyond the user-defined neighborhood along the shortest path on the manifold, similar samples are moved into closer proximity. High classification accuracies are achieved by using the simple k-nearest neighbor (kNN) classifier. Download

~Y. Chen, M.M. Crawford, and J. Ghosh, "Applying Nonlinear Manifold Learning to Hyperspectral Data for Land Cover Classification", International Geoscience and Remote Sensing Symposium, July 2005.


Data Process
These are "swiss-knife" matlab codes that convert the original CSR data to different data formats. Things get easier when people know how the CSR data is organized. Data Process
  • Data2Matlab.m coverts *data.bin and *classes.bin files to matlab matrices.
  • Mat2Arff.m and arffwrite.m convert *_AllData.mat to *.arff, which is used by WEKA.
  • More to come..

  • ~Y. Chen, M.M. Crawford, and J. Ghosh, "Integrating Support Vector Machines in a Hierarchical Output Decomposition Framework," Proc. 2004 International Geoscience and Remote Sensing Symposium, Anchorage, Alaska, Sept 20-24, 949-953, 2004.


  • Kennedy Space Center
  • The NASA AVIRIS (Airborne Visible/Infrared Imaging Spectrometer) instrument acquired data over the Kennedy Space Center (KSC), Florida, on March 23, 1996. AVIRIS acquires data in 224 bands of 10 nm width with center wavelengths from 400 - 2500 nm. The KSC data, acquired from an altitude of approximately 20 km, have a spatial resolution of 18 m. After removing water absorption and low SNR bands, 176 bands were used for the analysis. Training data were selected using land cover maps derived from color infrared photography provided by the Kennedy Space Center and Landsat Thematic Mapper (TM) imagery. The vegetation classification scheme was developed by KSC personnel in an effort to define functional types that are discernable at the spatial resolution of Landsat and these AVIRIS data. Discrimination of land cover for this environment is difficult due to the similarity of spectral signatures for certain vegetation types. For classification purposes, 13 classes representing the various land cover types that occur in this environment were defined for the site. (Table 1). Classes 4, and 6 represent mixed classes.

  • Botswana
  • The NASA EO-1 satellite acquired a sequence of data over the Okavango Delta, Botswana in 2001-2004. The Hyperion sensor on EO-1 acquires data at 30 m pixel resolution over a 7.7 km strip in 242 bands covering the 400-2500 nm portion of the spectrum in 10 nm windows. Preprocessing of the data was performed by the UT Center for Space Research to mitigate the effects of bad detectors, inter-detector miscalibration, and intermittent anomalies. Uncalibrated and noisy bands that cover water absorption features were removed, and the remaining 145 bands were included as candidate features: [10-55, 82-97, 102-119, 134-164, 187-220]. The data analyzed in this study, acquired May 31, 2001, consist of observations from 14 identified classes representing the land cover types in seasonal swamps, occasional swamps, and drier woodlands located in the distal portion of the Delta [22]. These classes were chosen to reflect the impact of flooding on vegetation in the study area. The class names and corresponding numbers of ground truth observations used in the experiments are listed in Table 2. Classes 3 and 4 are both floodplain grasses that are seasonally inundated, but differ in their hydroperiod (the amount of time inundated). Classes 9, 10, and 11 represent different mixtures of acacia woodlands, shrublands, and grasslands and are named according to the dominant class. Training data were selected manually using a combination of GPS located vegetation surveys, aerial photography from the Aquarap (2000) project, and 2.6 m resolution IKONOS multispectral imagery.

    ~J. Ham, Y. Chen, M. Crawford, and J. Ghosh, "Investigation of the Random Forest Framework for Classification of Hyperspectral Data," IEEE Trans. on Geoscience and Remote Sensing, accepted for publication.


    ~ this site is managed by Yangchi Chen
      Related Papers
      Recent Publications