Rxivist logo

Comparison of silhouette-based reallocation methods for vegetation classification

By Attila Lengyel, David W. Roberts, Zoltán Botta-Dukát

Posted 07 May 2019
bioRxiv DOI: 10.1101/630384

Aims To introduce REMOS, a new iterative reallocation method (with two variants) for vegetation classification, and to compare its performance with OPTSIL. We test (1) how effectively REMOS and OPTSIL maximize mean silhouette width and minimize the number of negative silhouette widths when run on classifications with different structure; (2) how these three methods differ in runtime with different sample sizes; and (3) if classifications by the three reallocation methods differ in the number of diagnostic species, a surrogate for interpretability. Study area Simulation; example data sets from grasslands in Hungary and forests in Wyoming and Utah, USA. Methods We classified random subsets of simulated data with the flexible-beta algorithm for different values of beta. These classifications were subsequently optimized by REMOS and OPTSIL and compared for mean silhouette widths and proportion of negative silhouette widths. Then, we classified three vegetation data sets of different sizes from two to ten clusters, optimized them with the reallocation methods, and compared their runtimes, mean silhouette widths, numbers of negative silhouette widths, and the number of diagnostic species. Results In terms of mean silhouette width, OPTSIL performed the best when the initial classifications already had high mean silhouette width. REMOS algorithms had slightly lower mean silhouette width than what was maximally achievable with OPTSIL but their efficiency was consistent across different initial classifications; thus REMOS was significantly superior to OPTSIL when the initial classification had low mean silhouette width. REMOS resulted in zero or a negligible number of negative silhouette widths across all classifications. OPTSIL performed similarly when the initial classification was effective but could not reach as low proportion of misclassified objects when the initial classification was inefficient. REMOS algorithms were typically more than an order of magnitude faster to calculate than OPTSIL. There was no clear difference between REMOS and OPTSIL in the number of diagnostic species. Conclusions REMOS algorithms may be preferable to OPTSIL when (1) the primary objective is to reduce or eliminate negative silhouette widths in a classification, (2) the initial classification has low mean silhouette width, or (3) when the time efficiency of the algorithm is important because of the size of the data set or the high number of clusters. * MSW : mean silhouette width MR : misclassification rate

Download data

  • Downloaded 337 times
  • Download rankings, all-time:
    • Site-wide: 90,334
    • In bioinformatics: 8,005
  • Year to date:
    • Site-wide: 96,354
  • Since beginning of last month:
    • Site-wide: 119,432

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide