Clustering algorithms fall into two categories: hierarchical clustering and partitional clustering. For hierarchical algorithms, they are static in the sense that they never undo what was done previously, which means that, objects which are committed to a cluster in the early stages, cannot move to another cluster. This often results in low accuracy in clustering, especially for poorly separated data sets. Partitional clustering does not suffer from this problem, but requires a pre-specified number for the output clusters, which very often is difficult to be met by many applications. This paper 'presents a hybrid hierarchical clustering method called Hybrid Hierarchical Clustering Algorithm that combines the advantages of hierarchical clustering and partitional clustering techniques. The proposed hybrid algorithm does not require a number for the output clusters prior to the clustering and the clusters can be rearranged according to a quality measurement. In the present paper, we apply this method to Web page classification and provide the necessary experimental results. |
|