This book provides the reader with a basic understanding of the formal concepts of the cluster, clustering, partition, cluster analysis etc.
The book explains feature-based, graph-based and spectral clustering methods and discusses their formal similarities and differences. Understanding the related formal concepts is particularly vital in the epoch of Big Data; due to the volume and characteristics of the data, it is no longer feasible to predominantly rely on merely viewing the data when facing a clustering problem.
Usually clustering involves choosing similar objects and grouping them together. To facilitate the choice of similarity measures for complex and big data, various measures of object similarity, based on quantitative (like numerical measurement results) and qualitative features (like text), as well as combinations of the two, are described, as well as graph-based similarity measures for (hyper) linked objects and measures for multilayered graphs. Numerous variants demonstrating how such similarity measures can be exploited when defining clustering cost functions are also presented.
In addition, the book provides an overview of approaches to handling large collections of objects in a reasonable time. In particular, it addresses grid-based methods, sampling methods, parallelization via Map-Reduce, usage of tree-structures, random projections and various heuristic approaches, especially those used for community detection.