Abstract:
Recovering a tree model that accurately represents the developmental process of high-dimensional data is a key challenge in multiple domains. A common setting is the latent tree model, where the task is to infer the tree structure given only observations of its terminal nodes. For example, in phylogenetics, the evolutionary history of a set of organisms is inferred by their nucleotide or protein sequences.
In our work, we incorporate results from spectral graph theory to develop novel methods for recovering latent tree models. We show that the tree structure is strongly related to the spectral properties of a fully connected graph defined over the terminal nodes of the tree. This relation forms the theoretical basis of two new methods: (i) spectral neighbor-joining, where subsets of nodes are iteratively merged to form the full tree, and (ii) spectral top-down recovery, where the terminal nodes are iteratively partitioned into smaller subsets. Comparing our approach to several competing methods, we show that in many settings, spectral methods have stronger theoretical guarantees and work better in practice.
The talk is based on the following two papers:
- "Spectral neighbor joining for reconstruction of latent tree models", A Jaffe, N Amsel, Y Aizenbud, B Nadler, JT Chang, Y Kluger SIAM journal on mathematics of data science (2021)
- "Spectral top-down recovery of latent tree models", Y Aizenbud, A Jaffe, M Wang, A Hu, N Amsel, B Nadler, JT Chang, Yuval Kluger