Abstract:
Machine learning problems involving non-Euclidean data such as sets, point clouds, and graphs, require theoretical tools and practical architectures that respect the inherent symmetries of the data. Efficient methods to embed such data into Euclidean space are thus valuable tools, as they allow the application of neural architectures to these non-Euclidean domains.
In this talk, I will present two novel methods for embedding sets and distributions over R^d into Euclidean space: one based on moments of shallow MLPs with analytic activations, and the other on Fourier sampling of projected quantile functions. I will discuss their theoretical properties (injectivity, bi-Lipschitzness) and demonstrate their efficient application in practical learning tasks. Additionally, I will present two impossibility results: (1) embeddings based on moments cannot be bi-Lipschitz on multisets, and (2) no embedding can be bi-Lipschitz on distributions.