Abstract:
Recent advances in machine learning have focused on non-Euclidean data, such as sets, point clouds, and graphs. Developing efficient methods to embed these data types into Euclidean space is critical, as it allows for the direct application of standard machine learning tools such as neural networks. In particular, Euclidean embeddings for more complex structures like point clouds and graphs typically rely on Euclidean embeddings for multisets as a fundamental building block.
In this talk, I will present two novel methods for embedding multisets and measures into Euclidean space: the first is based on sum-pooled shallow neural networks with carefully chosen activation functions [1]; the second, inspired by optimal transport, uses Fourier sampling of projected quantile functions [2]. I will discuss the theoretical properties of these methods, including injectivity and bi-Lipschitzness, and demonstrate their advantages in practical learning tasks. Additionally, I will present two impossibility results: (1) embeddings based on sum-pooling can never be bi-Lipschitz on multisets, and (2) no embedding can be bi-Lipschitz on distributions.
References
[1] Amir, T., Gortler, S., Avni, I., Ravina, R., & Dym, N. (2023). Neural injective functions for multisets, measures and graphs via a finite witness theorem. Advances in Neural Information Processing Systems, 36, 42516-42551.
[2] Amir, Tal, and Nadav Dym. Fourier Sliced-Wasserstein Embedding for Multisets and Measures. To appear in the Proceedings of the International Conference on Learning Representations (ICLR), 2025.