Abstract:
I will present a theoretical framework for analyzing learning algorithms which rely on dependent, rather than independent, observations. While a common assumption is that the learning algorithm receives independent datapoints, such as unrelated images or texts, this assumption often does not hold. An example is data on opinions across a social network, where opinions of related people are often correlated, for example as a consequence of their interactions. I will present a line of work that models the dependence between such related datapoints using a probabilistic framework in which the observed datapoints are assumed to be sampled from some joint distribution, rather than sampled i.i.d. The joint distribution is modeled via the Ising model, which originated in the theory of Spin Glasses in statistical physics and was used in various research areas. We frame the problem of learning from dependent data as the problem of learning parameters of the Ising model, given a training set that consists of only a single sample from the joint distribution over all datapoints. We then propose using the Pseudo-MLE algorithm, and provide a corresponding analysis, improving upon the prior literature which necessitated multiple samples from this joint distribution. Our proof benefits from sparsifying a model's interaction network, conditioning on subsets of variables that make the dependencies in the resulting conditional distribution sufficiently weak. We use this sparsification technique to prove generic concentration and anti-concentration results for the Ising model, which have found applications beyond the scope of our work.