Poisson calculus and exposure theory

A statistical mechanics description of a system has three parts: a way to count the outcomes, a way to weigh the outcomes, and a connection from weight to probability. In order to turn it into a computational framework, in order to "glorify" the accounting, we just need to add some rules for coarse-graining and computing expectation values. I wrote extensively how all these components fit together for the usual Boltzmann statistical mechanics, and couldn't find a good alternative example. Until now.

For the past year, I have been working on the problem of human learning of complex networks - seemingly far from my previous forays into ship design and self-assembly. Humans natively build structures of the world by analyzing statistics of incoming signals, even as preverbal infants. Human learning is not perfect, the models of the world that humans build are quite distorted, but in steady state this distortion can have consistent statistics, as measured in this recent work on small model networks. But when networks get large, we cannot take the steady state limit anymore, because that limit is quite far from the actual operational regime. We need a new view.

How much can one learn about a network by performing a finite random walk on it? That's a hard question but we can start with the heavy hammers: just perform some random walks and see what you learn. With the magic of Signac data wrangling, this becomes just a matter of crunching a few CPU hours on a laptop. I did this and saw patterns qualitatively similar to what others saw before: for unweighted the number of learned nodes and edges first grows linearly and then saturates. But for weighted networks the behavior is somehow different and saturation is much slower. Running longer simulations to see this detailed saturation behavior would be a waste of computational resources. There needs to be a better, more sustainable way.

The problem of random walks was proposed a hundred years ago for the kind of networks people cared about back then: regular lattices in one, two, three dimensions. Those lattice networks are but a special corner of the space of all possible networks: the structure of the low-dimensional lattices leads to long correlation times in random walks. In other words, in a few steps of a random walk on a lattice you are probably still close to where you started. Compare this to the World Wide Web - in three clicks from this page you can get to an arbitrarily far topic. Short correlation times are not a rarity, they appear in regular lattices in high dimensions, they appear in randomized networks, and importantly they appear in most complex networks we measure in the real world.

From the knowledge of a short correlation time, or weak correlation, we can make an assumption of no correlation. As one walks across the network, there is a small probability of gaining a memory count of any particular edge, independent of any other edge. If one walks for a while, this low-probability event can trigger many times, perhaps even with a different probability as the network can morph and evolve. The setup I just described is a textbook definition of a Poisson process. The number of memories of a particular edge is a random number with the Poisson distribution, with a different parameter for every edge, which I term edge exposure. The mean and the variance of a Poisson distribution are trivial to compute, as both are equal to its only parameter. As the parameter gets larger and larger, the distribution concentrates around its mean with vanishing fluctuations.

If this seems complicated, let's talk about photography instead. The main principle of photography is that the mechanical shutter is opened for some finite time, during which light passes through the lens and hits a photosensitive medium: a film for analog cameras or a matrix of light-sensitive pixels for digital ones. A camera thus naturally integrates the light flux, even if it fluctuates over time. A professional photographer, given a particular scene, can easily predict how the picture would change if they choose a longer or shorter exposure time. A shorter exposure makes for a more crisp but darker image. A longer exposure smudges objects, but the picture gets brighter. Across the image, the darker areas would inevitably have more noise as fewer photons of light hit those photosensitive areas on average, and thus the fluctuation is larger there.

Inspired by these camera mechanisms and my own brief photographic stint, I named the approach in my fresh paper the exposure theory. The discrete photons of light turn into discrete visits of network edges. The scene a photographer sees with their naked eye turns into the specific exposure. Camera settings that convert the scene into a picture become the random walk setup that turns specific exposure into an integral one. And even our propensity to see areas of a photo instead of individual pixels is reflected in the coarse-graining rule, since the sum of several Poisson random numbers is a Poisson random number itself.

What I created is essentially a canonical ensemble of memories of random walks on complex networks. It is a statistical mechanics framework that satisfies the desiderata laid out above. The specific exposure of each edge is like the Hamiltonian, telling us how to weigh the edge. The prefactor encoding random walk length is like the inverse temperature: as it gets larger, we are more and more likely to encounter each edge on our random walk. The coarse-graining rule lets us study network learning across scales from edges to nodes to communities, or distinguish between the memory weight in the real edges and the spurious non-edges. Instead of the exponential Boltzmann weight of states, we add up their exposures.

While a random walk simulation requires drawing a lot of pseudorandom numbers on a computer - a costly operation itself - most exposure computations consist of adding or multiplying deterministic floating point numbers and occasionally computing an element-wise exponential function. All of those are very fast in modern numerical algebra. The specific exposure of a network can be precomputed, looked up, and multiplied by the appropriate factor to convert it into integral exposure. The computational cost of predicting learning with a long random walk does not scale with the walk length anymore. In practical terms, exposure predictions are just as accurate as simulations but save a factor of a thousand to a million in computation time.

Exposure theory should apply to a wide class of Markov processes with correlation times much shorter than run times, since those processes are almost isomorphic to random walks on networks. Numerically, it taught me that one doesn't always need bigger computers and a clever approximation can cut the computation time in thousands. Morally, it taught me that deconstructing and reconstructing statistical mechanics was not a waste of time.

Previous
Previous

Language Dance Act One: Español

Next
Next

Memories and Texts