Some members of the Algorithmic Network Science Group in the Network Science Center have been studying identifying spreaders in complex networks. We have mostly focused on the Susceptible-Infected-Recovered (SIR) disease spread model. We’ll describe this model in words, but the model is perhaps best understood with an animation, so let’s show one first:

In an SIR model, we start with one infected node. At the first time step, that node affects it’s adjacent nodes with some probability p. The node itself moves to the recovered state, meaning it no longer has the disease and can no longer get the disease.

At the next time step, each of nodes infected during the first time step will transmit the disease to each of their neighbors with probability p, and then move to the recovered state. This process repeats until there are no more infected nodes left.

If we start this process with different nodes in the network as the initial node, the simulations can look very different. With certain initial nodes, like those connected to very few other nodes, the disease may die out quickly. For other initial nodes, the disease may spread rapidly to the majority of the nodes in the network. A question one might ask is, “what nodes are the best spreaders of disease?” This question is relevant if you are modeling the spread of disease with an SIR model, but it could also apply if the stuff that is spreading is information instead of a disease, or if you are using something other than SIR to model the spreading process.

One way to identify the best spreaders in a network is to run lots of simulations. We can infect one initial node, run a simulation similar to the one shown in the animation, and record the number of nodes that are infected in that simulation before the disease dies out. Then we can repeat this simulation 100s of times for that node, and compute the average number of nodes that become infected during those simulations. We can also repeat this for different initial nodes in the network. The best spreaders could be defined as the ones that, on average, infect the highest number of nodes.

We can run simulations like these fairly easily on small networks, but simulations become computationally expensive when the size of the network becomes very large. So we aim to find the best spreaders using other methods.

In a recent paper, we (MAJ Paulo Shakarian, MAJ Nick Howard, CDT Geoffrey Moores, and I) showed that eigenvector centrality does the best job of identifying spreaders, among the centrality measures we tried. Currently, we are working on three projects that are closely related to the work in that paper. CDT Geoffrey Moores is developing algorithms to identify the best subset of nodes, rather than the best individual node. CDT Rob Delaney is studying the Susceptible-Infected-Susceptible (SIS) disease spread model instead of the SIR model. MAJ Nick Howard is determining what, if any, conclusions we can draw when we have a partially observed network; that is, when our information about the network is incomplete.