As detailed in a previous Network Science Center Newsblast, the goal of our research is to compare the entrepreneurial ecosystems in different locations. Mathematically, comparing entrepreneurial environments equates to comparing networks. To what degree are two networks the same or different? This illustration is a bit simplistic but gets to the general goal.
Are the following two networks similar or different, mathematically, and if they are different, how so?
Our team conducted an extensive literature review and to our surprise, we found that network comparison methods remain scarce. Current methods are imprecise and lack technical rigor. The list below summarizes the approaches that we have explored:
- Simple Metrics
- Structural Comparison and Fitting
- Connectedness and Robustness
- Future-Oriented Dynamic Analysis
- Exponential Random Graph Models – our method of choice
We first consider simple metrics, such as node-level centrality metrics. The network science community typically uses degree, betweenness centrality, closeness centrality, and eigenvector centrality to assess the importance of particular nodes. To characterize a network as a whole using these metrics, we can either examine centralizations or directly analyze centrality distributions. Centralization quantifies how influential the most important node in the network is relative to all others, and can be evaluated with respect to any centrality metric. Alternatively, a researcher could analyze the distribution of a particular metric, which details the proportion of nodes in the graph with a specific centrality value. Then, networks could be classified and compared based on their centralizations and centrality distributions.
However, these simple metrics these metrics are highly sensitive, and provide imprecise “high level” descriptions of networks. Consequently, these measurements would not inform researchers into the inner workings of the systems they represent. Small modifications to a network can radically change the aforementioned metrics:
- Minor changes to the number of nodes or edges
- “Rewiring” a small proportion of edges
- Slight changes to edge weights
We also note that these modifications arise with errors in data collection, a problem all too familiar to the network science community. It would be difficult to argue that simple metrics adequately compare networks when the metrics depend so heavily on reliable data. Two networks could actually be similar with very different metrics, or quite different with similar metrics.
Structural Comparison and Fitting
Network scientists have already addressed the problems with simple metrics described above, with generally accepted structural characterizations. Some popular structures found in the literature are star, circle, Erdős–Rényi, small-world, and scale-free networks. Star networks contain edges only from a center node to all others, making them highly centralized. In contrast, circle networks are highly decentralized, and contain edges between adjacent nodes, forming a ring. Erdős–Rényi random networks arise when the existence of edges is subject to assigned probabilities. Small-world networks are formed by randomly “rewiring” some edges of a nonrandom lattice structure. This rewiring drastically reduces shortest path lengths between node-pairs throughout the network. Scale-free networks exhibit a degree distribution obeying a power-law. The proportion of nodes in the network with degree k obeys P(k) ~ k-γ, where γ is a parameter typically between 2 and 3. Researchers can formally classify other networks by specifying a particular degree distribution. Networks in the same category could be deemed “similar.” For network structures that are based upon degree-distributions we could use statistical tests, such as a Chi-squared test, to determine if the network in question fits the postulated distribution.
Connectedness and Robustness
While these structural frameworks improve upon the descriptive strength of simple metrics they are still inadequate. Just because two networks can be classified into the same broad structural category does not mean they are similar. Also, real-world networks do not fit neatly into predefined categories. It is rare to find a network that can be perfectly described by one of these models. Statistically-based distribution fitting also proves ineffective. When performing a Chi-squared test we might reject the null hypothesis of a specified distribution for many, or all, networks under consideration. This could happen because few networks are well-described by a particular distribution. In this case, rejecting numerous hypotheses has not brought us any closer to understanding our observed networks, nor comparing them.
The concept of a network’s robustness against node or link deletions has long been a part of the network science literature. Examples include node and edge connectivity. These figures represent the minimum number of nodes or edges that must be removed to disconnect a graph. Networks exhibiting similar robustness metrics could be said to be similar.
But observe, robustness measures themselves are sensitive to minor modifications in network structure, and carry the same burdens as their simple metric counterparts. Also note that connectedness figures do not account for network size, in terms of node-set or edge-set cardinalities. Two networks could have the same connectivity, yet differ in size.
Future-Oriented Dynamic Analysis
Many real-world networks represent dynamic processes, with ties changing over time. Ties present now may absent later, and ties absent now may be present later. We could observe networks at discrete time points and attempt to predict their future structures. It might be informative to compare networks by their future structures. This projection comparative approach could be complemented with classic forecasting techniques such as exponential smoothing. Two networks could be declared similar if they are projected to have similar characteristics, such as counts of local structures (e.g., reciprocated ties) or summary statistics (e.g., mean degree). Alternatively, network “transition paths” could be compared. Networks could be deemed similar if the way they change over time is similar.
But how would we select an end-point? Two networks might look similar at some future times, but not others. It would be arbitrary to pick the time for which the projected state comparison should take place. Further, comparisons based on projections are risky and would likely lead to invalid comparisons, even with troves of data. However, comparing network evolution is safer than projection because it involves documenting observed changes in real data. It is most meaningful to track changes in particular characteristics and local structures. But to track these salient characteristics we must first identify them. The current project is a prerequisite, and this temporal aspect constitutes a natural extension.
Exponential Random Graph Models
Exponential Random Graph Models (ERGMs), our method of choice, provide a statistical framework to analyze the forces governing link-formation in networks. ERGMs adapt logistic regression techniques, where the dependent variable is binary, to account for dependencies inherent to networks. One approach to comparing networks is to use the ERGM fit from one observed network to predict another observed network. Then we could analyze the error. Further, we could gauge the goodness-of-fit of one network’s ERGM with respect to another network. Researchers have begun to explore these techniques. But beyond the estimation of the ERGM itself, these methods are not statistically rigorous. They are begging for extension, which is where this project picks up.
Our network comparison extension is as follows. We aim to compare coefficients across models. To compare two networks, we fit two ERGMs, one for each observed network. This is analogous to estimating two logistic regression models, each with a different dependent variable and the same set of explanatory variables. Then we will compare parameters associated with the same explanatory variable to determine if the effects are similar in each system, i.e., network. Specifically, we seek to determine whether the parameters are statistically different from one another. Parameter comparison would allow us to see if the same forces govern two networks, and if the forces have the same magnitude of effect. Two networks are deemed similar if their ERGM parameters for the same explanatory variable are statistically similar, in terms of magnitude. More “common-magnitude parameters” means greater similarity. The merits of these comparative ERGM approaches, currently under development, are their statistical rigor without depending on imprecise global structural properties.