Network Analysis Tutorial - Interpreting the Results

Created by Steve Hoover, Modified on Mon, Aug 19, 2024 at 3:47 PM by Steve Hoover

Interpreting the Network Analysis Results

The following results are from the OfficeStar Tutorial data set that loads automatically when you select the Tutorial link in the Enginius Dashboard and run with analysis parameters indicated in Running a Network Analysis article.

Network visualization

The first output provides a visual representation of the network. If you choose the web page output option, Enginius will generate an interactive network graph which allows you to zoom into different areas of the network. Here is an example of the network without node labels, where the nodes are clustered using greedy clustering. The colors of the nodes indicate the different segments to which the nodes belong.

Network visualization (2D): To move the entire network in any direction, hold the cursor anywhere on the graph with the mouse left button and move the mouse. To zoom in or out of any specific area of the network, use the mouse scroll wheel. To highlight just the links associated with a specific node, hold the cursor on that node and click the left button. (Similar operations can also be done using a touchpad.)

Here is an example of the display for hierarchical clustering with node labels, where we have zoomed into the node Na-E and displaying only Na-E’s direct connections to other nodes. The red and blue colors indicate that Na-E’s connections belong to two different segments.

If you select other output options (e.g., Word or pdf), the network display looks like this:

Chart, scatter chart

Description automatically generated

The next set of outputs summarizes the network structure via several metrics.

	Metric
Number of nodes	152
Number of connections	256
Average degree of a node	3.368
Density	0.0223
Average path length	3.927
Global clustering coefficient	0.0911

Global network metrics

The number of the nodes and links provide basic information about the size of the network. The average degree of a node is the average number of links per node in the network. The density is the number of links in the network expressed as a proportion of the maximum number of potential links that could exist in the network. If there are n nodes, then the maximum number of links possible is equal to n*(n-1)/2 if the links are bi-directional. The average path length is the average of the shortest paths between all pairs of nodes in the network. In social networks, this number should generally be less than 6 –- the notion of “six degree of separation” is based on the average path length connecting everyone on the planet. The global clustering coefficient is a measure of the overall clustering in a network, denoting the extent to which one’s friends are also friends of each other in the entire network, on average.

The average degree of a node, density, average path length, and global clustering coefficients can be compared across networks to provide an intuitive feel for the reach and speed of an influence process in a network. A network with a higher average degree, higher density, or higher global clustering coefficient compared to another network, should have higher reach and faster speed of the influence process. Likewise, a network with lower average path length will also tend to a have a higher reach and faster speed of an influence process on the network.

The next set of results are particularly useful for identifying important influencers in the social network based on their positions in the network.

	Node ID	Degree	Closeness centrality	Betweenness centrality	Page rank	Local clustering coefficient
1	Jillian - M	20	0.39	4 227.24	5.22	0.037
2	Na - E	17	0.30	1 857.19	4.89	0.059
3	John-2	15	0.37	1 923.95	3.65	0.048
4	Rex	14	0.31	833.84	3.20	0.088
5	Tai - M	13	0.36	1 616.69	3.37	0.064
6	Cynthia	12	0.38	3 439.65	3.01	0.076
7	Kathryn	11	0.35	1 213.10	3.18	0.036
8	Rex2	11	0.35	1 678.14	2.84	0.073
9	Cherrie	11	0.21	1 453.00	4.42	0.036
10	Alexis	10	0.32	1 083.07	3.14	0.044

Node-level network metrics (excerpt). (NC: not-computable)

The degree is the number of links incident at a node (either incoming, outgoing, or bi-directional). The output is summarized in decreasing order of degree. In this set of 10 nodes, Jillian has the highest degree. The designation M notes she is a manager (the designation E stands for Employee, and the others are customers). Closeness centrality is a measure of how fast an influencer’s message could spread within a network. When an influencer is within easy reach of everyone in the network, then their closeness centrality will be high, and their messages are more likely to propagate faster in the network. Betweenness centrality of a node is an indicator of how likely is that node to be on the most direct route (i.e., shortest route) between any two people on the network. It is also a measure of how much disruption to the influence process would occur within the network if this node were to disappear, i.e., nodes with high betweenness centrality serve as gatekeepers in the network. Betweenness centrality is particularly useful when considering the total direct and indirect reach of an influencer within a network, say for viral marketing. Page rank (alternatively eigenvector centrality) is a recursive measure such that a node has high page rank if it is connected to other nodes with high page rank. Local clustering coefficient is a measure of the connectedness of the focal node, the extent to which nodes that are connected to the focal node are themselves connected to each other. In other words, it is a measure of the extent to which one’s friends are also friends with each other. In the set of 10 nodes listed above, Jillian has high centrality measures, but Shani has the highest local clustering coefficient. Jillian is likely the most important influencer in this network. Shani would be someone who might be effective in spreading a message within her own immediate network. To get the node metrics for all the nodes, obtain the output in Excel format, which can then be used to sort and filter the data in the table to help identify influencers in large networks.

The next set of outputs provide information about the clusters/community structure in the network. To help with interpretation, the segments are color coded, and there is also a tabular output showing the segments to which each node belongs.

Clustering via greedy method

Here greedy clustering identified 7 segments. We can see that Jillian, Na, and Tai are important influencers, and Agnus is an important gatekeeper for reaching the nodes on the top right of the network.

Clustering by greedy method

In addition to the graphical output, we also get a report showing the assignment of each node to each segment. If you wish to obtain the full listing of node assignments to segments, (for further analysis) download the report in Excel format.

	Segment
Tai - M	6
Nickolas	4
Na - E	4
Bessie	7
Ettie	4
Doyle	3
Jillian - M	3
Ena	2
Agnus	2
Shani - M	5

Segment membership (excerpt)

Clustering via hierarchical clustering (with default cosine similarity distance metric)

Enginius^® implements the agglomerative hierarchical clustering method (also used in the segmentation module) where distances between nodes are measured via the specified similarity metric you specify. The three-segment structure is evident from the Dendrogram shown below (for this data, the number of segments are fewer clusters than we get from the greedy clustering), There is one large segment (community) with 82% of the nodes.

Chart, histogram

Description automatically generated

Dendrogram. The dendrogram is a tree diagram to illustrate the arrangement of clusters produced by hierarchical clustering, and how the nodes are incrementally clustered together.

In the network graph shown below, we see that Jillian, Na-E and Cherrie are important influencers located in different parts of the network. Agnus is a gatekeeper for a section of the network (located in the East part of the graph). To further examine the structure of the network, you can download the Excel version of the output and explore the segment assignments for all nodes in this data for both clustering methods. What these results suggest is that we will get different community structures with different methods, and it is unlikely we will be able to get detect unique community structures in large networks.

Chart, scatter chart

Description automatically generated

Finally, the model output consists of the diffusion paths based on 10 simulations conducted by randomly seeding of the nodes (here we selected a 1% random seeding). The results indicate that with the relatively low values of p and q in the Bass model, the diffusion process is slow and requires 60 periods to get close to full penetration.

Chart

Description automatically generated