Social Network Analysis with igraph

Chung-hong Chan

MZES, Social Science Data Lab #4.3

Front matters

whoami

Chung-hong Chan

twitter / github: @chainsawriot

Maintainer:

Author:

Agenda

  1. Social Network
    • Representation
    • Data collection
  2. Analysis
    • node, edge, graph
  3. igraph
    • example: MZES collabration network

Social Network / Graph

Graph

  • G = (V, E)

Example

Germany

Definitions of V and E

Germany

  • Vertices (entities): Länder
  • Edges (relationships): Share borders

Representation

Germany

  • Vertices = {BW, BY, RP, HE, SL, NW….}
  • Edges = {{BW, BY}, {BW, RP}, {BW, HE}, {HE, NW}, …}

Representation

Edgelist
src dest
BW BY
BW HE
BW RP
BY HE
BY TH
BY SN

Visualization

Vertice size by no of degrees

igraph

require(igraph)
germany <- read.csv('germany_map.csv')
g <- igraph::simplify(graph.data.frame(germany, directed = FALSE))
plot(g, vertex.size = degree(g) * 3)

Terminology

crowd V E
Mathematicians Vertices Edges
Computer Scientists Nodes Arcs, Edges
Social Scientists Actors Ties
Most People (me included) Nodes Edges
Some People Points Lines

Directed graph

undirected: always with reciprocal edges

directed: edges are not always reciprocal.

Twitter

kfc’s Twitter

kfc1

Representation

Edgelist
from to
chainsawriot kfc
kfc gerihalliwell
kfc officialmelb
kfc emmabunton
kfc melaniecmusic
kfc victoriabeckham

Other directed graph examples

  • Retweet/sharing network
  • Citation network

Weighted network

Both node and edge can be weighted.

Representation

Edgelist
from to distance
Mannheim Ludwigshanfen 2.9
Mannheim Heidelberg 20.3
Ludwigshanfen Heidelberg 22.0
Mannheim Frankfurt 76.7

Representation

  1. edgelist
  2. adjaceny matrix

Edgelist is the adjacency matrix in triplet format(i,j,k).

Example

Edgelist
from to distance
Mannheim Ludwigshanfen 2.9
Mannheim Heidelberg 20.3
Ludwigshanfen Heidelberg 22.0
Mannheim Frankfurt 76.7
Ludwigshanfen Frankfurt 78.8
Heidelberg Frankfurt 84.3

Adj. matrix

Mannheim Ludwigshafen Heidelberg Frankfurt
Mannheim 0.0 2.9 20.3 76.7
Ludwigshafen 2.9 0.0 22.0 78.8
Heidelberg 20.3 22.0 0.0 84.3
Frankfurt 76.7 78.8 84.3 0.0

Edgelist

Edgelist
from to
chainsawriot kfc
kfc gerihalliwell
kfc officialmelb
kfc emmabunton
kfc melaniecmusic
kfc victoriabeckham

adj. matrix

me kfc geri melb emma melc vic
me 0 1 0 0 0 0 0
kfc 0 0 1 1 1 1 1
geri 0 0 0 0 0 0 0
melb 0 0 0 0 0 0 0
emma 0 0 0 0 0 0 0
melc 0 0 0 0 0 0 0
vic 0 0 0 0 0 0 0

Data collection

Usual patterns

  1. Ego network
  2. Affiliation/bipartite network

Ego network

From an ego, ask for alters.

Affiliation/bipartite network

aff

Example

mzesgoogle

Representation

author paper
Jäger 1
Stecker 2
Däubler 2,6
Scholten 3
Tieben 3
Braun 4
Popa 4
Theocharis 5
van Deth 5
Benoit 6

Normalization: 3NF

author paper
Jäger 1
Stecker 2
Däubler 2
Scholten 3
Tieben 3
Braun 4
Popa 4
Theocharis 5
van Deth 5
Däubler 6
Benoit 6

Affiliation Matrix

P1 P2 P3 P4 P5 P6
Jäger 1 0 0 0 0 0
Stecker 0 1 0 0 0 0
Däubler 0 1 0 0 0 1
Scholten 0 0 1 0 0 0
Tieben 0 0 1 0 0 0
Braun 0 0 0 1 0 0
Popa 0 0 0 1 0 0
Theocharis 0 0 0 0 1 0
van Deth 0 0 0 0 1 0
Benoit 0 0 0 0 0 1

R internal

matrix(c(1,0,0,0,0,0,
         0,1,0,0,0,0,
         0,1,0,0,0,1,
         0,0,1,0,0,0,
         0,0,1,0,0,0,
         0,0,0,1,0,0,
         0,0,0,1,0,0,
         0,0,0,0,1,0,
         0,0,0,0,1,0,
         0,0,0,0,0,1), nrow = 10, 
         byrow = TRUE, 
         dimnames = list(
         c('Jäger', 'Stecker', 'Däubler', 'Scholten', 
         'Tieben', 'Braun', 'Popa', 'Theocharis', 
         'van Deth','Benoit'), 
         c('P1', "P2", "P3", "P4", "P5", "P6")))

Affiliation data frame

ap <- data.frame(
author = c('Jäger', 'Stecker', 'Däubler', 'Scholten', 'Tieben', 'Braun', 'Popa', 'Theocharis', 'van Deth', 'Däubler','Benoit'), 
paper = c('p1', 'p2', 'p2', 'p3', 'p3', 'p4', 'p4', 'p5', 'p5', 'p6','p6'))
knitr::kable(ap)
author paper
Jäger p1
Stecker p2
Däubler p2
Scholten p3
Tieben p3
Braun p4
Popa p4
Theocharis p5
van Deth p5
Däubler p6
Benoit p6

using spMatrix

require(Matrix)
A <- spMatrix(nrow = 10, ncol = 6,
         i = as.numeric(as.factor(ap$author)),
         j = as.numeric(as.factor(ap$paper)),
         x = rep(1, nrow(ap)))
row.names(A) <- levels(factor(ap$author))
colnames(A) <- levels(factor(ap$paper))
A
## 10 x 6 sparse Matrix of class "dgTMatrix"
##            p1 p2 p3 p4 p5 p6
## Benoit      .  .  .  .  .  1
## Braun       .  .  .  1  .  .
## Däubler     .  1  .  .  .  1
## Jäger       1  .  .  .  .  .
## Popa        .  .  .  1  .  .
## Scholten    .  .  1  .  .  .
## Stecker     .  1  .  .  .  .
## Theocharis  .  .  .  .  1  .
## Tieben      .  .  1  .  .  .
## van Deth    .  .  .  .  1  .

Why are you doing that?

To generate co-affiliation network

## A %*% t(A)
## tcrossprod(A)
tcrossprod(A)
## 10 x 10 sparse Matrix of class "dsCMatrix"
##    [[ suppressing 10 column names 'Benoit', 'Braun', 'Däubler' ... ]]
##                               
## Benoit     1 . 1 . . . . . . .
## Braun      . 1 . . 1 . . . . .
## Däubler    1 . 2 . . . 1 . . .
## Jäger      . . . 1 . . . . . .
## Popa       . 1 . . 1 . . . . .
## Scholten   . . . . . 1 . . 1 .
## Stecker    . . 1 . . . 1 . . .
## Theocharis . . . . . . . 1 . 1
## Tieben     . . . . . 1 . . 1 .
## van Deth   . . . . . . . 1 . 1

Visualize

coaff <- graph.adjacency(tcrossprod(A), 
    "undirected", weighted = TRUE, 
    diag = FALSE)
plot(coaff)

Analysis

Level

  1. node level
  2. edge level
  3. graph level

Example: Karate club data

karate <- make_graph("Zachary")
plot(karate)

Node level

Centrality - measurement to identify the most important vertices within a graph.

Example: degree centrality, eigenvector centrality, betweenness

Betweenness: measurement of information brokerage

  1. For each pair of vertices (s,t), compute the shortest paths between them.
  2. For each pair of vertices (s,t), determine the fraction of shortest paths that pass through the vertex in question (here, vertex v).
  3. Sum this fraction over all pairs of vertices (s,t).

Source: Wikipedia. Freeman, Linton. Sociometry 1977.

Betweenness

V(karate)
## + 34/34 vertices, from 7798f72:
##  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
## [24] 24 25 26 27 28 29 30 31 32 33 34
betweenness(karate, directed = FALSE)
##  [1] 231.0714286  28.4785714  75.8507937   6.2880952   0.3333333
##  [6]  15.8333333  15.8333333   0.0000000  29.5293651   0.4476190
## [11]   0.3333333   0.0000000   0.0000000  24.2158730   0.0000000
## [16]   0.0000000   0.0000000   0.0000000   0.0000000  17.1468254
## [21]   0.0000000   0.0000000   0.0000000   9.3000000   1.1666667
## [26]   2.0277778   0.0000000  11.7920635   0.9476190   1.5428571
## [31]   7.6095238  73.0095238  76.6904762 160.5515873

Visualize

plot(karate, vertex.size = (betweenness(karate) + 2) / 15)

Edge level

bridge

“Local bridge” (Granovetter. The strength of weak ties. Socialogical Theory. 1983)

Edges

E(karate)
## + 78/78 edges from 7798f72:
##  [1]  1-- 2  1-- 3  1-- 4  1-- 5  1-- 6  1-- 7  1-- 8  1-- 9  1--11  1--12
## [11]  1--13  1--14  1--18  1--20  1--22  1--32  2-- 3  2-- 4  2-- 8  2--14
## [21]  2--18  2--20  2--22  2--31  3-- 4  3-- 8  3--28  3--29  3--33  3--10
## [31]  3-- 9  3--14  4-- 8  4--13  4--14  5-- 7  5--11  6-- 7  6--11  6--17
## [41]  7--17  9--31  9--33  9--34 10--34 14--34 15--33 15--34 16--33 16--34
## [51] 19--33 19--34 20--34 21--33 21--34 23--33 23--34 24--26 24--28 24--33
## [61] 24--34 24--30 25--26 25--28 25--32 26--32 27--30 27--34 28--34 29--32
## [71] 29--34 30--33 30--34 31--33 31--34 32--33 32--34 33--34

Edge betweenness

edge_betweenness(karate, directed = FALSE)
##  [1] 14.166667 43.638889 11.500000 29.333333 43.833333 43.833333 12.802381
##  [8] 41.648413 29.333333 33.000000 26.100000 23.770635 22.509524 25.770635
## [15] 22.509524 71.392857 13.033333  4.333333  4.164286  6.959524 10.490476
## [22]  8.209524 10.490476 18.109524 12.583333 14.145238 23.108730 12.780952
## [29] 38.701587 17.280952  5.147619  4.280952  1.888095  6.900000  8.371429
## [36]  2.666667  1.666667  1.666667  2.666667 16.500000 16.500000  5.500000
## [43] 17.077778 22.684921 16.614286 38.049206 13.511111 19.488889 13.511111
## [50] 19.488889 13.511111 19.488889 33.313492 13.511111 19.488889 13.511111
## [57] 19.488889 11.094444  5.911111 12.533333 18.327778  3.733333  2.366667
## [64] 10.466667 22.500000 23.594444  2.542857 30.457143 17.097619  8.333333
## [71] 13.780952 13.087302 16.722222  9.566667 15.042857 23.244444 29.953968
## [78]  4.614286

Visualize

plot(karate, 
edge.width = edge_betweenness(karate, directed = FALSE) / 10, 
vertex.size = (betweenness(karate, directed = FALSE) + 2) / 15)

Modeling

  • Exponential Random Graph Model (ERGM)

Graph level

Community detection

Community - a subset of individuals within the network such that connections between the individuals are denser than the rest (Radicchi, Castellano, Cecconi, Loreto, & Parisi, 2004).

Example

conover

Conover et al. Political Polarization on Twitter. ICWSM 2011.

Finding community

comm <- walktrap.community(karate)
comm
## IGRAPH clustering walktrap, groups: 5, mod: 0.35
## + groups:
##   $`1`
##   [1]  1  2  4  8 12 13 18 20 22
##   
##   $`2`
##   [1]  3  9 10 14 29 31 32
##   
##   $`3`
##   [1] 15 16 19 21 23 27 30 33 34
##   
##   $`4`
##   + ... omitted several groups/vertices

Visualize

plot(comm, karate)

Artificial manipulation

comm2 <- structure(list(membership=cutat(comm, 2)), class="communities")
plot(comm2, karate)

Story-telling

Zachary’s karate club

Other analysis

  • Access to structural holes
  • Embeddedness
  • K-core decomposition

Summary

Sumary

  • Graph (undirected, directed, weighted)
  • Data collection (Ego, affiliation)
  • Analysis
    • Node level: Centrality
    • Edge level: Structural bridge
    • Graph level: Community detection

Example: MZES collabration network

whoami

Chung-hong Chan twitter / github: @chainsawriot

My latest publication!

jcmc

End