Package 'GoodFitSBM'

Title: Monte Carlo goodness-of-fit tests for Stochastic Blockmodels
Description: Performing goodness-of-fit tests for stochastic blockmodels used to fit network data. Among the three variants of SBMs discussed in <>, goodness-of-fit test has been performed for the Erdős-Rényi (ER) and Beta versions of SBMs.
Authors: Soham Ghosh [aut, cre], Somjit Roy [aut], Debdeep Pati [aut]
Maintainer: Soham Ghosh <[email protected]>
License: GPL (>= 3)
Version: 0.0.1
Built: 2025-02-18 04:10:43 UTC

Help Index

Maximum Likelihood Estimation of edge probabilities between blocks of a graph, under beta-SBM


get_mle_BetaSBM obtains MLE for the probability of edges between blocks in a graph, used in calculating the goodness-of-fit test statistic for the beta-SBM (Karwa et al. (2023))


get_mle_BetaSBM(G, C)



an igraph object which is an undirected graph with no self loop


a positive integer vector of size n for block assignments of each node; from 1 to K (no of blocks)


A matrix of maximum likelihood estimates


a matrix containing the estimated edge probabilities between blocks in a graph


Karwa et al. (2023). "Monte Carlo goodness-of-fit tests for degree corrected and related stochastic blockmodels", Journal of the Royal Statistical Society Series B: Statistical Methodology,

See Also

goftest_BetaSBM() performs the goodness-of-fit test for the beta-SBM, where the MLE of the edge probabilities are required


RNGkind(sample.kind = "Rounding")

# We model a network with 3 even classes
n1 = 50
n2 = 50
n3 = 50

# Generating block assignments for each of the nodes
n = n1 + n2 + n3
class = rep(c(1, 2, 3), c(n1, n2, n3))

# Generating the adjacency matrix of the network
# Generate the matrix of connection probabilities
cmat = matrix(
    30, 0.05, 0.05,
    0.05, 30, 0.05,
    0.05, 0.05, 30
  ncol = 3,
  byrow = TRUE
pmat = cmat / n

# Creating the n x n adjacency matrix
adj <- matrix(0, n, n)
for (i in 2:n) {
  for (j in 1:(i - 1)) {
    p = pmat[class[i], class[j]] # We find the probability of connection with the weights
    adj[i, j] = rbinom(1, 1, p) # We include the edge with probability p

adjsymm = adj + t(adj)

# graph from the adjacency matrix
G = igraph::graph_from_adjacency_matrix(adjsymm, mode = "undirected", weighted = NULL)

# mle of the edge probabilities
get_mle_BetaSBM(G, class)

Maximum Likelihood Estimation of edge probabilities between blocks of a graph, under ERSBM


get_mle_ERSBM obtains MLE for the probability of edges between blocks in a graph, used in calculating the goodness-of-fit test statistic for the ERSBM (Karwa et al. (2023))


get_mle_ERSBM(G, C)



an igraph object which is an undirected graph with no self loop


a positive integer vector of size n for block assignments of each node; from 1 to K (no of blocks)


A matrix of maximum likelihood estimates


a matrix containing the estimated edge probabilities between blocks in a graph


Karwa et al. (2023). "Monte Carlo goodness-of-fit tests for degree corrected and related stochastic blockmodels", Journal of the Royal Statistical Society Series B: Statistical Methodology,

See Also

goftest_ERSBM() performs the goodness-of-fit test for the ERSBM, where the MLE of the edge probabilities are required


RNGkind(sample.kind = "Rounding")

# We model a network with 3 even classes
n1 = 50
n2 = 50
n3 = 50

# Generating block assignments for each of the nodes
n = n1 + n2 + n3
class = rep(c(1, 2, 3), c(n1, n2, n3))

# Generating the adjacency matrix of the network
# Generate the matrix of connection probabilities
cmat = matrix(
    30, 0.05, 0.05,
    0.05, 30, 0.05,
    0.05, 0.05, 30
  ncol = 3,
  byrow = TRUE
pmat = cmat / n

# Creating the n x n adjacency matrix
adj <- matrix(0, n, n)
for (i in 2:n) {
  for (j in 1:(i - 1)) {
    p = pmat[class[i], class[j]] # We find the probability of connection with the weights
    adj[i, j] = rbinom(1, 1, p) # We include the edge with probability p

adjsymm = adj + t(adj)

# graph from the adjacency matrix
G = igraph::graph_from_adjacency_matrix(adjsymm, mode = "undirected", weighted = NULL)

# mle of the edge probabilities
get_mle_ERSBM(G, class)

Monte Carlo goodness-of-fit test for a beta stochastic blockmodel (beta-SBM)


goftest_BetaSBM performs chi square goodness-of-fit test for network data considering the model as beta-SBM (Karwa et al. (2023))


goftest_BetaSBM(A, K = NULL, C = NULL, numGraphs = 100)



n by n binary symmetric adjacency matrix representing an undirected graph where n is the number of nodes in the graph


positive integer scalar representing the number of blocks; K>1


positive integer vector of size n for block assignments of each node; from 1 to K (no of blocks)


number of graphs to be sampled; default value is 100


A list with the elements


the values of the chi-square test statistics on each sampled graph


the p-value for the test


Karwa et al. (2023). "Monte Carlo goodness-of-fit tests for degree corrected and related stochastic blockmodels", Journal of the Royal Statistical Society Series B: Statistical Methodology,


# Example 1

RNGkind(sample.kind = "Rounding")

# We model a network with 3 even classes
n1 = 50
n2 = 50
n3 = 50

# Generating block assignments for each of the nodes
n = n1 + n2 + n3
class = rep(c(1, 2, 3), c(n1, n2, n3))

# Generating the adjacency matrix of the network
# Generate the matrix of connection probabilities
cmat = matrix(
    30, 0.05, 0.05,
    0.05, 30, 0.05,
    0.05, 0.05, 30
  ncol = 3,
  byrow = TRUE
pmat = cmat / n

# Creating the n x n adjacency matrix
adj <- matrix(0, n, n)
for (i in 2:n) {
  for (j in 1:(i - 1)) {
    p = pmat[class[i], class[j]] # We find the probability of connection with the weights
    adj[i, j] = rbinom(1, 1, p) # We include the edge with probability p

adjsymm = adj + t(adj)

# When class assignment is known
out = goftest_BetaSBM(adjsymm, C = class, numGraphs = 100)

chi_sq_seq = out$statistic
pvalue = out$p.value

# Plotting histogram of the sequence of the test statistics
hist(chi_sq_seq, 20, xlab = "chi-square test statistics", main = NULL)
abline(v = chi_sq_seq[1], col = "red", lwd = 5) # adding test statistic on the observed network
legend("topleft", legend = paste("observed GoF = ", chi_sq_seq[1]))

# Example 2

#' RNGkind(sample.kind = "Rounding")

# We model a network with 3 even classes
n1 = 30
n2 = 20
n3 = 50

# Generating block assignments for each of the nodes
n = n1 + n2 + n3
class = rep(c(1, 2, 3), c(n1, n2, n3))

# Generating the adjacency matrix of the network
# Generate the matrix of connection probabilities
cmat = matrix(
    30, 0.05, 0.05,
    0.05, 30, 0.05,
    0.05, 0.05, 30
  ncol = 3,
  byrow = TRUE
pmat = cmat / n

# Creating the n x n adjacency matrix
adj <- matrix(0, n, n)
for (i in 2:n) {
  for (j in 1:(i - 1)) {
    p = pmat[class[i], class[j]] # We find the probability of connection with the weights
    adj[i, j] = rbinom(1, 1, p) # We include the edge with probability p

adjsymm = adj + t(adj)

# When class assignment is known
out = goftest_BetaSBM(adjsymm, C = class, numGraphs = 100)

chi_sq_seq = out$statistic
pvalue = out$p.value

# Plotting histogram of the sequence of the test statistics
hist(chi_sq_seq, 20, xlab = "chi-square test statistics", main = NULL)
abline(v = chi_sq_seq[1], col = "red", lwd = 5) # adding test statistic on the observed network
legend("topleft", legend = paste("observed GoF = ", chi_sq_seq[1]))

# Application on real dataset: Testing on the Zachary's Karate Club Data



d = zachary # the Zachary's Karate Club data set

# the adjacency matrix
A_zachary = as.matrix(d[1:34, ])
colnames(A_zachary) = 1:34

# obtaining the graph from the adjacency matrix above
g_zachary = igraph::graph_from_adjacency_matrix(A_zachary, mode = "undirected", weighted = NULL)

# plotting the graph (network) obtained
main = "Network (Graph) for the Zachary's Karate Club data set; reference clustering")

# block assignments
K = 2 # no. of blocks

n1 = 10
n2 = 24
n = n1 + n2

# known class assignments
class = rep(c(1, 2), c(n1, n2))
# goodness-of-fit tests for the Zachary's Karate Club data set
out_zachary = goftest_BetaSBM(A_zachary, C = class, numGraphs = 100)

chi_sq_seq = out_zachary$statistic
pvalue = out_zachary$p.value

# Plotting histogram of the sequence of the test statistics
hist(chi_sq_seq, 20, xlab = "chi-square test statistics", main = NULL)
abline(v = chi_sq_seq[1], col = "red", lwd = 5) # adding test statistic on the observed network
legend("topleft", legend = paste("observed GoF = ", chi_sq_seq[1]))

Monte Carlo goodness-of-fit test for an Erdős-Rényi stochastic blockmodel (ERSBM)


goftest_ERSBM performs chi square goodness-of-fit test for network data considering the model as ERSBM (Karwa et al. (2023))


goftest_ERSBM(A, K = NULL, C = NULL, numGraphs = 100)



n by n binary symmetric adjacency matrix representing an undirected graph where n is the number of nodes in the graph


positive integer scalar representing the number of blocks; K>1


positive integer vector of size n for block assignments of each node; from 1 to K (no of blocks)


number of graphs to be sampled; default value is 100


A list with the elements


the values of the chi-square test statistics on each sampled graph


the p-value for the test


Karwa et al. (2023). "Monte Carlo goodness-of-fit tests for degree corrected and related stochastic blockmodels", Journal of the Royal Statistical Society Series B: Statistical Methodology,


# Example 1

RNGkind(sample.kind = "Rounding")

# We model a network with 3 even classes
n1 = 50
n2 = 50
n3 = 50

# Generating block assignments for each of the nodes
n = n1 + n2 + n3
class = rep(c(1, 2, 3), c(n1, n2, n3))

# Generating the adjacency matrix of the network
# Generate the matrix of connection probabilities
cmat = matrix(
    30, 0.05, 0.05,
    0.05, 30, 0.05,
    0.05, 0.05, 30
  ncol = 3,
  byrow = TRUE
pmat = cmat / n

# Creating the n x n adjacency matrix
adj <- matrix(0, n, n)
for (i in 2:n) {
  for (j in 1:(i - 1)) {
    p = pmat[class[i], class[j]] # We find the probability of connection with the weights
    adj[i, j] = rbinom(1, 1, p) # We include the edge with probability p

adjsymm = adj + t(adj)

# When class assignment is known
out = goftest_ERSBM(adjsymm, C = class, numGraphs = 100)

chi_sq_seq = out$statistic
pvalue = out$p.value

# Plotting histogram of the sequence of the test statistics
hist(chi_sq_seq, 20, xlab = "chi-square test statistics", main = NULL)
abline(v = chi_sq_seq[1], col = "red", lwd = 5) # adding test statistic on the observed network
legend("topleft", legend = paste("observed GoF = ", chi_sq_seq[1]))

# Example 2

#' RNGkind(sample.kind = "Rounding")

# We model a network with 3 even classes
n1 = 30
n2 = 20
n3 = 50

# Generating block assignments for each of the nodes
n = n1 + n2 + n3
class = rep(c(1, 2, 3), c(n1, n2, n3))

# Generating the adjacency matrix of the network
# Generate the matrix of connection probabilities
cmat = matrix(
    30, 0.05, 0.05,
    0.05, 30, 0.05,
    0.05, 0.05, 30
  ncol = 3,
  byrow = TRUE
pmat = cmat / n

# Creating the n x n adjacency matrix
adj <- matrix(0, n, n)
for (i in 2:n) {
  for (j in 1:(i - 1)) {
    p = pmat[class[i], class[j]] # We find the probability of connection with the weights
    adj[i, j] = rbinom(1, 1, p) # We include the edge with probability p

adjsymm = adj + t(adj)

# When class assignment is known
out = goftest_ERSBM(adjsymm, C = class, numGraphs = 100)

chi_sq_seq = out$statistic
pvalue = out$p.value

# Plotting histogram of the sequence of the test statistics
hist(chi_sq_seq, 20, xlab = "chi-square test statistics", main = NULL)
abline(v = chi_sq_seq[1], col = "red", lwd = 5) # adding test statistic on the observed network
legend("topleft", legend = paste("observed GoF = ", chi_sq_seq[1]))

# Application on real dataset: Testing on the Zachary's Karate Club Data



d = zachary # the Zachary's Karate Club data set

# the adjacency matrix
A_zachary = as.matrix(d[1:34, ])
colnames(A_zachary) = 1:34

# obtaining the graph from the adjacency matrix above
g_zachary = igraph::graph_from_adjacency_matrix(A_zachary, mode = "undirected", weighted = NULL)

# plotting the graph (network) obtained
main = "Network (Graph) for the Zachary's Karate Club data set; reference clustering")

# block assignments
K = 2 # no. of blocks

n1 = 10
n2 = 24
n = n1 + n2

# known class assignments
class = rep(c(1, 2), c(n1, n2))
# goodness-of-fit tests for the Zachary's Karate Club data set
out_zachary = goftest_ERSBM(A_zachary, C = class, numGraphs = 100)

chi_sq_seq = out_zachary$statistic
pvalue = out_zachary$p.value

# Plotting histogram of the sequence of the test statistics
hist(chi_sq_seq, 20, xlab = "chi-square test statistics", main = NULL)
abline(v = chi_sq_seq[1], col = "red", lwd = 5) # adding test statistic on the observed network
legend("topleft", legend = paste("observed GoF = ", chi_sq_seq[1]))

Computation of the chi-square test statistic for goodness-of-fit, under beta-SBM


graphchi_BetaSBM obtains the value of the chi-square test statistic required for the goodness-of-fit of a beta-SBM (Karwa et al. (2023))


graphchi_BetaSBM(G, C, p_mle)



an igraph object which is an undirected graph with no self loop


a positive integer vector of size n for block assignments of each node; from 1 to K (no of blocks)


a matrix with the MLE estimates of the edge probabilities


A numeric value


The value of the chi-square test statistic

See Also

goftest_BetaSBM() performs the goodness-of-fit test for the beta-SBM, where the values of the chi-square test statistics are required


RNGkind(sample.kind = "Rounding")

# We model a network with 3 even classes
n1 = 50
n2 = 50
n3 = 50

# Generating block assignments for each of the nodes
n = n1 + n2 + n3
class = rep(c(1, 2, 3), c(n1, n2, n3))

# Generating the adjacency matrix of the network
# Generate the matrix of connection probabilities
cmat = matrix(
    30, 0.05, 0.05,
    0.05, 30, 0.05,
    0.05, 0.05, 30
  ncol = 3,
  byrow = TRUE
pmat = cmat / n

# Creating the n x n adjacency matrix
adj <- matrix(0, n, n)
for (i in 2:n) {
  for (j in 1:(i - 1)) {
    p = pmat[class[i], class[j]] # We find the probability of connection with the weights
    adj[i, j] = rbinom(1, 1, p) # We include the edge with probability p

adjsymm = adj + t(adj)

# graph from the adjacency matrix
G = igraph::graph_from_adjacency_matrix(adjsymm, mode = "undirected", weighted = NULL)

# mle of the edge probabilities
p.hat = get_mle_BetaSBM (G, class)

# chi-square test statistic values
graphchi_BetaSBM(G, class, p.hat)

Computation of the chi-square test statistic for goodness-of-fit, under ERSBM


graphchi_ERSBM obtains the value of the chi-square test statistic required for the goodness-of-fit of a ERSBM (Karwa et al. (2023))


graphchi_ERSBM(G, C, p_mle)



an igraph object which is an undirected graph with no self loop


a positive integer vector of size n for block assignments of each node; from 1 to K (no of blocks)


a matrix with the MLE estimates of the edge probabilities


A numeric value


The value of the chi-square test statistic

See Also

goftest_ERSBM() performs the goodness-of-fit test for the ERSBM, where the values of the chi-square test statistics are required


RNGkind(sample.kind = "Rounding")

# We model a network with 3 even classes
n1 = 50
n2 = 50
n3 = 50

# Generating block assignments for each of the nodes
n = n1 + n2 + n3
class = rep(c(1, 2, 3), c(n1, n2, n3))

# Generating the adjacency matrix of the network
# Generate the matrix of connection probabilities
cmat = matrix(
    30, 0.05, 0.05,
    0.05, 30, 0.05,
    0.05, 0.05, 30
  ncol = 3,
  byrow = TRUE
pmat = cmat / n

# Creating the n x n adjacency matrix
adj <- matrix(0, n, n)
for (i in 2:n) {
  for (j in 1:(i - 1)) {
    p = pmat[class[i], class[j]] # We find the probability of connection with the weights
    adj[i, j] = rbinom(1, 1, p) # We include the edge with probability p

adjsymm = adj + t(adj)

# graph from the adjacency matrix
G = igraph::graph_from_adjacency_matrix(adjsymm, mode = "undirected", weighted = NULL)

# mle of the edge probabilities
p.hat = get_mle_ERSBM(G, class)

# chi-square test statistic values
graphchi_ERSBM(G, class, p.hat)

Sampling a graph through a Markov move (basis) for beta-SBM


sample_a_move_BetaSBM to sample a graph in the same fiber; sampling according to the beta-SBM (Karwa et al. (2023))


sample_a_move_BetaSBM(C, G_current)



a positive integer vector of size n for block assignments of each node; from 1 to K (no of blocks)


an igraph object which is an undirected graph with no self loop


A graph

sampled graph

the sampled graph after one move as per the beta-SBM


Karwa et al. (2023). "Monte Carlo goodness-of-fit tests for degree corrected and related stochastic blockmodels", Journal of the Royal Statistical Society Series B: Statistical Methodology,

See Also

goftest_BetaSBM() performs the goodness-of-fit test for the beta-SBM, where graphs are being sampled


RNGkind(sample.kind = "Rounding")

# We model a network with 3 even classes
n1 = 50
n2 = 50
n3 = 50

# Generating block assignments for each of the nodes
n = n1 + n2 + n3
class = rep(c(1, 2, 3), c(n1, n2, n3))

# Generating the adjacency matrix of the network
# Generate the matrix of connection probabilities
cmat = matrix(
    30, 0.05, 0.05,
    0.05, 30, 0.05,
    0.05, 0.05, 30
  ncol = 3,
  byrow = TRUE
pmat = cmat / n

# Creating the n x n adjacency matrix
adj <- matrix(0, n, n)
for (i in 2:n) {
  for (j in 1:(i - 1)) {
    p = pmat[class[i], class[j]] # We find the probability of connection with the weights
    adj[i, j] = rbinom(1, 1, p) # We include the edge with probability p

adjsymm = adj + t(adj)

# graph from the adjacency matrix
G = igraph::graph_from_adjacency_matrix(adjsymm, mode = "undirected", weighted = NULL)

# sampling a Markov move for the beta-SBM
G_sample = sample_a_move_BetaSBM(class, G)

# plotting the sampled graph
plot(G_sample, main = "The sampled graph after one Markov move for beta-SBM")

Sampling a graph through a Markov move (basis) for ERSBM


sample_a_move_ERSBM to sample a graph in the same fiber; sampling according to the ERSBM (Karwa et al. (2023))


sample_a_move_ERSBM(C, G_current)



a positive integer vector of size n for block assignments of each node; from 1 to K (no of blocks)


an igraph object which is an undirected graph with no self loop


A graph

sampled graph

the sampled graph after one move as per the ERSBM


Karwa et al. (2023). "Monte Carlo goodness-of-fit tests for degree corrected and related stochastic blockmodels", Journal of the Royal Statistical Society Series B: Statistical Methodology,

See Also

goftest_ERSBM() performs the goodness-of-fit test for the ERSBM, where graphs are being sampled


RNGkind(sample.kind = "Rounding")

# We model a network with 3 even classes
n1 = 50
n2 = 50
n3 = 50

# Generating block assignments for each of the nodes
n = n1 + n2 + n3
class = rep(c(1, 2, 3), c(n1, n2, n3))

# Generating the adjacency matrix of the network
# Generate the matrix of connection probabilities
cmat = matrix(
    30, 0.05, 0.05,
    0.05, 30, 0.05,
    0.05, 0.05, 30
  ncol = 3,
  byrow = TRUE
pmat = cmat / n

# Creating the n x n adjacency matrix
adj <- matrix(0, n, n)
for (i in 2:n) {
  for (j in 1:(i - 1)) {
    p = pmat[class[i], class[j]] # We find the probability of connection with the weights
    adj[i, j] = rbinom(1, 1, p) # We include the edge with probability p

adjsymm = adj + t(adj)

# graph from the adjacency matrix
G = igraph::graph_from_adjacency_matrix(adjsymm, mode = "undirected", weighted = NULL)

# sampling a Markov move for the ERSBM
G_sample = sample_a_move_ERSBM(class, G)

# plotting the sampled graph
plot(G_sample, main = "The sampled graph after one Markov move for ERSBM")

Zachary Karate Club Data


Zachary’s Karate club data is a classic, well-studied social network of friendships between 34 members of a Karate club at a US university, collected by Wayne Zachary in 1977. Each node represents a member of the club, and each edge represents a tie between two members of the club. The network is undirected. An often discussed problem using this dataset is to find the two groups of people into which the karate club split after an argument between two teachers.




Two 34 by 34 matrices:


symmetric, binary 34 by 34 adjacency matrix.


symmetric, valued 34 by 34 matrix, indicating the relative strength of the associations


(Zachary, 1977),