Title: | Monte Carlo goodness-of-fit tests for Stochastic Blockmodels |
---|---|
Description: | Performing goodness-of-fit tests for stochastic blockmodels used to fit network data. Among the three variants of SBMs discussed in <https://doi.org/10.1093/jrsssb/qkad084>, goodness-of-fit test has been performed for the Erdős-Rényi (ER) and Beta versions of SBMs. |
Authors: | Soham Ghosh [aut, cre], Somjit Roy [aut], Debdeep Pati [aut] |
Maintainer: | Soham Ghosh <[email protected]> |
License: | GPL (>= 3) |
Version: | 0.0.1 |
Built: | 2025-02-18 04:10:43 UTC |
Source: | https://github.com/roy-sr-007/goodfitsbm |
get_mle_BetaSBM
obtains MLE for the probability of edges between blocks in a graph, used in calculating the goodness-of-fit test statistic for the beta-SBM (Karwa et al. (2023))
get_mle_BetaSBM(G, C)
get_mle_BetaSBM(G, C)
G |
an igraph object which is an undirected graph with no self loop |
C |
a positive integer vector of size n for block assignments of each node; from 1 to K (no of blocks) |
A matrix of maximum likelihood estimates
mleMatr |
a matrix containing the estimated edge probabilities between blocks in a graph |
Karwa et al. (2023). "Monte Carlo goodness-of-fit tests for degree corrected and related stochastic blockmodels", Journal of the Royal Statistical Society Series B: Statistical Methodology, https://doi.org/10.1093/jrsssb/qkad084
goftest_BetaSBM()
performs the goodness-of-fit test for the beta-SBM, where the MLE of the edge probabilities are required
RNGkind(sample.kind = "Rounding") set.seed(1729) # We model a network with 3 even classes n1 = 50 n2 = 50 n3 = 50 # Generating block assignments for each of the nodes n = n1 + n2 + n3 class = rep(c(1, 2, 3), c(n1, n2, n3)) # Generating the adjacency matrix of the network # Generate the matrix of connection probabilities cmat = matrix( c( 30, 0.05, 0.05, 0.05, 30, 0.05, 0.05, 0.05, 30 ), ncol = 3, byrow = TRUE ) pmat = cmat / n # Creating the n x n adjacency matrix adj <- matrix(0, n, n) for (i in 2:n) { for (j in 1:(i - 1)) { p = pmat[class[i], class[j]] # We find the probability of connection with the weights adj[i, j] = rbinom(1, 1, p) # We include the edge with probability p } } adjsymm = adj + t(adj) # graph from the adjacency matrix G = igraph::graph_from_adjacency_matrix(adjsymm, mode = "undirected", weighted = NULL) # mle of the edge probabilities get_mle_BetaSBM(G, class)
RNGkind(sample.kind = "Rounding") set.seed(1729) # We model a network with 3 even classes n1 = 50 n2 = 50 n3 = 50 # Generating block assignments for each of the nodes n = n1 + n2 + n3 class = rep(c(1, 2, 3), c(n1, n2, n3)) # Generating the adjacency matrix of the network # Generate the matrix of connection probabilities cmat = matrix( c( 30, 0.05, 0.05, 0.05, 30, 0.05, 0.05, 0.05, 30 ), ncol = 3, byrow = TRUE ) pmat = cmat / n # Creating the n x n adjacency matrix adj <- matrix(0, n, n) for (i in 2:n) { for (j in 1:(i - 1)) { p = pmat[class[i], class[j]] # We find the probability of connection with the weights adj[i, j] = rbinom(1, 1, p) # We include the edge with probability p } } adjsymm = adj + t(adj) # graph from the adjacency matrix G = igraph::graph_from_adjacency_matrix(adjsymm, mode = "undirected", weighted = NULL) # mle of the edge probabilities get_mle_BetaSBM(G, class)
get_mle_ERSBM
obtains MLE for the probability of edges between blocks in a graph, used in calculating the goodness-of-fit test statistic for the ERSBM (Karwa et al. (2023))
get_mle_ERSBM(G, C)
get_mle_ERSBM(G, C)
G |
an igraph object which is an undirected graph with no self loop |
C |
a positive integer vector of size n for block assignments of each node; from 1 to K (no of blocks) |
A matrix of maximum likelihood estimates
mleMatr |
a matrix containing the estimated edge probabilities between blocks in a graph |
Karwa et al. (2023). "Monte Carlo goodness-of-fit tests for degree corrected and related stochastic blockmodels", Journal of the Royal Statistical Society Series B: Statistical Methodology, https://doi.org/10.1093/jrsssb/qkad084
goftest_ERSBM()
performs the goodness-of-fit test for the ERSBM, where the MLE of the edge probabilities are required
RNGkind(sample.kind = "Rounding") set.seed(1729) # We model a network with 3 even classes n1 = 50 n2 = 50 n3 = 50 # Generating block assignments for each of the nodes n = n1 + n2 + n3 class = rep(c(1, 2, 3), c(n1, n2, n3)) # Generating the adjacency matrix of the network # Generate the matrix of connection probabilities cmat = matrix( c( 30, 0.05, 0.05, 0.05, 30, 0.05, 0.05, 0.05, 30 ), ncol = 3, byrow = TRUE ) pmat = cmat / n # Creating the n x n adjacency matrix adj <- matrix(0, n, n) for (i in 2:n) { for (j in 1:(i - 1)) { p = pmat[class[i], class[j]] # We find the probability of connection with the weights adj[i, j] = rbinom(1, 1, p) # We include the edge with probability p } } adjsymm = adj + t(adj) # graph from the adjacency matrix G = igraph::graph_from_adjacency_matrix(adjsymm, mode = "undirected", weighted = NULL) # mle of the edge probabilities get_mle_ERSBM(G, class)
RNGkind(sample.kind = "Rounding") set.seed(1729) # We model a network with 3 even classes n1 = 50 n2 = 50 n3 = 50 # Generating block assignments for each of the nodes n = n1 + n2 + n3 class = rep(c(1, 2, 3), c(n1, n2, n3)) # Generating the adjacency matrix of the network # Generate the matrix of connection probabilities cmat = matrix( c( 30, 0.05, 0.05, 0.05, 30, 0.05, 0.05, 0.05, 30 ), ncol = 3, byrow = TRUE ) pmat = cmat / n # Creating the n x n adjacency matrix adj <- matrix(0, n, n) for (i in 2:n) { for (j in 1:(i - 1)) { p = pmat[class[i], class[j]] # We find the probability of connection with the weights adj[i, j] = rbinom(1, 1, p) # We include the edge with probability p } } adjsymm = adj + t(adj) # graph from the adjacency matrix G = igraph::graph_from_adjacency_matrix(adjsymm, mode = "undirected", weighted = NULL) # mle of the edge probabilities get_mle_ERSBM(G, class)
goftest_BetaSBM
performs chi square goodness-of-fit test for network data considering the model as beta-SBM (Karwa et al. (2023))
goftest_BetaSBM(A, K = NULL, C = NULL, numGraphs = 100)
goftest_BetaSBM(A, K = NULL, C = NULL, numGraphs = 100)
A |
n by n binary symmetric adjacency matrix representing an undirected graph where n is the number of nodes in the graph |
K |
positive integer scalar representing the number of blocks; K>1 |
C |
positive integer vector of size n for block assignments of each node; from 1 to K (no of blocks) |
numGraphs |
number of graphs to be sampled; default value is 100 |
A list with the elements
statistic |
the values of the chi-square test statistics on each sampled graph |
p.value |
the p-value for the test |
Karwa et al. (2023). "Monte Carlo goodness-of-fit tests for degree corrected and related stochastic blockmodels", Journal of the Royal Statistical Society Series B: Statistical Methodology, https://doi.org/10.1093/jrsssb/qkad084
# Example 1 RNGkind(sample.kind = "Rounding") set.seed(1729) # We model a network with 3 even classes n1 = 50 n2 = 50 n3 = 50 # Generating block assignments for each of the nodes n = n1 + n2 + n3 class = rep(c(1, 2, 3), c(n1, n2, n3)) # Generating the adjacency matrix of the network # Generate the matrix of connection probabilities cmat = matrix( c( 30, 0.05, 0.05, 0.05, 30, 0.05, 0.05, 0.05, 30 ), ncol = 3, byrow = TRUE ) pmat = cmat / n # Creating the n x n adjacency matrix adj <- matrix(0, n, n) for (i in 2:n) { for (j in 1:(i - 1)) { p = pmat[class[i], class[j]] # We find the probability of connection with the weights adj[i, j] = rbinom(1, 1, p) # We include the edge with probability p } } adjsymm = adj + t(adj) # When class assignment is known out = goftest_BetaSBM(adjsymm, C = class, numGraphs = 100) chi_sq_seq = out$statistic pvalue = out$p.value print(pvalue) # Plotting histogram of the sequence of the test statistics hist(chi_sq_seq, 20, xlab = "chi-square test statistics", main = NULL) abline(v = chi_sq_seq[1], col = "red", lwd = 5) # adding test statistic on the observed network legend("topleft", legend = paste("observed GoF = ", chi_sq_seq[1])) # Example 2 #' RNGkind(sample.kind = "Rounding") set.seed(1729) # We model a network with 3 even classes n1 = 30 n2 = 20 n3 = 50 # Generating block assignments for each of the nodes n = n1 + n2 + n3 class = rep(c(1, 2, 3), c(n1, n2, n3)) # Generating the adjacency matrix of the network # Generate the matrix of connection probabilities cmat = matrix( c( 30, 0.05, 0.05, 0.05, 30, 0.05, 0.05, 0.05, 30 ), ncol = 3, byrow = TRUE ) pmat = cmat / n # Creating the n x n adjacency matrix adj <- matrix(0, n, n) for (i in 2:n) { for (j in 1:(i - 1)) { p = pmat[class[i], class[j]] # We find the probability of connection with the weights adj[i, j] = rbinom(1, 1, p) # We include the edge with probability p } } adjsymm = adj + t(adj) # When class assignment is known out = goftest_BetaSBM(adjsymm, C = class, numGraphs = 100) chi_sq_seq = out$statistic pvalue = out$p.value print(pvalue) # Plotting histogram of the sequence of the test statistics hist(chi_sq_seq, 20, xlab = "chi-square test statistics", main = NULL) abline(v = chi_sq_seq[1], col = "red", lwd = 5) # adding test statistic on the observed network legend("topleft", legend = paste("observed GoF = ", chi_sq_seq[1])) # Application on real dataset: Testing on the Zachary's Karate Club Data set.seed(100000) data("zachary") d = zachary # the Zachary's Karate Club data set # the adjacency matrix A_zachary = as.matrix(d[1:34, ]) colnames(A_zachary) = 1:34 # obtaining the graph from the adjacency matrix above g_zachary = igraph::graph_from_adjacency_matrix(A_zachary, mode = "undirected", weighted = NULL) # plotting the graph (network) obtained plot(g_zachary, main = "Network (Graph) for the Zachary's Karate Club data set; reference clustering") # block assignments K = 2 # no. of blocks n1 = 10 n2 = 24 n = n1 + n2 # known class assignments class = rep(c(1, 2), c(n1, n2)) # goodness-of-fit tests for the Zachary's Karate Club data set out_zachary = goftest_BetaSBM(A_zachary, C = class, numGraphs = 100) chi_sq_seq = out_zachary$statistic pvalue = out_zachary$p.value print(pvalue) # Plotting histogram of the sequence of the test statistics hist(chi_sq_seq, 20, xlab = "chi-square test statistics", main = NULL) abline(v = chi_sq_seq[1], col = "red", lwd = 5) # adding test statistic on the observed network legend("topleft", legend = paste("observed GoF = ", chi_sq_seq[1]))
# Example 1 RNGkind(sample.kind = "Rounding") set.seed(1729) # We model a network with 3 even classes n1 = 50 n2 = 50 n3 = 50 # Generating block assignments for each of the nodes n = n1 + n2 + n3 class = rep(c(1, 2, 3), c(n1, n2, n3)) # Generating the adjacency matrix of the network # Generate the matrix of connection probabilities cmat = matrix( c( 30, 0.05, 0.05, 0.05, 30, 0.05, 0.05, 0.05, 30 ), ncol = 3, byrow = TRUE ) pmat = cmat / n # Creating the n x n adjacency matrix adj <- matrix(0, n, n) for (i in 2:n) { for (j in 1:(i - 1)) { p = pmat[class[i], class[j]] # We find the probability of connection with the weights adj[i, j] = rbinom(1, 1, p) # We include the edge with probability p } } adjsymm = adj + t(adj) # When class assignment is known out = goftest_BetaSBM(adjsymm, C = class, numGraphs = 100) chi_sq_seq = out$statistic pvalue = out$p.value print(pvalue) # Plotting histogram of the sequence of the test statistics hist(chi_sq_seq, 20, xlab = "chi-square test statistics", main = NULL) abline(v = chi_sq_seq[1], col = "red", lwd = 5) # adding test statistic on the observed network legend("topleft", legend = paste("observed GoF = ", chi_sq_seq[1])) # Example 2 #' RNGkind(sample.kind = "Rounding") set.seed(1729) # We model a network with 3 even classes n1 = 30 n2 = 20 n3 = 50 # Generating block assignments for each of the nodes n = n1 + n2 + n3 class = rep(c(1, 2, 3), c(n1, n2, n3)) # Generating the adjacency matrix of the network # Generate the matrix of connection probabilities cmat = matrix( c( 30, 0.05, 0.05, 0.05, 30, 0.05, 0.05, 0.05, 30 ), ncol = 3, byrow = TRUE ) pmat = cmat / n # Creating the n x n adjacency matrix adj <- matrix(0, n, n) for (i in 2:n) { for (j in 1:(i - 1)) { p = pmat[class[i], class[j]] # We find the probability of connection with the weights adj[i, j] = rbinom(1, 1, p) # We include the edge with probability p } } adjsymm = adj + t(adj) # When class assignment is known out = goftest_BetaSBM(adjsymm, C = class, numGraphs = 100) chi_sq_seq = out$statistic pvalue = out$p.value print(pvalue) # Plotting histogram of the sequence of the test statistics hist(chi_sq_seq, 20, xlab = "chi-square test statistics", main = NULL) abline(v = chi_sq_seq[1], col = "red", lwd = 5) # adding test statistic on the observed network legend("topleft", legend = paste("observed GoF = ", chi_sq_seq[1])) # Application on real dataset: Testing on the Zachary's Karate Club Data set.seed(100000) data("zachary") d = zachary # the Zachary's Karate Club data set # the adjacency matrix A_zachary = as.matrix(d[1:34, ]) colnames(A_zachary) = 1:34 # obtaining the graph from the adjacency matrix above g_zachary = igraph::graph_from_adjacency_matrix(A_zachary, mode = "undirected", weighted = NULL) # plotting the graph (network) obtained plot(g_zachary, main = "Network (Graph) for the Zachary's Karate Club data set; reference clustering") # block assignments K = 2 # no. of blocks n1 = 10 n2 = 24 n = n1 + n2 # known class assignments class = rep(c(1, 2), c(n1, n2)) # goodness-of-fit tests for the Zachary's Karate Club data set out_zachary = goftest_BetaSBM(A_zachary, C = class, numGraphs = 100) chi_sq_seq = out_zachary$statistic pvalue = out_zachary$p.value print(pvalue) # Plotting histogram of the sequence of the test statistics hist(chi_sq_seq, 20, xlab = "chi-square test statistics", main = NULL) abline(v = chi_sq_seq[1], col = "red", lwd = 5) # adding test statistic on the observed network legend("topleft", legend = paste("observed GoF = ", chi_sq_seq[1]))
goftest_ERSBM
performs chi square goodness-of-fit test for network data considering the model as ERSBM (Karwa et al. (2023))
goftest_ERSBM(A, K = NULL, C = NULL, numGraphs = 100)
goftest_ERSBM(A, K = NULL, C = NULL, numGraphs = 100)
A |
n by n binary symmetric adjacency matrix representing an undirected graph where n is the number of nodes in the graph |
K |
positive integer scalar representing the number of blocks; K>1 |
C |
positive integer vector of size n for block assignments of each node; from 1 to K (no of blocks) |
numGraphs |
number of graphs to be sampled; default value is 100 |
A list with the elements
statistic |
the values of the chi-square test statistics on each sampled graph |
p.value |
the p-value for the test |
Karwa et al. (2023). "Monte Carlo goodness-of-fit tests for degree corrected and related stochastic blockmodels", Journal of the Royal Statistical Society Series B: Statistical Methodology, https://doi.org/10.1093/jrsssb/qkad084
# Example 1 RNGkind(sample.kind = "Rounding") set.seed(1729) # We model a network with 3 even classes n1 = 50 n2 = 50 n3 = 50 # Generating block assignments for each of the nodes n = n1 + n2 + n3 class = rep(c(1, 2, 3), c(n1, n2, n3)) # Generating the adjacency matrix of the network # Generate the matrix of connection probabilities cmat = matrix( c( 30, 0.05, 0.05, 0.05, 30, 0.05, 0.05, 0.05, 30 ), ncol = 3, byrow = TRUE ) pmat = cmat / n # Creating the n x n adjacency matrix adj <- matrix(0, n, n) for (i in 2:n) { for (j in 1:(i - 1)) { p = pmat[class[i], class[j]] # We find the probability of connection with the weights adj[i, j] = rbinom(1, 1, p) # We include the edge with probability p } } adjsymm = adj + t(adj) # When class assignment is known out = goftest_ERSBM(adjsymm, C = class, numGraphs = 100) chi_sq_seq = out$statistic pvalue = out$p.value print(pvalue) # Plotting histogram of the sequence of the test statistics hist(chi_sq_seq, 20, xlab = "chi-square test statistics", main = NULL) abline(v = chi_sq_seq[1], col = "red", lwd = 5) # adding test statistic on the observed network legend("topleft", legend = paste("observed GoF = ", chi_sq_seq[1])) # Example 2 #' RNGkind(sample.kind = "Rounding") set.seed(1729) # We model a network with 3 even classes n1 = 30 n2 = 20 n3 = 50 # Generating block assignments for each of the nodes n = n1 + n2 + n3 class = rep(c(1, 2, 3), c(n1, n2, n3)) # Generating the adjacency matrix of the network # Generate the matrix of connection probabilities cmat = matrix( c( 30, 0.05, 0.05, 0.05, 30, 0.05, 0.05, 0.05, 30 ), ncol = 3, byrow = TRUE ) pmat = cmat / n # Creating the n x n adjacency matrix adj <- matrix(0, n, n) for (i in 2:n) { for (j in 1:(i - 1)) { p = pmat[class[i], class[j]] # We find the probability of connection with the weights adj[i, j] = rbinom(1, 1, p) # We include the edge with probability p } } adjsymm = adj + t(adj) # When class assignment is known out = goftest_ERSBM(adjsymm, C = class, numGraphs = 100) chi_sq_seq = out$statistic pvalue = out$p.value print(pvalue) # Plotting histogram of the sequence of the test statistics hist(chi_sq_seq, 20, xlab = "chi-square test statistics", main = NULL) abline(v = chi_sq_seq[1], col = "red", lwd = 5) # adding test statistic on the observed network legend("topleft", legend = paste("observed GoF = ", chi_sq_seq[1])) # Application on real dataset: Testing on the Zachary's Karate Club Data set.seed(100000) data("zachary") d = zachary # the Zachary's Karate Club data set # the adjacency matrix A_zachary = as.matrix(d[1:34, ]) colnames(A_zachary) = 1:34 # obtaining the graph from the adjacency matrix above g_zachary = igraph::graph_from_adjacency_matrix(A_zachary, mode = "undirected", weighted = NULL) # plotting the graph (network) obtained plot(g_zachary, main = "Network (Graph) for the Zachary's Karate Club data set; reference clustering") # block assignments K = 2 # no. of blocks n1 = 10 n2 = 24 n = n1 + n2 # known class assignments class = rep(c(1, 2), c(n1, n2)) # goodness-of-fit tests for the Zachary's Karate Club data set out_zachary = goftest_ERSBM(A_zachary, C = class, numGraphs = 100) chi_sq_seq = out_zachary$statistic pvalue = out_zachary$p.value print(pvalue) # Plotting histogram of the sequence of the test statistics hist(chi_sq_seq, 20, xlab = "chi-square test statistics", main = NULL) abline(v = chi_sq_seq[1], col = "red", lwd = 5) # adding test statistic on the observed network legend("topleft", legend = paste("observed GoF = ", chi_sq_seq[1]))
# Example 1 RNGkind(sample.kind = "Rounding") set.seed(1729) # We model a network with 3 even classes n1 = 50 n2 = 50 n3 = 50 # Generating block assignments for each of the nodes n = n1 + n2 + n3 class = rep(c(1, 2, 3), c(n1, n2, n3)) # Generating the adjacency matrix of the network # Generate the matrix of connection probabilities cmat = matrix( c( 30, 0.05, 0.05, 0.05, 30, 0.05, 0.05, 0.05, 30 ), ncol = 3, byrow = TRUE ) pmat = cmat / n # Creating the n x n adjacency matrix adj <- matrix(0, n, n) for (i in 2:n) { for (j in 1:(i - 1)) { p = pmat[class[i], class[j]] # We find the probability of connection with the weights adj[i, j] = rbinom(1, 1, p) # We include the edge with probability p } } adjsymm = adj + t(adj) # When class assignment is known out = goftest_ERSBM(adjsymm, C = class, numGraphs = 100) chi_sq_seq = out$statistic pvalue = out$p.value print(pvalue) # Plotting histogram of the sequence of the test statistics hist(chi_sq_seq, 20, xlab = "chi-square test statistics", main = NULL) abline(v = chi_sq_seq[1], col = "red", lwd = 5) # adding test statistic on the observed network legend("topleft", legend = paste("observed GoF = ", chi_sq_seq[1])) # Example 2 #' RNGkind(sample.kind = "Rounding") set.seed(1729) # We model a network with 3 even classes n1 = 30 n2 = 20 n3 = 50 # Generating block assignments for each of the nodes n = n1 + n2 + n3 class = rep(c(1, 2, 3), c(n1, n2, n3)) # Generating the adjacency matrix of the network # Generate the matrix of connection probabilities cmat = matrix( c( 30, 0.05, 0.05, 0.05, 30, 0.05, 0.05, 0.05, 30 ), ncol = 3, byrow = TRUE ) pmat = cmat / n # Creating the n x n adjacency matrix adj <- matrix(0, n, n) for (i in 2:n) { for (j in 1:(i - 1)) { p = pmat[class[i], class[j]] # We find the probability of connection with the weights adj[i, j] = rbinom(1, 1, p) # We include the edge with probability p } } adjsymm = adj + t(adj) # When class assignment is known out = goftest_ERSBM(adjsymm, C = class, numGraphs = 100) chi_sq_seq = out$statistic pvalue = out$p.value print(pvalue) # Plotting histogram of the sequence of the test statistics hist(chi_sq_seq, 20, xlab = "chi-square test statistics", main = NULL) abline(v = chi_sq_seq[1], col = "red", lwd = 5) # adding test statistic on the observed network legend("topleft", legend = paste("observed GoF = ", chi_sq_seq[1])) # Application on real dataset: Testing on the Zachary's Karate Club Data set.seed(100000) data("zachary") d = zachary # the Zachary's Karate Club data set # the adjacency matrix A_zachary = as.matrix(d[1:34, ]) colnames(A_zachary) = 1:34 # obtaining the graph from the adjacency matrix above g_zachary = igraph::graph_from_adjacency_matrix(A_zachary, mode = "undirected", weighted = NULL) # plotting the graph (network) obtained plot(g_zachary, main = "Network (Graph) for the Zachary's Karate Club data set; reference clustering") # block assignments K = 2 # no. of blocks n1 = 10 n2 = 24 n = n1 + n2 # known class assignments class = rep(c(1, 2), c(n1, n2)) # goodness-of-fit tests for the Zachary's Karate Club data set out_zachary = goftest_ERSBM(A_zachary, C = class, numGraphs = 100) chi_sq_seq = out_zachary$statistic pvalue = out_zachary$p.value print(pvalue) # Plotting histogram of the sequence of the test statistics hist(chi_sq_seq, 20, xlab = "chi-square test statistics", main = NULL) abline(v = chi_sq_seq[1], col = "red", lwd = 5) # adding test statistic on the observed network legend("topleft", legend = paste("observed GoF = ", chi_sq_seq[1]))
graphchi_BetaSBM
obtains the value of the chi-square test statistic required for the goodness-of-fit of a beta-SBM (Karwa et al. (2023))
graphchi_BetaSBM(G, C, p_mle)
graphchi_BetaSBM(G, C, p_mle)
G |
an igraph object which is an undirected graph with no self loop |
C |
a positive integer vector of size n for block assignments of each node; from 1 to K (no of blocks) |
p_mle |
a matrix with the MLE estimates of the edge probabilities |
A numeric value
teststat_val |
The value of the chi-square test statistic |
goftest_BetaSBM()
performs the goodness-of-fit test for the beta-SBM, where the values of the chi-square test statistics are required
RNGkind(sample.kind = "Rounding") set.seed(1729) # We model a network with 3 even classes n1 = 50 n2 = 50 n3 = 50 # Generating block assignments for each of the nodes n = n1 + n2 + n3 class = rep(c(1, 2, 3), c(n1, n2, n3)) # Generating the adjacency matrix of the network # Generate the matrix of connection probabilities cmat = matrix( c( 30, 0.05, 0.05, 0.05, 30, 0.05, 0.05, 0.05, 30 ), ncol = 3, byrow = TRUE ) pmat = cmat / n # Creating the n x n adjacency matrix adj <- matrix(0, n, n) for (i in 2:n) { for (j in 1:(i - 1)) { p = pmat[class[i], class[j]] # We find the probability of connection with the weights adj[i, j] = rbinom(1, 1, p) # We include the edge with probability p } } adjsymm = adj + t(adj) # graph from the adjacency matrix G = igraph::graph_from_adjacency_matrix(adjsymm, mode = "undirected", weighted = NULL) # mle of the edge probabilities p.hat = get_mle_BetaSBM (G, class) # chi-square test statistic values graphchi_BetaSBM(G, class, p.hat)
RNGkind(sample.kind = "Rounding") set.seed(1729) # We model a network with 3 even classes n1 = 50 n2 = 50 n3 = 50 # Generating block assignments for each of the nodes n = n1 + n2 + n3 class = rep(c(1, 2, 3), c(n1, n2, n3)) # Generating the adjacency matrix of the network # Generate the matrix of connection probabilities cmat = matrix( c( 30, 0.05, 0.05, 0.05, 30, 0.05, 0.05, 0.05, 30 ), ncol = 3, byrow = TRUE ) pmat = cmat / n # Creating the n x n adjacency matrix adj <- matrix(0, n, n) for (i in 2:n) { for (j in 1:(i - 1)) { p = pmat[class[i], class[j]] # We find the probability of connection with the weights adj[i, j] = rbinom(1, 1, p) # We include the edge with probability p } } adjsymm = adj + t(adj) # graph from the adjacency matrix G = igraph::graph_from_adjacency_matrix(adjsymm, mode = "undirected", weighted = NULL) # mle of the edge probabilities p.hat = get_mle_BetaSBM (G, class) # chi-square test statistic values graphchi_BetaSBM(G, class, p.hat)
graphchi_ERSBM
obtains the value of the chi-square test statistic required for the goodness-of-fit of a ERSBM (Karwa et al. (2023))
graphchi_ERSBM(G, C, p_mle)
graphchi_ERSBM(G, C, p_mle)
G |
an igraph object which is an undirected graph with no self loop |
C |
a positive integer vector of size n for block assignments of each node; from 1 to K (no of blocks) |
p_mle |
a matrix with the MLE estimates of the edge probabilities |
A numeric value
teststat_val |
The value of the chi-square test statistic |
goftest_ERSBM()
performs the goodness-of-fit test for the ERSBM, where the values of the chi-square test statistics are required
RNGkind(sample.kind = "Rounding") set.seed(1729) # We model a network with 3 even classes n1 = 50 n2 = 50 n3 = 50 # Generating block assignments for each of the nodes n = n1 + n2 + n3 class = rep(c(1, 2, 3), c(n1, n2, n3)) # Generating the adjacency matrix of the network # Generate the matrix of connection probabilities cmat = matrix( c( 30, 0.05, 0.05, 0.05, 30, 0.05, 0.05, 0.05, 30 ), ncol = 3, byrow = TRUE ) pmat = cmat / n # Creating the n x n adjacency matrix adj <- matrix(0, n, n) for (i in 2:n) { for (j in 1:(i - 1)) { p = pmat[class[i], class[j]] # We find the probability of connection with the weights adj[i, j] = rbinom(1, 1, p) # We include the edge with probability p } } adjsymm = adj + t(adj) # graph from the adjacency matrix G = igraph::graph_from_adjacency_matrix(adjsymm, mode = "undirected", weighted = NULL) # mle of the edge probabilities p.hat = get_mle_ERSBM(G, class) # chi-square test statistic values graphchi_ERSBM(G, class, p.hat)
RNGkind(sample.kind = "Rounding") set.seed(1729) # We model a network with 3 even classes n1 = 50 n2 = 50 n3 = 50 # Generating block assignments for each of the nodes n = n1 + n2 + n3 class = rep(c(1, 2, 3), c(n1, n2, n3)) # Generating the adjacency matrix of the network # Generate the matrix of connection probabilities cmat = matrix( c( 30, 0.05, 0.05, 0.05, 30, 0.05, 0.05, 0.05, 30 ), ncol = 3, byrow = TRUE ) pmat = cmat / n # Creating the n x n adjacency matrix adj <- matrix(0, n, n) for (i in 2:n) { for (j in 1:(i - 1)) { p = pmat[class[i], class[j]] # We find the probability of connection with the weights adj[i, j] = rbinom(1, 1, p) # We include the edge with probability p } } adjsymm = adj + t(adj) # graph from the adjacency matrix G = igraph::graph_from_adjacency_matrix(adjsymm, mode = "undirected", weighted = NULL) # mle of the edge probabilities p.hat = get_mle_ERSBM(G, class) # chi-square test statistic values graphchi_ERSBM(G, class, p.hat)
sample_a_move_BetaSBM
to sample a graph in the same fiber; sampling according to the beta-SBM (Karwa et al. (2023))
sample_a_move_BetaSBM(C, G_current)
sample_a_move_BetaSBM(C, G_current)
C |
a positive integer vector of size n for block assignments of each node; from 1 to K (no of blocks) |
G_current |
an igraph object which is an undirected graph with no self loop |
A graph
sampled graph |
the sampled graph after one move as per the beta-SBM |
Karwa et al. (2023). "Monte Carlo goodness-of-fit tests for degree corrected and related stochastic blockmodels", Journal of the Royal Statistical Society Series B: Statistical Methodology, https://doi.org/10.1093/jrsssb/qkad084
goftest_BetaSBM()
performs the goodness-of-fit test for the beta-SBM, where graphs are being sampled
RNGkind(sample.kind = "Rounding") set.seed(1729) # We model a network with 3 even classes n1 = 50 n2 = 50 n3 = 50 # Generating block assignments for each of the nodes n = n1 + n2 + n3 class = rep(c(1, 2, 3), c(n1, n2, n3)) # Generating the adjacency matrix of the network # Generate the matrix of connection probabilities cmat = matrix( c( 30, 0.05, 0.05, 0.05, 30, 0.05, 0.05, 0.05, 30 ), ncol = 3, byrow = TRUE ) pmat = cmat / n # Creating the n x n adjacency matrix adj <- matrix(0, n, n) for (i in 2:n) { for (j in 1:(i - 1)) { p = pmat[class[i], class[j]] # We find the probability of connection with the weights adj[i, j] = rbinom(1, 1, p) # We include the edge with probability p } } adjsymm = adj + t(adj) # graph from the adjacency matrix G = igraph::graph_from_adjacency_matrix(adjsymm, mode = "undirected", weighted = NULL) # sampling a Markov move for the beta-SBM G_sample = sample_a_move_BetaSBM(class, G) # plotting the sampled graph plot(G_sample, main = "The sampled graph after one Markov move for beta-SBM")
RNGkind(sample.kind = "Rounding") set.seed(1729) # We model a network with 3 even classes n1 = 50 n2 = 50 n3 = 50 # Generating block assignments for each of the nodes n = n1 + n2 + n3 class = rep(c(1, 2, 3), c(n1, n2, n3)) # Generating the adjacency matrix of the network # Generate the matrix of connection probabilities cmat = matrix( c( 30, 0.05, 0.05, 0.05, 30, 0.05, 0.05, 0.05, 30 ), ncol = 3, byrow = TRUE ) pmat = cmat / n # Creating the n x n adjacency matrix adj <- matrix(0, n, n) for (i in 2:n) { for (j in 1:(i - 1)) { p = pmat[class[i], class[j]] # We find the probability of connection with the weights adj[i, j] = rbinom(1, 1, p) # We include the edge with probability p } } adjsymm = adj + t(adj) # graph from the adjacency matrix G = igraph::graph_from_adjacency_matrix(adjsymm, mode = "undirected", weighted = NULL) # sampling a Markov move for the beta-SBM G_sample = sample_a_move_BetaSBM(class, G) # plotting the sampled graph plot(G_sample, main = "The sampled graph after one Markov move for beta-SBM")
sample_a_move_ERSBM
to sample a graph in the same fiber; sampling according to the ERSBM (Karwa et al. (2023))
sample_a_move_ERSBM(C, G_current)
sample_a_move_ERSBM(C, G_current)
C |
a positive integer vector of size n for block assignments of each node; from 1 to K (no of blocks) |
G_current |
an igraph object which is an undirected graph with no self loop |
A graph
sampled graph |
the sampled graph after one move as per the ERSBM |
Karwa et al. (2023). "Monte Carlo goodness-of-fit tests for degree corrected and related stochastic blockmodels", Journal of the Royal Statistical Society Series B: Statistical Methodology, https://doi.org/10.1093/jrsssb/qkad084
goftest_ERSBM()
performs the goodness-of-fit test for the ERSBM, where graphs are being sampled
RNGkind(sample.kind = "Rounding") set.seed(1729) # We model a network with 3 even classes n1 = 50 n2 = 50 n3 = 50 # Generating block assignments for each of the nodes n = n1 + n2 + n3 class = rep(c(1, 2, 3), c(n1, n2, n3)) # Generating the adjacency matrix of the network # Generate the matrix of connection probabilities cmat = matrix( c( 30, 0.05, 0.05, 0.05, 30, 0.05, 0.05, 0.05, 30 ), ncol = 3, byrow = TRUE ) pmat = cmat / n # Creating the n x n adjacency matrix adj <- matrix(0, n, n) for (i in 2:n) { for (j in 1:(i - 1)) { p = pmat[class[i], class[j]] # We find the probability of connection with the weights adj[i, j] = rbinom(1, 1, p) # We include the edge with probability p } } adjsymm = adj + t(adj) # graph from the adjacency matrix G = igraph::graph_from_adjacency_matrix(adjsymm, mode = "undirected", weighted = NULL) # sampling a Markov move for the ERSBM G_sample = sample_a_move_ERSBM(class, G) # plotting the sampled graph plot(G_sample, main = "The sampled graph after one Markov move for ERSBM")
RNGkind(sample.kind = "Rounding") set.seed(1729) # We model a network with 3 even classes n1 = 50 n2 = 50 n3 = 50 # Generating block assignments for each of the nodes n = n1 + n2 + n3 class = rep(c(1, 2, 3), c(n1, n2, n3)) # Generating the adjacency matrix of the network # Generate the matrix of connection probabilities cmat = matrix( c( 30, 0.05, 0.05, 0.05, 30, 0.05, 0.05, 0.05, 30 ), ncol = 3, byrow = TRUE ) pmat = cmat / n # Creating the n x n adjacency matrix adj <- matrix(0, n, n) for (i in 2:n) { for (j in 1:(i - 1)) { p = pmat[class[i], class[j]] # We find the probability of connection with the weights adj[i, j] = rbinom(1, 1, p) # We include the edge with probability p } } adjsymm = adj + t(adj) # graph from the adjacency matrix G = igraph::graph_from_adjacency_matrix(adjsymm, mode = "undirected", weighted = NULL) # sampling a Markov move for the ERSBM G_sample = sample_a_move_ERSBM(class, G) # plotting the sampled graph plot(G_sample, main = "The sampled graph after one Markov move for ERSBM")
Zachary’s Karate club data is a classic, well-studied social network of friendships between 34 members of a Karate club at a US university, collected by Wayne Zachary in 1977. Each node represents a member of the club, and each edge represents a tie between two members of the club. The network is undirected. An often discussed problem using this dataset is to find the two groups of people into which the karate club split after an argument between two teachers.
zachary
zachary
Two 34 by 34 matrices:
symmetric, binary 34 by 34 adjacency matrix.
symmetric, valued 34 by 34 matrix, indicating the relative strength of the associations
(Zachary, 1977), http://vlado.fmf.uni-lj.si/pub/networks/data/Ucinet/UciData.htm#zachary.