probinet.models.mtcov#

Class definition of MTCOV, the generative algorithm that incorporates both the topology of interactions and node attributes to extract overlapping communities in directed and undirected multilayer networks [CPDB20].

Classes

MTCOV([err_max, num_realizations])

Class definition of MTCOV, the generative algorithm that incorporates both the topology of interactions and node attributes to extract overlapping communities in directed and undirected multilayer networks.

class probinet.models.mtcov.MTCOV(err_max: float = 1e-07, num_realizations: int = 1, **kwargs: Any)[source]#

Class definition of MTCOV, the generative algorithm that incorporates both the topology of interactions and node attributes to extract overlapping communities in directed and undirected multilayer networks.

additional_fields = ['egoX', 'cov_name', 'attr_name']#
compute_likelihood()[source]#

Compute the pseudo log-likelihood of the data.

Returns:

loglik – Pseudo log-likelihood value.

Return type:

float

fit(gdata: GraphData, batch_size: int | None = None, gamma: float = 0.5, K: int = 2, initialization: int = 0, undirected: bool = False, assortative: bool = False, out_inference: bool = True, out_folder: Path = PosixPath('outputs'), end_file: str | None = None, files: PathLike | None = None, rng: Generator | None = None, **_MTCOV__kwargs: Any) tuple[ndarray[Any, dtype[float64]], ndarray[Any, dtype[float64]], ndarray[Any, dtype[float64]], ndarray[Any, dtype[float64]], float][source]#

Perform community detection in multilayer networks considering both the topology of interactions and node attributes via EM updates. Save the membership matrices U and V, the affinity tensor W, and the beta matrix.

Parameters:
  • gdata – Graph adjacency tensor.

  • K – Number of communities, by default 3.

  • initialization – Indicator for choosing how to initialize u, v, and w. If 0, they will be generated randomly; 1 means only the affinity matrix w will be uploaded from file; 2 implies the membership matrices u and v will be uploaded from file, and 3 all u, v, and w will be initialized through an input file, by default 0.

  • eta0 – Initial value for the reciprocity coefficient, by default None.

  • undirected – Flag to call the undirected network, by default False.

  • assortative – Flag to call the assortative network, by default True.

  • fix_eta – Flag to fix the eta parameter, by default False.

  • fix_communities – Flag to fix the community memberships, by default False.

  • fix_w – Flag to fix the affinity tensor, by default False.

  • use_approximation – Flag to use approximation in updates, by default False.

  • out_inference – Flag to evaluate inference results, by default True.

  • out_folder – Output folder for inference results, by default OUTPUT_FOLDER.

  • end_file – Suffix for the evaluation file, by default None.

  • files – Path to the file for initialization, by default None.

  • rng – Random number generator, by default None.

Returns:

  • u_f – Out-going membership matrix.

  • v_f – In-coming membership matrix.

  • w_f – Affinity tensor.

  • eta_f – Pair interaction coefficient.

  • maxL – Maximum log-likelihood.

get_params_to_load_data(args: Namespace) Dict[str, Any][source]#

Get the parameters for the models.

load_data(files: str, adj_name: str, **kwargs)[source]#

Load data from the input folder.

preprocess_data_for_fit(data: COO | ndarray, data_X: ndarray, batch_size: int | None = None) Tuple[COO | ndarray, ndarray, Sequence[ndarray], Sequence[ndarray], ndarray | None, Sequence[ndarray] | None, Sequence[ndarray] | None][source]#

Preprocesses the input data for fitting the models.

This method handles the sparsity of the data, saves the indices of the non-zero entries, and optionally selects a subset of nodes for batch processing.

Parameters:
  • data (GraphDataType) – The graph adjacency tensor to be preprocessed.

  • data_X (np.ndarray) – The one-hot encoding version of the design matrix to be preprocessed.

  • batch_size (Optional[int], default=None) – The size of the subset of nodes to compute the likelihood with. If None, the method will automatically determine the batch size based on the number of nodes.

Returns:

  • preprocessed_data (GraphDataType) – The preprocessed graph adjacency tensor.

  • preprocessed_data_X (np.ndarray) – The preprocessed one-hot encoding version of the design matrix.

  • subs_nz (TupleArrays) – The indices of the non-zero entries in the data.

  • subs_X_nz (TupleArrays) – The indices of the non-zero entries in the design matrix.

  • subset_N (Optional[np.ndarray]) – The subset of nodes selected for batch processing. None if no subset is selected.

  • Subs (Optional[TupleArrays]) – The list of tuples representing the non-zero entries in the data. None if no subset is selected.

  • SubsX (Optional[TupleArrays]) – The list of tuples representing the non-zero entries in the design matrix. None if no subset is selected.