probinet.utils.tools#

This module contains utility functions for data manipulation and file I/O.

Functions

build_edgelist(A, layer)

Build the edgelist for a given layer of an adjacency tensor in DataFrame format.

can_cast_to_int(string)

Verify if one object can be converted to integer object.

check_symmetric(a[, rtol, atol])

Check if a matrix or a list of matrices is symmetric.

create_design_matrix(metadata[, nodeID, ...])

Create the design matrix DataFrame from metadata.

flt(x[, d])

Round a number to a specified number of decimal places.

get_item_array_from_subs(A, ref_subs)

Retrieves the values of specific entries in a dense tensor.

get_or_create_rng([rng])

Set the random seed and initialize the random number generator.

is_sparse(X)

Check whether the input tensor is sparse.

log_and_raise_error(error_type, message)

Logs an error message and raises an exception of the specified type.

output_adjacency(A, out_folder, label)

Save the adjacency tensor to a file.

save_design_matrix(X, perc[, folder, fname])

Save the design matrix to file.

sptensor_from_dense_array(X)

Create a sparse tensor from a dense array using sparse.COO.

write_adjacency(G[, folder, fname, ego, alter])

Save the adjacency tensor to file.

write_design_matrix(metadata, perc[, ...])

Save the design matrix to file.

probinet.utils.tools.build_edgelist(A: COO, layer: int) DataFrame[source]#

Build the edgelist for a given layer of an adjacency tensor in DataFrame format.

Parameters:
  • A (coo_matrix) – Sparse matrix in COOrdinate format representing the adjacency tensor.

  • layer (int) – Index of the layer for which the edgelist is to be built.

Returns:

DataFrame containing the edgelist for the specified layer with columns ‘source’, ‘target’, and ‘L<layer>’.

Return type:

pd.DataFrame

probinet.utils.tools.can_cast_to_int(string: int | float | str) bool[source]#

Verify if one object can be converted to integer object.

Parameters:

string (int or float or str) – Name of the node.

Returns:

bool – If True, the input can be converted to integer object.

Return type:

bool

probinet.utils.tools.check_symmetric(a: ndarray | List[ndarray], rtol: float = 1e-05, atol: float = 1e-08) bool[source]#

Check if a matrix or a list of matrices is symmetric.

Parameters:
  • a (ndarray or list) – Input data.

  • rtol (float) – Relative convergence_tol.

  • atol (float) – Absolute convergence_tol.

Return type:

True if the matrix is symmetric, False otherwise.

probinet.utils.tools.create_design_matrix(metadata: Dict[str, str], nodeID: str = 'Name', attr_name: str = 'Metadata') DataFrame[source]#

Create the design matrix DataFrame from metadata.

Parameters:
  • metadata (dict) – Dictionary where the keys are the node labels and the values are the metadata associated to them.

  • nodeID (str) – Name of the column with the node labels.

  • attr_name (str) – Name of the column to consider as attribute.

Returns:

X – Design matrix

Return type:

DataFrame

probinet.utils.tools.flt(x: float, d: int = 3) float[source]#

Round a number to a specified number of decimal places.

Parameters:
  • x (float) – Number to be rounded.

  • d (int) – Number of decimal places to round to.

Returns:

The input number rounded to the specified number of decimal places.

Return type:

float

probinet.utils.tools.get_item_array_from_subs(A: ndarray, ref_subs: Sequence[ndarray]) ndarray[source]#

Retrieves the values of specific entries in a dense tensor. Output is a 1-d array with dimension = number of non-zero entries.

Parameters:
  • A (np.ndarray) – The input tensor from which values are to be retrieved.

  • ref_subs (Tuple[np.ndarray]) – A tuple containing arrays of indices. Each array in the tuple corresponds to indices along one dimension of the tensor.

Returns:

A 1-dimensional array containing the values of the tensor at the specified indices.

Return type:

np.ndarray

probinet.utils.tools.get_or_create_rng(rng: Generator | None = None) Generator[source]#

Set the random seed and initialize the random number generator.

Parameters:

rng (Optional[np.random.Generator]) – Random number generator. If None, a new generator is created using the seed.

Returns:

Initialized random number generator.

Return type:

np.random.Generator

probinet.utils.tools.is_sparse(X: ndarray) bool[source]#

Check whether the input tensor is sparse. It implements a heuristic definition of sparsity. A tensor is considered sparse if: given M = number of modes S = number of entries I = number of non-zero entries then N > M(I + 1)

Parameters:

X (ndarray) – Input data.

Returns:

Boolean flag

Return type:

true if the input tensor is sparse, false otherwise.

probinet.utils.tools.log_and_raise_error(error_type: Type[BaseException], message: str) None[source]#

Logs an error message and raises an exception of the specified type.

Parameters:
  • error_type (Type[BaseException]) – The type of the exception to be raised.

  • message (str) – The error message to be logged and included in the exception.

Raises:

BaseException – An exception of the specified type with the given message.

probinet.utils.tools.output_adjacency(A: List, out_folder: PathLike, label: str)[source]#

Save the adjacency tensor to a file. Default format is space-separated .csv with L+2 columns: source_node target_node edge_l0 … edge_lL .

Parameters:
  • A (ndarray) – Adjacency tensor.

  • out_folder (str) – Output folder.

  • label (str) – Label of the evaluation file.

probinet.utils.tools.save_design_matrix(X: DataFrame, perc: float, folder: str = './', fname: str = 'X_')[source]#

Save the design matrix to file.

Parameters:
  • X (DataFrame) – Design matrix.

  • perc (float) – Fraction of match between communities and metadata.

  • folder (str) – Path of the folder where to save the files.

  • fname (str) – Name of the design matrix file.

probinet.utils.tools.sptensor_from_dense_array(X: ndarray) COO[source]#

Create a sparse tensor from a dense array using sparse.COO.

Parameters:

X (ndarray) – Input data.

Returns:

Sparse tensor created from the dense array.

Return type:

COO

probinet.utils.tools.write_adjacency(G: List[MultiDiGraph], folder: str = './', fname: str = 'multilayer_network.csv', ego: str = 'source', alter: str = 'target')[source]#

Save the adjacency tensor to file.

Parameters:
  • G (list) – List of MultiDiGraph NetworkX objects.

  • folder (str) – Path of the folder where to save the files.

  • fname (str) – Name of the adjacency tensor file.

  • ego (str) – Name of the column to consider as source of the edge.

  • alter (str) – Name of the column to consider as target of the edge.

probinet.utils.tools.write_design_matrix(metadata: Dict[str, str], perc: float, folder: str = './', fname: str = 'X_', nodeID: str = 'Name', attr_name: str = 'Metadata') DataFrame[source]#

Save the design matrix to file.

Parameters:
  • metadata (dict) – Dictionary where the keys are the node labels and the values are the metadata associated to them.

  • perc (float) – Fraction of match between communities and metadata.

  • folder (str) – Path of the folder where to save the files.

  • fname (str) – Name of the design matrix file.

  • nodeID (str) – Name of the column with the node labels.

  • attr_name (str) – Name of the column to consider as attribute.

Returns:

X – Design matrix

Return type:

DataFrame