probinet.input.loader#

Functions for handling the data.

Functions

build_adjacency_and_design_from_file(in_folder)

Import data, i.e. the adjacency tensor and the design matrix, from a given folder.

build_adjacency_from_file(path_to_file[, ...])

Import data, i.e., the adjacency matrix, from a given folder.

build_adjacency_from_networkx(network, ...)

Import networkx graph and convert it to the GraphData object

read_and_process_design_matrix(...)

Read and process the design matrix with covariates.

read_design_matrix(df_X, nodes[, attribute, ego])

Create the design matrix with the one-hot encoding of the given attribute.

read_graph(df_adj[, ego, alter, undirected, ...])

Create the graph by adding edges and nodes.

probinet.input.loader.build_adjacency_and_design_from_file(in_folder: str, adj_name: str = 'synthetic_multilayer_network.csv', cov_name: str = 'synthetic_design_matrix.csv', ego: str = 'source', egoX: str = 'Name', alter: str = 'target', attr_name: str = 'Metadata', undirected: bool = False, binary: bool = True, force_dense: bool = True, noselfloop: bool = True, sep: str = ',', header: int | None = 0, return_X_as_np: bool = True) GraphData[source]#

Import data, i.e. the adjacency tensor and the design matrix, from a given folder.

Parameters:
  • in_folder (str) – Path of the folder containing the input files.

  • adj_name (str) – Input file name of the adjacency tensor.

  • cov_name (str) – Input file name of the design matrix.

  • ego (str) – Name of the column to consider as the source of the edge.

  • egoX (str) – Name of the column to consider as node IDs in the design matrix-attribute dataset.

  • alter (str) – Name of the column to consider as the target of the edge.

  • attr_name (str) – Name of the attribute to consider in the analysis.

  • undirected (bool) – If set to True, the network is considered undirected.

  • binary – If set to True, the network is treated as binary.

  • force_dense (bool) – If set to True, the network is saved in a dense adjacency tensor.

  • noselfloop (bool) – If set to True, the self-loops are removed.

  • sep (str) – Separator to use when reading the dataset.

  • header (int) – Row number to use as the column names, and the start of the data.

  • return_X_as_np (bool) – If set to True, the design matrix is returned as a numpy array.

Returns:

  • A (list of nx.MultiDiGraph) – List of MultiDiGraph NetworkX objects representing the layers of the network.

  • B (ndarray or sparse.COO) – Graph adjacency tensor. If force_dense is True, returns a dense ndarray. Otherwise, returns a sparse COO tensor.

  • X_attr (pd.DataFrame or None) – Pandas DataFrame object representing the one-hot encoding version of the design matrix. Returns None if the design matrix is not provided.

  • nodes (list of str) – List of node IDs.

probinet.input.loader.build_adjacency_from_file(path_to_file: PathLike, ego: str = 'source', alter: str = 'target', force_dense: bool = True, undirected: bool = False, noselfloop: bool = True, sep: str = '\\s+', binary: bool = True, header: int | None = 0, **_kwargs: Any) GraphData[source]#

Import data, i.e., the adjacency matrix, from a given folder.

Return the NetworkX graph and its numpy adjacency matrix.

Parameters:
  • path_to_file – Path of the input file.

  • ego – Name of the column to consider as the source of the edge.

  • alter – Name of the column to consider as the target of the edge.

  • force_dense – If set to True, the network is saved in a dense adjacency tensor.

  • undirected – If set to True, the network is considered undirected.

  • noselfloop – If set to True, the self-loops are removed.

  • sep – Separator to use when reading the dataset.

  • binary – If set to True, the network is treated as binary.

  • header – Row number to use as the column names, and the start of the data.

Returns:

Named tuple containing the graph list, the adjacency tensor, the transposed tensor, the data values, and the nodes.

Return type:

GraphData

probinet.input.loader.build_adjacency_from_networkx(network: Graph, weight_list: list[str], file_name: PathLike | None = None) GraphData[source]#

Import networkx graph and convert it to the GraphData object

Parameters:
  • networkx – networkx graph that will be converted to GraphData object

  • weight_list – list of names of weights user would like to use from networkx graph

  • file_name – name of csv file (and path) created from networkx graph (used to create GraphData object) e.g. /path/to/file/file_name.csv

Returns:

GraphData object created from networkx graph

Return type:

GraphData

probinet.input.loader.read_and_process_design_matrix(in_folder_path: PathLike, cov_name: str, sep: str, header: int | None, nodes: list[str], attr_name: str, egoX: str) DataFrame[source]#

Read and process the design matrix with covariates.

Parameters:
  • in_folder_path – Path to the folder containing the input files.

  • cov_name – Name of the covariate file.

  • sep (str) – Separator to use when reading the covariate file.

  • header – Row number to use as the column names, and the start of the data.

  • nodes – List of node IDs.

  • attr_name – Name of the attribute to consider in the analysis.

  • egoX (str) – Name of the column to consider as node IDs in the design matrix.

Returns:

Pandas DataFrame that represents the one-hot encoding version of the design matrix.

Return type:

X_attr

probinet.input.loader.read_design_matrix(df_X: DataFrame, nodes: list, attribute: str | None = None, ego: str = 'Name')[source]#

Create the design matrix with the one-hot encoding of the given attribute.

Parameters:
  • df_X (DataFrame) – Pandas DataFrame object containing the covariates of the nodes.

  • nodes (list) – List of nodes IDs.

  • attribute (str) – Name of the attribute to consider in the analysis.

  • ego (str) – Name of the column to consider as node IDs in the design matrix.

Returns:

X_attr – Pandas DataFrame that represents the one-hot encoding version of the design matrix.

Return type:

DataFrame

probinet.input.loader.read_graph(df_adj: DataFrame, ego: str = 'source', alter: str = 'target', undirected: bool = False, noselfloop: bool = True, binary: bool = True, label: str = 'weight') list[MultiDiGraph][source]#

Create the graph by adding edges and nodes.

Return the list MultiGraph (or MultiDiGraph if undirected=False) NetworkX objects. The graph is built by adding edges and nodes from the given DataFrame. The graphs listed in the output have an edge attribute named label.

Parameters:
  • df_adj (DataFrame) – Pandas DataFrame object containing the edges of the graph.

  • ego (str) – Name of the column to consider as the source of the edge.

  • alter (str) – Name of the column to consider as the target of the edge.

  • undirected (bool) – If set to True, the network is considered undirected.

  • noselfloop (bool) – If set to True, the self-loops are removed.

  • binary (bool) – If set to True, the network is treated as binary.

  • label (str) – Name to be assigned to the edge attribute, across all layers.

Returns:

A – List of MultiGraph (or MultiDiGraph if undirected=False) NetworkX objects.

Return type:

list