probinet.input.loader#

Functions for handling the data.

Functions

build_adjacency_and_design_from_file(in_folder)

Import data, i.e. the adjacency tensor and the design matrix, from a given folder.

build_adjacency_from_file(path_to_file[, ...])

Import data, i.e. the adjacency matrix, from a given folder.

build_adjacency_from_networkx(network, ...)

Import networkx graph and convert it to the GraphData object

read_and_process_design_matrix(...)

Read and process the design matrix with covariates.

read_design_matrix(df_X, nodes[, attribute, ego])

Create the design matrix with the one-hot encoding of the given attribute.

read_graph(df_adj[, ego, alter, undirected, ...])

Create the graph by adding edges and nodes.

probinet.input.loader.build_adjacency_and_design_from_file(in_folder: str, adj_name: str = 'multilayer_network.csv', cov_name: str = 'X.csv', ego: str = 'source', egoX: str = 'Name', alter: str = 'target', attr_name: str = 'Metadata', undirected: bool = False, force_dense: bool = True, noselfloop: bool = True, sep: str = ',', header: int | None = 0, return_X_as_np: bool = True, **_kwargs) GraphData[source]#

Import data, i.e. the adjacency tensor and the design matrix, from a given folder.

Parameters:
  • in_folder (str) – Path of the folder containing the input files.

  • adj_name (str) – Input file name of the adjacency tensor.

  • cov_name (str) – Input file name of the design matrix.

  • ego (str) – Name of the column to consider as the source of the edge.

  • egoX (str) – Name of the column to consider as node IDs in the design matrix-attribute dataset.

  • alter (str) – Name of the column to consider as the target of the edge.

  • attr_name (str) – Name of the attribute to consider in the analysis.

  • undirected (bool) – If set to True, the algorithm considers an undirected graph.

  • force_dense (bool) – If set to True, the algorithm is forced to consider a dense adjacency tensor.

  • noselfloop (bool) – If set to True, the algorithm removes the self-loops.

  • sep (str) – Separator to use when reading the dataset.

  • header (int) – Row number to use as the column names, and the start of the data.

  • return_X_as_np (bool) – If set to True, the design matrix is returned as a numpy array.

  • _kwargs – Additional keyword arguments.

Returns:

  • A (list of nx.MultiDiGraph) – List of MultiDiGraph NetworkX objects representing the layers of the network.

  • B (ndarray or sparse.COO) – Graph adjacency tensor. If force_dense is True, returns a dense ndarray. Otherwise, returns a sparse COO tensor.

  • X_attr (pd.DataFrame or None) – Pandas DataFrame object representing the one-hot encoding version of the design matrix. Returns None if the design matrix is not provided.

  • nodes (list of str) – List of node IDs.

probinet.input.loader.build_adjacency_from_file(path_to_file: PathLike, ego: str = 'source', alter: str = 'target', force_dense: bool = True, undirected: bool = False, noselfloop: bool = True, sep: str = '\\s+', binary: bool = True, header: int | None = 0, **_kwargs: Any) GraphData[source]#

Import data, i.e. the adjacency matrix, from a given folder.

Return the NetworkX graph and its numpy adjacency matrix.

path_to_file

Path of the input file.

ego

Name of the column to consider as the source of the edge.

alter

Name of the column to consider as the target of the edge.

force_dense

If set to True, the algorithm is forced to consider a dense adjacency tensor.

undirected

If set to True, the algorithm considers an undirected graph.

noselfloop

If set to True, the algorithm removes the self-loops.

sep

Separator to use when reading the dataset.

binary

If set to True, the algorithm reads the graph with binary edges.

header

Row number to use as the column names, and the start of the data.

GraphData

Named tuple containing the graph list, the adjacency tensor, the transposed tensor, the data values, and the nodes.

probinet.input.loader.build_adjacency_from_networkx(network: Graph, weight_list: list[str], file_name: PathLike | None = None) GraphData[source]#

Import networkx graph and convert it to the GraphData object

Parameters:
  • networkx – networkx graph that will be converted to GraphData object

  • weight_list – list of names of weights user would like to use from networkx graph

  • file_name – name of csv file (and path) created from networkx graph (used to create GraphData object) e.g. /path/to/file/file_name.csv

Returns:

GraphData object created from networkx graph

Return type:

GraphData

probinet.input.loader.read_and_process_design_matrix(in_folder_path: PathLike, cov_name: str, sep: str, header: int | None, nodes: list[str], attr_name: str, egoX: str) DataFrame[source]#

Read and process the design matrix with covariates.

Parameters:
  • in_folder_path – Path to the folder containing the input files.

  • cov_name – Name of the covariate file.

  • sep (str) – Separator to use when reading the covariate file.

  • header – Row number to use as the column names, and the start of the data.

  • nodes – List of node IDs.

  • attr_name – Name of the attribute to consider in the analysis.

  • egoX (str) – Name of the column to consider as node IDs in the design matrix.

Returns:

Pandas DataFrame that represents the one-hot encoding version of the design matrix.

Return type:

X_attr

probinet.input.loader.read_design_matrix(df_X: DataFrame, nodes: list, attribute: str | None = None, ego: str = 'Name')[source]#

Create the design matrix with the one-hot encoding of the given attribute.

Parameters:
  • df_X (DataFrame) – Pandas DataFrame object containing the covariates of the nodes.

  • nodes (list) – List of nodes IDs.

  • attribute (str) – Name of the attribute to consider in the analysis.

  • ego (str) – Name of the column to consider as node IDs in the design matrix.

Returns:

X_attr – Pandas DataFrame that represents the one-hot encoding version of the design matrix.

Return type:

DataFrame

probinet.input.loader.read_graph(df_adj: DataFrame, ego: str = 'source', alter: str = 'target', undirected: bool = False, noselfloop: bool = True, binary: bool = True, label: str = 'weight') list[MultiDiGraph][source]#

Create the graph by adding edges and nodes.

Return the list MultiGraph (or MultiDiGraph if undirected=False) NetworkX objects. The graph is built by adding edges and nodes from the given DataFrame. The graphs listed in the output have an edge attribute named label.

Parameters:
  • df_adj (DataFrame) – Pandas DataFrame object containing the edges of the graph.

  • ego (str) – Name of the column to consider as the source of the edge.

  • alter (str) – Name of the column to consider as the target of the edge.

  • undirected (bool) – If set to True, the algorithm considers an undirected graph.

  • noselfloop (bool) – If set to True, the algorithm removes the self-loops.

  • binary (bool) – If set to True, read the graph with binary edges.

  • label (str) – Name to be assigned to the edge attribute, across all layers.

Returns:

A – List of MultiGraph (or MultiDiGraph if undirected=False) NetworkX objects.

Return type:

list