probinet.input.loader#

Functions for handling the data.

Functions

`build_adjacency_and_design_from_file`(in_folder)	Import data, i.e. the adjacency tensor and the design matrix, from a given folder.
`build_adjacency_from_file`(path_to_file[, ...])	Import data, i.e., the adjacency matrix, from a given folder.
`build_adjacency_from_networkx`(network, ...)	Import networkx graph and convert it to the GraphData object
`read_and_process_design_matrix`(...)	Read and process the design matrix with covariates.
`read_design_matrix`(df_X, nodes[, attribute, ego])	Create the design matrix with the one-hot encoding of the given attribute.
`read_graph`(df_adj[, ego, alter, undirected, ...])	Create the graph by adding edges and nodes.

probinet.input.loader.build_adjacency_and_design_from_file(in_folder: str, adj_name: str = 'synthetic_multilayer_network.csv', cov_name: str = 'synthetic_design_matrix.csv', ego: str = 'source', egoX: str = 'Name', alter: str = 'target', attr_name: str = 'Metadata', undirected: bool = False, binary: bool = True, force_dense: bool = True, noselfloop: bool = True, sep: str = ',', header: int | None = 0, return_X_as_np: bool = True) → GraphData[source]#

Import data, i.e. the adjacency tensor and the design matrix, from a given folder.

Parameters:

in_folder (str) – Path of the folder containing the input files.
adj_name (str) – Input file name of the adjacency tensor.
cov_name (str) – Input file name of the design matrix.
ego (str) – Name of the column to consider as the source of the edge.
egoX (str) – Name of the column to consider as node IDs in the design matrix-attribute dataset.
alter (str) – Name of the column to consider as the target of the edge.
attr_name (str) – Name of the attribute to consider in the analysis.
undirected (bool) – If set to True, the network is considered undirected.
binary – If set to True, the network is treated as binary.
force_dense (bool) – If set to True, the network is saved in a dense adjacency tensor.
noselfloop (bool) – If set to True, the self-loops are removed.
sep (str) – Separator to use when reading the dataset.
header (int) – Row number to use as the column names, and the start of the data.
return_X_as_np (bool) – If set to True, the design matrix is returned as a numpy array.

Returns:

A (list of nx.MultiDiGraph) – List of MultiDiGraph NetworkX objects representing the layers of the network.
B (ndarray or sparse.COO) – Graph adjacency tensor. If force_dense is True, returns a dense ndarray. Otherwise, returns a sparse COO tensor.
X_attr (pd.DataFrame or None) – Pandas DataFrame object representing the one-hot encoding version of the design matrix. Returns None if the design matrix is not provided.
nodes (list of str) – List of node IDs.

probinet.input.loader.build_adjacency_from_file(path_to_file: PathLike, ego: str = 'source', alter: str = 'target', force_dense: bool = True, undirected: bool = False, noselfloop: bool = True, sep: str = '\\s+', binary: bool = True, header: int | None = 0, **_kwargs: Any) → GraphData[source]#

Import data, i.e., the adjacency matrix, from a given folder.

Return the NetworkX graph and its numpy adjacency matrix.

Parameters:

path_to_file – Path of the input file.
ego – Name of the column to consider as the source of the edge.
alter – Name of the column to consider as the target of the edge.
force_dense – If set to True, the network is saved in a dense adjacency tensor.
undirected – If set to True, the network is considered undirected.
noselfloop – If set to True, the self-loops are removed.
sep – Separator to use when reading the dataset.
binary – If set to True, the network is treated as binary.
header – Row number to use as the column names, and the start of the data.

Returns:

Named tuple containing the graph list, the adjacency tensor, the transposed tensor, the data values, and the nodes.

Return type:

GraphData

probinet.input.loader.build_adjacency_from_networkx(network: Graph, weight_list: list[str], file_name: PathLike | None = None) → GraphData[source]#

Import networkx graph and convert it to the GraphData object

Parameters:

networkx – networkx graph that will be converted to GraphData object
weight_list – list of names of weights user would like to use from networkx graph
file_name – name of csv file (and path) created from networkx graph (used to create GraphData object) e.g. /path/to/file/file_name.csv

Returns:

GraphData object created from networkx graph

Return type:

GraphData

probinet.input.loader.read_and_process_design_matrix(in_folder_path: PathLike, cov_name: str, sep: str, header: int | None, nodes: list[str], attr_name: str, egoX: str) → DataFrame[source]#

Read and process the design matrix with covariates.

Parameters:

in_folder_path – Path to the folder containing the input files.
cov_name – Name of the covariate file.
sep (str) – Separator to use when reading the covariate file.
header – Row number to use as the column names, and the start of the data.
nodes – List of node IDs.
attr_name – Name of the attribute to consider in the analysis.
egoX (str) – Name of the column to consider as node IDs in the design matrix.

Returns:

Pandas DataFrame that represents the one-hot encoding version of the design matrix.

Return type:

X_attr

probinet.input.loader.read_design_matrix(df_X: DataFrame, nodes: list, attribute: str | None = None, ego: str = 'Name')[source]#

Create the design matrix with the one-hot encoding of the given attribute.

Parameters:

df_X (DataFrame) – Pandas DataFrame object containing the covariates of the nodes.
nodes (list) – List of nodes IDs.
attribute (str) – Name of the attribute to consider in the analysis.
ego (str) – Name of the column to consider as node IDs in the design matrix.

Returns:

X_attr – Pandas DataFrame that represents the one-hot encoding version of the design matrix.

Return type:

DataFrame

probinet.input.loader.read_graph(df_adj: DataFrame, ego: str = 'source', alter: str = 'target', undirected: bool = False, noselfloop: bool = True, binary: bool = True, label: str = 'weight') → list[MultiDiGraph][source]#

Create the graph by adding edges and nodes.

Return the list MultiGraph (or MultiDiGraph if undirected=False) NetworkX objects. The graph is built by adding edges and nodes from the given DataFrame. The graphs listed in the output have an edge attribute named label.

Parameters:

df_adj (DataFrame) – Pandas DataFrame object containing the edges of the graph.
ego (str) – Name of the column to consider as the source of the edge.
alter (str) – Name of the column to consider as the target of the edge.
undirected (bool) – If set to True, the network is considered undirected.
noselfloop (bool) – If set to True, the self-loops are removed.
binary (bool) – If set to True, the network is treated as binary.
label (str) – Name to be assigned to the edge attribute, across all layers.

Returns:

A – List of MultiGraph (or MultiDiGraph if undirected=False) NetworkX objects.

Return type:

list

probinet.input.loader

Contents

probinet.input.loader#