probinet.input.loader#
Functions for handling the data.
Functions
|
Import data, i.e. the adjacency tensor and the design matrix, from a given folder. |
|
Import data, i.e. the adjacency matrix, from a given folder. |
|
Import networkx graph and convert it to the GraphData object |
Read and process the design matrix with covariates. |
|
|
Create the design matrix with the one-hot encoding of the given attribute. |
|
Create the graph by adding edges and nodes. |
- probinet.input.loader.build_adjacency_and_design_from_file(in_folder: str, adj_name: str = 'multilayer_network.csv', cov_name: str = 'X.csv', ego: str = 'source', egoX: str = 'Name', alter: str = 'target', attr_name: str = 'Metadata', undirected: bool = False, force_dense: bool = True, noselfloop: bool = True, sep: str = ',', header: int | None = 0, return_X_as_np: bool = True, **_kwargs) GraphData [source]#
Import data, i.e. the adjacency tensor and the design matrix, from a given folder.
- Parameters:
in_folder (str) – Path of the folder containing the input files.
adj_name (str) – Input file name of the adjacency tensor.
cov_name (str) – Input file name of the design matrix.
ego (str) – Name of the column to consider as the source of the edge.
egoX (str) – Name of the column to consider as node IDs in the design matrix-attribute dataset.
alter (str) – Name of the column to consider as the target of the edge.
attr_name (str) – Name of the attribute to consider in the analysis.
undirected (bool) – If set to True, the algorithm considers an undirected graph.
force_dense (bool) – If set to True, the algorithm is forced to consider a dense adjacency tensor.
noselfloop (bool) – If set to True, the algorithm removes the self-loops.
sep (str) – Separator to use when reading the dataset.
header (int) – Row number to use as the column names, and the start of the data.
return_X_as_np (bool) – If set to True, the design matrix is returned as a numpy array.
_kwargs – Additional keyword arguments.
- Returns:
A (list of nx.MultiDiGraph) – List of MultiDiGraph NetworkX objects representing the layers of the network.
B (ndarray or sparse.COO) – Graph adjacency tensor. If force_dense is True, returns a dense ndarray. Otherwise, returns a sparse COO tensor.
X_attr (pd.DataFrame or None) – Pandas DataFrame object representing the one-hot encoding version of the design matrix. Returns None if the design matrix is not provided.
nodes (list of str) – List of node IDs.
- probinet.input.loader.build_adjacency_from_file(path_to_file: PathLike, ego: str = 'source', alter: str = 'target', force_dense: bool = True, undirected: bool = False, noselfloop: bool = True, sep: str = '\\s+', binary: bool = True, header: int | None = 0, **_kwargs: Any) GraphData [source]#
Import data, i.e. the adjacency matrix, from a given folder.
Return the NetworkX graph and its numpy adjacency matrix.
- path_to_file
Path of the input file.
- ego
Name of the column to consider as the source of the edge.
- alter
Name of the column to consider as the target of the edge.
- force_dense
If set to True, the algorithm is forced to consider a dense adjacency tensor.
- undirected
If set to True, the algorithm considers an undirected graph.
- noselfloop
If set to True, the algorithm removes the self-loops.
- sep
Separator to use when reading the dataset.
- binary
If set to True, the algorithm reads the graph with binary edges.
- header
Row number to use as the column names, and the start of the data.
- GraphData
Named tuple containing the graph list, the adjacency tensor, the transposed tensor, the data values, and the nodes.
- probinet.input.loader.build_adjacency_from_networkx(network: Graph, weight_list: list[str], file_name: PathLike | None = None) GraphData [source]#
Import networkx graph and convert it to the GraphData object
- Parameters:
networkx – networkx graph that will be converted to GraphData object
weight_list – list of names of weights user would like to use from networkx graph
file_name – name of csv file (and path) created from networkx graph (used to create GraphData object) e.g. /path/to/file/file_name.csv
- Returns:
GraphData object created from networkx graph
- Return type:
- probinet.input.loader.read_and_process_design_matrix(in_folder_path: PathLike, cov_name: str, sep: str, header: int | None, nodes: list[str], attr_name: str, egoX: str) DataFrame [source]#
Read and process the design matrix with covariates.
- Parameters:
in_folder_path – Path to the folder containing the input files.
cov_name – Name of the covariate file.
sep (str) – Separator to use when reading the covariate file.
header – Row number to use as the column names, and the start of the data.
nodes – List of node IDs.
attr_name – Name of the attribute to consider in the analysis.
egoX (str) – Name of the column to consider as node IDs in the design matrix.
- Returns:
Pandas DataFrame that represents the one-hot encoding version of the design matrix.
- Return type:
X_attr
- probinet.input.loader.read_design_matrix(df_X: DataFrame, nodes: list, attribute: str | None = None, ego: str = 'Name')[source]#
Create the design matrix with the one-hot encoding of the given attribute.
- Parameters:
df_X (DataFrame) – Pandas DataFrame object containing the covariates of the nodes.
nodes (list) – List of nodes IDs.
attribute (str) – Name of the attribute to consider in the analysis.
ego (str) – Name of the column to consider as node IDs in the design matrix.
- Returns:
X_attr – Pandas DataFrame that represents the one-hot encoding version of the design matrix.
- Return type:
DataFrame
- probinet.input.loader.read_graph(df_adj: DataFrame, ego: str = 'source', alter: str = 'target', undirected: bool = False, noselfloop: bool = True, binary: bool = True, label: str = 'weight') list[MultiDiGraph] [source]#
Create the graph by adding edges and nodes.
Return the list MultiGraph (or MultiDiGraph if undirected=False) NetworkX objects. The graph is built by adding edges and nodes from the given DataFrame. The graphs listed in the output have an edge attribute named label.
- Parameters:
df_adj (DataFrame) – Pandas DataFrame object containing the edges of the graph.
ego (str) – Name of the column to consider as the source of the edge.
alter (str) – Name of the column to consider as the target of the edge.
undirected (bool) – If set to True, the algorithm considers an undirected graph.
noselfloop (bool) – If set to True, the algorithm removes the self-loops.
binary (bool) – If set to True, read the graph with binary edges.
label (str) – Name to be assigned to the edge attribute, across all layers.
- Returns:
A – List of MultiGraph (or MultiDiGraph if undirected=False) NetworkX objects.
- Return type:
list