graphtoolbox.data.preprocessing¶

Functions

`create_variable`(df, var_name, val)	Adds a new column to a DataFrame with specified values.
`extract_dataframe`(df[, day_inf, day_sup])	Extracts a subset of a DataFrame based on a date range.
`extract_dummies`(df, var_names)	Converts specified categorical variables in a DataFrame to dummy/indicator variables.
`sub_df`(df, var_name, val)	Filters a DataFrame to include only rows where a specified column matches a given value.

graphtoolbox.data.preprocessing.extract_dataframe(df: DataFrame, day_inf: str | None = None, day_sup: str | None = None) → DataFrame[source][source]¶

Extracts a subset of a DataFrame based on a date range.

This function filters the rows of the input DataFrame df to include only those within the specified date range [day_inf, day_sup). The date column in the DataFrame must be named ‘date’ and should be in the ‘YYYY-MM-DD’ format.

Parameters:

df (pd.DataFrame) – The input DataFrame with a ‘date’ column.
day_inf (str, optional) – The start date of the range (inclusive) in ‘YYYY-MM-DD’ format. If None, it defaults to the earliest date in the DataFrame.
day_sup (str, optional) – The end date of the range (exclusive) in ‘YYYY-MM-DD’ format. If None, it defaults to the latest date in the DataFrame.

Returns:

A new DataFrame containing only the rows within the specified date range.

Return type:

pd.DataFrame

graphtoolbox.data.preprocessing.create_variable(df: DataFrame, var_name: str, val: ndarray) → DataFrame[source][source]¶

Adds a new column to a DataFrame with specified values.

This function creates a new column in the input DataFrame df with the name var_name and populates it with the values provided in the val array. The length of the val array must match the number of rows in the DataFrame.

Parameters:

df (pd.DataFrame) – The input DataFrame to which the new column will be added.
var_name (str) – The name of the new column to be created.
val (np.ndarray) – A NumPy array containing the values to be added to the new column. The length of this array must match the number of rows in the DataFrame.

Returns:

A new DataFrame with the additional column var_name containing the values from val.

Return type:

pd.DataFrame

graphtoolbox.data.preprocessing.sub_df(df: DataFrame, var_name: str, val: Any) → DataFrame[source][source]¶

Filters a DataFrame to include only rows where a specified column matches a given value.

This function creates a new DataFrame containing only the rows from the input DataFrame df where the values in the column var_name are equal to val.

Parameters:

df (pd.DataFrame) – The input DataFrame to be filtered.
var_name (str) – The name of the column to be filtered on.
val (Any) – The value that the column var_name should match to be included in the output DataFrame.

Returns:

A new DataFrame containing only the rows where df[var_name] equals val.

Return type:

pd.DataFrame

graphtoolbox.data.preprocessing.extract_dummies(df: DataFrame, var_names: ndarray) → DataFrame[source][source]¶

Converts specified categorical variables in a DataFrame to dummy/indicator variables.

Parameters:

df (pd.DataFrame) – The input DataFrame containing the data.
var_names (np.ndarray) – An array of column names to be converted to dummy variables.

Returns:

A new DataFrame with the original columns and the added dummy variables.

Return type:

pd.DataFrame