graphtoolbox.data.preprocessing

Functions

create_variable(df, var_name, val)

Adds a new column to a DataFrame with specified values.

extract_dataframe(df[, day_inf, day_sup])

Extracts a subset of a DataFrame based on a date range.

extract_dummies(df, var_names)

Converts specified categorical variables in a DataFrame to dummy/indicator variables.

sub_df(df, var_name, val)

Filters a DataFrame to include only rows where a specified column matches a given value.

graphtoolbox.data.preprocessing.extract_dataframe(df: DataFrame, day_inf: str | None = None, day_sup: str | None = None) DataFrame[source][source]

Extracts a subset of a DataFrame based on a date range.

This function filters the rows of the input DataFrame df to include only those within the specified date range [day_inf, day_sup). The date column in the DataFrame must be named ‘date’ and should be in the ‘YYYY-MM-DD’ format.

Parameters:
  • df (pd.DataFrame) – The input DataFrame with a ‘date’ column.

  • day_inf (str, optional) – The start date of the range (inclusive) in ‘YYYY-MM-DD’ format. If None, it defaults to the earliest date in the DataFrame.

  • day_sup (str, optional) – The end date of the range (exclusive) in ‘YYYY-MM-DD’ format. If None, it defaults to the latest date in the DataFrame.

Returns:

A new DataFrame containing only the rows within the specified date range.

Return type:

pd.DataFrame

graphtoolbox.data.preprocessing.create_variable(df: DataFrame, var_name: str, val: ndarray) DataFrame[source][source]

Adds a new column to a DataFrame with specified values.

This function creates a new column in the input DataFrame df with the name var_name and populates it with the values provided in the val array. The length of the val array must match the number of rows in the DataFrame.

Parameters:
  • df (pd.DataFrame) – The input DataFrame to which the new column will be added.

  • var_name (str) – The name of the new column to be created.

  • val (np.ndarray) – A NumPy array containing the values to be added to the new column. The length of this array must match the number of rows in the DataFrame.

Returns:

A new DataFrame with the additional column var_name containing the values from val.

Return type:

pd.DataFrame

graphtoolbox.data.preprocessing.sub_df(df: DataFrame, var_name: str, val: Any) DataFrame[source][source]

Filters a DataFrame to include only rows where a specified column matches a given value.

This function creates a new DataFrame containing only the rows from the input DataFrame df where the values in the column var_name are equal to val.

Parameters:
  • df (pd.DataFrame) – The input DataFrame to be filtered.

  • var_name (str) – The name of the column to be filtered on.

  • val (Any) – The value that the column var_name should match to be included in the output DataFrame.

Returns:

A new DataFrame containing only the rows where df[var_name] equals val.

Return type:

pd.DataFrame

graphtoolbox.data.preprocessing.extract_dummies(df: DataFrame, var_names: ndarray) DataFrame[source][source]

Converts specified categorical variables in a DataFrame to dummy/indicator variables.

Parameters:
  • df (pd.DataFrame) – The input DataFrame containing the data.

  • var_names (np.ndarray) – An array of column names to be converted to dummy variables.

Returns:

A new DataFrame with the original columns and the added dummy variables.

Return type:

pd.DataFrame