graphtoolbox.data.preprocessing¶
Functions
|
Adds a new column to a DataFrame with specified values. |
|
Extracts a subset of a DataFrame based on a date range. |
|
Converts specified categorical variables in a DataFrame to dummy/indicator variables. |
|
Filters a DataFrame to include only rows where a specified column matches a given value. |
- graphtoolbox.data.preprocessing.extract_dataframe(df: DataFrame, day_inf: str | None = None, day_sup: str | None = None) DataFrame[source][source]¶
Extracts a subset of a DataFrame based on a date range.
This function filters the rows of the input DataFrame df to include only those within the specified date range [day_inf, day_sup). The date column in the DataFrame must be named ‘date’ and should be in the ‘YYYY-MM-DD’ format.
- Parameters:
df (pd.DataFrame) – The input DataFrame with a ‘date’ column.
day_inf (str, optional) – The start date of the range (inclusive) in ‘YYYY-MM-DD’ format. If None, it defaults to the earliest date in the DataFrame.
day_sup (str, optional) – The end date of the range (exclusive) in ‘YYYY-MM-DD’ format. If None, it defaults to the latest date in the DataFrame.
- Returns:
A new DataFrame containing only the rows within the specified date range.
- Return type:
pd.DataFrame
- graphtoolbox.data.preprocessing.create_variable(df: DataFrame, var_name: str, val: ndarray) DataFrame[source][source]¶
Adds a new column to a DataFrame with specified values.
This function creates a new column in the input DataFrame df with the name var_name and populates it with the values provided in the val array. The length of the val array must match the number of rows in the DataFrame.
- Parameters:
df (pd.DataFrame) – The input DataFrame to which the new column will be added.
var_name (str) – The name of the new column to be created.
val (np.ndarray) – A NumPy array containing the values to be added to the new column. The length of this array must match the number of rows in the DataFrame.
- Returns:
A new DataFrame with the additional column var_name containing the values from val.
- Return type:
pd.DataFrame
- graphtoolbox.data.preprocessing.sub_df(df: DataFrame, var_name: str, val: Any) DataFrame[source][source]¶
Filters a DataFrame to include only rows where a specified column matches a given value.
This function creates a new DataFrame containing only the rows from the input DataFrame df where the values in the column var_name are equal to val.
- Parameters:
df (pd.DataFrame) – The input DataFrame to be filtered.
var_name (str) – The name of the column to be filtered on.
val (Any) – The value that the column var_name should match to be included in the output DataFrame.
- Returns:
A new DataFrame containing only the rows where df[var_name] equals val.
- Return type:
pd.DataFrame
- graphtoolbox.data.preprocessing.extract_dummies(df: DataFrame, var_names: ndarray) DataFrame[source][source]¶
Converts specified categorical variables in a DataFrame to dummy/indicator variables.
- Parameters:
df (pd.DataFrame) – The input DataFrame containing the data.
var_names (np.ndarray) – An array of column names to be converted to dummy variables.
- Returns:
A new DataFrame with the original columns and the added dummy variables.
- Return type:
pd.DataFrame