data:image/s3,"s3://crabby-images/60c92/60c92f4ac6a52f318c89f38da4056a21557b4b37" alt="Klib store"
data:image/s3,"s3://crabby-images/c166b/c166b24483f31d7f57c085055d3e27cf52c7c4af" alt="klib store klib store"
If you are dealing with data where duplicates add value, consider setting drop_duplicates=False. drops duplicate rows: This is a straightforward drop of entirely duplicate rows.Other examples are “download_date” or indicator variables which are identical for all entries. This comes in handy when columns such as “year” are included while you’re just looking at a single year. removes single valued columns: As the name states, this removes columns in which each cell contains the same value.The default is to drop columns and rows with more than 90% of the values missing. dropping empty and virtually empty columns: You can use the parameters drop_threshold_cols and drop_threshold_rows to adjust the dropping to your needs.Some column name examples: Yards.Gained -> yards_gained PlayAttempted -> play_attempted Challenge.Replay -> challenge_replay This also checks for and fixes duplicate column names, which you sometimes get when reading data from a file. cleaning the column names: This unifies the column names by formatting them, splitting, among others, CamelCase into camel_case, removing special characters as well as leading and trailing white-spaces and formatting all column names to lowercase_and_underscore_separated.
data:image/s3,"s3://crabby-images/d4ffc/d4ffc04a1fbc99fe1f9c94fcaed9f1450523b280" alt="klib store klib store"
Klib.data_cleaning() performs a number of steps, among them:
data:image/s3,"s3://crabby-images/60c92/60c92f4ac6a52f318c89f38da4056a21557b4b37" alt="Klib store"