0
How to handle missing data in python?
I have started a course for machine learning in udemy. To take care of missing data in the datasets they have imported imputer from sklearn.preprocessing but neither imputer nor simpleimputet is available to download.. Any help would be hugely appreciated. thank you!
1 Answer
+ 1
Missing data is best handled by using df.fillna() where inside the () you could use a zero.
Caveat this to say as an example:
# Import pandas and assign as pd
import pandas as pd
# Read in the file to a dataframe named df
df = pd.read_csv('yourfile.csv')
# Show the dataframe to see missing values
df
# Fill na will allow us to remove the NaN's with
# zeroes in this case
df.fillna(0)
# Show the dataframe now to confirm no missing
# values
df
If you want to get fancy and say utilize the average of a column to fill missing values in that column, there is functionality to do that as well. Do a descriptive statistics pass of the dataframe, take the mean from the column in question and put it into the fillna() and voila!