0

How to handle missing data in python?

I have started a course for machine learning in udemy. To take care of missing data in the datasets they have imported imputer from sklearn.preprocessing but neither imputer nor simpleimputet is available to download.. Any help would be hugely appreciated. thank you!

28th Oct 2018, 3:18 AM
Risav Ghosh
Risav Ghosh - avatar
1 Answer
+ 1
Missing data is best handled by using df.fillna() where inside the () you could use a zero. Caveat this to say as an example: # Import pandas and assign as pd import pandas as pd # Read in the file to a dataframe named df df = pd.read_csv('yourfile.csv') # Show the dataframe to see missing values df # Fill na will allow us to remove the NaN's with # zeroes in this case df.fillna(0) # Show the dataframe now to confirm no missing # values df If you want to get fancy and say utilize the average of a column to fill missing values in that column, there is functionality to do that as well. Do a descriptive statistics pass of the dataframe, take the mean from the column in question and put it into the fillna() and voila!
13th Dec 2018, 12:00 AM
Chris Ford
Chris Ford - avatar