0
How to normalise only specific columns of a pandas Dataframe?
I am currently working on a Pandas dataframe in which I have a total of 52 columns (the features). I added a 53rd column which is my "Y" or the output column which contains numerical values. When applying the MinMax normalisation to the dataframe, I don't want it to apply to the "Y" column. How can we do that?
11 Answers
+ 2
Amarjeet Singh tries this one i forget that index can't be copied in case of the data frame.
try this code.
x = df.iloc[:,0:53]
min_max_scaler = preprocessing.MinMaxScaler()
x_scaled = min_max_scaler.fit_transform(x)
dataset = pd.DataFrame(x_scaled)
dataset["Your 53rd column name"] = df["Your 53rd column name"]
+ 2
Amarjeet Singh i don't why you are getting these types of errors in my case it works perfect.
You can do one thing, first of all, drop your 53rd column from dataset then save this column data in another CSV file.
Then do min_max normalise or do what you want to do with the left database. After done normalise on your data then again add your y column or your 53rd column with your normalised data.
+ 1
Try this code.
import pandas as pd
from sklearn import preprocessing
x = df.iloc[:, 0:-1] #returns a numpy array
min_max_scaler =
preprocessing.MinMaxScaler()
x_scaled = min_max_scaler.fit_transform(x)
df = pd.DataFrame(x_scaled)
+ 1
Tibor Santa
Okay so I have two separate datasets for testing and training and this is the code I am using to normalise them:
import pandas as pd
from sklearn.preprocessing import minmax_scale
#train_df is my training dataframe.
#train_norm is my required new normalised train dataframe.
#labels is a list that includes the names of the 52+1 columns in my dataframe.
#I only want to normalise the first 52 columns, and let the 53rd column be as it is, but I don't wana drop it because that is my target or "Y" column.
#I am aware that the code below normalises all the 53 columns. But I have tried many variations and nothing is working.
#I am also aware that I can add the 53rd column after normalisation but then it would be a tideous process as I will have to particularly fill all the rows of the last column.
#10580 rows x 53 columns is the size of my train_df.
train_norm = minmax_scale(train_df, feature_range=(0,1), axis=0)
train_norm = pd.DataFrame(train_norm)
train_norm.columns = labels
train_norm
+ 1
try this code.
x = df.iloc[:,0:53]
min_max_scaler = preprocessing.MinMaxScaler()
x_scaled = min_max_scaler.fit_transform(x)
dataset = pd.DataFrame(x_scaled)
dataset["Your 53rd column name"] = df.iloc[53]
0
Maninder $ingh hello, I appreciate your effort for helping out but the code your posted results in final two columns getting dropped out of the dataframe. I don't want to drop any columns. I need all 53 of them but I simply don't want to normalise the last 53rd column.
0
Amarjeet Singh how are you applying the MinMax normalization? Share your code, otherwise it will be more difficult to help you.
0
Nope. Doesn't work. Says, "cannot copy index"
0
It says "cannot reindex from a duplicate axis"
0
X = df.iloc[:,0:53]
from sklearn.preprocessing import MinMaxScaler
min_max_scaler = MinMaxScaler()
X_scaled = min_max_scaler.fit_transform(X)
df_norm = pd.DataFrame(X_scaled)
df_norm ["name of 53rd column"] = df["name of 53rd column"]
This will work