Automate X = df.iloc[:,[7+i[0],7+i[1],...]]

Question

Hi,

How can I automate the second code line dependant on len(i) in a FOR or so...
Want to define i List, but every time some values are included or taken out from list, need to redefine the "X=df.iloc" 2d list.... how can I avoid that?

i = [0,1,3,6,13,14,15,16,17,18,19,20,21,23,24,26,30,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58]

X = df.iloc[:,[7+i[0],7+i[1],7+i[2],7+i[3],7+i[4],7+i[5],7+i[6],7+i[7],7+i[8],7+i[9],
                   7+i[10],7+i[11],7+i[12],7+i[13],7+i[14],7+i[15],7+i[16],7+i[17],7+i[18],7+i[19],
                   7+i[20],7+i[21],7+i[22],7+i[23],7+i[24],7+i[25],7+i[26],7+i[27],7+i[28],7+i[29],
                   7+i[30],7+i[31],7+i[32],7+i[33],7+i[34],7+i[35],7+i[36],7+i[37],7+i[38],7+i[39],
                   7+i[40],7+i[41],7+i[42],7+i[43] ]]

Thank you.

FS

Answer

Filipe Santos , can you please include a tag with programming language? Thanks!

Answer

I think I understand, maybe try a list comprehension?...

[x+7 for x in i if x+7 <= len(i)]

It would be easier to help if you shared some of the code that is giving you trouble

Answer

Hey,

Lothar; Done (Python). Thanks for the advice.
Steven M, Thank you very much. Will try that code.
Thank you very much.

FS.

Answer

This is an interesting issue, I am still not sure I understand completely. However, typically the Target is a single column, while the remaining DataFrame are the Features used to Predict. I imagine the dataset you have would need some manipulation in order to obtain a single column Target, but then the remaining DataFrame would need to have the same number of Rows. So, I see a mismatch of Row & Column shapes being an issue in the future. You could try to Merge or Concat the columns so you have a single column Target, but remember to do the same transformation to the remaining DataFrame so your Target and Features have the same shape.

This is a Logistic Regression prediction done with KFold Cross Validation using the Breast Cancer Dataset

https://code.sololearn.com/cHHQ2iGB72ai/?ref=app

Answer

My last attempt :)  
I am out of ideas...

https://code.sololearn.com/cDR07MF9tE0w/?ref=app

Answer

Hey Steven, thanks anyway.

I tried this line;
[x+7 for x in i if x+7 <= len(df_features.columns)]

as
[x+7 for x in i if x+7 <= len(i)]

But it gives me this error
" ValueError: at least one array or dtype is required "

Will try to adapt all code based on your sample.
It defines Features and Targets in a different way, but maybe it works.
Your code works ok.

Thank you very much.

FS

Answer

Steven, that code is not working as I needed.
Maybe I describe a little more my needs.
I have a dataframe (df) of 66 columns, from wich first 7 comuns are my targets, thats why i want to arranje my X from column/position 7 till 66.
My list i[n], specifies what columns to be used in regression (from df 7th column till 66th) but in this list I use index starting from 0 and then add 7.
So, in order to define my X dataframe to be used in regression, need to code as in my first post. Just wanted to find a wait to automate it.
Other wise need to add/ take out the 7+i[n] elements so it matches how many items in list i[n]... 
Not sure if explained well enough. :D
Thank you once more.

PS:
Forgot to post that your code results in this error;
" ValueError: X has 44 features per sample; expecting 29 "
Maybe because i list has 44 items and X=df.iloc[:,[7+i[0], ... need to have all untill ,7+i[43] ]]

FS

Answer

Steven, thanks for your efforts while trying to understand my issue.
Let me try to clarify it as much as possible.
My DataFrame has 66 columns and first seven ones will be my targets.
Will do predictions on each one (out of the 7 ones) at a time.
So, columns 7th till the 66th are my features. But don't want to use all the feature columns. Am using list i[n] to specify what features to use.
After that need to adapt the X to these features (same to the pred list, but this one is easier to manipulate). The issue here is to define my X to the features in i[n] list.
The code in first post works, but if I add an extra feature to the i[n] list will have to include another element in X=df.iloc[[]] 2d.... and I want to make it automated dependant on len(i)...
Hope this way you can better understand what I need to do.

FS

Answer

I think I understand a little better, you want to test several Targets individually. Try splitting your DataFrame into 2 different DataFrames. 1 consisting of Targets and the other consisting of Features, then iterate through your Targets. The other thing I can think of would be to use a Pipeline.

https://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html

Does something like this help?...

https://code.sololearn.com/cDR07MF9tE0w/?ref=app

Answer

Hey Steven, thanks again.
My problem is not the targets.
Its the Features.
My code reads all DataFrame and I want to manipulate what features will be used in the regression calculus.
Let me post some more of my code, maybe that way you will get the hole picture.

df = pd.read_csv(Path + FileName)

X = df[['NumConc','3F6F',... ,'TOT_SunQ-LI190','MAX_SunQ-I190']].values

y = df['N1'].values

pred = [[62,3,2020,8,31,4,... ,104.367,649.7,339.66,335.79]]

With this first lines of code, I get the "df" with all columns, "X" with all Features, "y" with actual Target to be used and "pred" with values for the prediction.

But I want to use only some of the features... thats why i use;

i = [0,1,3,6,13,14,15,16,17,18,19,20,21,23,24,26,30,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58]

This way I specify what features to be used from all imported ones.
So, I remake my X to use only what i[n] has;

X = df.iloc[:,[7+i[0],7+i[1],7+i[2],7+i[3],7+i[4],7+i[5],7+i[6],7+i[7],7+i[8],7+i[9],                 7+i[10],7+i[11],7+i[12],7+i[13],7+i[14],7+i[15],7+i[16],7+i[17],7+i[18],7+i[19],7+i[20],7+i[21],7+i[22],7+i[23],7+i[24],7+i[25],7+i[26],7+i[27],
7+i[28],7+i[29],7+i[30],7+i[31],7+i[32],7+i[33],7+i[34],7+i[35],7+i[36],7+i[37],7+i[38],7+i[39],7+i[40],7+i[41],7+i[42],7+i[43] ]]

for m in range(len(i)):
     p[m] = pred[0][i[m]]

pred = [[p[0],p[1],p[2],p[3],p[4],p[5],p[6],p[7],p[8],p[9],p[10],p[11],p[12],p[13],p[14],p[15],p[16],p[17],p[18],p[19],p[20],p[21],p[22],p[23],p[24],p[25],p[26],p[27],p[28],p[29],p[30],p[31],p[32],p[33],p[34],p[35],p[36],p[37],p[38],p[39],p[40],p[41],p[42],p[43] ]]

Now, if I take out or add new features in i[n], then need to change X and pred... and that's what i wanted to make automatic in these 2 last code lines.

Sorry to insist in this issue, but now its also a challange to me, making you understand my problem. :D

FS

Answer

I think we are going the long way of a short course.
In simple terms what I need is like this;

if len(i) = 1
then X = df.iloc[: , [7+i[0] ]]

if len(i) = 2
then X = df.iloc[: , [7+i[0], 7+i[1] ]]

and so on....
It should there be an easy way to automate this...

Thank you and sorry for all the mess about this subject, as it could have been put in simpler means before :D

FS

Answer

No ideas here?

FS

Automate X = df.iloc[:,[7+i[0],7+i[1],...]]

Häufig solche Fragen?