+ 1

Python logistic regression with sklearn.linear_model

Hello, Am using this model to reach predictions from a big database. My dataframe has 60 columns and growing. Have noticed that some of new columns contribute negatively to final scores. Have also noticed that table order also influences final scores. How can I reach the optimum table order and containing the best data columns to reach the best possible final score. Thank you. Filipe Santos

8th Aug 2020, 11:14 AM
Filipe Santos
13 Respostas
+ 2
Here you will have to apply feature reduction techniques to increase the performance of your model. You can start with PCA. It stands for Principal Component Analysis.
8th Aug 2020, 12:51 PM
Peter Parker
Peter Parker - avatar
+ 1
To the point Peter Parker is making about PCA, it might also be beneficial to Correlate the columns and/or use Feature Engineering Techniques, Feature Importance, etc. Good Luck šŸ‘
8th Aug 2020, 1:32 PM
Steven M
Steven M - avatar
+ 1
Thank you for your answers. Am very new to all this stuff... Learned ML from here few weeks ago and am applying it to some needs of mine. So, based on all your usefull information will google for it and try to apply it in my code. Thank you once again. FS
8th Aug 2020, 2:22 PM
Filipe Santos
+ 1
Hey again, Tried PCA techniques but got many erros while trying to adapt my code. Then started to apply Feature Selection techniques, wich seem more directed to what I need, but again some odd errors arrise. Am using SelectKBest and chi2 from sklearn and getting this error "ValueError: Input X must be non-negative." But there is no negative values in my database... Am not getting what it means. Any clues? Thank you. FS
8th Aug 2020, 4:08 PM
Filipe Santos
+ 1
It is tough to say anything without seeing the code and dataset. It will be better if you can share these.
8th Aug 2020, 4:44 PM
Peter Parker
Peter Parker - avatar
+ 1
Try encoding the data using Label Encoder or One Hot Encoder. You might also benefit from Min/Max Scaler. However, you should do these things before you run them through K-Fold, Chi-Squared, Fit, Predict. https://code.sololearn.com/cHHQ2iGB72ai/?ref=app
8th Aug 2020, 6:09 PM
Steven M
Steven M - avatar
+ 1
Thank you to all. I think I have solved the case for now. Have followed instructions from this link; https://machinelearningmastery.com/feature-selection-machine-learning-JUMP_LINK__&&__python__&&__JUMP_LINK/ Specifically; 2. Recursive Feature Elimination It does just what I wanted. Thank you. You guys rock. Keep it up. FS
8th Aug 2020, 7:25 PM
Filipe Santos
+ 1
Filipe Santos good deal, thats awesome and for what is worth, the Machine Learning Mastery website is fantastic, love that site šŸ˜€ šŸ‘
8th Aug 2020, 7:30 PM
Steven M
Steven M - avatar
+ 1
Yes, it seems a good learning place. Anyways, it solved half of my issues (not a case closed yet). Now need to find a way to get the right data columns order. It also influences the final scores. FS.
8th Aug 2020, 7:44 PM
Filipe Santos
8th Aug 2020, 11:38 PM
Steven M
Steven M - avatar
+ 1
Hi again, Have another basic question. In another model, need to get predictions with target ranging from 0 to 10. Can I use logistic regression here? Does it give me good predictions anyways? Thank you. FS
9th Aug 2020, 11:17 AM
Filipe Santos
+ 1
Maybe, if you kind find a clever way to convert the integers 0-10 into a binary target... Logistic Regression is typically done on Discrete Targets or targets with binary results, like 0 or 1, True or False, Predicting Titanic Survivors or predicting Cancerous Tumors, Malignant/Benign, etc. Logistic Regression typically produces a Sigmoid Curve. https://medium.com/greyatom/logistic-regression-89e496433063 I like to use SciKit Learns Cheat Sheet, it helps me... https://scikit-learn.org/stable/tutorial/machine_learning_map/index.html
9th Aug 2020, 12:19 PM
Steven M
Steven M - avatar
0
Yes, understand that. I'd figured that since logistic regression is intended for category kind of targets. Anyways, integers 0-10 converted to binary its easy (1010) but that's no suitable for a target. But if you make it as 10 targets (its a 0, yes/no, its a 1, yes/no....) that way you could get a good option, but way too many targets in the end. :D Thanks for that SciKit Learns Cheat Sheet input. This is very helpful. FS
10th Aug 2020, 7:25 AM
Filipe Santos