+ 1
Python logistic regression with sklearn.linear_model
Hello, Am using this model to reach predictions from a big database. My dataframe has 60 columns and growing. Have noticed that some of new columns contribute negatively to final scores. Have also noticed that table order also influences final scores. How can I reach the optimum table order and containing the best data columns to reach the best possible final score. Thank you. Filipe Santos
13 odpowiedzi
+ 2
Here you will have to apply feature reduction techniques to increase the performance of your model. You can start with PCA. It stands for Principal Component Analysis.
+ 1
To the point Peter Parker is making about PCA, it might also be beneficial to Correlate the columns and/or use Feature Engineering Techniques, Feature Importance, etc. Good Luck 👍
+ 1
Thank you for your answers.
Am very new to all this stuff...
Learned ML from here few weeks ago and am applying it to some needs of mine.
So, based on all your usefull information will google for it and try to apply it in my code.
Thank you once again.
FS
+ 1
Hey again,
Tried PCA techniques but got many erros while trying to adapt my code.
Then started to apply Feature Selection techniques, wich seem more directed to what I need, but again some odd errors arrise.
Am using SelectKBest and chi2 from sklearn and getting this error
"ValueError: Input X must be non-negative."
But there is no negative values in my database... Am not getting what it means.
Any clues?
Thank you.
FS
+ 1
It is tough to say anything without seeing the code and dataset. It will be better if you can share these.
+ 1
Try encoding the data using Label Encoder or One Hot Encoder. You might also benefit from Min/Max Scaler. However, you should do these things before you run them through K-Fold, Chi-Squared, Fit, Predict.
https://code.sololearn.com/cHHQ2iGB72ai/?ref=app
+ 1
Thank you to all.
I think I have solved the case for now.
Have followed instructions from this link;
https://machinelearningmastery.com/feature-selection-machine-learning-JUMP_LINK__&&__python__&&__JUMP_LINK/
Specifically; 2. Recursive Feature Elimination
It does just what I wanted.
Thank you. You guys rock. Keep it up.
FS
+ 1
Filipe Santos good deal, thats awesome and for what is worth, the Machine Learning Mastery website is fantastic, love that site 😀 👍
+ 1
Yes, it seems a good learning place.
Anyways, it solved half of my issues (not a case closed yet).
Now need to find a way to get the right data columns order.
It also influences the final scores.
FS.
+ 1
Filipe Santos have you checked out GridSearch, it may help with this... https://medium.com/data-science-reporter/feature-selection-via-grid-search-in-supervised-models-4dc0c43d7ab1
+ 1
Hi again,
Have another basic question.
In another model, need to get predictions with target ranging from 0 to 10. Can I use logistic regression here? Does it give me good predictions anyways?
Thank you.
FS
+ 1
Maybe, if you kind find a clever way to convert the integers 0-10 into a binary target... Logistic Regression is typically done on Discrete Targets or targets with binary results, like 0 or 1, True or False, Predicting Titanic Survivors or predicting Cancerous Tumors, Malignant/Benign, etc. Logistic Regression typically produces a Sigmoid Curve.
https://medium.com/greyatom/logistic-regression-89e496433063
I like to use SciKit Learns Cheat Sheet, it helps me...
https://scikit-learn.org/stable/tutorial/machine_learning_map/index.html
0
Yes, understand that.
I'd figured that since logistic regression is intended for category kind of targets.
Anyways, integers 0-10 converted to binary its easy (1010) but that's no suitable for a target. But if you make it as 10 targets (its a 0, yes/no, its a 1, yes/no....) that way you could get a good option, but way too many targets in the end. :D
Thanks for that SciKit Learns Cheat Sheet input. This is very helpful.
FS