+ 1
Decision Tree Score seems overfitted
I have a large dataset and a label column. I try to use from sklearn.tree import DecisionTreeClassifier to make a tree and score it using .score(x,y). But before scoring the accuracy, I need to extract the label from the dataset and encode the remaining entire dataset to boolean using get_dummies(). After doing all these things, it seems overfitted because I get 100 accuracy scores. No matter how I change things in it, it always gives me 100 accuracy score. Is it normal?
6 Answers
+ 4
Hmm.. decision trees tend to have really high scores, but in *all* cases it is surely indicating an overfitting. Could you share the code? Is the dataset split to train/validate/test? Maybe you should shuffle them or make a proper cross-validation?
+ 4
I'll add a comment in the code section in a while..
+ 1
Here is my code. if you have the dataset, you will see that the accuracy score is always 100. The accuracy score seems very abnormal to me.
https://code.sololearn.com/cLlY2KmwlZr5/?ref=app
+ 1
The model is definitely overfitted as 100% percent accuracy for large datasets is not possible.
You can fix it by:
1) Pruning
2) Using a different classifier
0
For the train and split, I have used from sklearn.model_selection import train_test_split. I used this after doing get_dummies. It seems there is no problem there. There is really no need to do any more cross-validation. For my code, please wait for a while. I have to insert it. Maybe you need the dataset too, could you please send me your email or something I can use to send the file to you.
0
Ah... I forget a thing. You can download the dataset from the website I commented in the code. Also, when I try other datasets using import sklearn.datasets, I always get 100% accuracy score. That's why I have to ask.