+ 1

Decision Tree Score seems overfitted

I have a large dataset and a label column. I try to use from sklearn.tree import DecisionTreeClassifier to make a tree and score it using .score(x,y). But before scoring the accuracy, I need to extract the label from the dataset and encode the remaining entire dataset to boolean using get_dummies(). After doing all these things, it seems overfitted because I get 100 accuracy scores. No matter how I change things in it, it always gives me 100 accuracy score. Is it normal?

python decisiontree sklearn

2nd Sep 2017, 7:51 PM

Sura Wankam

6 odpowiedzi

+ 4

Hmm.. decision trees tend to have really high scores, but in *all* cases it is surely indicating an overfitting. Could you share the code? Is the dataset split to train/validate/test? Maybe you should shuffle them or make a proper cross-validation?

2nd Sep 2017, 8:05 PM

Kuba Siekierzyński

+ 4

I'll add a comment in the code section in a while..

2nd Sep 2017, 8:49 PM

Kuba Siekierzyński

+ 1

Here is my code. if you have the dataset, you will see that the accuracy score is always 100. The accuracy score seems very abnormal to me. https://code.sololearn.com/cLlY2KmwlZr5/?ref=app

2nd Sep 2017, 8:17 PM

Sura Wankam

+ 1

The model is definitely overfitted as 100% percent accuracy for large datasets is not possible. You can fix it by: 1) Pruning 2) Using a different classifier

27th Dec 2022, 1:22 PM

Omanshu

For the train and split, I have used from sklearn.model_selection import train_test_split. I used this after doing get_dummies. It seems there is no problem there. There is really no need to do any more cross-validation. For my code, please wait for a while. I have to insert it. Maybe you need the dataset too, could you please send me your email or something I can use to send the file to you.

2nd Sep 2017, 8:11 PM

Sura Wankam

Ah... I forget a thing. You can download the dataset from the website I commented in the code. Also, when I try other datasets using import sklearn.datasets, I always get 100% accuracy score. That's why I have to ask.

2nd Sep 2017, 8:20 PM

Sura Wankam