+ 3

How to properly use MLPClassifier?

In the machine learning tutorial, MLPClassifier is used with randomly generated numbers. The entire thing seems random. They skimmed over it and I don't understand. Where's the input data? Where's the output data? It isn't a neural network without them right? You just gave it random numbers, trained it and it gave a random percentage. I don't get what is happening. The random numbers didn't have an expected output, meaning the entire thing was random and useless. How do you give input data, and output data to the classifier? Otherwise it's useless.

19th Jul 2020, 12:39 PM
Clueless Coder
Clueless Coder - avatar
8 Answers
+ 1
Clueless Coder I took SciKit Learns builtin "breast cancer dataset" and made a quick MLP example. https://code.sololearn.com/cR4156l00Nsl/?ref=app
19th Jul 2020, 6:38 PM
Steven M
Steven M - avatar
+ 2
Steven M Sorry, been revisiting the tutorial and have more questions 😅. First, why do we need 2 sets of training/test data? (xTest, yTest etc). Second, you know how we have X = [[]] Y = [] Why does X have to be a 2 dimensional array and the opposite for Y?
19th Jul 2020, 7:55 PM
Clueless Coder
Clueless Coder - avatar
+ 2
Steven M Thanks for the answer!
19th Jul 2020, 8:31 PM
Clueless Coder
Clueless Coder - avatar
+ 1
Steven M Wow! This is awesome! You're great at this. I'm struggling to wrap my head around the training/test data for the MLPClassifier though. How do you give custom input data without using train_test_split()?
19th Jul 2020, 7:20 PM
Clueless Coder
Clueless Coder - avatar
+ 1
Clueless Coder great question, the "train_test_split()" function uses "tuple unpacking" to make it easy to assign X_train, X_test, y_train, y_test, set test_size, etc. However you can set these variables manually X_train= X_test= y_train= y_test=
19th Jul 2020, 7:42 PM
Steven M
Steven M - avatar
+ 1
Clueless Coder another great question. FIRST: So generally speaking, you would have 2 separate files, typically in csv, xls, pkl, or similar format. You would train your models with the training dataset and you would predict off of the testing dataset, typically called the "holdout" dataset. Having 2 separate files like this will help determine if your model is working properly. If your models work great with the training dataset and you then give it the holdout dataset, and everything fails, this is typically a sign that something is wrong with your models and they will need adjusting. Checkout "Overfitting vs Underfitting" and "Cross Validation" for more ways to handle issues like these. SECOND: The y variable is always your "target" or the column for which you are predicting for both train and test. The X variable is everything else in the dataset, minus our target. We essentially tell the models: "We want to predict for y, use all of X to get the predictions."
19th Jul 2020, 8:28 PM
Steven M
Steven M - avatar
+ 1
A great article well worth a read 👍👍 https://www.datarobot.com/wiki/training-validation-holdout/
19th Jul 2020, 8:33 PM
Steven M
Steven M - avatar
+ 1
Steven M Thanks so much for the help and the article. I remade the original code, just using ints instead of strings. Success? https://code.sololearn.com/cHQJfAHKAHuW/?ref=app Don't know if it's actually predicting anything as I'm still a bit confused about behind the scenes
19th Jul 2020, 8:36 PM
Clueless Coder
Clueless Coder - avatar