+ 2
Machine learning model debugging
I have developed a model for hand written character recognition using pytorch and mltu. The dataset used is kaggles IAM Dataset. I have been trying to train the model for the past 7 days. There is some issue after completing the epochs and unable to complete it. While in the logs it showed some errors I corrected it. Still it gives an error. Can anyone please help me in debugging this code. I have attached it below. I tried it to run in both Google colab and kaggle environment using GPU. https://sololearn.com/compiler-playground/cC662yAm9rpt/?ref=app
15 Answers
0
Aysha I think I might have found out what the problem is, because the dataset from Kaggle doesn't contains any labels, but on the official website of the dataset, there should be labels as part of the dataset, but it seems you need to register on their website to get the full dataset that also includes labels.
That could be the reason why the loss function doesn't work in your model.
+ 3
Yeah Jan but I had to submit it for the internship but it's ok what can I do if its not getting resolved. Anyways thanks for suggestions.
+ 2
Yeah that's the reason I had used kaggles Tesla p100 GPU for training the datasets
+ 2
Jan Even though I tried using the real dataset the issue didn't get resolved.
+ 1
Sorry, I can't help with PyTorch and mltu. If you made it using Tensorflow, Keras or Scikit-Learn , I could help.
But You can seek help from chatGPT. Also Google Colab's AI co-pilot shows the issue for bugs and exceptions.
+ 1
Yeah it's ok Captain Thunder ⚡ I had even used tensorflow, keras and scikit learn. The problem occurred with CTC loss using lambda and failing to load the model. I changed it to without lambda but failed to decrease the loss. Chat gpt also couldn't help in this scenario. So I tried using a pytorch which is pretty good in training and reducing the loss but in the end without any warning it shows a sad emoji and is still running in the background. After thorough check model2onnx was responsible then after removing it using pure pytorch to save the model it still didn't work neither in the kaggle nor the Google colab. Thanks for your suggestion I will try to resolve the error.
+ 1
Yes Jan in my model training I have used both pytorch and mltu. Mltu has the same function as tensorflow which helps to train the model in an easy manner altogether with pytorch. I even used a normal pytorch but again I was stuck with CTC loss. I don't know what's wrong with the way of using CTC. There is no built-in CTC in pytorch and when custom created it gives shapes incompatibility.
+ 1
Thank you Jan I will surely download the dataset from the official website and try to train the model. Yeah it's TXT files but those are lines of code with file name, status and many more so it doesn't need NLP. I will let you know once my model gets successfully trained.
0
I had a look at that dataset from Kaggle. Are you actually aware how huge those images are and how many features every image contains. You need a super computer and a bunch of super GPUs!
0
As far as I can see, you are training the model the same way as in tensorflow with model.fit. Why are you using that method, since pytorch normally use model.train?
0
Aysha That's because in pytorch, you have to adjust the loss manually, and that can be little bit tricky.
0
Jan yeah thanks for the clarification. I tried to do it but failed with the shape incompatibility. I guess there are other issues which I am not able to figure out. It says to alter built-in functions which is not feasible.
0
Aysha It also seems you should use an nlp model if the labels are txt files.
0
Aysha I look forward to that💪👍
0
Aysha That's okay, because it's also a heavy task with those images.