+ 2

Machine learning model debugging

I have developed a model for hand written character recognition using pytorch and mltu. The dataset used is kaggles IAM Dataset. I have been trying to train the model for the past 7 days. There is some issue after completing the epochs and unable to complete it. While in the logs it showed some errors I corrected it. Still it gives an error. Can anyone please help me in debugging this code. I have attached it below. I tried it to run in both Google colab and kaggle environment using GPU. https://sololearn.com/compiler-playground/cC662yAm9rpt/?ref=app

25th Nov 2024, 6:30 PM
Aysha
Aysha - avatar
15 Answers
0
Aysha I think I might have found out what the problem is, because the dataset from Kaggle doesn't contains any labels, but on the official website of the dataset, there should be labels as part of the dataset, but it seems you need to register on their website to get the full dataset that also includes labels. That could be the reason why the loss function doesn't work in your model.
27th Nov 2024, 10:09 AM
Jan
Jan - avatar
+ 3
Yeah Jan but I had to submit it for the internship but it's ok what can I do if its not getting resolved. Anyways thanks for suggestions.
30th Nov 2024, 6:51 PM
Aysha
Aysha - avatar
+ 2
Yeah that's the reason I had used kaggles Tesla p100 GPU for training the datasets
26th Nov 2024, 2:06 PM
Aysha
Aysha - avatar
+ 2
Jan Even though I tried using the real dataset the issue didn't get resolved.
30th Nov 2024, 4:57 PM
Aysha
Aysha - avatar
+ 1
Sorry, I can't help with PyTorch and mltu. If you made it using Tensorflow, Keras or Scikit-Learn , I could help. But You can seek help from chatGPT. Also Google Colab's AI co-pilot shows the issue for bugs and exceptions.
27th Nov 2024, 4:40 AM
Captain Thunder ⚡
Captain Thunder ⚡ - avatar
+ 1
Yeah it's ok Captain Thunder ⚡ I had even used tensorflow, keras and scikit learn. The problem occurred with CTC loss using lambda and failing to load the model. I changed it to without lambda but failed to decrease the loss. Chat gpt also couldn't help in this scenario. So I tried using a pytorch which is pretty good in training and reducing the loss but in the end without any warning it shows a sad emoji and is still running in the background. After thorough check model2onnx was responsible then after removing it using pure pytorch to save the model it still didn't work neither in the kaggle nor the Google colab. Thanks for your suggestion I will try to resolve the error.
27th Nov 2024, 5:07 AM
Aysha
Aysha - avatar
+ 1
Yes Jan in my model training I have used both pytorch and mltu. Mltu has the same function as tensorflow which helps to train the model in an easy manner altogether with pytorch. I even used a normal pytorch but again I was stuck with CTC loss. I don't know what's wrong with the way of using CTC. There is no built-in CTC in pytorch and when custom created it gives shapes incompatibility.
27th Nov 2024, 6:59 AM
Aysha
Aysha - avatar
+ 1
Thank you Jan I will surely download the dataset from the official website and try to train the model. Yeah it's TXT files but those are lines of code with file name, status and many more so it doesn't need NLP. I will let you know once my model gets successfully trained.
27th Nov 2024, 6:21 PM
Aysha
Aysha - avatar
0
I had a look at that dataset from Kaggle. Are you actually aware how huge those images are and how many features every image contains. You need a super computer and a bunch of super GPUs!
26th Nov 2024, 1:29 PM
Jan
Jan - avatar
0
As far as I can see, you are training the model the same way as in tensorflow with model.fit. Why are you using that method, since pytorch normally use model.train?
27th Nov 2024, 6:52 AM
Jan
Jan - avatar
0
Aysha That's because in pytorch, you have to adjust the loss manually, and that can be little bit tricky.
27th Nov 2024, 8:13 AM
Jan
Jan - avatar
0
Jan yeah thanks for the clarification. I tried to do it but failed with the shape incompatibility. I guess there are other issues which I am not able to figure out. It says to alter built-in functions which is not feasible.
27th Nov 2024, 8:53 AM
Aysha
Aysha - avatar
0
Aysha It also seems you should use an nlp model if the labels are txt files.
27th Nov 2024, 10:19 AM
Jan
Jan - avatar
0
Aysha I look forward to that💪👍
27th Nov 2024, 7:33 PM
Jan
Jan - avatar
0
Aysha That's okay, because it's also a heavy task with those images.
30th Nov 2024, 6:05 PM
Jan
Jan - avatar