I have images of handwritten lines and I need to recognize the text in those images. For that, I am using 4 CNN layers followed by 2 bi-lstm layers and using ctc loss function.
I am using MXNet Gluon to do this.
I am doing word embedding on labels using ‘mxnet.contrib.text.embedding’ and pretrained ‘fastText word embedding’. I am getting vectors for each label of size (n, 300) where n is number of words in that label(line) and 300 is the length of the embedding vector. I am padding the vector into a fixed size (Seq_len, 300) where seq_len = 100. Then I am getting a vector of shape (100, 300) for every label.
But when I fed the labels to the model for training, I am getting an error saying “label array must be of rank 2 but got 3”. Then I flattened the labels but got another error saying “number of labels should be <= sequence length”
Is my approach correct? Please help me soving this issue.
Please find the attached screenshot for the code that I used to create word embeddings for labels.