I implemented CNN LSTM model for text recognition in images. I am extracting image features with CNN and the extracted features are given to LSTM layer. When I trained the model with images of same size (128, 1600), it is doing well. But when i tried to train the model with images of different size, i am getting the following error:
AssertionError: Expected shape (800, 4000) is incompatible with given shape (800, 16384).
I am getting this error at LSTM. With image of size (128, 1600), the shape of the CNN output is (Batch_size, 32, 64, 800). I flattened this which gives (Batch_size, 1638400) and made 100 (sequence_length) splits along axis 1. The resultant ndarray of size (100, Batch_size, 16384) is sent to LSTM.
As the LSTM weights are getting initialized in the first forward pass, when the first image is of size (128, 1600), the weights are getting initialized with the size (800, 16384) and when I am trying to give image of different size, I am getting the above error.
Here 800 is: 2 (bidirectional) * 2 (Num LSTM layers) * 200 (LSTM Hidden Units)
How to resolve this issue and make LSTM handle with images of different sizes.
Any suggestions will be helpful.
Thanks in advance,