Encoder-Decoder Architecture


1 Like

first of all im a total beginner at this. I am doing a project and I will explain what i want to do for the LSTM architecture. It would be helpful if i could get some assistance in doing so.

As mentioned in the architecture description, the LSTM can take two inputs, one being the output vector which comes from the encoder and an additional input. How can I make this optional input to condition the output to be provided?

  1. Its an image captioning project where in the encoder takes the image and encodes it into a vector.
  2. I want the LSTM(decoder), to take the image vector plus another vector(may be one hot-encoded), and output the caption based on the aditional vector. Something like, i want to condition the network.
    Ecample; if i pass image+ [0 1], i should get romantic caption. If i pass image + [1 0], i should get another style(say witty) caption as output…#
    How can i do that using en_decoder architecture?