I am trying to fine tune a pre-trained awd_lstm_lm_1150
Language Model to fit a dataset of my own, building on this tutorial.
Here is how the original model looks like when loaded from the gluonnlp
zoo:
dataset_name = 'wikitext-2'
awd_model_name = 'awd_lstm_lm_1150'
awd_model, voc = nlp.model.get_model(
awd_model_name,
vocab=vocab,
dataset_name=dataset_name,
pretrained=True)
print(awd_model)
print(voc)
>>> AWDRNN(
(embedding): HybridSequential(
(0): Embedding(33278 -> 400, float32)
(1): Dropout(p = 0.65, axes=(0,))
)
(encoder): Sequential(
(0): LSTM(400 -> 1150, TNC)
(1): LSTM(1150 -> 1150, TNC)
(2): LSTM(1150 -> 400, TNC)
)
(decoder): HybridSequential(
(0): Dense(400 -> 33278, linear)
)
)
>>> Vocab(size=33278, unk="<unk>", reserved="['<eos>']")
As my own vocab
is of len
1031, I’d like to either
- edit the last
Dense
layer to output 1031 classes - add an additional
Dense
layer on top, like the following:nn.Dense(in_units=33278, units=1031)
I cannot seem to figure out how to achieve #1.
As for #2 (less optimal option by definition) I declared a Sequential
model, like this:
net = nn.Sequential()
net.add(awd_model)
net.add(nn.Dense(in_units=33278, units=1031))
but then the training process errors out as the Sequential
object lacks several attributes the original gluonnlp.model.language_model.AWDRNN
object featured.
For instance, running the train
function from the tutorial linked on top, I get:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-31-4b4b6bfe8cc9> in <module>()
----> 1 train(net, farewell_train_data, epochs, lr=0.01)
<ipython-input-29-7991a5aa7704> in train(model, train_data, epochs, lr)
22 start_log_interval_time = time.time()
23 hiddens = [model.begin_state(batch_size//len(context), func=mx.nd.zeros, ctx=ctx)
---> 24 for ctx in context]
25 for i, (data, target) in enumerate(train_data):
26 data_list = gluon.utils.split_and_load(data, context,
<ipython-input-29-7991a5aa7704> in <listcomp>(.0)
22 start_log_interval_time = time.time()
23 hiddens = [model.begin_state(batch_size//len(context), func=mx.nd.zeros, ctx=ctx)
---> 24 for ctx in context]
25 for i, (data, target) in enumerate(train_data):
26 data_list = gluon.utils.split_and_load(data, context,
AttributeError: 'Sequential' object has no attribute 'begin_state'
Do you guys have any clue?