I’m following the training tutorial here: https://gluon-nlp.mxnet.io/examples/word_embedding/word_embedding_training.html
But using my own custom dataset (space-separated and cleaned).
The error I am getting is:
Beginnign epoch 1 and resampling data.
---------------------------------------------------------------------------
TypingError Traceback (most recent call last)
in
----> 1 train_embedding(num_epochs=5)
<ipython-input-37-98c218d225ec> in train_embedding(num_epochs)
8
9 print('Beginnign epoch %d and resampling data.' % epoch)
---> 10 for i, batch in enumerate(batches):
11 batch = [array.as_in_context(context) for array in batch]
12 with mx.autograd.record():
~/anaconda3/envs/mxnet/lib/python3.7/site-packages/gluonnlp/data/stream.py in _closure()
120 istuple = isinstance(item, tuple)
121 if istuple:
--> 122 yield self._fn(*item)
123 while True:
124 try:
/Volumes/archive/deardenlab/guhlin/kmer_vec_embed/data.py in cbow_fasttext_batch(centers, contexts, num_tokens, subword_lookup, dtype, index_dtype)
324 """Create a batch for CBOW training objective with subwords."""
325 _, contexts_row, contexts_col = contexts
--> 326 data, row, col = subword_lookup(contexts_row, contexts_col)
327 centers = mx.nd.array(centers, dtype=index_dtype)
328 contexts = mx.nd.sparse.csr_matrix(
~/anaconda3/envs/mxnet/lib/python3.7/site-packages/numba/dispatcher.py in _compile_for_args(self, *args, **kws)
374 e.patch_message(msg)
375
--> 376 error_rewrite(e, 'typing')
377 except errors.UnsupportedError as e:
378 # Something unsupported is present in the user code, add help info
~/anaconda3/envs/mxnet/lib/python3.7/site-packages/numba/dispatcher.py in error_rewrite(e, issue_type)
341 raise e
342 else:
--> 343 reraise(type(e), e, None)
344
345 argtypes = []
~/anaconda3/envs/mxnet/lib/python3.7/site-packages/numba/six.py in reraise(tp, value, tb)
656 value = tp()
657 if value.__traceback__ is not tb:
--> 658 raise value.with_traceback(tb)
659 raise value
660
TypingError: Failed in nopython mode pipeline (step: nopython frontend)
Invalid use of Function(<built-in function getitem>) with argument(s) of type(s): (array(int64, 1d, C), float64)
* parameterized
In definition 0:
All templates rejected with literals.
In definition 1:
All templates rejected without literals.
In definition 2:
All templates rejected with literals.
In definition 3:
All templates rejected without literals.
In definition 4:
All templates rejected with literals.
In definition 5:
All templates rejected without literals.
In definition 6:
All templates rejected with literals.
In definition 7:
All templates rejected without literals.
In definition 8:
All templates rejected with literals.
In definition 9:
All templates rejected without literals.
In definition 10:
TypeError: unsupported array index type float64 in [float64]
raised from /Volumes/userdata/staff_users/josephguhlin/anaconda3/envs/mxnet/lib/python3.7/site-packages/numba/typing/arraydecl.py:71
In definition 11:
TypeError: unsupported array index type float64 in [float64]
raised from /Volumes/userdata/staff_users/josephguhlin/anaconda3/envs/mxnet/lib/python3.7/site-packages/numba/typing/arraydecl.py:71
This error is usually caused by passing an argument of a type that is unsupported by the named function.
[1] During: typing of intrinsic-call at /Volumes/archive/deardenlab/guhlin/kmer_vec_embed/data.py (481)
File "data.py", line 481:
def cbow_lookup(context_row, context_col, subwordidxs, subwordidxsptr,
<source elided>
for i, idx in enumerate(context_col):
start = subwordidxsptr[idx]
^
This is not usually a problem with Numba itself but instead often caused by
the use of unsupported features or an issue in resolving types.
To see Python/NumPy features supported by the latest release of Numba visit:
http://numba.pydata.org/numba-doc/latest/reference/pysupported.html
and
http://numba.pydata.org/numba-doc/latest/reference/numpysupported.html
For more information about typing errors and how to debug them visit:
http://numba.pydata.org/numba-doc/latest/user/troubleshoot.html#my-code-doesn-t-compile
If you think your code should work with Numba, please report the error message
and traceback, along with a minimal reproducer at:
https://github.com/numba/numba/issues/new
This is not usually a problem with Numba itself but instead often caused by
the use of unsupported features or an issue in resolving types.
To see Python/NumPy features supported by the latest release of Numba visit:
http://numba.pydata.org/numba-doc/latest/reference/pysupported.html
and
http://numba.pydata.org/numba-doc/latest/reference/numpysupported.html
For more information about typing errors and how to debug them visit:
http://numba.pydata.org/numba-doc/latest/user/troubleshoot.html#my-code-doesn-t-compile
If you think your code should work with Numba, please report the error message
and traceback, along with a minimal reproducer at:
https://github.com/numba/numba/issues/new
(The Beginnign typo is from the tutorial).
My vocab size is 28,852,611 and I’m only looking for subwords of size 7 or 9 (or both). But there are a decent number of subwords ( 3,065,880 ).
It works with the Text8 dataset, so I’m not sure where mine is diverging and going wrong, as much of it is copy and pasted from the tutorial.
Thanks,
–Joseph
The code I’m using to prepare the dataset is:
dataset = nlp.data.CorpusDataset(“out.ftinput”)
counter = nlp.data.count_tokens(itertools.chain.from_iterable(dataset))
vocab = nlp.Vocab(counter, unknown_token=None, padding_token=None,
bos_token=None, eos_token=None, min_freq=5)
idx_to_counts = [counter[w] for w in vocab.idx_to_token]
def code(sentence):
return [vocab[token] for token in sentence if token in vocab]
dataset_t = dataset.transform(code)