i’m trying to train a persian g2p model.
i have a csv file containing the data which is look like word,pos,phoneset
now, i read it with python’s csv module and try to encode it as a one-hot representation
but, i got the error saying “ValueError: Setting an array element with a sequence”
i know my arrays are with different size, but i am confused on how can i fix that error
this is my preprocessing and training code:
import csv from seq2seq import * from scikit_learn import * import numpy as np import mxnet as mx from mxnet import autograd, gluon, nd print("preprocessing data...") x, y = ,  # this function converts each charactor to it's ascii and returns an nd.array def convert_ascii(t): return nd.array([ord(c) for c in t]) with open("l1.csv", "r") as f: s = 0 r = csv.reader(f) for row in r: a = [nd.one_hot(convert_ascii(row), depth=32), nd.one_hot(convert_ascii(row), depth = 10)] b = nd.one_hot(convert_ascii(row), depth = 32) x.append(a) y.append(b) s += 1 x = np.array(x, dtype = np.float32) y = np.array(y, dtype = np.float32) net = seq2seq(s, x.size, 5000000, 5000000) # train print("training the data...") clf = GluonClassifier(model = net, loss_function = gluon.loss.SoftmaxCrossEntropyLoss, init_function = mxnet.initializer.Xavier, batch_size = 256, epochs = 1000000, verbose = True) clf.fit(x, y)
the GluonClassifier is my scikit-learn like wrapper to gluon, and seq2seq is a lstm sequence2sequence model
thanks in advance.