My first neural network for classification in mxnet gluon, I don't understand what is the problem

Hello, I have made my first neural network but I think It’s completely wrong and I don’t understand how exactly, this is my code:

NUM_LABELS = 2    # The number of labels.
file_train = open('Youtube_ex.csv')
file_test = open('Train_Youtube_ex.csv')

test_data,test_labels = extract_data(file_test)
test_data_nd = nd.array(test_data)
train_data,train_labels = extract_data(file_train)
train_data_nd = nd.array(train_data)

dataset = ArrayDataset(train_data, train_labels)
dataloader = DataLoader(dataset, batch_size=30, shuffle=True, num_workers=2)  
datasetvalidation = ArrayDataset(test_data, test_labels)
data_validation = DataLoader(datasetvalidation, batch_size=30, shuffle=True, num_workers=2)

batch_size = 30
net = nn.Sequential()
net.add(nn.Dense(256, activation='relu'),
        nn.Dense(2))
net.initialize(init.Normal(sigma=0.01))

softmax_cross_entropy = gluon.loss.SoftmaxCrossEntropyLoss(sparse_label=False)
trainer = gluon.Trainer(net.collect_params(), 'sgd', {'learning_rate': 0.01})

def acc(output, label):
    # output: (batch, num_output) float32 ndarray
    # label: (batch, ) int32 ndarray
    return (output.argmax(axis=1) ==
            label.astype('float32').argmax(axis=1)).mean().asscalar()

for epoch in range(10):
    train_loss, train_acc, valid_acc = 0., 0., 0.
    tic = time.time()
    for data, label in dataloader:
        with autograd.record():
            output = net(data)
            loss = softmax_cross_entropy(output, label)
        loss.backward()
        # update parameters
        trainer.step(batch_size)
        # calculate training metrics
        train_loss += loss.mean().asscalar()
        train_acc += acc(output, label)
    # calculate validation accuracy  
    for data, label in data_validation:
        valid_acc += acc(net(data), label)
    print("Epoch %d: loss %.3f, train acc %.3f, test acc %.3f, in %.1f sec" % (
            epoch, train_loss/len(data_validation), train_acc/len(data_validation),
            valid_acc/len(data), time.time()-tic))

And this is the result:

Epoch 0: loss 0.142, train acc 0.166, test acc 2.660, in 0.1 sec
Epoch 1: loss 0.142, train acc 0.134, test acc 2.703, in 0.1 sec
Epoch 2: loss 0.142, train acc 0.181, test acc 2.703, in 0.1 sec
Epoch 3: loss 0.142, train acc 0.150, test acc 2.730, in 0.1 sec
Epoch 4: loss 0.142, train acc 0.179, test acc 2.740, in 0.1 sec
Epoch 5: loss 0.142, train acc 0.175, test acc 2.753, in 0.1 sec
Epoch 6: loss 0.142, train acc 0.202, test acc 2.730, in 0.1 sec
Epoch 7: loss 0.142, train acc 0.179, test acc 2.743, in 0.1 sec
Epoch 8: loss 0.142, train acc 0.183, test acc 2.763, in 0.1 sec
Epoch 9: loss 0.142, train acc 0.169, test acc 2.757, in 0.1 sec

I think it’s wrong because my loss is always the same. I’m trying to use a .csv data set with 2 classes. I also don’t know how exactly to normalize the data, if I have to do this with all my dataset including labels in on hot encode or not and if there is a function to do it. On the internet, most of the tutorials are with image datasets and with functions to use in image classification problems.
If someone can give me some direction.
Also, how can I test my artificial intelligence (giving some data and having the answer to know if it is correctly trained)?
My classes are in numbers and in the code I do a one hot encode but I dont know how to identify my labels with names after.

It is hard to debug the code like that. I certainly see that when you display stuff you divide it on a wrong items:

  1. training_loss. You divide it by len(data_validation), but should divide by num of batches if you want to see loss per example and already took the mean.
  2. train accuracy. You divide it by len(data_validation), but should divide by len(dataloader)
  3. valid_accuracy. you are dividing it on len(data), but should divide it on len(data_validation)

I would also make sure that my data processing code is alright.

If you have categorical data, then you can use Embeeding layer instead of doing one-hot encoding. There is an example of doing it for a recommendation problem. They use Embedding layer for items and users. You could do something like that: https://github.com/apache/incubator-mxnet/blob/master/example/recommenders/demo1-MF.ipynb