Question 1 part 3


#1

I don’t want to convert the mxnet dataset to an ndarray and then manually filter out the elements not of interest.

First I’m relabeling with a transform function and then providing a custom batchify function. But the design of mxnet is making this painful, as the zipper only allows access to the labels after it has stacked the predictor variables into a big ndarray.

How do I write a batchify function to remove the unwanted data points?

Edit: I’m aware that this won’t work because there batches would differ in size after removing unwanted elements, but can’t think of a reasonable solution, so am giving up.


#2

There are probably several ways to do this, but you can access the data directly after importing with:

train_labels = mnist_train._label
train_data = mnist_train._data

and then from there select only the indices of interest (i.e. where label is 2,5,6 or 7, I believe).


#3

I used that method to keep only the rows of interest, but how do we feed it into the trainer?

I attempted using mxnet.gluon.gdata.DataLoader(gdata.ArrayDataset(…)) as in homework 4’s train method to get train_iter and test_iter because I had split the data and labels. The result was an incompatible datatype error (expected uint8 but got float32).

I checked that the data and labels are uint8, but I’m still receiving this error.


#4

@rdutta

I ran into the same error and googled it. The error message is backward (it actually expects float32, but the data is uint8). Casting the arrays to float32 fixed it for me.

See here for more info: https://stackoverflow.com/questions/49961351/mxnet-augmentations-expected-uint8-got-float32