Simple network does not learn on own images


I have tested the python examples of mnist with Alex network. It works fine.
However, I changed the code to feed my own images. These are two class artificial images (red outlined circle vs. blue outlined square). In Mathematica with Alex network, this learns very fast (1 iteration) with an accuracy of 100%.Of course I changed the last FC layer to 2 outputs.

In MxNet however it does not learn. The output probability is around [0.4966,0.5034] and the same for each image.

I checked whether the batch contains the right encoded image and corresponding label. That seems OK. I used several values for learning rate and initialization but nothing seems to work. The only difference is that MNIST image pixel values are within 0…1, and my images have pixel values between 0…255.

Any idea what’s wrong?

Using MxNet 0.11 with CUDA 8.0 / CUDNN 7 (used pip install), on Windows 10, VS2017 Anaconda 2, python 2.7 , Titan-X GPU.

Code (rec/lst files can be downloaded from: download link rec/lst files):

import numpy as np
import mxnet as mx
import matplotlib.pyplot as plt

batch_size = 10
#train_iter =[‘train_data’], mnist[‘train_label’], batch_size, shuffle=True)
#val_iter =[‘test_data’], mnist[‘test_label’], batch_size)

def get_iterators(batch_size, data_shape=(3, 28, 28)):
train =
path_imgrec = ‘C:/DataSets/CirclesAndRectsBase28Color/CirclesAndRectsBase28Color_train.rec’,
path_imglist = ‘C:/DataSets/CirclesAndRectsBase28Color/CirclesAndRectsBase28Color_train.lst’,
data_name = ‘data’,
label_name = ‘softmax_label’,
batch_size = batch_size,
data_shape = data_shape,

mean_r = 220,

mean_g = 220,

mean_b = 220,

    shuffle             = False,
    rand_crop           = False,
    rand_mirror         = False)
val =
    path_imgrec         = 'C:/DataSets/CirclesAndRectsBase28Color/CirclesAndRectsBase28Color_val.rec',
    path_imglist        = 'C:/DataSets/CirclesAndRectsBase28Color/CirclesAndRectsBase28Color_val.lst',
    data_name           = 'data',
    label_name          = 'softmax_label',
    batch_size          = batch_size,
    data_shape          = data_shape,
    rand_crop           = False,
    rand_mirror         = False)
return (train, val)

data = mx.sym.var(‘data’)

first conv layer

conv1 = mx.sym.Convolution(data=data, kernel=(5,5), num_filter=20)
tanh1 = mx.sym.Activation(data=conv1, act_type=“tanh”)
pool1 = mx.sym.Pooling(data=tanh1, pool_type=“max”, kernel=(2,2), stride=(2,2))

second conv layer

conv2 = mx.sym.Convolution(data=pool1, kernel=(5,5), num_filter=50)
tanh2 = mx.sym.Activation(data=conv2, act_type=“tanh”)
pool2 = mx.sym.Pooling(data=tanh2, pool_type=“max”, kernel=(2,2), stride=(2,2))

first fullc layer

flatten = mx.sym.flatten(data=pool2)
fc1 = mx.symbol.FullyConnected(data=flatten, num_hidden=500)
tanh3 = mx.sym.Activation(data=fc1, act_type=“tanh”)

second fullc

fc2 = mx.sym.FullyConnected(data=tanh3, num_hidden=2)

softmax loss

lenet = mx.sym.SoftmaxOutput(data=fc2, name=‘softmax’)

import logging
logging.getLogger().setLevel(logging.DEBUG) # logging to stdout

create a trainable module on CPU

(train, val) = get_iterators(batch_size)
#test if image is the image we expect
batch =
dataN =[0]
im = dataN[0].asnumpy()
imT = im.transpose(2,1,0)
imTscaled = imT * (1.0/255.0)

lenet_model = mx.mod.Module(symbol=lenet, context=mx.cpu()),
batch_end_callback = mx.callback.Speedometer(batch_size, 20),
initializer = mx.initializer.Xavier)

prob = lenet_model.predict(val)
probNP = prob.asnumpy()
a = probNP.shape
b = prob.ndim