topic: write a meaningful custom iterator which supports shuffling and composite datatypes.
two ways: (A) i write the iterator from scratch myself or (B) I try to embed ImageIter in my own iterator class
(B) has one problem I noticed: the DataBatch returned by mx.image.ImageIter has member DataBatch.index to be None. This problem occurs in the case when path_imglist is provided, no matter whether I have provided path_imgidx or not.
Note: my path_imglist has the sample indices in it. So the information is there (also in the path_imgidx).
Is DataBatch.index= None the desired behaviour?
Why I consider it a problem? Well,suppose I want to provide augmented data, for example a batch of images and a batch of whatever other datatype, and I want to use ImageIter with shuffling for the images, then I would need to access DataBatch.index to know what images I currently have in my batch. Or have I overlooked something?
doc says DataBatch.index should be a numpy. Sure ? numpy or mxarray ? Its not a big thing at all though.
(A) works, but then i have to do all image augmentations myself (and it could be slow if I manually use opencv or PIL)
so thats what i would love to avoid, even if i have coded it already
some other observations:
I. the return type of DataBatch.data:
do i misread the following ?
data : list of
NDArray, each array containing
A list of input data.
doc says it should be a list of mx.nd.array. Is that really a requirement from mxnet side?
Currently mx.nd.array does not support np.unicode_ type … if one wanted to provide strings as data …
there is a workaround: seems in my custom iterator
DataBatch.data can be a python list, and nothing complained so far (but i did not try transfer learn stuffs yet)
II. the SimpleIter example in https://mxnet.incubator.apache.org/tutorials/basic/data.html confuses me:
it uses zip for _provide_data which creates a(n iterator of) tuple I thought. However provide_data is expected to be a DataDesc class, which is derived from namedtuple. any method which expects something from datadesc which is not in tuple may fail.
Do I hallucinate or would it be better to use a datadesc object in that example ?
III. ImageIter doc could have a link provided to image.CreateAugmenter
aug_list=None … a link what aug_list could be seems to be not given
minor: IV. DataDesc description [link removed, new user can post only 2 links]
cls (DataDesc) – The class. … that part confuses me, seems to work without it
V. data augmentation with image.CreateAugmenter(data_shape, resize=0, rand_crop=False, rand_resize=False, rand_mirror=False, mean=None, std=None, brightness=0, contrast=0, saturation=0, hue=0, pca_noise=0, rand_gray=0, inter_method=2)
How that works is not clear from its doc. can you turn on resize to one size, then rand_crop to another size ?
that would require 2 size parameters, while data_shape allows for one only, but maybe I am wrong here
In general the relationship between multiple augmentations and data shape is not clear to me.
Some more info on data augmentation would be helpful (e.g. in a tutorial).
VI. strange error when trying to use data augmentation:
augs = mx.image.CreateDetAugmenter(data_shape=(3, 300, 300), rand_crop=0.5, rand_mirror=True, brightness=0.125, contrast=0.125, saturation=0.125 )
imgiter=mx.image.ImageIter(batch_size=5, data_shape=(3,300,300), label_width=2, path_imgrec=None, path_imglist=imglist, path_root=path_root, path_imgidx=indexlist, shuffle=False, part_index=0, num_parts=1, aug_list=augs, imglist=None, data_name=‘data’, label_name=‘softmax_label’)
python mxiter_py [modified to avoid link filter]
Traceback (most recent call last):
File “mxiter_py”, line 260, in
File “mxiter_py”, line 245, in tester2
File “…”, line 1181, in next
data = self.augmentation_transform(data)
File “…”, line 1239, in augmentation_transform
data = aug(data)
TypeError: call() missing 1 required positional argument: ‘label’