Dataset bizarre behavior after using .transform

#1

I’m confused as to how exactly does .transform work.

I thought that it is just used to map a function over dataset items, but it’s clearly not as simple:

def dummy_transform(input, output):
    return input.split(), output.split()

dummy_dataset = gluon.data.SimpleDataset([('1 2 a', '1'), ('3 4', '5')])
transformed_dummy_dataset = dummy_dataset.transform(dummy_transform)

transformed_dummy_dataset[0] works as expected - it returns dummy_transform(dataset[0]), but when I try

transformed_dummy_dataset[:1]

I get

TypeError Traceback (most recent call last)
in ()
----> 1 dummy_dataset[:1]

~/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/gluon/data/dataset.py in getitem(self, idx)
127 if isinstance(item, tuple):
128 return self._fn(*item)
–> 129 return self._fn(item)
130
131

TypeError: call() missing 1 required positional argument: ‘output’

From what I’ve checked it seems like using .transform feeds the whole Dataset to DummyTransform().

Why does that happen?

EDIT:
I’ve simplified the example

#2

The error has nothing to do with gluon. There is error in your code. You have made a class without any inheritance and defined call method which takes 2 positional arguments (input, output) then you are calling the class object without passing any arguments!!!
This is the python TypeError. There’s nothing wrong with Gluon.
Replace your class with:

def DummyTransform(input, output):
return input.split(), output.split()

#3

@mouryarishik You’re wrong. The Python code is correct, for example DummyTransform()(*dummy_dataset[0]) works just fine.

BTW the function you proposed gives exactly the same error as using DummyTransform as I outlined.

#4

Why the hell you are still calling the function.
You are supposed to pass the function name that’s it, gluon will call it internally itself.
Remove “()” after DummyTransform from "transformed_dummy_dataset = dummy_dataset.transform(DummyTransform()).
Just write
“transformed_dummy_dataset = dummy_dataset.transform(DummyTransform)”

#5

Watch the tutorial thoroughly here https://gluon.mxnet.io/chapter02_supervised-learning/softmax-regression-gluon.html.
You are missing the basic understanding of gluon.
Hope this helps

#6

First of all, I know I’m not supposed to do
dummy_dataset.transform(DummyTransform())
With your function.
. I didn’t do that. I’ve renamed your function and called it (renamed it, according to Python convention, to dummy_transform):
transformed_dummy_dataset = dummy_dataset.transform(dummy_transform)
The result is the same.

Your answer in fact isn’t helpful, you only add confusion to my point. Please do not expand this confusion further.

#7

Plz open the link I’ve provided, the explanation of how to use transform is available there.

#8

Hi @lambdaofgod,

Adding a litle more information to what I have already provided answering your other question, the following source code snippets might be useful for your understanding.

When calling dummy_dataset.transform(dummy_transform) you create a _LazyTransformDataset from the current Dataset using the dummy_transform function:

class Dataset(object):
    ...

    def transform(self, fn):
        trans = _LazyTransformDataset(self, fn)
        return trans

And a _LazyTransformDataset implements a __getitem__ method that’s used for retrieving single samples from the original Dataset (self._data) and passing them to the transform function (self._fn).

class _LazyTransformDataset(Dataset):
    """Lazily transformed dataset."""
    def __init__(self, data, fn):
        self._data = data
        self._fn = fn
    ...

    def __getitem__(self, idx):
        item = self._data[idx]
        if isinstance(item, tuple):
            return self._fn(*item)
        return self._fn(item)

Strange things happen when you start requesting multiple indexes in one go (e.g. transformed_dummy_dataset[:2]), because this isn’t intended usage. item ends up being a list of samples, which isn’t a tuple (the if condition in the code) and so the list of samples gets passed to the transform function. dummy_transform wasn’t expecting a single input (since you specified dummy_transform(input, output)) and everything breaks giving you:

TypeError: call() missing 1 required positional argument: ‘output’

I hope that explains everything. In summary, don’t use [:n] slicing on a Dataset.