I’m confused as to how exactly does .transform
work.
I thought that it is just used to map a function over dataset items, but it’s clearly not as simple:
def dummy_transform(input, output):
return input.split(), output.split()
dummy_dataset = gluon.data.SimpleDataset([('1 2 a', '1'), ('3 4', '5')])
transformed_dummy_dataset = dummy_dataset.transform(dummy_transform)
transformed_dummy_dataset[0]
works as expected - it returns dummy_transform(dataset[0])
, but when I try
transformed_dummy_dataset[:1]
I get
TypeError Traceback (most recent call last)
in ()
----> 1 dummy_dataset[:1]
~/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/gluon/data/dataset.py in getitem(self, idx)
127 if isinstance(item, tuple):
128 return self._fn(*item)
→ 129 return self._fn(item)
130
131
TypeError: call() missing 1 required positional argument: ‘output’
From what I’ve checked it seems like using .transform
feeds the whole Dataset to DummyTransform()
.
Why does that happen?
EDIT:
I’ve simplified the example
The error has nothing to do with gluon. There is error in your code. You have made a class without any inheritance and defined call method which takes 2 positional arguments (input, output) then you are calling the class object without passing any arguments!!!
This is the python TypeError. There’s nothing wrong with Gluon.
Replace your class with:
def DummyTransform(input, output):
return input.split(), output.split()
@mouryarishik You’re wrong. The Python code is correct, for example DummyTransform()(*dummy_dataset[0])
works just fine.
BTW the function you proposed gives exactly the same error as using DummyTransform
as I outlined.
Why the hell you are still calling the function.
You are supposed to pass the function name that’s it, gluon will call it internally itself.
Remove “()” after DummyTransform from "transformed_dummy_dataset = dummy_dataset.transform(DummyTransform()).
Just write
“transformed_dummy_dataset = dummy_dataset.transform(DummyTransform)”
Watch the tutorial thoroughly here https://gluon.mxnet.io/chapter02_supervised-learning/softmax-regression-gluon.html.
You are missing the basic understanding of gluon.
Hope this helps
First of all, I know I’m not supposed to do
dummy_dataset.transform(DummyTransform())
With your function.
. I didn’t do that. I’ve renamed your function and called it (renamed it, according to Python convention, to dummy_transform
):
transformed_dummy_dataset = dummy_dataset.transform(dummy_transform)
The result is the same.
Your answer in fact isn’t helpful, you only add confusion to my point. Please do not expand this confusion further.
Plz open the link I’ve provided, the explanation of how to use transform is available there.
Hi @lambdaofgod,
Adding a litle more information to what I have already provided answering your other question, the following source code snippets might be useful for your understanding.
When calling dummy_dataset.transform(dummy_transform)
you create a _LazyTransformDataset
from the current Dataset
using the dummy_transform
function:
class Dataset(object):
...
def transform(self, fn):
trans = _LazyTransformDataset(self, fn)
return trans
And a _LazyTransformDataset
implements a __getitem__
method that’s used for retrieving single samples from the original Dataset
(self._data
) and passing them to the transform function (self._fn
).
class _LazyTransformDataset(Dataset):
"""Lazily transformed dataset."""
def __init__(self, data, fn):
self._data = data
self._fn = fn
...
def __getitem__(self, idx):
item = self._data[idx]
if isinstance(item, tuple):
return self._fn(*item)
return self._fn(item)
Strange things happen when you start requesting multiple indexes in one go (e.g. transformed_dummy_dataset[:2]
), because this isn’t intended usage. item
ends up being a list of samples, which isn’t a tuple (the if condition in the code) and so the list of samples gets passed to the transform function. dummy_transform
wasn’t expecting a single input (since you specified dummy_transform(input, output)
) and everything breaks giving you:
TypeError: call() missing 1 required positional argument: ‘output’
I hope that explains everything. In summary, don’t use [:n]
slicing on a Dataset.