MXNetError - dataset does not start with a valid magic number

I am trying to use amazon sagemaker linear-learner algorithm, it support content type of ‘application/x-recordio-protobuf’. In preprocessing phase, i used scikit-learn preprocessing to one-hot-encode my features. Then i use linear learner estimator to with record-io converted input data.

I used package and the preprocess conversion was successful.

from sagemaker.amazon.common import write_spmatrix_to_sparse_tensor

But when linear-learner takes the input record, it fails with the error below

Caused by: [15:53:30] /opt/brazil-pkg-cache/packages/AIAlgorithmsCppLibs/AIAlgorithmsCppLibs-2.0.774.0/AL2012/generic-flavor/src/src/aialgs/io/iterator_base.cpp:100:

(Input Error) The header of the MXNet RecordIO record at position 810 in the dataset does not start with a valid magic number.

#sagemaker #linear-learner #mxnet #application/x-recordio-protobuf

I assume the problem comes from the preprocessing. Can you share a code example for how you preprocess and convert the data?

i have posted the same question in stackoverflow…

did you ever figure it out? I m getting the same error on the factorization machine and I protobuffed the data already …

@SarathChandran Did you ever figure this out? I ended up with something very similar to yours in my output_fn, trying to pass the output to KMeans. Same error. The line that MXNet is complaining about (no magic number) is the same number as number of samples in the file.