Can someone help me with the fastest way to compute cosine similarities of ndarrays pair. The ndarrays are of length 2048.
I have 10000 such pairs.
In python, I am able to get this under 2-3 minutes.
How can we achieve this in SCALA?
Can someone help me with the fastest way to compute cosine similarities of ndarrays pair. The ndarrays are of length 2048.
I have 10000 such pairs.
In python, I am able to get this under 2-3 minutes.
How can we achieve this in SCALA?
you mean numpy.ndarray or mxnet.ndarray?
if mxnet.ndarray, this might be helpful:
https://gluon-nlp.mxnet.io/_modules/gluonnlp/embedding/evaluation.html#CosineSimilarity
you can use GPU.
I meant mxnet.ndarray
How can we parallelise computing cosine similarities of ndarrays pair in SCALA. The ndarrays are of length 2048.
I have 10000 such pairs.
Here is the solution in python.
You can reproduce it in Scala using the Scala API:
Here are some useful tutorials:
import mxnet as mx
import time
tic = time.time()
first_term = mx.nd.random.uniform(shape=(10000,2048), ctx=mx.gpu())
second_term = mx.nd.random.uniform(shape=(10000,2048), ctx=mx.gpu())
first_term_normalized = first_term / mx.nd.norm(first_term, axis=1, keepdims=1)
second_term_normalized = second_term / mx.nd.norm(second_term, axis=1, keepdims=1)
cosine_similarity = mx.nd.batch_dot(first_term_normalized.expand_dims(axis=1), second_term_normalized.expand_dims(axis=2)).squeeze()
mx.nd.waitall()
print(time.time()-tic)
print(cosine_similarity)
(it takes about ~10ms on GPU)