In definition of bayespost(data)
function, the x
in logpost += (logpx * x + logpxneg * (1-x)).sum(0)
should be data
.
There are certain terms like softmax which I feel are unknown to a beginner. Is this a concept which will be covered later or is there any resource where we can learn about this?
I got confused about the meaning of the notations P(x), P(y), P(x|y), etc.
To all, this chapter is rewritten to be more beginner friendly. (you may need a force fresh in case there is a cached version).
I don’t really get it in Section 2.5.4 for Bayes prediction:
def bayes_pred(x):
x = x.expand_dims(axis=0) # (28, 28) -> (1, 28, 28)
p_xy = P_xy * x + (1-P_xy)*(1-x)
p_xy = p_xy.reshape((10,-1)).prod(axis=1) # p(x|y)
return p_xy * P_y
What is line 3 doing and how we are getting value of p(x|y)?
@harrysun23w
The original shape of p_xy is (10, 28, 28), so line 3 reshapes p_xy into (10, 784), and then does multiplication for all the 784 probabilities for each class.
If we can estimate \prod_i p(x_i=1 | y) for every i and y, and save its value in P_{xy}[i,y], here P_{xy} is a d\times n matrix with n being the number of classes and y\in{1,\ldots,n}.
It seems that \prod_i p(x_i=1 | y) should be p(x_i=1 | y) instead.
we could compute \hat{y} = \operatorname*{argmax}_y \prod_{i=1}^d P_{xy}[x_i, y]P_y[y], (2.5.5)
this equation seems incorrect. Probably it could be like,
\hat{y} = \operatorname*{argmax}_y \prod_{i=1}^d (x_iP_{xy}[i, y] + (1 - x_i)(1 - P_{xy}[i, y]))P_y[y]
I do not get this equation here.
p_xy = P_xy * x + (1-P_xy)*(1-x)
which is not explained in the context.
Since x_i can only be 1 or 0, we should have
p(x_i | y)
= p(x_i | y) \delta(x_i - 1) + p(x_i | y) \delta(x_i - 0)
= p(x_i = 1 | y) x_i + p(x_i = 0 | y) (1-x_i)
=>
p(x_i | y) = p(x_i = 1 | y) x_i + (1 - p(x_i = 1 | y)) (1-x_i)
If P_{xy}[i, y] represents p(x_i = 1| y), we have
\hat{y} = \operatorname*{argmax}_y \> \prod_{i=1}^d (P_{xy}[i, y]x_i + (1-P_{xy}[i,y])(1-x_i))P_y[y],
For log case,
\hat{y} = \operatorname*{argmax}_y \> \sum_{i=1}^d (\log P_{xy}[i, y] x_i + \log (1 - P_{xy}[i, y])(1-x_i) + \log P_y[y].
Hi Yayun, thank you for your reply. Basically it is just a mathematical transformation, right?
Hi mru4913, you are welcome. I think it is. Just keep in mind that our goal is to find what the value of p(x_i | y) is. I was confused at the first time. But p(x_i = 1| y) reminded me that x_i could also be 0, and then I got the key that we need to calculate p(x_i = 0 | y).
What is the answer of 3rd question of exercise ?
what is delta in this case?
n_x[y] = nd.array(X.asnumpy()[Y==y].sum(axis=0)).
In this line why does one have to convert X to numpy and then index it, why not directly index it like X[Y==y] ?
I think it is a bug. To be consistent with the code snippet later which is used to demo the trick of avoiding underflow and overflow, the code here should be
p_xy = P_xy ** x + (1-P_xy)**(1-x)
Yes, I think so.
should be
instead.