Naive Bayes Classification


In definition of bayespost(data) function, the x in logpost += (logpx * x + logpxneg * (1-x)).sum(0) should be data.

1 Like

There are certain terms like softmax which I feel are unknown to a beginner. Is this a concept which will be covered later or is there any resource where we can learn about this?

I got confused about the meaning of the notations P(x), P(y), P(x|y), etc.

To all, this chapter is rewritten to be more beginner friendly. (you may need a force fresh in case there is a cached version).

I don’t really get it in Section 2.5.4 for Bayes prediction:

def bayes_pred(x):
    x = x.expand_dims(axis=0)  # (28, 28) -> (1, 28, 28)
    p_xy = P_xy * x + (1-P_xy)*(1-x)
    p_xy = p_xy.reshape((10,-1)).prod(axis=1) # p(x|y)
    return p_xy * P_y

What is line 3 doing and how we are getting value of p(x|y)?

The original shape of p_xy is (10, 28, 28), so line 3 reshapes p_xy into (10, 784), and then does multiplication for all the 784 probabilities for each class.

If we can estimate \prod_i p(x_i=1 | y) for every i and y, and save its value in P_{xy}[i,y], here P_{xy} is a d\times n matrix with n being the number of classes and y\in{1,\ldots,n}.

It seems that \prod_i p(x_i=1 | y) should be p(x_i=1 | y) instead.

we could compute \hat{y} = \operatorname*{argmax}_y \prod_{i=1}^d P_{xy}[x_i, y]P_y[y], (2.5.5)

this equation seems incorrect. Probably it could be like,

\hat{y} = \operatorname*{argmax}_y \prod_{i=1}^d (x_iP_{xy}[i, y] + (1 - x_i)(1 - P_{xy}[i, y]))P_y[y]