What does the y’ signify here? It may seem like a rather elementary question, but I haven’t been able to find out what the answer is. My assumption is that it refers to all of the other labels other than the one that we’re focusing on. Would that be correct?
y' also includes the label you’re focusing on. You can think of it as an index running through all possible labels for the summation.
Hello, thanks for the reply! So if that were the case, the equation is basically saying that if we want to calculate the probability of a certain label
y given some features, it’s equivalent to calculating the naive probability of seeing features given that specific label
y. Is my understanding correct? I hope I’m making sense.
Yes your understanding is mostly correct. It’s equivalent to calculating the naive probability of seeing those features given that specific label y, normalized by the probability of seeing those features at all. You also have to multiply by the prior probability of that label. I updated the book to include the
p(y) term that was missing from the statement of bayes theorem.