HW5, Q3 Weighting

mjethanandani · March 1, 2019, 5:45pm

@gold_piggy: reminder to finalize the weighting proportions.

gold_piggy · March 2, 2019, 6:03am

In our example, the truth distribution is 50% shirts and 50% shoes, while the train distribution is 10% shirts and 90% shoes. As a results, the model trained on training set will classify more shoes than shirts. That’s why we need to adjust the probability distribution by the following formulas.

[Background.] See we have
x_i^{'} \sim q(x), n_q samples in test set (truth distribution)
x_i \sim p(x), n_p samples in training set

To train a classier f_c for distinguish train and truth distribution, we define a “classifier training set” C = \{(a_i, b_i), ... \}, which is the combination of training and testing set, and it has n_c = (n_p + n_q) samples, and

b_i = -1 , if a_i \sim p(x), i.e. draw from training set;
b_i = 1 , if a_i \sim q(x), i.e. draw from test set.

As b_i is a binary classifier,

p(b_i=1 | a_i) = \frac{q(x) w_q}{p(x) w_p + q(x) w_q} \text{ where } w_p = \frac{n_p}{n_p + n_q}, w_q = \frac{n_q}{n_p + n_q}

and hence,

\frac{Pr(b_i=1|a_i)}{Pr(b_i=-1|a_i)} = \frac{q(x) n_q}{p(x) n_p}

Also, by HW3, we know

\frac{Pr (b_i = 1 | a_i)}{Pr (b_i = -1 | a_i)} = exp(f_{c}(x)),

Hence, by above two functions, for unbalanced train and test sets:

\frac{q(x)}{p(x)} = \frac{n_p}{n_q } exp(f_{c}(x))

i.e. the classifier will have more weights on q(x) if n_p > n_q.

Now back to our covariance shift problem, we can approximately the function by the classifier:

\int q(x) f(x) dx = \int p(x) \alpha(x) f(x) dx, where \alpha (x) = \frac{q(x)}{p(x)} = \frac{n_p}{n_q } exp(f_{c}(x))

i.e. for each sample x, we can approximate its probability by p(x) \alpha(x), where p(x) is from train distribution and \alpha(x) calculated by above classifier f_c,

or in the discrete cases,

\sum_i loss(y_i, \hat{y_i}) —> \sum_i loss(y_i, \hat{y_i}) * exp(f_c(x_i))

Note the weighted factors can be implemented as sample_weight parameters in SigmoidBinaryCrossEntropyLoss.

limslarmo · March 5, 2019, 11:08am

To clarify… does $$f_c(x_i)$$ take the value of 1 or -1 or the original output of the data classifier that might be any float value?

Topic		Replies	Views
HW5 accuracy of distribution classifier Courses	3	464	March 12, 2019
HW5 Q2 and Q3 Courses	7	662	February 26, 2019
Hw5 q3.2 Courses	2	555	March 2, 2019
Hw5 q1.4 Courses	2	339	February 26, 2019
Sampling balance in YOLO training Gluon	0	254	January 31, 2020

HW5, Q3 Weighting

Related Topics