Can the course staff please clarify what this question means or point to some related resources/examples? Thank you!

# HW8.3 Clarification

**gold_piggy**#3

We use part of the original string to predict what comes next. e.g. after `"But B"`

, it comes `"r"`

, after `"ut Br"`

, it comes `"u"`

. We are preparing for training data here for the model to learn what comes after certain characters. The above is 5 gram, which take 5 characters sliding window for each X.

**Benson_Yuan**#4

Regarding sequential encoding, do we always have a sequence of 5?

From my understanding, for " * Use a bag of characters encoding that sums over all occurrences.", we simply turned the 5 characters into a one-hot encoded vector. So for sequential encoding, can we just retain it as a matrix of shape (vocab size, 5) as the input where ith column is the one-hot vector of the ith character?

Edit: It seems to work

**gold_piggy**#5

That’s an interesting approach, not sure how well does it work. My understanding is to sum the 5-character encoding to a vector rather than matrix.

Yeah, 5-gram is fine.

**Benson_Yuan**#6

I thought the question mentioned to use two models.

- In one case use a sequential encoding to obtain an embedding proportional to the length of the sequence. (each example is a matrix)
- Use a bag of characters encoding that sums over all occurrences. (each example is a vector)

And the result is consistent with our intuition in which one should work significantly better than the other.

**annashang**#8

If that’s the case, we would lose sequential information when we turn the matrix into a vector. E.g. “aab” would have same one-hot encoding as “baa” and “aba”.