Can someone please specify where the 2,500 comes from?
5 vectors per stock (open, high, low, close, volume) and 500 company in S&P 500 list. Actual there are 500 companies but 505 stock tickers. (Such as Alphabet has two tickers “GOOG” and “GOOGL”). So you can use 2525 dimension for your convenience.
So the MXNet array should be of size (2525)x(# of trading days over 5 years)? Or do we want trading days as rows and 2525 columns?
Each trading day we have a vector of size 2525 (concatenate all stock’s price and volume at a day).
So if there are only 490 stocks in a particular day, we need to pad the data to the length of 2525?
And if got NaN for some data, are we filling them with the previous data? Or just mean?
Same way as you during with any other missing data.
For some stocks, we don’t have the record for 5 years. For example, for a stock that only have 3 years’ history, how to we deal with the missing data in the first two years? Thanks.