We write a trading robot ourselves
For example, if we only ever traversed the data frame in a serial fashion i. Our observation space could only even take on a discrete number of states at each time step.
However, by randomly traversing slices of the data frame, we essentially manufacture more unique data points by creating more interesting combinations of account balance, trades taken, and previously seen price action for each time step in our initial data set. Let me explain with an example. At time step 10 after resetting a serial environment, our agent will always be at the same time within the data frame, and would have had 3 choices to make at each time step: buy, sell, or hold.
Now consider our randomly sliced environment. At time step 10, our agent could be at any of len df time steps within the data frame. While this may add quite a bit of noise to large data sets, I believe it should allow the agent to learn more from our limited amount of data.
For example, here is a visualization of our observation space rendered using OpenCV. The first 4 rows of frequency-like red lines represent the OHCL data, and the spurious orange and yellow dots directly below represent the volume.
If you squint, you can just make out a candlestick graph, with volume bars below it and a strange morse-code like interface below that shows trade history.