For example, for the dancing scenes (rows 1-2 and rows 9-10) and the skating scene (rows 5-6), given the presence of fast body movement and self-occlusion, the estimations are accurate enough to provide the corresponding 3D positions for each frame. Note that this is just one selected frame from the walking sequence, which is a common body activity involving the alternate of left and right legs in a repetitive manner. To further verify the robustness, different sports activities with novel body poses (rows 3-4, rows 7-8, and rows 11-12) are processed. Figure 14 shows several outdoor simulations on the standard activities with snow, fog, and occlusion effects (each column). Figure 18 demonstrates the results of this experiment on various activities. Furthermore, in the second part of Table 6, we show the results with ground-truth (GT) 2D input. The architecture of the causal attention model is shown in Fig. 12. The architecture is similar to the one described in Fig. 2, but here we only consider the left half of the input video sequence. In particular, more noticeable improvements are achieved as the number of input frames increases.
my blog oq-ayiq.net