Hi,
Thanks for creating this repo, it's really helpful!
Currently, I would like to use s3d to get features for each frame by setting step_size=1 and stack_size=20. When looking at the code in models/s3d/extract_s3d.py, I wasn't sure how the temporal window is determined, as there is no ...timestamps_ms.npy output file as in i3d code..
Looking at the code below, it appears that for a given sample, the window is forward-looking. E.g. for sample 0, the features would be determiend via the window: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19], and for sample 5 it would be the window [5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24]. Is this correct? Seems a bit counter-intuitive that the window is forward-looking, backward-looking or centered around the sample would make more sense to me?
|
slices = form_slices(rgb.size(2), self.stack_size, self.step_size) |
|
|
|
vid_feats = [] |
|
|
|
for stack_idx, (start_idx, end_idx) in enumerate(slices): |
|
# inference |
|
rgb_stack = rgb[:, :, start_idx:end_idx, :, :].to(self.device) |
|
output = self.name2module['model'](rgb_stack, features=True) |
|
vid_feats.extend(output.tolist()) |
|
self.maybe_show_pred(rgb_stack, start_idx, end_idx) |
|
def form_slices(size: int, stack_size: int, step_size: int) -> list((int, int)): |
|
'''print(form_slices(100, 15, 15) - example''' |
|
slices = [] |
|
# calc how many full stacks can be formed out of framepaths |
|
full_stack_num = (size - stack_size) // step_size + 1 |
|
for i in range(full_stack_num): |
|
start_idx = i * step_size |
|
end_idx = start_idx + stack_size |
|
slices.append((start_idx, end_idx)) |
|
return slices |
Appreciate the help!
Akseli
Hi,
Thanks for creating this repo, it's really helpful!
Currently, I would like to use
s3dto get features for each frame by settingstep_size=1andstack_size=20. When looking at the code inmodels/s3d/extract_s3d.py, I wasn't sure how the temporal window is determined, as there is no...timestamps_ms.npyoutput file as ini3dcode..Looking at the code below, it appears that for a given sample, the window is forward-looking. E.g. for sample 0, the features would be determiend via the window: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19], and for sample 5 it would be the window [5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24]. Is this correct? Seems a bit counter-intuitive that the window is forward-looking, backward-looking or centered around the sample would make more sense to me?
video_features/models/s3d/extract_s3d.py
Lines 60 to 69 in a2f61b7
video_features/utils/utils.py
Lines 62 to 71 in a2f61b7
Appreciate the help!
Akseli