Skip to content

Explain windowing/slicing in extract_s3d.py #149

@Akseli-Ilmanen

Description

@Akseli-Ilmanen

Hi,

Thanks for creating this repo, it's really helpful!

Currently, I would like to use s3d to get features for each frame by setting step_size=1 and stack_size=20. When looking at the code in models/s3d/extract_s3d.py, I wasn't sure how the temporal window is determined, as there is no ...timestamps_ms.npy output file as in i3d code..

Looking at the code below, it appears that for a given sample, the window is forward-looking. E.g. for sample 0, the features would be determiend via the window: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19], and for sample 5 it would be the window [5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24]. Is this correct? Seems a bit counter-intuitive that the window is forward-looking, backward-looking or centered around the sample would make more sense to me?

slices = form_slices(rgb.size(2), self.stack_size, self.step_size)
vid_feats = []
for stack_idx, (start_idx, end_idx) in enumerate(slices):
# inference
rgb_stack = rgb[:, :, start_idx:end_idx, :, :].to(self.device)
output = self.name2module['model'](rgb_stack, features=True)
vid_feats.extend(output.tolist())
self.maybe_show_pred(rgb_stack, start_idx, end_idx)

def form_slices(size: int, stack_size: int, step_size: int) -> list((int, int)):
'''print(form_slices(100, 15, 15) - example'''
slices = []
# calc how many full stacks can be formed out of framepaths
full_stack_num = (size - stack_size) // step_size + 1
for i in range(full_stack_num):
start_idx = i * step_size
end_idx = start_idx + stack_size
slices.append((start_idx, end_idx))
return slices

Appreciate the help!
Akseli

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions