Thanks, Samuele. This is brilliant. Clearly the cut off assumption plays a big role here in generating the predictor matrix. Do you have any recommendations in this regard - perhaps trying a bunch of cut offs, averaging, etc and the impact on the outcome, if any?
That's cause we don't want to average samples across different channels (and risk losing some signal in data) and ultimately convert the audio into 1-D array (especially when HuBERT can handle multi-dimensional audio arrays).