--

That's cause we don't want to average samples across different channels (and risk losing some signal in data) and ultimately convert the audio into 1-D array (especially when HuBERT can handle multi-dimensional audio arrays).

--

--

Dr. Varshita Sher

Senior Data Scientist | Explain like I am 5 | Oxford & SFU Alumni | https://podurama.com | Top writer on Medium