Removed NEST layer outputs and added Streaming Sortformer v2 weights

#6

The NEST layer outputs used to be part of the model to be used as speaker embeddings, but testing revealed that they actually encoded the arrival order slot rather than the speaker identity, so they were removed.
I also added variants for StreamingSortformer v2 (the old one was v2.1), which performed better in DIHARD III and may work better outside of meeting environments (unconfirmed).

bweng changed pull request status to merged

Sign up or log in to comment