Peter Szemraj PRO

pszemraj

499 363 1064

https://pszemraj.carrd.co/

AI & ML interests

metallic intuition

Recent Activity

liked a model about 16 hours ago

Xenova/GIST-small-Embedding-v0

liked a model 1 day ago

NX-AI/TiRex-2

liked a Space 1 day ago

victor/nemotron-3-5-asr-streaming

View all activity

Organizations

liked a model about 16 hours ago

Xenova/GIST-small-Embedding-v0

Feature Extraction • Updated Jul 22, 2025 • 1.34k • 3

liked a model 1 day ago

NX-AI/TiRex-2

Time Series Forecasting • Updated 2 days ago • 15

liked a Space 1 day ago

Nemotron 3.5 ASR Streaming

🎙

Multilingual streaming ASR with NeMo

commented on Introducing the FFASR Leaderboard: Benchmarking ASR in the Real World 7 days ago

Much needed and very cool work!

Btw one related idea I've had sitting on the back of my mind: there's a class of synthetic audio scenarios that don't really occur naturally but are both tricky and relevant to real deployments.

Most ASR models lately seem geared toward clean meeting transcription. But if you run a Granola style setup that pulls every audio stream on your machine and mixes them together, things get messy fast. The system audio from the meeting, your physical hardware mic, and sometimes your own voice echoing back through the meeting can all be mixed/delayed/noisy in the same track the model sees. (I use VibeVoice for this myself since I figured it would be more robust; it's held up okay, but I haven't done a real comparison)

Mixed multi-stream audio like that feels like a natural fit for the kind of robustness/"real world scenario" this benchmark is measuring, even though it's a synthetic condition rather than a recorded room

upvoted 2 articles 9 days ago

Article

Introducing the FFASR Leaderboard: Benchmarking ASR in the Real World

daniel-treble, whojavumusic, alessia-treble, georg-goetz, bezzam

•

12 days ago

• 7

Article

Which tokens does a hybrid model predict better?

allenai

•

10 days ago

• 8

commented on Which tokens does a hybrid model predict better? 9 days ago

Cool work, I've been quite excited about AllenAI's take/improved hybrid arch. Question for you though:

The one genuinely matched-data comparison in the paper is the 1B ladder (transformer / hybrid / pure-RNN, identical mix), which you use for the 6 filtered-loss eval - but only as aggregate loss, not the POS/bracket/copy decomposition. Since that's forward-passes-only on released checkpoints, have you run (or can you) the same tag-stratified analysis on those models? It'd help show whether the content-word / open-close / copy structure survives when data is actually held constant (vs ~7b case).

Curious if you've looked at this internally as well