🐗 ‎ ‎

Gabriel Mongaras

gmongaras

16 26 15

https://gmongaras.me/

AI & ML interests

None yet

Recent Activity

updated a collection 2 days ago

Papers I'm going to read

updated a collection 2 days ago

Papers I'm going to read

updated a collection 3 days ago

Papers I'm going to read

View all activity

Organizations

upvoted a paper 8 days ago

Nemotron-TwoTower: Diffusion Language Modeling with Pretrained Autoregressive Context

Paper • 2606.26493 • Published 9 days ago • 3

upvoted a collection 8 days ago

Nemotron-Labs-TwoTower

Collection

Diffusion Language Modeling with Pretrained Autoregressive Nemotron 3 Models • 1 item • Updated 3 days ago • 5

upvoted a paper 17 days ago

Rethinking the Role of Efficient Attention in Hybrid Architectures

Paper • 2606.15378 • Published 21 days ago • 18

upvoted 3 papers about 1 month ago

upvoted a paper 3 months ago

Attention Sink in Transformers: A Survey on Utilization, Interpretation, and Mitigation

Paper • 2604.10098 • Published Apr 11 • 82

upvoted a paper 4 months ago

Memory Caching: RNNs with Growing Memory

Paper • 2602.24281 • Published Feb 27 • 13

upvoted 2 papers 7 months ago

InfiniteVL: Synergizing Linear and Sparse Attention for Highly-Efficient, Unlimited-Input Vision-Language Models

Paper • 2512.08829 • Published Dec 9, 2025 • 23

DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models

Paper • 2512.02556 • Published Dec 2, 2025 • 270

upvoted 2 papers 8 months ago

Lumine: An Open Recipe for Building Generalist Agents in 3D Open Worlds

Paper • 2511.08892 • Published Nov 12, 2025 • 218

Brain-IT: Image Reconstruction from fMRI via Brain-Interaction Transformer

Paper • 2510.25976 • Published Oct 29, 2025 • 16

upvoted an article 8 months ago

Article

Why Did MiniMax M2 End Up as a Full Attention Model?

MiniMax-AI

•

Oct 30, 2025

• 80

upvoted 2 papers 9 months ago

Less is More: Recursive Reasoning with Tiny Networks

Paper • 2510.04871 • Published Oct 6, 2025 • 517

Fast-dLLM v2: Efficient Block-Diffusion LLM

Paper • 2509.26328 • Published Sep 30, 2025 • 59

upvoted an article 9 months ago

Article

There is no such thing as a tokenizer-free lunch

catherinearnett

•

Sep 25, 2025

• 101

upvoted a paper 12 months ago

A Systematic Analysis of Hybrid Linear Attention

Paper • 2507.06457 • Published Jul 8, 2025 • 26

upvoted a paper about 1 year ago

Fast and Simplex: 2-Simplicial Attention in Triton

Paper • 2507.02754 • Published Jul 3, 2025 • 25

upvoted 2 papers over 1 year ago

RWKV-7 "Goose" with Expressive Dynamic State Evolution

Paper • 2503.14456 • Published Mar 18, 2025 • 153

Transformers without Normalization

Paper • 2503.10622 • Published Mar 13, 2025 • 172

Gabriel Mongaras

AI & ML interests

Recent Activity

Organizations

gmongaras's activity

Why Did MiniMax M2 End Up as a Full Attention Model?

There is no such thing as a tokenizer-free lunch