DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning Paper β’ 2501.12948 β’ Published Jan 22, 2025 β’ 434
Running on CPU Upgrade Featured 2.84k The Smol Training Playbook π 2.84k The secrets to building world-class LLMs
view article Article The Open Medical-LLM Leaderboard: Benchmarking Large Language Models in Healthcare +1 Apr 19, 2024 β’ 190
view article Article π¦Έπ»#12: How Do Agents Learn from Their Own Mistakes? The Role of Reflection in AI Mar 9, 2025 β’ 9
view article Article Welcome PaliGemma 2 β New vision language models by Google +2 Dec 5, 2024 β’ 162
view article Article Introducing Idefics2: A Powerful 8B Vision-Language Model for the community +1 Apr 15, 2024 β’ 191
view post Post 5718 I have put together a notebook on Multimodal RAG, where we do not process the documents with hefty pipelines but natively use:- vidore/colpali for retrieval π it doesn't need indexing with image-text pairs but just images!- Qwen/Qwen2-VL-2B-Instruct for generation π¬ directly feed images as is to a vision language model with no processing to text! I used ColPali implementation of the new π Byaldi library by @bclavie π€https://github.com/answerdotai/byaldiLink to notebook: https://github.com/merveenoyan/smol-vision/blob/main/ColPali_%2B_Qwen2_VL.ipynb π₯ 23 23 π 10 10 β€οΈ 4 4 + Reply
microsoft/Phi-3.5-vision-instruct Image-Text-to-Text β’ 4B β’ Updated Dec 10, 2025 β’ 760k β’ 724