Spaces:
Running
Running
metadata
title: GPU Memory Calculator
emoji: ๐ฎ
colorFrom: blue
colorTo: purple
sdk: docker
pinned: true
license: mit
tags:
- llm
- gpu
- deep-learning
- pytorch
- training
- inference
- memory-calculator
- deepspeed
- megatron
- fsdp
- vllm
- quantization
- machine-learning
- ai
- tools
๐ฎ GPU Memory Calculator for LLM Training & Inference
Instantly calculate GPU memory requirements for training and running Large Language Models. Plan your infrastructure, avoid OOM errors, and optimize costs before you start.
๐ Why Use This Tool?
- ๐ฐ Save Money - Know exactly what GPUs you need before spending thousands
- โก Avoid OOM - Validate your config fits in memory before training
- ๐ Compare Strategies - DeepSpeed vs Megatron vs FSDP at a glance
- ๐ฏ Plan Infrastructure - From 7B to 175B+ parameter models
- โ๏ธ Export Configs - Generate working configs for your training framework
โจ Features
Training Memory Calculation
Calculate memory for all major training frameworks:
- PyTorch DDP - Baseline distributed training
- DeepSpeed ZeRO (Stages 0-3) with CPU/NVMe offloading
- Megatron-LM - Tensor + Pipeline parallelism
- PyTorch FSDP - Fully sharded data parallel
- Megatron + DeepSpeed - Hybrid approach
Inference Memory Estimation
Optimize your deployment with:
- HuggingFace Transformers - Baseline inference
- vLLM - PagedAttention optimization
- TGI - Text Generation Inference
- TensorRT-LLM - Maximum throughput
- SGLang - RadixAttention caching
Smart Features
- ๐ฏ Model Presets - LLaMA 2, GPT-3, Mixtral, GLM, Qwen, DeepSeek-MoE
- ๐ฆ Export Configs - Accelerate, Lightning, Axolotl, DeepSpeed, YAML, JSON
- ๐ข Batch Optimizer - Auto-find max batch size for your hardware
- ๐ Multi-Node - Calculate network overhead for distributed training
- ๐พ KV Cache - Quantization options (INT4/INT8/FP8/None)
๐ฏ Supported Models
| Model | Parameters | Use Case |
|---|---|---|
| LLaMA 2 | 7B, 13B, 70B | General purpose |
| GPT-3 | 175B | Large scale training |
| Mixtral 8x7B | 47B | Mixture of Experts |
| GLM-4 | 9B - 355B | Chinese/English |
| Qwen MoE | 2.7B | Efficient inference |
| DeepSeek-MoE | 16B | sparse training |
๐ How to Use
- Select a Model - Choose from presets or enter custom parameters
- Pick Your Engine - Training (DeepSpeed/Megatron/FSDP) or Inference (vLLM/TGI/SGLang)
- Configure - Adjust batch size, GPUs, precision, offloading
- Calculate - Get instant memory breakdown
- Export - Generate working configs for your framework
๐ก Example Use Cases
- "Can I train a 7B model on 4x A100s?" โ Calculate and find out
- "What's the max batch size for DeepSpeed ZeRO-3?" โ Batch optimizer tells you
- "vLLM vs TGI - which uses less memory?" โ Compare instantly
- "How many GPUs for 175B with Megatron?" โ Plan your cluster
๐ Links & Resources
- GitHub Repository - Star us on GitHub! โญ
- Full Documentation - Complete guide
- Report Issues - Bug reports & feature requests
- Contributing Guide - Pull requests welcome!
๐ Technical Details
Built with:
- FastAPI - High-performance web framework
- Pydantic - Data validation and settings
- Python 3.12 - Latest Python for maximum performance
Formulas verified against:
๐ License
MIT License - Free for commercial and personal use.
Made with โค๏ธ by the AI community