George Yang
SEO: Optimize Space README for discoverability and traffic
dd750c0
metadata
title: GPU Memory Calculator
emoji: ๐ŸŽฎ
colorFrom: blue
colorTo: purple
sdk: docker
pinned: true
license: mit
tags:
  - llm
  - gpu
  - deep-learning
  - pytorch
  - training
  - inference
  - memory-calculator
  - deepspeed
  - megatron
  - fsdp
  - vllm
  - quantization
  - machine-learning
  - ai
  - tools

๐ŸŽฎ GPU Memory Calculator for LLM Training & Inference

Instantly calculate GPU memory requirements for training and running Large Language Models. Plan your infrastructure, avoid OOM errors, and optimize costs before you start.

GitHub Stars GitHub Issues License: MIT

๐Ÿš€ Why Use This Tool?

  • ๐Ÿ’ฐ Save Money - Know exactly what GPUs you need before spending thousands
  • โšก Avoid OOM - Validate your config fits in memory before training
  • ๐Ÿ“Š Compare Strategies - DeepSpeed vs Megatron vs FSDP at a glance
  • ๐ŸŽฏ Plan Infrastructure - From 7B to 175B+ parameter models
  • โš™๏ธ Export Configs - Generate working configs for your training framework

โœจ Features

Training Memory Calculation

Calculate memory for all major training frameworks:

  • PyTorch DDP - Baseline distributed training
  • DeepSpeed ZeRO (Stages 0-3) with CPU/NVMe offloading
  • Megatron-LM - Tensor + Pipeline parallelism
  • PyTorch FSDP - Fully sharded data parallel
  • Megatron + DeepSpeed - Hybrid approach

Inference Memory Estimation

Optimize your deployment with:

  • HuggingFace Transformers - Baseline inference
  • vLLM - PagedAttention optimization
  • TGI - Text Generation Inference
  • TensorRT-LLM - Maximum throughput
  • SGLang - RadixAttention caching

Smart Features

  • ๐ŸŽฏ Model Presets - LLaMA 2, GPT-3, Mixtral, GLM, Qwen, DeepSeek-MoE
  • ๐Ÿ“ฆ Export Configs - Accelerate, Lightning, Axolotl, DeepSpeed, YAML, JSON
  • ๐Ÿ”ข Batch Optimizer - Auto-find max batch size for your hardware
  • ๐ŸŒ Multi-Node - Calculate network overhead for distributed training
  • ๐Ÿ’พ KV Cache - Quantization options (INT4/INT8/FP8/None)

๐ŸŽฏ Supported Models

Model Parameters Use Case
LLaMA 2 7B, 13B, 70B General purpose
GPT-3 175B Large scale training
Mixtral 8x7B 47B Mixture of Experts
GLM-4 9B - 355B Chinese/English
Qwen MoE 2.7B Efficient inference
DeepSeek-MoE 16B sparse training

๐Ÿ“– How to Use

  1. Select a Model - Choose from presets or enter custom parameters
  2. Pick Your Engine - Training (DeepSpeed/Megatron/FSDP) or Inference (vLLM/TGI/SGLang)
  3. Configure - Adjust batch size, GPUs, precision, offloading
  4. Calculate - Get instant memory breakdown
  5. Export - Generate working configs for your framework

๐Ÿ’ก Example Use Cases

  • "Can I train a 7B model on 4x A100s?" โ†’ Calculate and find out
  • "What's the max batch size for DeepSpeed ZeRO-3?" โ†’ Batch optimizer tells you
  • "vLLM vs TGI - which uses less memory?" โ†’ Compare instantly
  • "How many GPUs for 175B with Megatron?" โ†’ Plan your cluster

๐Ÿ”— Links & Resources

๐Ÿ“š Technical Details

Built with:

  • FastAPI - High-performance web framework
  • Pydantic - Data validation and settings
  • Python 3.12 - Latest Python for maximum performance

Formulas verified against:

๐Ÿ“Š License

MIT License - Free for commercial and personal use.


Made with โค๏ธ by the AI community

GitHub stars