Instructions to use google/gemma-3-4b-it with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use google/gemma-3-4b-it with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="google/gemma-3-4b-it")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("google/gemma-3-4b-it", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use google/gemma-3-4b-it with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "google/gemma-3-4b-it" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "google/gemma-3-4b-it", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/google/gemma-3-4b-it
- SGLang
How to use google/gemma-3-4b-it with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "google/gemma-3-4b-it" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "google/gemma-3-4b-it", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "google/gemma-3-4b-it" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "google/gemma-3-4b-it", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use google/gemma-3-4b-it with Docker Model Runner:
docker model run hf.co/google/gemma-3-4b-it
AttributeError: 'Gemma3Config' object has no attribute 'vocab_size'
*is changed folder name
(venv) E:*>python message.py
tokenizer_config.json: 100%|███████████████████████████████████████████████████████| 1.16M/1.16M [00:01<00:00, 592kB/s]
tokenizer.model: 100%|████████████████████████████████████████████████████████████| 4.69M/4.69M [00:00<00:00, 16.9MB/s]
tokenizer.json: 100%|█████████████████████████████████████████████████████████████| 33.4M/33.4M [00:01<00:00, 28.9MB/s]
added_tokens.json: 100%|████████████████████████████████████████████████████████████████████| 35.0/35.0 [00:00<?, ?B/s]
special_tokens_map.json: 100%|████████████████████████████████████████████████████████████████| 662/662 [00:00<?, ?B/s]
config.json: 100%|████████████████████████████████████████████████████████████████████████████| 855/855 [00:00<?, ?B/s]
model.safetensors.index.json: 100%|████████████████████████████████████████████████| 90.6k/90.6k [00:00<00:00, 879kB/s]
model-00001-of-00002.safetensors: 100%|███████████████████████████████████████████| 4.96G/4.96G [02:21<00:00, 35.0MB/s]
model-00002-of-00002.safetensors: 100%|███████████████████████████████████████████| 3.64G/3.64G [01:39<00:00, 36.7MB/s]
Downloading shards: 100%|███████████████████████████████████████████████████████████████| 2/2 [04:01<00:00, 120.83s/it]
Traceback (most recent call last):
File "E:*\message.py", line 5, in
model = AutoModelForCausalLM.from_pretrained(model_path)
File "C:\Projects*\venv\lib\site-packages\transformers\models\auto\auto_factory.py", line 564, in from_pretrained
return model_class.from_pretrained(
File "C:\Projects*\venv\lib\site-packages\transformers\modeling_utils.py", line 273, in _wrapper
return func(*args, **kwargs)
File "C:\Projects*\venv\lib\site-packages\transformers\modeling_utils.py", line 4313, in from_pretrained
model = cls(config, *model_args, **model_kwargs)
File "C:\Projects*\venv\lib\site-packages\transformers\models\gemma3\modeling_gemma3.py", line 889, in init
self.model = Gemma3TextModel(config)
File "C:\Projects*\venv\lib\site-packages\transformers\models\gemma3\modeling_gemma3.py", line 622, in init
self.vocab_size = config.vocab_size
File "C:\Projects*\venv\lib\site-packages\transformers\configuration_utils.py", line 214, in getattribute
return super().getattribute(key)
AttributeError: 'Gemma3Config' object has no attribute 'vocab_size'
Another butchered release by Google 🤦.
They forgot to have vocab_size parameter in config.json for 4b, 12b, 27b models, but have it for 1b
is there a fix for this?
After more debugging
1B model is Gemma3ForCausalLM, but 4B+ are Gemma3ForConditionalGeneration. So for 4B+ models text_only code/pipelines don't work as is. You need to use code similar to example in ModelCard
Another butchered release by Google 🤦.
They forgot to have vocab_size parameter in config.json for 4b, 12b, 27b models, but have it for 1b
The it versions is fine,no problem in training progress (exception in save).Problem in pt version,only 1b is right (config.json)
Should use AutoModelForImageTextToText to load model with in transformer v4.50.0.dev0,which can be installed via git: "pip install git+https://github.com/huggingface/transformers@v4.49.0-Gemma-3".
ps: install with that 4.49 branch will result in v4.50 installed
Should use AutoModelForImageTextToText to load model with in transformer v4.50.0.dev0,which can be installed via git: "pip install git+https://github.com/huggingface/transformers@v4.49.0-Gemma-3".
ps: install with that 4.49 branch will result in v4.50 installed
I have this library installed from the 4.49.0-Gemma-3 branch
Should use AutoModelForImageTextToText to load model with in transformer v4.50.0.dev0,which can be installed via git: "pip install git+https://github.com/huggingface/transformers@v4.49.0-Gemma-3".
ps: install with that 4.49 branch will result in v4.50 installedI have this library installed from the 4.49.0-Gemma-3 branch
Maybe try load model with AutoModelForImageTextToText,that works for me @MinecAnton209
Apologies for the delayed response. We can confirm that this issue has been addressed and resolved in Transformers version 4.53.0. Please try again by installing the latest transformers version (4.53.0) using
!pip install -U transformers
and load the gemma-3-4b-it model using following code-
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("google/gemma-3-4b-it")
Please let us know if this resolves the issue or if you continue to experience the same problem. Thank you.
Used the latest transformers version but still encounter vocab size issue. Am trying to deploy the 27b-it