How to use nm-testing/Llama-3.3-70B-Instruct-QKV-Cache-FP8-Per-Tensor with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("nm-testing/Llama-3.3-70B-Instruct-QKV-Cache-FP8-Per-Tensor", dtype="auto")
The community tab is the place to discuss and collaborate with the HF community!