--- license: apache-2.0 tags: - biomedical - medical - mistral - fp8 - quantization - vllm - text-generation library_name: transformers --- # BioMistral-7B-FP8-Dynamic ## Overview **BioMistral-7B-FP8-Dynamic** is an **FP8 Dynamic–quantized** version of the **BioMistral-7B** model, designed for high-performance inference while maintaining strong quality on biomedical and medical NLP tasks. This model is primarily intended for deployment with **vLLM** on modern GPUs (Hopper / Ada architectures). --- ## Base Model - **Base model**: BioMistral-7B - **Architecture**: Mistral-style decoder-only Transformer - **Domain**: Biomedical / Medical Natural Language Processing --- ## Quantization - **Method**: FP8 Dynamic - **Scope**: Linear layers - **Objective**: Reduce VRAM usage and improve inference throughput ### Notes - The weights are **already quantized**. - Do **not** apply additional runtime quantization. --- ## Intended Use - Biomedical and medical text generation - Medical writing assistance - Summarization and analysis of scientific literature - Medical RAG pipelines (clinical notes, research papers) --- ## Deployment (vLLM) ### Recommended ```bash vllm serve ig1/BioMistral-7B-FP8-Dynamic \ --served-model-name biomistral-7b-fp8 \ --dtype auto