OpenGVLab
/

InternVL2-Llama3-76B

Image-Text-to-Text

feature-extraction

Model card Files Files and versions

cuierfei commited on Jul 29, 2024

Commit

b47d2e8

·

verified ·

1 Parent(s): 9de62f3

Upload folder using huggingface_hub

Files changed (1) hide show

README.md +53 -1

README.md CHANGED Viewed

@@ -479,7 +479,59 @@ SWIFT from ModelScope community has supported the fine-tuning (Image/Video) of I
 ### LMDeploy
-TODO
 ### vLLM

 ### LMDeploy
+#### Service
+To deploy InternVL2 as an API, please configure the chat template config first. Create the following JSON file `chat_template.json`.
+```json
+{
+    "model_name":"internvl-internlm2",
+    "meta_instruction":"我是书生·万象，英文名是InternVL，是由上海人工智能实验室及多家合作单位联合开发的多模态大语言模型。",
+    "stop_words":["<|im_start|>", "<|im_end|>"]
+}
+```
+LMDeploy's `api_server` enables models to be easily packed into services with a single command. The provided RESTful APIs are compatible with OpenAI's interfaces. Below are an example of service startup:
+> **⚠️ Warning**: Please make sure to install Flash Attention; otherwise, using `--tp` will cause errors.
+```shell
+CUDA_VISIBLE_DEVICES=0,1,2,3 lmdeploy serve api_server OpenGVLab/InternVL2-Llama3-76B --backend turbomind --server-port 23333 --chat-template chat_template.json --tp 4
+```
+To use the OpenAI-style interface, you need to install OpenAI:
+```shell
+pip install openai
+```
+Then, use the code below to make the API call:
+```python
+from openai import OpenAI
+client = OpenAI(api_key='YOUR_API_KEY', base_url='http://0.0.0.0:23333/v1')
+model_name = client.models.list().data[0].id
+response = client.chat.completions.create(
+    model=model_name,
+    messages=[{
+        'role':
+        'user',
+        'content': [{
+            'type': 'text',
+            'text': 'describe this image',
+        }, {
+            'type': 'image_url',
+            'image_url': {
+                'url':
+                'https://modelscope.oss-cn-beijing.aliyuncs.com/resource/tiger.jpeg',
+            },
+        }],
+    }],
+    temperature=0.8,
+    top_p=0.8)
+print(response)
+```
 ### vLLM