Instructions to use webbigdata/ALMA-7B-Ja with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use webbigdata/ALMA-7B-Ja with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="webbigdata/ALMA-7B-Ja")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("webbigdata/ALMA-7B-Ja") model = AutoModelForCausalLM.from_pretrained("webbigdata/ALMA-7B-Ja") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use webbigdata/ALMA-7B-Ja with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "webbigdata/ALMA-7B-Ja" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "webbigdata/ALMA-7B-Ja", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/webbigdata/ALMA-7B-Ja
- SGLang
How to use webbigdata/ALMA-7B-Ja with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "webbigdata/ALMA-7B-Ja" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "webbigdata/ALMA-7B-Ja", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "webbigdata/ALMA-7B-Ja" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "webbigdata/ALMA-7B-Ja", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use webbigdata/ALMA-7B-Ja with Docker Model Runner:
docker model run hf.co/webbigdata/ALMA-7B-Ja
| inference: false | |
| language: | |
| - ja | |
| - en | |
| - de | |
| - is | |
| - zh | |
| - cs | |
| # New Version has been released. | |
| 2024/03/04 | |
| [webbigdata/C3TR-Adapter](https://huggingface.co/webbigdata/C3TR-Adapter) | |
| Memory GPU requirement has increased to 8.1 GB. However, it is possible to run it with the free version of Colab and the performance is much improved! | |
| 2023/10/21 | |
| [ALMA-7B-Ja-V2](https://huggingface.co/webbigdata/ALMA-7B-Ja-V2) | |
| Overall performance has been raised. | |
| Below is a description of the old version. We urge you to try the newer version above. | |
| # webbigdata/ALMA-7B-Ja | |
| ALMA-7B-Ja(13.3GB) is a machine translation model that uses ALMA's learning method to translate Japanese to English. | |
| The [original ALMA-7B (26.95GB)](https://huggingface.co/haoranxu/ALMA-7B) supports English and Russian(ru) translation. This model supports Japanese(ja) and English translations instead of Russian. | |
| Like the original model, This model has been verified that it also has a translation ability between the following languages, but if you want the translation function for these languages, it is better to use the original [ALMA-13B model](https://huggingface.co/haoranxu/ALMA-13B). | |
| - German(de) and English(en) | |
| - Chinese(zh) and English(en) | |
| - Icelandic(is) and English(en) | |
| - Czech(cs) and English(en) | |
| Translating from English (en→xx) BLEU/COMET | |
| Models | de | cs | is | zh | ru/jp | Avg. | | |
| |----------------|--------|--------|--------|--------|--------|--------| | |
| NLLB-54B | 34.50/86.45 | 37.60/90.15 | 24.15/81.76 | 27.38/78.91 | 30.96/87.92 | 30.92/85.04 | | |
| GPT-3.5-D | 31.80/85.61 | 31.30/88.57 | 15.90/76.28 | 38.30/85.76 | 27.50/86.74 | 28.96/84.59 | | |
| ALMA-7B(Original)| 30.31/85.59 | 29.88/89.10 | 25.71/85.52 | 36.87/85.11 | 27.13/86.98 | 29.89/86.49 | | |
| ALMA-7B-Ja(Ours) | 23.70/82.04 | 18.58/81.36 | 12.20/71.59 | 29.06/82.45 | 14.82/85.40 | 19.67/80.57 | | |
| Translating to English (xx→en) BLEU/COMET | |
| Models | de | cs | is | zh | ru/jp | Avg. | | |
| |----------------|--------|--------|--------|--------|--------|--------| | |
| NLLB-54B | 26.89/78.94 | 39.11/80.13 | 23.09/71.66 | 16.56/70.70 | 39.11/81.88 | 28.95/76.66 | | |
| GPT-3.5-D | 30.90/84.79 | 44.50/86.16 | 31.90/82.13 | 25.00/81.62 | 38.50/84.80 | 34.16/83.90 | | |
| ALMA-7B(Original)| 30.26/84.00 | 43.91/85.86 | 35.97/86.03 | 23.75/79.85 | 39.37/84.58 | 34.55/84.02 | | |
| ALMA-7B-Ja(Ours) | 26.41/83.13 | 34.39/83.50 | 24.77/81.12 | 20.60/78.54 | 15.57/78.61 | 24.35/81.76 | | |
| [Sample Code For Free Colab](https://github.com/webbigdata-jp/python_sample/blob/main/ALMA_7B_Ja_Free_Colab_sample.ipynb) | |
| ## Other Version | |
| ### webbigdata-ALMA-7B-Ja-gguf | |
| mmnga made llama.cpp(gguf) version [webbigdata-ALMA-7B-Ja-gguf](https://huggingface.co/mmnga/webbigdata-ALMA-7B-Ja-gguf). Thank you! | |
| llama.cpp is a tool used primarily on Macs, and gguf is its latest version format. It can be used without gpu. | |
| [ALMA-7B-Ja-gguf Free Colab sample](https://github.com/webbigdata-jp/python_sample/blob/main/ALMA_7B_Ja_gguf_Free_Colab_sample.ipynb) | |
| ### ALMA-7B-Ja-GPTQ-Ja-En | |
| GPTQ is quantized(reduce the size of the model) method and ALMA-7B-Ja-GPTQ has GPTQ quantized version that reduces model size(3.9GB) and memory usage. | |
| But the performance is probably lower. And translation ability for languages other than Japanese and English has deteriorated significantly. | |
| [Sample Code For Free Colab webbigdata/ALMA-7B-Ja-GPTQ-Ja-En](https://huggingface.co/webbigdata/ALMA-7B-Ja-GPTQ-Ja-En) | |
| If you want to translate the entire file at once, try Colab below. | |
| [ALMA_7B_Ja_GPTQ_Ja_En_batch_translation_sample](https://github.com/webbigdata-jp/python_sample/blob/main/ALMA_7B_Ja_GPTQ_Ja_En_batch_translation_sample.ipynb) | |
| **ALMA** (**A**dvanced **L**anguage **M**odel-based tr**A**nslator) is an LLM-based translation model, which adopts a new translation model paradigm: it begins with fine-tuning on monolingual data and is further optimized using high-quality parallel data. This two-step fine-tuning process ensures strong translation performance. | |
| Please find more details in their [paper](https://arxiv.org/abs/2309.11674). | |
| ``` | |
| @misc{xu2023paradigm, | |
| title={A Paradigm Shift in Machine Translation: Boosting Translation Performance of Large Language Models}, | |
| author={Haoran Xu and Young Jin Kim and Amr Sharaf and Hany Hassan Awadalla}, | |
| year={2023}, | |
| eprint={2309.11674}, | |
| archivePrefix={arXiv}, | |
| primaryClass={cs.CL} | |
| } | |
| ``` | |
| ## about this work | |
| - **This work was done by :** [webbigdata](https://webbigdata.jp/). |