Instructions to use internlm/POLAR-7B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use internlm/POLAR-7B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="internlm/POLAR-7B", trust_remote_code=True)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("internlm/POLAR-7B", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
Any plan to open source the dataset?
Thanks for the great work!
May I know if there's plan to open source the datasets for pre-training or SFT? Thanks.
Thank you for your interest in our work!
The pre-training corpus for POLAR is extremely large (approximately 3.6T tokens), and it was derived from the InternLM pre-training corpus. Unfortunately, we currently have no plans to publicly release POLAR’s pre-training corpus.
However, I would like to emphasize that POLAR's pre-training data is relatively easy to obtain. You can use open-source corpora like Common Crawl (CC) along with publicly available LLMs like Qwen to perform large-scale inference sampling based on our methodology, forming positive and negative samples.
Thanks for the info.