Chess BPE Tokenizer
A BPE tokenizer trained on chess moves using rustbpe with tiktoken inference.
Installation
pip install rustbpe tiktoken datasets huggingface_hub
Quick Start
Load from HuggingFace & Inference
from chess_tokenizer import load_tiktoken
enc = load_tiktoken("ItsMaxNorm/chess-bpe-tokenizer")
# Encode chess moves
ids = enc.encode("w.βg1βf3.. b.βc7βc5.. w.βd2βd4..")
print(ids) # [token_ids...]
# Decode back
text = enc.decode(ids)
print(text) # "w.βg1βf3.. b.βc7βc5.. w.βd2βd4.."
Or simply load using tiktoken
config = json.load(open(hf_hub_download("ItsMaxNorm/bpess", "config.json")))
vocab = json.load(open(hf_hub_download("ItsMaxNorm/bpess", "vocab.json")))
return tiktoken.Encoding(
name="chess", pat_str=config["pattern"],
mergeable_ranks={k.encode('utf-8', errors='replace'): v for k, v in vocab.items()},
special_tokens={}
)
Train Your Own
from chess_tokenizer import train, upload
# Train on chess dataset
tok = train(vocab_size=4096, split="train[0:10000]")
# Upload to HuggingFace
upload(tok, "YOUR_USERNAME/chess-bpe-tokenizer")
Full Pipeline
python chess_tokenizer.py
Move Format
The tokenizer is trained on custom chess notation:
| Move | Meaning |
|---|---|
w.βg1βf3.. |
White knight g1 to f3 |
b.βc7βc5.. |
Black pawn c7 to c5 |
b.βc5βd4.x. |
Black pawn captures on d4 |
w.βe1βg1βh1βf1.. |
White kingside castle |
b.βd7βd5..+ |
Black queen to d5 with check |
Piece Symbols
| White | Black | Piece |
|---|---|---|
| β | β | King |
| β | β | Queen |
| β | β | Rook |
| β | β | Bishop |
| β | β | Knight |
| β | β | Pawn |
API
| Function | Description |
|---|---|
train(vocab_size, split) |
Train BPE on angeluriot/chess_games |
save(tok, path) |
Save vocab.json + config.json |
upload(tok, repo_id) |
Push to HuggingFace Hub |
load_tiktoken(repo_id) |
Load as tiktoken Encoding |
License
MIT
- Downloads last month
- 62
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support