wikimedia/wikipedia
Viewer β’ Updated β’ 61.6M β’ 266k β’ 1.23k
A geometric deep learning system for analyzing and interpreting the Voynich Manuscript using KSimplex similarity assessment trained on Latin Wikipedia.
This system combines:
Input Text
β
ββββΊ SBERT (all-MiniLM-L6-v2) βββΊ 384-dim
β β
ββββΊ Char TF-IDF (3-5 grams) βββΊ 30k-dim
β β
βΌ βΌ
βββββββββββββββββββββββββββββββββββββββββββββββ
β KSimplex Similarity Assessor β
βββββββββββββββββββββββββββββββββββββββββββββββ€
β SBERT Projection βββΊ 256-dim β
β TF-IDF Projection βββΊ 256-dim β
β β β
β βΌ β
β Fusion Layer βββΊ 256-dim β
β β β
β βΌ β
β SimplexSimilarityLayer Γ 3 (k=4) β
β βββββββββββββββββββββββββββββββ β
β β Route Projection (β4 edges)β β
β β Edge Transforms (4ΓLinear) β β
β β Weighted Sum + LayerNorm β β
β βββββββββββββββββββββββββββββββ β
β β β
β βΌ β
β Similarity Head βββΊ 128-dim β
β (L2 normalized) β
βββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
128-dim Similarity Embedding
| Section | Folios | Character | Style Group |
|---|---|---|---|
| Herbal A | f1-f57 | Dense prose, plant descriptions | A |
| Herbal B | f58-f66 | Variant herbal style | A |
| Astronomical | f67-f73 | Zodiac, celestial diagrams | A |
| Biological | f75-f84 | Nymph figures, labels | B |
| Cosmological | f85-f86 | Rosette foldouts | C |
| Pharmaceutical | f87-f102 | Recipe format (p...am) | C |
| Recipes | f103-f116 | Cross-references, star labels | B |
Structural Markers (Greek-derived):
p = Recipe/paragraph start (Ο)m, g = Line-end markers (ΞΌ, Ξ³)s, l, o = Label markers (Ο, Ξ», ΞΏ)-am, -dam, -ram = Recipe terminators (measurement)Morphological System:
(PREFIX) + STEM + (SUFFIX) + (n)
Prefixes: qok- (the-), ok- (this-), ot- (other-), da- (of-)
Suffixes: -dy (matter), -ey (type), -in (of), -ol (liquid), -ar (part)
Bound 'n': Attaches to -ai- stems (daiin, qokaiin, okaiin)
Section Similarity Matrix:
Herbal_A Astro Bio Cosmo Pharma Recipe
Herbal A 1.00 0.99 0.93 0.77 0.77 0.88
Astronomical 0.99 1.00 0.96 0.83 0.82 0.92
Biological 0.93 0.96 1.00 0.94 0.94 0.99
Cosmological 0.77 0.83 0.94 1.00 0.98 0.97
Pharmaceutical 0.77 0.82 0.94 0.98 1.00 0.95
Recipes 0.88 0.92 0.99 0.97 0.95 1.00
Three Style Groups:
pip install torch sentence-transformers scikit-learn datasets
from voynich_translator import VoynichTranslator
translator = VoynichTranslator()
# Translate text
result = translator.translate("daiin chedy qokeey shedy chol daiin")
print(result['english']) # "the herb bloom leaf stem the"
print(result['section']) # "Herbal A"
print(result['confidence']) # 1.0
# Translate with verbose analysis
result = translator.translate("p ol shy am", verbose=True)
# Returns word-by-word analysis and similar Latin passages
# Translate entire folio
folio = translator.translate_folio('f75r')
print(folio['full_english'])
# Find similar passages
similar = translator.find_similar_voynich("chedy qokeey")
latin = translator.find_similar_latin("chedy qokeey")
# Requires: Latin Wikipedia reload for TF-IDF vocabulary
# See standalone cell in repository for complete setup
from datasets import load_dataset
# 1. Load Latin corpus (same as training)
ds = load_dataset("wikimedia/wikipedia", "20231101.la", split="train", streaming=True)
# ... build windows, fit TF-IDF
# 2. Transform Voynich using Latin vectorizer
X_voy_tfidf = vec_lat.transform(voynich_texts)
# 3. Encode through KSimplex model
emb, _ = model(sbert_emb, tfidf_emb)
Core vocabulary mappings based on frequency and morphological analysis:
| Voynich | English | Category |
|---|---|---|
| daiin | the | Determiner |
| aiin | this | Determiner |
| qokaiin | the-said | Determiner |
| chedy | herb | Plant |
| shedy | leaf | Plant |
| qokeedy | blossom | Plant |
| chol | stem | Plant |
| ol | oil | Preparation |
| ar | root | Plant part |
| or | seed | Plant part |
| p | ΒΆ (recipe start) | Marker |
| am | β (measure end) | Marker |
voynich-ksimplex-translator/
βββ README.md # This file
βββ voynich_translator.py # Complete standalone translator
βββ ksimplex_model.py # Model architecture
βββ ksimplex_similarity_model.pt # Trained weights
βββ similarity_embeddings.npz # Pre-computed embeddings
β βββ voynich_emb # (N_voy, 128) Voynich embeddings
β βββ voynich_labels # Cluster assignments
β βββ latin_emb # (N_lat, 128) Latin embeddings
β βββ latin_labels # Latin bucket assignments
βββ voynich_analysis_results.json # Statistical analysis
β οΈ This is interpretive translation, not decipherment.
The Voynich cipher has not been broken. This system provides:
The lexicon is based on:
@software{voynich_ksimplex_2026,
title={Voynich KSimplex Translator: Geometric Deep Learning for Manuscript Analysis},
author={AbstractPhil},
year={2026},
url={https://huggingface.co/AbstractPhil/sbert-voynich-translation}
}
MIT License - See LICENSE file for details.
"The Voynich appears to be a practical document (recipes, medical prescriptions) using Greek-derived notation for structure, with verbose cipher encoding the content, and a cross-reference system linking sections."