NLKI: A lightweight Natural Language Knowledge Integration Framework for Improving Small VLMs in Commonsense VQA Tasks
Paper
•
2508.19724
•
Published
ColBERT (Contextualized Late Interaction over BERT) is a retrieval model that scores queries vs. passages using fine-grained token-level interactions (“late interaction”). This repo hosts a fine-tuned ColBERT checkpoint for neural information retrieval.
colbert-ir/colbertv1.9 colbert (with Hugging Face backbones) ℹ️ ColBERT encodes queries and passages into token-level embedding matrices and uses
MaxSimto compute relevance at search time. It typically outperforms single-vector embedding retrievers while remaining scalable.
colbert-ir/colbertv1.9.[qid, pid+, pid-]) using TSV queries.tsv and collection.tsv (IDs + text).NLKI: A lightweight Natural Language Knowledge Integration Framework for Improving Small VLMs in Commonsense VQA Tasksfrom colbert.infra import Run, RunConfig, ColBERTConfig
from colbert import Indexer, Searcher
from colbert.data import Queries
# 1) Index your collection (pid \t passage)
with Run().context(RunConfig(nranks=1, experiment="my-exp")):
cfg = ColBERTConfig(root="/path/to/experiments")
indexer = Indexer(checkpoint="dutta18/Colbert-Finetuned", config=cfg)
indexer.index(
name="my.index",
collection="/path/to/collection.tsv" # "pid \t passage text"
)
# 2) Search with queries (qid \t query)
with Run().context(RunConfig(nranks=1, experiment="my-exp")):
cfg = ColBERTConfig(root="/path/to/experiments")
searcher = Searcher(index="my.index", config=cfg)
queries = Queries("/path/to/queries.tsv") # "qid \t query text"
ranking = searcher.search_all(queries, k=20)
ranking.save("my.index.top20.tsv")
Base model
colbert-ir/colbertv1.9