ai4privacy/pii-masking-200k
Viewer • Updated • 209k • 3k • 121
How to use SoelMgd/bert-pii-detection with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("token-classification", model="SoelMgd/bert-pii-detection") # Load model directly
from transformers import AutoTokenizer, AutoModelForTokenClassification
tokenizer = AutoTokenizer.from_pretrained("SoelMgd/bert-pii-detection")
model = AutoModelForTokenClassification.from_pretrained("SoelMgd/bert-pii-detection")Fine-tuned DistilBERT model for Personal Identifiable Information (PII) detection and classification.
distilbert-base-uncasedThis model can detect 56 different types of PII entities including:
Personal Information:
Address Information:
Financial Information:
Identification:
Professional Information:
from transformers import AutoTokenizer, AutoModelForTokenClassification
from transformers import pipeline
# Load model and tokenizer
model_name = "SoelMgd/bert-pii-detection"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForTokenClassification.from_pretrained(model_name)
# Create NER pipeline
ner_pipeline = pipeline(
"ner",
model=model,
tokenizer=tokenizer,
aggregation_strategy="simple"
)
# Example usage
text = "Hi, my name is John Smith and my email is john.smith@company.com"
entities = ner_pipeline(text)
print(entities)
The model achieves high performance on PII detection tasks with good precision and recall across different entity types.
This model is designed for:
Base model
distilbert/distilbert-base-uncased