Bot or Not — Denoising Trajectory Detector

Logistic-regression classifier on top of denoising-trajectory features extracted with CLIP ViT-L/14 + Stable Diffusion v1.5. Reproduces the method from Liang et al., "Denoising Trajectory Biases for Zero-Shot AI-Generated Image Detection" (NeurIPS 2025).

How it works

For each input image:

Encode to SD v1.5 latent space (VAE).
Add DDPM noise at timesteps (50, 150, 300, 500, 800).
Run one UNet denoising step per timestep with an empty-prompt embedding.
Decode each denoised latent back to image space.
Compute CLIP-cosine similarity between the original and each reconstruction.

This yields a 6-D feature vector [sim_mean, sim_t50, sim_t150, sim_t300, sim_t500, sim_t800], which a logistic regression (class_weight='balanced', solver='lbfgs') classifies as AI / Real.

Training data

AI images: 2,500 images generated by diffusion models (1024×1024).
Real images: 2,500 images sampled from COCO 2017 train2017.
80/20 stratified split, random_state=42.

Test metrics

Held-out test set: 1,000 images (500 Real, 500 AI), random_state=42.

Metric	Value
Accuracy	0.7940
ROC AUC	0.8679
F1	0.7876

Per-class breakdown:

	Precision	Recall	F1	Support
Real	0.78	0.82	0.80	500
AI	0.81	0.76	0.79	500

Confusion matrix (rows = true, cols = predicted):

	Pred Real	Pred AI
True Real	412	88
True AI	118	382

Usage

from huggingface_hub import hf_hub_download
import joblib, json
# Or use the bundled inference module:
# from inference import BotOrNotDetector
# detector = BotOrNotDetector.from_pretrained("bezand/BoN1")
# detector.predict("image.jpg")

A CUDA GPU is required for practical inference (~30s/image on a T4; CPU inference is impractical because each prediction runs five SD denoising steps).

Files

classifier.joblib — trained sklearn.linear_model.LogisticRegression.
scaler.joblib — StandardScaler fit on training features.
config.json — feature-extractor config (timesteps, CLIP and SD model IDs).
inference.py, feature_extractor.py — inference wrappers.

Limitations and biases

Trained on a single AI-image source at fixed 1024×1024 resolution. Real images (COCO) vary in size and content, which may bias the classifier toward resolution/aspect-ratio cues rather than denoising-trajectory artefacts.
Single-step denoising with an empty prompt; full multi-step trajectories may give cleaner signal but were not used in training.
Only tested against SD-family generators. Performance on other generators (Midjourney, FLUX, autoregressive models) is unknown.

License

The trained classifier weights and StandardScaler are released under CC-BY-NC-4.0. Inference also requires Stable Diffusion v1.5 (CreativeML Open RAIL-M) and CLIP ViT-L/14, each governed by its own license.

Citation

@inproceedings{liang2025denoising,
  title  = {Denoising Trajectory Biases for Zero-Shot AI-Generated Image Detection},
  author = {Liang et al.},
  booktitle = {NeurIPS},
  year   = {2025}
}

Downloads last month: -