Bot or Not — Denoising Trajectory Detector

Logistic-regression classifier on top of denoising-trajectory features extracted with CLIP ViT-L/14 + Stable Diffusion v1.5. Reproduces the method from Liang et al., "Denoising Trajectory Biases for Zero-Shot AI-Generated Image Detection" (NeurIPS 2025).

How it works

For each input image:

  1. Encode to SD v1.5 latent space (VAE).
  2. Add DDPM noise at timesteps (50, 150, 300, 500, 800).
  3. Run one UNet denoising step per timestep with an empty-prompt embedding.
  4. Decode each denoised latent back to image space.
  5. Compute CLIP-cosine similarity between the original and each reconstruction.

This yields a 6-D feature vector [sim_mean, sim_t50, sim_t150, sim_t300, sim_t500, sim_t800], which a logistic regression (class_weight='balanced', solver='lbfgs') classifies as AI / Real.

Training data

  • AI images: 2,500 images generated by diffusion models (1024×1024).
  • Real images: 2,500 images sampled from COCO 2017 train2017.
  • 80/20 stratified split, random_state=42.

Test metrics

Held-out test set: 1,000 images (500 Real, 500 AI), random_state=42.

Metric Value
Accuracy 0.7940
ROC AUC 0.8679
F1 0.7876

Per-class breakdown:

Precision Recall F1 Support
Real 0.78 0.82 0.80 500
AI 0.81 0.76 0.79 500

Confusion matrix (rows = true, cols = predicted):

Pred Real Pred AI
True Real 412 88
True AI 118 382

Usage

from huggingface_hub import hf_hub_download
import joblib, json
# Or use the bundled inference module:
# from inference import BotOrNotDetector
# detector = BotOrNotDetector.from_pretrained("bezand/BoN1")
# detector.predict("image.jpg")

A CUDA GPU is required for practical inference (~30s/image on a T4; CPU inference is impractical because each prediction runs five SD denoising steps).

Files

  • classifier.joblib — trained sklearn.linear_model.LogisticRegression.
  • scaler.joblibStandardScaler fit on training features.
  • config.json — feature-extractor config (timesteps, CLIP and SD model IDs).
  • inference.py, feature_extractor.py — inference wrappers.

Limitations and biases

  • Trained on a single AI-image source at fixed 1024×1024 resolution. Real images (COCO) vary in size and content, which may bias the classifier toward resolution/aspect-ratio cues rather than denoising-trajectory artefacts.
  • Single-step denoising with an empty prompt; full multi-step trajectories may give cleaner signal but were not used in training.
  • Only tested against SD-family generators. Performance on other generators (Midjourney, FLUX, autoregressive models) is unknown.

License

The trained classifier weights and StandardScaler are released under CC-BY-NC-4.0. Inference also requires Stable Diffusion v1.5 (CreativeML Open RAIL-M) and CLIP ViT-L/14, each governed by its own license.

Citation

@inproceedings{liang2025denoising,
  title  = {Denoising Trajectory Biases for Zero-Shot AI-Generated Image Detection},
  author = {Liang et al.},
  booktitle = {NeurIPS},
  year   = {2025}
}
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support