GeoLIP Spectral Encoder — Test Manifest

Geometric Primitives for Constellation-Anchored Classification

Target: CIFAR-10 (baseline), then generalize Constraint: Zero or minimal learned encoder params. All learning in constellation anchors, patchwork, classifier. Metric: Val accuracy, CV convergence, anchor activation, InfoNCE lock, train/val gap Baseline to beat: 88.0% (conv encoder + SquaredReLU + full trainer, 1.6M params) Current best spectral: 46.8% (STFT + Cholesky + SVD, v4, 137K params, CE-only carry)


STATUS KEY

  • [ ] — Not started
  • [R] — Running
  • [X] — Completed
  • [F] — Failed (with reason)
  • [S] — Skipped (with reason)
  • [P] — Partially completed

COMPLETED EXPERIMENTS (prior sessions + this session)

Conv Encoder Baselines (Form 1 Core)

  • Linear baseline, 100 epochs → 67.0%, 422K params, overfits at E31
  • MLP baseline, 100 epochs → 65.0%, 687K params, overfits at E10
  • Core CE-only, 100 epochs → 63.4%, 820K params, CV=0.70, never converges
  • Core CE+CV, 100 epochs → 62.7%, 820K params, CV=0.61, worse than CE-only
  • Core 32 anchors, interrupted E20 → 59.2%, 1.8M params, slow convergence
  • Full trainer GELU, 100 epochs → 88.0%, 1.6M params (original proven result)
  • Full trainer SquaredReLU, 100 epochs → 88.0%, 1.6M params, E96 best

Spectral Encoder Experiments

  • [F] Spectral v1: flat FFT → 768-d → single constellation → collapsed
    • Cause: concat norm √48≈6.93 vs anchor norm 1, not on same sphere
  • [F] Spectral v2: per-band constellation (48×64=3072 anchors) → ~35%
    • Cause: 3072 tri dims too diffuse, InfoNCE dead at 0.45, no cross-band structure
  • [F] Spectral v3: FFT → 8 channels (spherical mean) → 128 anchors → 27%
    • Cause: cos≈0.99, spherical mean collapsed all images to same point
  • [P] Spectral v4: STFT + Cholesky + SVD → S^43 → 64 anchors → 46.8% (still running)
    • CE carrying alone, CosineEmbeddingLoss frozen at 0.346, InfoNCE dead at 0.15
    • Cholesky+SVD signature IS discriminative, contrastive losses unable to contribute

CATEGORY 1: SIGNAL DECOMPOSITION TO GEOMETRY

1.1 Wavelet Scattering Transform (Mallat)

Formula: S_J[p]x(u) = |||x * ψ_{λ₁}| * ψ_{λ₂}| ... | * φ_{2^J}(u) Library: kymatio (pip install kymatio) Github: https://github.com/kymatio/kymatio Expected output: ~10K-dim feature vector for 32×32 Literature baseline: ~82% CIFAR-10 with SVM, ~70.5% with linear Properties: Deterministic, Lipschitz-continuous, approximately energy-preserving

  • 1.1a Scattering order 2, J=2, L=8 → L2 normalize → flat constellation on S^d
    • Hypothesis: scattering features are rich enough that flat constellation should work
    • Compare: direct linear classifier on scattering vs constellation pipeline
  • 1.1b Scattering → JL projection to S^127 → constellation (64 anchors)
    • JL preserves distances; S^127 matches our proven dim
  • 1.1c Scattering → JL → S^43 → Cholesky/SVD signature → constellation
    • Stack v4's geometric signature on top of scattering features
  • 1.1d Scattering order 1 vs order 2 ablation
    • Order 1 is ~Gabor magnitude; order 2 adds inter-frequency structure
  • 1.1e Scattering + InfoNCE: does augmentation invariance help or hurt?
    • Scattering is already translation-invariant; InfoNCE may be redundant
  • 1.1f Scattering hybrid: scattering front-end + lightweight learned projection + constellation
    • Test minimal learned params needed to bridge the 82→88% gap

1.2 Gabor Filter Banks

Formula: g(x,y) = exp(−(x'²+γ²y'²)/(2σ²)) · exp(i(2πx'/λ+ψ)) Expected: S scales × K orientations → S×K magnitude responses Properties: Deterministic, O(N·S·K), first-order scattering ≈ Gabor modulus

  • 1.2a Gabor bank (4 scales × 8 orientations = 32 filters) → L2 norm → S^31
    • Each filter response is a spatial map; pool to scalar per filter
  • 1.2b Gabor → per-filter spatial statistics (mean, std, skew, kurtosis) → S^127
    • 32 filters × 4 stats = 128-d, matches conv encoder output dim
  • 1.2c Gabor vs scattering order 1 A/B test
    • Validate that scattering order 1 ≈ Gabor + modulus

1.3 Radon Transform

Formula: Rf(ω,t) = ∫ f(x) δ(x·ω − t) dx Properties: Deterministic, exactly invertible via filtered back-projection

  • 1.3a Radon at K angles → sinogram → L2 norm per angle → K points on S^d
    • K angles = K geometric addresses, constellation measures the cloud
  • 1.3b Radon → 1D wavelet per projection (= ridgelet) → aggregate to S^d
    • Composition: Radon → Ridgelet, captures linear singularities

1.4 Curvelet Transform

Formula: c_{j,l,k} = ⟨f, φ_{j,l,k}⟩, parabolic scaling: width ≈ length² Properties: Deterministic, exactly invertible (tight frame), O(N² log N)

  • 1.4a Curvelet energy per (scale, orientation) band → L2 norm → S^d
    • Captures directional frequency that scattering misses
  • 1.4b Curvelet + scattering concatenation → JL → constellation
    • Test complementarity of isotropic (scattering) + anisotropic (curvelet) features

1.5 Persistent Homology (TDA)

Formula: Track birth/death of β₀ (components), β₁ (loops) across filtration Library: giotto-tda or ripser Properties: Deterministic, O(n³), captures topology no other transform sees

  • 1.5a Sublevel set filtration on grayscale → persistence image → L2 norm → S^d
  • 1.5b PH on scattering feature maps (topology of the representation)
    • Captures whether scattering features form clusters, loops, voids
  • 1.5c PH Betti curve as additional channel in multi-signature pipeline
  • 1.5d PH standalone classification baseline on CIFAR-10
    • Literature suggests ~60-70% standalone; valuable as complementary signal

1.6 STFT Variants (improving v4)

  • 1.6a 2D STFT via patch-wise FFT (overlapping patches) instead of row/col STFT
    • True spatial-frequency decomposition vs row+col approximation
  • 1.6b STFT with larger n_fft=32 (current: 16) → more frequency resolution
  • 1.6c STFT preserving phase (not just magnitude) via analytic signal
    • Phase encodes spatial structure; current pipeline discards it
  • 1.6d Multi-window STFT (different window sizes for different frequency ranges)

CATEGORY 2: MANIFOLD STRUCTURES

2.1 Hopf Fibration

Formula: h(z₁,z₂) = (2z̄₁z₂, |z₁|²−|z₂|²) : S³ → S² Properties: Deterministic, O(1), hierarchical (base + fiber)

  • 2.1a Encode 4-d feature vectors on S³ → Hopf project to S² + fiber coordinate
    • Coarse triangulation on S², fine discrimination in fiber
  • 2.1b Quaternionic Hopf S⁷ → S⁴ for 8-d features
    • Natural for 8-channel spectral decomposition (v3/v4 channel count)
  • 2.1c Hopf foliation spherical codes for anchor initialization
    • Replace uniform_hypersphere_init with Hopf-structured codes
  • 2.1d Hierarchical constellation: coarse anchors on base S², fine anchors per fiber

2.2 Grassmannian Class Representations

Formula: Class = k-dim subspace of ℝⁿ, distances via principal angles Properties: Requires SVD, O(nk²)

  • 2.2a Replace class vectors with class subspaces on Gr(k,n)
    • Each class owns a k-dim subspace; classification = nearest subspace
    • Literature: +1.3% on ImageNet over single class vectors
  • 2.2b Grassmannian distance metrics ablation: geodesic vs chordal vs projection
  • 2.2c Per-class anchor subspace: each anchor defines a subspace, not a point

2.3 Flag Manifold (Nested Subspace Hierarchy)

Formula: V₁ ⊂ V₂ ⊂ ... ⊂ Vₖ, nested subspaces Properties: Generalizes Grassmannian, natural for multi-resolution

  • 2.3a Flag decomposition of frequency channels (DC ⊂ low ⊂ mid ⊂ high)
    • Test whether nesting constraint improves spectral encoder
  • 2.3b Flag-structured anchors: coarse-to-fine anchor hierarchy

2.4 Von Mises-Fisher Mixture

Formula: f(x; μ, κ) = C_p(κ) exp(κ μᵀx), soft clustering on S^d Properties: Natural density model for hyperspherical data

  • 2.4a Replace hard nearest-anchor assignment with vMF soft posteriors
    • p(j|x) = α_j f(x;μ_j,κ_j) / Σ α_k f(x;μ_k,κ_k)
    • Learned κ per anchor = adaptive influence radius
  • 2.4b vMF mixture EM for anchor initialization (replace uniform hypersphere init)
  • 2.4c vMF concentration κ as a diagnostic: track per-class κ convergence

2.5 Optimal Anchor Placement

  • 2.5a E₈ lattice anchors for 8-d constellation (240 maximally separated points)
  • 2.5b Spherical t-design initialization vs uniform hypersphere init
  • 2.5c Thomson problem solver for N anchors on S^d (energy minimization)
    • Compare: QR + iterative repulsion (current) vs Coulomb energy minimization

CATEGORY 3: COMPACT REPRESENTATIONS

3.1 Random Fourier Features

Formula: z(x) = √(2/D) [cos(ω₁ᵀx+b₁), ..., cos(ωDᵀx+bD)] Properties: Pseudo-deterministic, preserves kernel structure, maps to S^d via cos/sin

  • 3.1a RFF on raw pixels → S^d → constellation
    • Baseline: how much does nonlinear kernel approximation help raw pixels?
  • 3.1b RFF on scattering features → constellation
    • Composition: scattering (linear invariants) → RFF (nonlinear kernel)
  • 3.1c Fourier feature positional encoding (Tancik/Mildenhall style)
    • γ(v) = [cos(2πBv), sin(2πBv)]ᵀ explicitly maps to hypersphere

3.2 Johnson-Lindenstrauss Projection

Formula: f(x) = (1/√k)Ax, preserves distances with k = O(ε⁻² log n) Properties: Pseudo-deterministic, near-isometric

  • 3.2a JL from scattering (~10K) to 128-d → L2 norm → constellation
    • Test: does JL + L2 norm preserve enough structure?
  • 3.2b JL target dimension sweep: 32, 64, 128, 256, 512
    • Find minimum k where constellation accuracy saturates
  • 3.2c Fast JL (randomized Hadamard) vs Gaussian JL speed/accuracy tradeoff

3.3 Compressed Sensing on Scattering Coefficients

Formula: y = Φx, recover via ℓ₁ minimization if x is k-sparse Properties: Exact recovery for sparse signals, O(k log(N/k)) measurements

  • 3.3a Measure sparsity of scattering coefficients (how many are near-zero?)
    • If sparse: CS can compress much more than JL
  • 3.3b CS measurement matrix → L2 norm → constellation
    • Compare: CS vs JL at same target dimension

3.4 Spherical Harmonics

Formula: Y_l^m(θ,φ), complete basis on S², (l_max+1)² coefficients Properties: Deterministic, native Fourier on sphere, exactly invertible

  • 3.4a Expand constellation triangulation profile in spherical harmonics
    • Which angular frequencies carry discriminative info?
  • 3.4b Spherical harmonic coefficients of embedding distribution as class signature
  • 3.4c Hyperspherical harmonics for S^15 and S^43 (higher-dim generalization)

CATEGORY 4: INVERTIBLE GEOMETRIC TRANSFORMS

4.1 Stereographic Projection

Formula: σ(x) = x_{1:n}/(1−x_{n+1}), σ⁻¹(y) = (2y, ‖y‖²−1)/(‖y‖²+1) Properties: Conformal bijection S^n{pole} ↔ ℝⁿ, preserves angles

  • 4.1a Stereographic → Euclidean scattering → inverse stereographic → S^d
    • Apply scattering in flat space, project back to sphere
  • 4.1b Stereographic projection as constellation readout alternative
    • Instead of triangulation distances, read local coordinates via stereographic

4.2 Exponential / Logarithmic Maps

Formula: exp_p(v) = cos(‖v‖)·p + sin(‖v‖)·v/‖v‖ Formula: log_p(q) = arccos(⟨q,p⟩) · (q−⟨q,p⟩p)/‖q−⟨q,p⟩p‖ Properties: Deterministic, locally invertible, O(n)

  • 4.2a Replace triangulation (1−cos) with log map coordinates at each anchor
    • Log map gives direction + distance in tangent space (richer than scalar distance)
    • Each anchor contributes d-dim tangent vector instead of 1-d distance
  • 4.2b Log map triangulation → parallel transport to common tangent space → aggregate
    • Geometrically principled alternative to patchwork concatenation

4.3 Parallel Transport

Formula: Γ^q_p(v) = v − (⟨v,p⟩+⟨v,q⟩/(1+⟨p,q⟩))·(p+q) on S^n Properties: Isometric between tangent spaces, exactly invertible

  • 4.3a Compute log maps at K anchors → parallel transport all to north pole → aggregate
    • Creates a canonical tangent-space representation independent of anchor positions
  • 4.3b Parallel transport as inter-anchor communication in constellation
    • How does the same input look from different anchor tangent spaces?

4.4 Möbius Transformations

Formula: h_ω(z) = (1−‖ω‖²)/‖z−ω‖² − ω Properties: Conformal automorphism of S^d, invertible, O(d)

  • 4.4a Möbius "geometric attention": transform sphere to zoom into anchor regions
    • Expand region near anchor, compress far regions
    • Each anchor applies its own Möbius transform before measuring distance
  • 4.4b Composition of Möbius transforms as normalizing flow on S^d
    • Learned flow that warps embedding distribution toward better separation

4.5 Procrustes + Polar Decomposition

Formula: R* = argmin_R ‖RA−B‖_F = UVᵀ from SVD(BᵀA) Formula: A = UP (rotation × stretch)

  • 4.5a Procrustes-align channel cloud to canonical pose before Cholesky/SVD
    • Remove rotation variability, isolate shape information
  • 4.5b Polar decomposition of channel matrix: U (rotation) + P (stretch) as separate features
    • U encodes orientation of frequency cloud; P encodes shape/scale
    • Both are geometric, both are deterministic from the channel matrix

CATEGORY 5: MATRIX DECOMPOSITION SIGNATURES

5.1 Already Tested

  • Cholesky of Gram matrix → 36 lower-tri values (in v4, working)
  • SVD singular values → 8 values (in v4, working)
  • Concatenated 44-d signature on S^43 → 46.8% with CE-only

5.2 Remaining Decompositions

  • 5.2a QR decomposition: Q (rotation) and R diagonal (scale per channel)
    • R diagonal = per-channel magnitude; Q = inter-channel angular structure
  • 5.2b Schur decomposition: T diagonal = eigenvalues, T off-diagonal = coupling
    • For the Gram matrix: Schur gives eigenstructure in triangular form
  • 5.2c Eigendecomposition of Gram: eigenvalues as spectral signature
    • Compare: eigenvalues vs SVD singular values vs Cholesky diagonal
    • These are related but not identical (λ_i = σ_i² for Gram = AᵀA)
  • 5.2d NMF of magnitude spectrum: parts-based decomposition
    • Requires iterative optimization (not fully deterministic)
    • But finds additive, non-negative parts — texture components
  • 5.2e Tucker tensor decomposition of spatial×frequency×channel tensor
    • 3D structure: (H, W, freq_bins) per color channel
    • Core tensor encodes interactions between spatial, frequency, channel modes

CATEGORY 6: INFORMATION-THEORETIC LOSSES

6.1 Already Tested

  • InfoNCE (self-contrastive, two augmented views) — dead at 0.15 in spectral v4
  • CosineEmbeddingLoss — frozen at 0.346 (margin-saturated)
  • CV loss (Cayley-Menger volume) — running but not in 0.18-0.25 band

6.2 Loss Modifications

  • 6.2a Drop contrastive losses entirely, CE-only + geometric losses
    • v4 shows CE is the only contributor; contrastive is dead weight
    • Hypothesis: removing dead losses may speed convergence
  • 6.2b Class-conditional InfoNCE: positive = same class, not same image
    • Requires labels but gives much stronger supervision signal
  • 6.2c vMF-based contrastive loss: replace dot-product similarity with vMF log-likelihood
    • κ-adaptive: high-κ for nearby pairs, low-κ for far pairs
  • 6.2d Fisher-Rao distance as loss: d_FR(p,q) = 2·arccos(∫√(pq))
    • Natural distance for distributions on the sphere
  • 6.2e Sliced spherical Wasserstein distance as distribution matching loss
    • Matches embedding distribution to target (e.g., uniform on sphere)
  • 6.2f Geometric autograd (from GM3): tangential projection + separation preservation
    • Adam + geometric autograd > AdamW on geometric tasks (proven)
    • Operates on gradient direction, not loss value

6.3 Anchor Management

  • 6.3a Anchor push frequency sweep: every 10, 25, 50, 100, 200 batches
  • 6.3b Anchor push with vMF-weighted centroids instead of hard class centroids
  • 6.3c Anchor birth/death: add anchors where density is high, remove where unused
  • 6.3d Anchor dropout sweep: 0%, 5%, 15%, 30%, 50%

CATEGORY 7: COMPOSITE PIPELINE TESTS

7.1 The Reference Pipeline (from research article)

  • 7.1a Scattering(J=2,L=8) → JL(128) → L2 norm → constellation(64) → classify
    • The "canonical" pipeline; expected ~75-80% based on literature
  • 7.1b Same as 7.1a but with learned 2-layer projection replacing JL
    • Minimal learned params (~16K), test if projection adaptation matters
  • 7.1c Scattering → curvelet energy → concat → JL → constellation
    • Test complementarity

7.2 Hybrid: Spectral + Scattering

  • 7.2a STFT channels (v4) + scattering features → concat → JL → S^d → constellation
    • STFT gives spatial-frequency; scattering gives multi-scale invariants
  • 7.2b Scattering → Cholesky Gram + SVD signature → constellation
    • Apply v4's geometric signature to scattering output instead of STFT

7.3 Multi-Signature Constellation

  • 7.3a Parallel extraction: scattering + Gabor + Radon → separate constellations → fusion
    • Each primitive captures different geometric aspect
    • Fusion: concatenate patchwork outputs → shared classifier
  • 7.3b Hierarchical constellation: scattering → coarse anchors → residual → fine anchors
    • Two-stage: first stage identifies broad category, second refines

7.4 Minimal Learned Params Tests

  • 7.4a Best deterministic pipeline + 1 learned linear layer (d_in → 128) before constellation
    • Measure: how much does a single projection layer help?
    • Count: exact learned param count
  • 7.4b Same as 7.4a but with SquaredReLU + LayerNorm (the proven patchwork block)
  • 7.4c Sweep learned projection sizes: 0, 1K, 5K, 10K, 50K, 100K params
    • Find the elbow where adding params stops helping

PRIORITY QUEUE (recommended execution order)

Tier 1: Highest Expected Impact

  1. 1.1a — Scattering + flat constellation (the literature leader)
  2. 1.1b — Scattering + JL → S^127 + constellation
  3. 6.2a — Drop dead contrastive losses from v4, measure CE-only ceiling
  4. 2.4a — vMF soft assignment replacing hard nearest-anchor
  5. 4.2a — Log map triangulation (richer than scalar distance)

Tier 2: High Expected Impact

  1. 7.1a — Full reference pipeline
  2. 1.1f — Scattering hybrid with minimal learned projection
  3. 1.2b — Gabor spatial statistics → S^127
  4. 5.2c — Eigendecomposition vs SVD vs Cholesky ablation
  5. 2.1b — Quaternionic Hopf S⁷→S⁴ for 8-channel data

Tier 3: Exploratory

  1. 1.5a — Persistent homology standalone
  2. 3.1b — RFF on scattering features
  3. 4.4a — Möbius geometric attention
  4. 7.3a — Multi-signature parallel constellations
  5. 2.2a — Grassmannian class subspaces

Tier 4: Deep Exploration

  1. 1.3a — Radon cloud on S^d
  2. 1.4b — Curvelet + scattering concat
  3. 2.3a — Flag decomposition of frequency channels
  4. 4.3a — Parallel transport aggregation
  5. 3.4c — Hyperspherical harmonics analysis

RUNNING SCOREBOARD

Experiment Val Acc Params (learned) CV Anchors Active InfoNCE Key Finding
Linear baseline 67.0% 423K Overfits E31
MLP baseline 65.0% 687K Overfits E10
Core CE-only 63.4% 820K 0.70 CV never converges
Core CE+CV 62.7% 820K 0.61 CV hurts accuracy
Full GELU 88.0% 1.6M 0.14-0.17 64/64 1.00 Reference
Full SquaredReLU 88.0% 1.6M 0.15 64/64 1.00 Matches GELU
Spectral v1 (flat FFT) FAIL 1/64 Norm mismatch
Spectral v2 (per-band) ~35% 1.2M 0.17-0.19 900/3072 0.45 Too diffuse
Spectral v3 (sph mean) ~27% 130K 0.27-0.34 110/128 0.35 Collapsed to point
Spectral v4 (STFT+Chol+SVD) 46.8% 137K 0.52-0.66 53/64 0.15 CE-only carry
Scattering baseline ~82%* 0 Literature (SVM)

Italicized entries are literature values, not our runs


NOTES & INSIGHTS

Why contrastive losses die on deterministic encoders

The STFT/FFT faithfully reports every pixel-level difference between augmented views. Two crops of the same image produce signatures as different as two different images. Without a learned layer to absorb augmentation variance, InfoNCE has nothing to align. Solutions: (a) augmentation-invariant features (scattering), (b) thin learned projection, (c) class-conditional contrastive (6.2b), (d) drop contrastive entirely (6.2a).

The Cholesky insight

L diagonal encodes "new angular information per tier given all lower tiers." This IS discriminative (proved by v4 reaching 46.8% with CE alone). The 44-d signature on S^43 carries real inter-channel geometry. Next question: is the STFT front-end the bottleneck, or the 44-d signature?

Scattering is the clear next step

82% on CIFAR-10 with zero learned params (literature) vs our 46.8%. Scattering is translation-invariant AND deformation-stable (Lipschitz). This directly addresses the augmentation sensitivity problem. kymatio provides GPU-accelerated PyTorch implementation.

The dimension question

S^15 (band_dim=16) vs S^43 (signature) vs S^127 (conv encoder output) E₈ lattice gives 240 optimal anchors on S^7 Proven CV attractor at ~0.20 is on S^15 Need to test which target sphere dimension is optimal for spectral features


Last updated: 2026-03-18, session with Opus Next: run scattering baseline (1.1a), then decide pipeline direction

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including AbstractPhil/geolip-hypersphere-experiments