Physics-Based AI Image Detection
Can you detect AI-generated images by checking if they obey the laws of physics? That's the core question behind this project.
The Insight
AI image generators (DALL-E, Midjourney, Stable Diffusion) produce visually stunning images — but they don't actually understand physics. They approximate what scenes look like without modeling how light, depth, and surface interactions actually work.
This project exploits that gap.
Three Physics Pipelines
We extract 27 features from three complementary physics-based analysis pipelines, applied to 30 real images (COCO) and 30 AI-generated images (AIGenBench):
1. Depth Map Statistics
Using monocular depth estimation, we analyze the statistical properties of predicted depth maps:
- gradient mean/std — how sharply depth transitions occur
- skewness/kurtosis — asymmetry and tail behavior of depth distributions
- entropy — information content of the depth field
2. Brightness-Depth Consistency
In real photographs, brightness and depth are physically coupled — objects farther away tend to have different illumination characteristics. We measure:
- Pearson/Spearman correlation between brightness and depth
- Local patch correlations — spatial consistency of the brightness-depth relationship
- Brightness at depth edges — what happens to brightness where depth changes sharply
3. Light Estimation
Real scenes have consistent lighting from a single dominant source. AI-generated images often have subtle inconsistencies:
- Global residual — how well a single light model fits the scene
- Angular deviation — variance in estimated light direction across patches
- Fraction anomalous patches — percentage of regions with inconsistent lighting
Key Findings
The Paradox of AI Images

AI-generated images are simultaneously too consistent AND too sharp:
| Finding | What It Means |
|---|---|
| Smooth brightness-depth | AI produces overly uniform brightness-depth relationships (d=0.445) |
| Symmetric depth distributions | Real images have skewed depth (0.556 vs 0.236) — AI makes everything too balanced |
| More uniform lighting | Fewer anomalous patches in AI images (41% vs 51%) |
| But sharper depth gradients | AI images have abrupt depth transitions (d=0.653) — the strongest signal |
The last finding is the most interesting: AI generators create sharp, almost cartoon-like depth boundaries because they haven't learned that real-world depth transitions are gradual (due to actual 3D geometry, not learned textures).
Feature Ranking by Effect Size
The top discriminative features ranked by Cohen's d:
| Rank | Feature | Cohen's d | Direction |
|---|---|---|---|
| 1 | Depth gradient mean | 0.653 | Fake > Real |
| 2 | Brightness at depth edges | 0.547 | Real > Fake |
| 3 | Local correlation (abs mean) | 0.445 | Fake > Real |
| 4 | Fraction strong correlation | 0.428 | Fake > Real |
| 5 | Gradient magnitude correlation | 0.376 | Fake > Real |
| 6 | Depth skewness | 0.356 | Real > Fake |
| 7 | Fraction anomalous patches | 0.307 | Real > Fake |
Less Is More: The Classifier

We trained logistic regression classifiers with leave-one-out cross-validation:
| Model | Accuracy | F1 | Train Acc | Gap |
|---|---|---|---|---|
| All 27 features | 55.0% | 0.542 | 81.7% | 26.7pp |
| 3 features | 68.3% | 0.678 | 71.7% | 3.4pp |
The 3-feature model uses only grad_mean, brightness_at_depth_edges, and n_valid_patches — and generalizes far better because:
- 27 features overfit on 60 samples
- The top features capture complementary physics violations
- Minimal train/test gap (3.4pp) indicates genuine signal, not memorization
The Decision Rule
Classify as FAKE if:
1.43 × z(grad_mean) − 1.26 × z(brightness_at_depth_edges) + 0.04 > 0
Translation: an image is likely AI-generated if it has sharp depth gradients (doesn't understand 3D geometry) combined with low brightness variation at depth edges (doesn't understand light-surface interaction).
PCA Analysis

27 features compress into 6 principal components capturing ~77% of variance:
| PC | Variance | Interpretation |
|---|---|---|
| PC1 | 24.1% | Scene complexity (entropy, skewness, kurtosis) |
| PC2 | 15.3% | Lighting quality (angular deviation, anomalous patches) |
| PC3 | 13.8% | Brightness-depth coupling (pearson_r, spearman_r) |
| PC4 | 10.6% | Depth distribution shape (std, iqr, grad_mean) |
| PC5 | 7.5% | Information content (mutual_information) |
Critical insight: The real/fake signal is NOT the dominant axis of variation. Scene-level variation (complexity, depth range) dominates, which is why targeted feature selection outperforms using all features.
Limitations & Future Work
- Sample size: 30 images per class is a proof-of-concept — scaling to thousands would strengthen results
- Generator diversity: Tested on AIGenBench; extending to DALL-E 3, Midjourney v6, Flux would test generalizability
- Depth estimator dependency: Results depend on the quality of the monocular depth model
- Complementary approaches: Physics features could be combined with frequency-domain or learned detectors for higher accuracy
Conclusion
This project demonstrates that physics-based reasoning can detect AI-generated images without any training on specific generators. The key insight — AI images are paradoxically too consistent while having unnaturally sharp depth transitions — reveals a fundamental limitation of current image generators: they learn to approximate visual appearance without genuinely understanding the physical world that produces those appearances.