AI-driven industrial robotic arms rely on compressed vision models to perform high-precision tasks such as component classification and sorting on resource-constrained edge hardware. Post-training quantization (PTQ) — the standard technique for shrinking models to fit small accelerators — changes weight and activation distributions in ways that can silently degrade robustness, particularly under the physical perturbations common on factory floors: dust, shadows, reflections, and partial occlusions. ARCADE investigates how FP8 post-training quantization affects model resilience in realistic industrial conditions, going beyond synthetic benchmarks to evaluate robustness under authentic, physically induced corruptions. The project makes three contributions. First, it introduces ScrewSet, a large-scale industrial image-classification benchmark comprising 102,400 clean RGB images of 40 screw types photographed from 32 robot-controlled angles, together with ScrewSet-C, a companion set of 7,680 images exhibiting six types of real, physically induced corruptions — sawdust, shadow, reflection, and three forms of occlusion. Unlike standard robustness benchmarks such as CIFAR-10-C and ImageNet-C, which apply pixel-level synthetic noise, ScrewSet-C captures the structured, spatially meaningful perturbations that edge-deployed models actually encounter in manufacturing. Second, the project conducts a systematic evaluation of FP8 PTQ on MobileNetV3-Small across ScrewSet, CIFAR-10, and ImageNet. Results reveal that FP8 in the E4M3 format best preserves both clean and corrupted accuracy, while narrower formats (E5M2, FP4) suffer severe or total collapse. Crucially, performance on synthetic benchmarks dramatically overstates real-world robustness: models that retain over 90 percent accuracy on ScrewSet lose nearly all accuracy on ScrewSet-C, exposing a critical gap between laboratory evaluation and factory deployment. Third, the project develops Hessian-spectrum diagnostics — lightweight, label-free geometric tools based on effective dimensionality and spectral curvature — that predict how much accuracy a quantized model will lose under distribution shift. These diagnostics require only a small calibration sample and no retraining, offering a practical toolkit for selecting among quantization formats before deployment.
All data were collected using a Yahboom DOFBOT AI-Vision robotic arm with a 0.3 MP USB industrial camera, controlled by ROS and Python 3 on a Raspberry Pi 5. Motion scripts guide the arm to sample each screw from many positions on a spherical grid, recording the image and pose information at each angle.
ScrewSet and ScrewSet-C. ScrewSet contains 102,400 RGB colour images of 40 industrial screw types, photographed from 32 robot-controlled angles on four backgrounds, covering elevation angles from −35° to +55°. Both standard and slightly defective examples are included. ScrewSet-C uses the same screws and views, adding six types of real, physical corruptions: sawdust, shadow, reflection, occlusion (top left), occlusion (bottom right), and multiple objects. This yields 7,680 images capturing the authentic perturbations that AI models encounter on factory floors — corruptions that are spatially structured and semantically meaningful, unlike the pixel-level noise of synthetic benchmarks.
FP8 Quantization Analysis. The project systematically benchmarks MobileNetV3-Small quantized to FP8 (E4M3, E5M2) and FP4 using post-training quantization, comparing against FP32 baselines across ScrewSet/-C, CIFAR-10/-C, ImageNet/-C, and ImageNet-A. FP8-E4M3 consistently preserves clean accuracy closest to FP32 and incurs only minor losses under corruption. FP4, by contrast, collapses to chance-level performance without retraining, making it unusable for classification. On ScrewSet-C, even FP32 models that achieve over 90 percent clean accuracy drop to under 10 percent mean corruption accuracy — demonstrating that physical corruptions pose qualitatively different challenges from synthetic noise.
Hessian-Spectrum Diagnostics. The project develops geometric diagnostics based on the Hessian eigenspectrum of quantized models. By computing effective dimensionality, spectral entropy, and curvature shifts across quantization formats and corruption types, these metrics predict accuracy loss without requiring labels or retraining. The slope of the change in effective dimensionality is the most consistent predictor of robustness loss, with strong negative correlations (Spearman ρ up to −0.91) between spectral shift and accuracy drop. These findings hold across both MobileNetV3 and ShuffleNetV2 backbones, confirming cross-architecture generality.