Quantize NafNet deblurring: ORT pipeline + validation script + robust handling#299
Quantize NafNet deblurring: ORT pipeline + validation script + robust handling#299Hariom-Nagar211 wants to merge 2 commits intoopencv:mainfrom
Conversation
…t from 4-point corners); keep original quad dets. Fixes opencv#275
…robust quantization handling
jay7-tech
left a comment
There was a problem hiding this comment.
Went through the changes. A few things worth resolving before merge:
- HandAlign lazy-load is incomplete
The PR description says HandAlign is lazy-loaded to avoid importing the palm detector on unrelated runs. But HandAlign.init still initializes the palm detector at construction time, not inside call. The real problem is that the models = dict(...) block is at module level, so every Quantize object — including mp_handpose with its HandAlign — gets instantiated on import, regardless of what the user passes on the command line. The fix is to defer models construction inside if name == 'main':, or make DataReader construction lazy until .run() is called.
- calibration_image_dir='path/to/dataset' in the existing mp_palmdet and mp_handpose entries (in main)
This is unrelated to this PR but is a live crash in main right now — python quantize-ort.py mp_palmdet immediately throws FileNotFoundError: 'path/to/dataset'. Should be fixed alongside this PR since this PR touches the same file.
- max_samples=1 for NafNet calibration
MinMax calibration on a single sample produces scales tuned to one image's dynamic range — results in severe accuracy loss on diverse inputs. 64 samples minimum would give representative ranges. Worth documenting the tradeoff at minimum.
The validate_quantization.py PSNR/SSIM comparison script is a great addition.
Summary
Adds ONNX Runtime static quantization support for deblurring_nafnet, and a validation script to compare FP32 vs INT8 outputs with PSNR/SSIM and timing.
Improves quantization robustness and memory handling across the tools.
Changes
tools/quantize/quantize-ort.py
Adds deblurring_nafnet entry to the central models dict.
Uses QuantFormat.QOperator + op_types_to_quantize=['Conv','MatMul'] for better CPU EP performance, with an automatic fallback to QDQ Conv-only when needed.
Optional pre-processing (skips if onnxruntime-extensions is missing).
Lazy
DataReader
creation to avoid importing all datasets at module import.
MinMax calibration; input resize to 512x512; max_samples=1 to reduce memory.
Auto-excludes Conv nodes without bias initializers to avoid bias=None errors.
tools/quantize/transform.py
Makes
HandAlign
lazy-load its palm detector so unrelated quant runs don’t load Mediapipe ONNX at import time.
models/deblurring_nafnet/validate_quantization.py
New: Runs FP32 and INT8 models via ORT, reports timing, PSNR, SSIM, and shows side-by-side results.
Usage
Quantize NafNet:
cd tools/quantize
python quantize-ort.py deblurring_nafnet
Output: models/deblurring_nafnet/deblurring_nafnet_2025may_int8.onnx
Validate:
cd models/deblurring_nafnet
python validate_quantization.py --input example_outputs/licenseplate_motion.jpg --model_fp32 deblurring_nafnet_2025may.onnx --model_int8 deblurring_nafnet_2025may_int8.onnx
Notes on performance
On CPUExecutionProvider, INT8 speed-ups are not guaranteed unless the provider fuses/accelerates int8 kernels.
For acceleration, consider provider-specific EPs:
OpenVINOExecutionProvider (CPU) or DmlExecutionProvider (GPU) if available.
The script quantizes to QOperator Conv/MatMul by default; falls back to QDQ Conv-only for robustness.
Known limitations
Quantization depends on ORT support for the model’s ops and shapes; the script includes automatic mitigations (resize, sample limit, excludes).
We do not commit generated ONNX or output images.
Screenshot:
