Inc3ViTs Model: A Hybrid Architecture to Accelerate and Reduce Complexity for the DeepVariant Model for Variant Calling

Mustafa Al-Saffar, Software Dept., Information Technology College, University of Babylon, Hilla, Iraq,Follow
Sura Z. Al Rashid, Software Dept., Information Technology College, University of Babylon, Hilla, Iraq,

Abstract

Deep learning has revolutionized genomic variant calling, yet the computational cost of current systems continues to limit scalability. We present a controlled efficiency study of DeepVariant-style pileup architectures under identical training and inference conditions, comparing architectural downsizing with a hybrid CNN–local attention design. Inc3ViTs pairs a streamlined InceptionV3 stem with a lightweight local attention head based on patch tokenization and windowed self-attention, enabling a direct comparison with a CNN-only reduced Inception baseline. Across whole-genome and whole-exome short-read datasets, Inc3ViTs reduces training time by ~40–50% and reduces inference runtime relative to the original DeepVariant. The CNN-only baseline indicates that most speedups stem from architectural simplification, whereas the hybrid design achieves competitive accuracy with statistically comparable SNP performance and marginal INDEL trade-offs at high coverage, alongside improved robustness at lower sequencing depth. Our evaluation is restricted to GIAB short-read small-variant benchmarking (SNPs and INDELs); structural variants and long-read settings were not evaluated. Experiments were conducted on a single workstation equipped with an RTX 3070 Ti under controlled hardware conditions.

Recommended Citation

Al-Saffar, Mustafa and Al Rashid, Sura Z. (2026) "Inc3ViTs Model: A Hybrid Architecture to Accelerate and Reduce Complexity for the DeepVariant Model for Variant Calling," Karbala International Journal of Modern Science: Vol. 12 : Iss. 2 , Article 8.
Available at: https://doi.org/10.33640/2405-609X.3460