•  
  •  
 

Abstract

Deep learning has revolutionized genomic variant calling, yet the computational cost of current systems continues to limit scalability. We present a controlled efficiency study of DeepVariant-style pileup architectures under identical training and inference conditions, comparing architectural downsizing with a hybrid CNN–local attention design. Inc3ViTs pairs a streamlined InceptionV3 stem with a lightweight local attention head based on patch tokenization and windowed self-attention, enabling a direct comparison with a CNN-only reduced Inception baseline. Across whole-genome and whole-exome short-read datasets, Inc3ViTs reduces training time by ~40–50% and reduces inference runtime relative to the original DeepVariant. The CNN-only baseline indicates that most speedups stem from architectural simplification, whereas the hybrid design achieves competitive accuracy with statistically comparable SNP performance and marginal INDEL trade-offs at high coverage, alongside improved robustness at lower sequencing depth. Our evaluation is restricted to GIAB short-read small-variant benchmarking (SNPs and INDELs); structural variants and long-read settings were not evaluated. Experiments were conducted on a single workstation equipped with an RTX 3070 Ti under controlled hardware conditions.

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 4.0 License.

Share

COinS