•  
  •  
 

Abstract

A crime is an illegal or violent act committed by one individual against another. The increasing crime rate has become a major concern as it negatively affects people's quality of life and generates significant social and economic costs. This study aims to identify the most widely used machine learning (ML) models for crime prediction, determine evaluation metrics for assessing model performance, and analyze key data characteristics to enhance real-world implementation. The study follows the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) methodology. A search string was formulated using the population, intervention, comparison, and outcomes (PICO) framework and applied to the Scopus and Web of Science database. After applying eligibility criteria, 50 articles were selected for in depth analysis. The findings indicate that the most prominent ML models include extreme gradient boosting (XGBoost), random forest (RF), gradient boosting decision trees (GBDT), and auto-regressive integrated moving average (ARIMA), as well as deep learning models such as long short-term memory (LSTM), which showed high performance in dynamic urban environments. The most relevant metrics for classification are accuracy, recall, F1-score, precision, and area under the curve (AUC), while for regression, mean absolute error (MAE), root mean squared error (RMSE), and R-squared (R2 ) are preferred. Key data features include date, time, age, gender, education level, location, and coordinates. Additionally, integrating climate and temperature data is recommended. This study provides a structured analysis of crime prediction models and proposes an architecture for their development and deployment, offering valuable insights for future research and practical applications.

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 4.0 License.

Share

COinS