AI

Artificial intelligence model system for bone age assessment of preschool children


Bone age serves as a quantitative measure of skeletal development maturity.1 The utilization of X-ray wrist images for BAA in children become widespread.10,11,12,13,14 In the past, manual BAA methods required observers to meticulously compare or score individual bones.16 DL offers a faster and more consistent solution. We categorized multiple observers into groups, scrutinized and compared their diagnostic accuracy both with and without AI assistance, and assessed inter-observer consistency and intra-observer reproducibility. The findings reveal that experienced radiologists can enhance the precision of BAA with the aid of AI. Simultaneously, AI can mitigate inter-observer variability and enhance intra-observer reproducibility.

AI technology stands as a prominent application within the realm of medical imaging, including the diagnosis of lung nodules and the detection of bladder cancer.17,18 DL can precisely quantify the shape and position of each target bone in the wrist for BAA, with its development dating back to 2017.19 Presently, researchers construct algorithmic models to predict BA rapidly and accurately.20 drawing from a vast repository of images.9,21 Spampinato et al.9 were pioneers in exploring the application of DL to medical images, and they demonstrated an average deviation of about 0.8 years when compared to manual evaluations. In 2020, Reddy et al.22 employed a publicly provided anonymized dataset from the Radiological Society of North America pediatric bone age challenge.2 The MAEs between the models for the whole hand and index finger were comparable (0.392 years vs. 0.425 years, p = 0.14). Both BA values were significantly smaller than those obtained by three pediatric radiologists from single-finger radiographs (0.667 years, p < 0.0001). Larson et al.21 developed a DL model for BAA based on a comparison with 12,611 clinical hand radiographs using the Greulich and Pyle (GP) atlas and corresponding clinical radiology reports. The mean difference between the model’s BAA radiographs and reviewers was 0 years, with a mean RMSE and MAE of 0.63 and 0.50 years, respectively. All assessments fell within the 95% limits of agreement with each other. The Residual Network model effectively extracts X-ray bone image features and autonomously determines bone age, boasting an impressive BA prediction accuracy of 97.6% and a MAE of 0.455 year.12 AI models have consistently demonstrated high accuracy in BAA,21,22,23 and this study’s results reaffirm this fact. Radiologists can enhance their diagnostic accuracy in BA evaluation with the assistance of AI models.

Environmental and ethnic factors exert varying degrees of influence on bone development, leading to differing outcomes in BAA. We employed two distinct BAA methods, primarily suited for Chinese children. Both the TW3 method and the RUS-CHN method are widely utilized for the assessment of preschool children. The TW3 method evaluates and scores the maturity of each region of interest bone and drew reference data from children residing in Europe and America, with publication occurring in 2001.4 The TW3 method is a quantitative approach that scores and sums 20 hand-wrist bones, which characterized by strong objectivity, resulting in highly accurate assessments with a precision of less than one month.24 However, it is time-consuming and entails a complex evaluation process. Several studies have affirmed the high accuracy of the TW3 method for BAA has high accuracy.3,25 In a British children’s sample, CA was underestimated in females beyond the age of 3 years, resulting in significant differences between BA and CA (−0.43 years, p < 0.001), while no such differences were observed in males (0.01 years, p = 0.760).3 Based on an analysis of 9059 clinical left hand radiographs, an optimized TW3-AI system for BAA exhibited strong concordance with the overall assessment of reviewers, with a RMSE of 0.50 years.25 In our study, with the aid of the AI model system, the RMSE of observations by mid-level doctors decreased from 0.358 to 0.151. This further underscores that AI has the potential to narrow the disparity in BAA results compared to the reference standard in the TW3 method, thereby assisting physicians in enhancing diagnostic accuracy. In 2006, researchers5 revised the standards based on the TW3 method and established the RUS-CHN method. using samples from urban areas in China Building on the original bone development framework of the TW3 method, the RUS-CHN method identifies new maturity characteristics, which better align with the actual skeletal conditions of children during their rapid growth and development. It also subdivides the long-term fusion process of the radius and ulna into five distinct grades, thereby enhancing accuracy throughout the entire growth and development period.26 The RUS-CHN method, necessitates more steps, consumes additional time during the evaluation process, and is challenging to master. In a preliminary study conducted by our team involving 390 preschool children, it was observed that while the TW3 method outperformed the RUS-CHN method, it was not entirely reliable on its own. This is because both methods tended to overestimate the age of both sexes. Nevertheless, the median difference of the TW3 method approached zero.27 In the current study, when observers used the RUS-CHN method, both with and without AI assistance, the RMSE was 0.359 and 0.148, while the MAE was 0.309 and 0.113, respectively, signifying a high level of diagnostic performance. Moreover, with the aid of AI, observer diagnostic accuracy can be further enhanced.

Applying AI systems to BAA presents two primary challenges, namely ensuring consistency in both inter- and intra-observer evaluations. in an investigation involving American children, researchers compared the BAA performance of a group of pediatric radiologists with and without AI support. With AI assistance, BAA accuracy improved, with an overall accuracy of 68.2% compared to 63.6%, and an accuracy of 98.6% within 1 year compared to 97.4%. Additionally, the ICC with AI was 0.9951, whereas without AI, it was 0.9914.10 Lee KC et al.28 discovered that a deep learning-based model exhibited accuracy in BAA for a total of 102 hand radiographs. Furthermore, it appeared to enhance clinical efficacy by improving inter-observer reliability, which slightly increased the ICC of the two observers from 0.945 to 0.990 with AI. More recently, Wang X et al.15 concluded that an AI model enhances both the accuracy and consistency of BAA for physicians of all experience levels. The accuracies of senior, mid-level, and junior physicians were significantly better with AI assistance than without AI assistance (MAEs of 0.325, 0.344, and 0.370 vs. 0.403, 0.469, and 0.755, respectively). Moreover, their consistency results were significantly higher with AI assistance than without AI assistance (ICCs of 0.996, 0.996, and 0.992 vs. 0.987, 0.989, and 0.941, respectively). In this study, for the inter-observer agreement comparison, with the aid of AI, the ICC values for both BAA methods reached 0.991 in the 1st interpretation. Regarding intra-observer reproducibility between the 1st and 2nd interpretation, the ICC results were elevated to 0.998 for the TW3 method and up to 0.997 for the RUS-CHN method (Reviewer 4). And the Bland-Altman plots showed an excellent agreement among the reviewers in both two methods. The Utilizing AI-assisted software in BAA can help reviewers mitigate both inter-observer variability and intra-observer variability.

The development of AI software has simplified and expedited the BAA process. Numerous studies have compared BAA differences between AI tools and radiologists.13,16,21,28,29,30 Their findings confirm that AI can enhance diagnostic accuracy. However, relying solely on AI results without confirmation from a radiologist is not considered reliable.31 In such cases, AI software is designed to assist radiologists in making faster and more accurate diagnoses rather than replacing radiologists outright. two scenarios were established for observers, one with and one without the AI model system, and BAA accuracy was calculated separately. Our results align with previous findings and further substantiate that AI can help radiologists enhance the accuracy of BAA, particularly in preschool children, using both the TW3 and RUS-CHN methods.

The present study has several limitations: 1) This is a single-center, cross-sectional study with a small sample size, focused only on a specific population aged 3–6 years in China. 2) The study exclusively compared the TW3 and RUS-CHN methods, but other methods like the GP method, which is commonly used in various regions and hospitals, were not considered. 3) The observers in this study were mid-level attending physicians, and there was no comparison with physicians of other levels, such as junior and senior physicians. 4) The timing of bone age assessment was not documented, even though previous studies have found that AI can reduce assessment time. Comparative time consumption should be considered. Therefore, more in-depth multicenter studies are necessary to validate these findings, incorporating various BAA methods and observers with different levels of experience in future research.

During the process of BAA for preschool children, the use of AI model systems can significantly improve not only the diagnostic accuracy of physicians but also the consistency among observers and the reproducibility within observers. As a result, AI model systems hold great promise for X-ray hand-wrist bone age assessment and are a valuable tool in the clinical work of radiologists.



Source

Related Articles

Back to top button