Screening and diagnosis of cardiovascular disease using artificial intelligence-enabled cardiac magnetic resonance imaging

May 16, 2024

65 19 minutes read

Screening and diagnosis of cardiovascular disease using artificial intelligence-enabled cardiac magnetic resonance imaging — 41591 2024 2971 Fig1 HTML.png

Ethics approval

The CMR datasets were acquired retrospectively under the approval of the institutional review boards (IRBs) at each participating institution, including Beijing Fuwai Hospital, Beijing Anzhen Hospital, Guangdong Provincial People’s Hospital, the 2nd Affiliated Hospital of Harbin Medical University, the First Hospital of Lanzhou University, Renji Hospital, Tongji Hospital and Peking Union Medical College Hospital. Informed consent was waived by the IRBs. Before model training, testing and reader studies, all data underwent deidentification processes.

Datasets

The CMR database search was performed for all eight centers to identify CVDs and normal controls. All data were anonymized and deidentified, as per the Health Insurance Portability and Accountability Act Safe Harbor provision⁵⁶. Inclusion criteria were (1) patients with a definitive diagnosis of CVD and (2) patients with CMR scans at baseline before surgical treatment, if any. Exclusion criteria were (1) incomplete cine or LGE modalities, (2) SAX cine with fewer than five views, (3) CMR images with insufficient scan quality, (4) CVD patients missing clinical data and (5) CMR examinations that could not be interpreted and agreed upon by the committee cardiologists according to the diagnostic criteria (Methods). The detailed diagnostic criteria of the 11 types of CVDs and normal controls included in this study was described in Methods. Table 1 and Extended Data Table 1 present the detailed demographics and distribution of the primary dataset and the external validation sets collected from the other seven medical centers across China. To offer a comprehensive perspective on our primary development dataset, we went the extra mile by collecting the LV ejection fraction (LVEF) metric for all 7,900 subjects (including 1,250 normal controls and 6,650 patients with CVD) within the primary dataset. We meticulously summarized the distribution of demographics and LVEF across the 11 specified CVD classes and the normal control class in Supplementary Table 5. Additionally, we generated density plots to illustrate the distribution of LVEF for each class in the primary dataset, offering a more comprehensive representation (Supplementary Fig. 1).

The fresh consecutive testing set is designed to capture the genuine spectrum of disease phenotypes in the real-world clinical prevalence. To offer a thorough understanding of the severity of cases in alignment with real-world clinical prevalence, we have presented five key cardiac function metrics. These metrics include LVEF, LV mass, LVMi (LV mass index), LV end-diastolic volume and LV end-diastolic volume index. Supplementary Table 6 presents the distribution of demographics and the cardiac functions across 11 CVD classes and the normal control class in the fresh consecutive testing set. For improved visualization and clarity, we have depicted the prevalence of the 11 CVD classes in both the fresh consecutive testing set (n = 532 patients with CVD) and the primary discovery dataset (n = 6,650 patients with CVD) using pie charts in Supplementary Fig. 2. The fresh consecutive testing set offers a representation of the genuine clinical prevalence. Through direct comparison, it is evident that the primary dataset and the consecutive testing set exhibit very similar CVD prevalence and distribution. The top three most prevalent CVDs referred to the CMR examination remain HCM, DCM and CAD.

All images were acquired by breath-holding and electrocardiographic gating. A balanced steady-state free precession sequence was used for cine images with a continuous sampling from the basal to the apical levels on SAX views and two-chamber, three-chamber and 4CH long-axis views. We included cine MRI from two views in this study: the standard SAX cine and the long-axis 4CH cine. The SAX cine clearly depicts the RV and the LV. The 4CH cine shows the four chambers of heart: right atrium, left atrium, RV and LV.

LGE MRI has been established as the gold standard reference for myocardial viability and replacement fibrosis in the myocardium^57,58. In our CMR cohorts, the LGE images were obtained using phase-sensitive inversion recovery sequence with a segmented FLASH readout scheme performed 10–15 min after injection of gadolinium-based contrast with 0.15 mmol kg⁻¹ per bolus. Gadolinium contrast agents can be used to detect areas of fibrosis, as the prolonged washout of the contrast correlates with a reduction in functional capillary density in the irreversibly injured myocardium⁵⁹. The SAX LGE used in the study was acquired from the SAX view with the same section thickness, covering the entire left ventricle from the base to the apex (nine parallel views for most cases). Note that LGE is an invasive examination that requires contrast injection and was therefore not performed for normal controls.

The typical CMR scan protocol and scanner parameters for the primary and external validation sets are presented in Supplementary Table 7. Extended Data Fig. 2 shows an illustration of cardiac MRIs (SAX cine, 4CH cine and SAX LGE) utilized in model development. Supplementary Videos 1–11 demonstrate example CMR of the 11 types of CVDs.

Annotation procedures

For each patient in the disease cohort, the textual description of the abnormalities in the CMR and the clinical report was extracted as the main reference. Besides that, all CMR records underwent additional annotation procedures. To annotate the disease cohort, a group of certified CMR experts reviewed all records and clinical reports. Every record was randomly assigned to be reviewed by a single physician specifically for this task, not for any other purpose. All annotators received specific instructions and training regarding how to annotate CMR data to improve labeling consistency. The diagnostic criteria we adopted in this study for each CVD class are described in Methods. CMR examinations that could not be interpreted by physicians received further annotation from a consensus committee of board-certified practicing cardiologists (with >15 years of experience in CMR reading) working in Fuwai Hospital. The CMR examinations that could not be interpreted or agreed upon by the committee were removed from our dataset.

For the independent gold-standard test dataset with 500 patients (Extended Data Table 6) for human–machine comparison, six physicians working in the MRI department at Fuwai Hospital contributed directly to its annotation (the six physicians were not involved in dataset annotation as described above). All participating physicians received specific instructions and training regarding how to annotate CMRs to ensure consistency. We divided the physicians into three groups according to their reading experience in CMR: 3–5 years, 5–10 years and more than 10 years. CMR physicians in each group reviewed a randomly selected set of the 500 CMRs in a nonrepetitive manner.

CMR preprocessing

The CMR preprocessing pipeline aimed to remove the additional burden of the deep neural network learning to find patterns between images for disease classification. All cardiac MRIs were preprocessed to (1) resample MRI images to the same spatial resolution and (2) localize the heart region of interest (ROI) to a crop image. We detailed the preprocessing step for cine and LGE MRI below and in Extended Data Fig. 4.

SAX cine comprises nine parallel views (for most cases) covering the apical to the basal levels of the LV. Each view contains 25 frames (cardiac phases), leading to 225 images in one single SAX cine record. We examined the representational power of different numbers of input views in developing the classification model. Balancing efficiency and effectiveness, the three-view input scheme achieved a greater representation of SAX cine and therefore is adopted throughout the rest of the study. The three-view input scheme includes the middle layer (the mid slice among the parallel layers spanning from the base to the apex), the second layer above the middle layer and the second layer below the middle layer (Extended Data Fig. 2). We extract the ‘ImagePositionPatient’ tag and the ‘ImageOrientationPatient’ tag from each Dicom header to locate the three layers. Then, three-spline interpolation provided by SimpleITK⁶⁰ library (https://simpleitk.org/) is applied to resample the raw cine MRIs to the same spatial resolution of 0.994 mm × 0.994 mm, which is the most common spatial resolution across all subjects investigated in this study. We developed a heart ROI segmentation model (the following section) and used it to localize the region of heart for each cine MRI. The heart ROI segmentations predicted by the AI models were manually checked to ensure their accuracy. The extracted ROIs are padded to keep the aspect ratio the same without distortion, and then resized to 224 × 224. The top and bottom 0.1% of the pixels in cine MRI images are clipped to avoid pixels that are outliners of the distribution. The cine images are scaled between 1 and 255, and then normalized by zero mean and unit variance before feeding them to the model. We sample a clip of 25 frames from each full-length cine sequence using a temporal stride of two, resulting in 13 frames as inputs to model development. The 4CH cine shares the same preprocessing pipeline as SAX cine, except that only one single layer (mid slice) is used to represent the 4CH view. For SAX LGE, all layers covering from the base to the apex of the heart are used for diagnostic model development. The preprocessing steps for SAX LGE are similar to that of cine MRI. We resampled SAX LGE along the z-axis to ensure that each LGE sequence contains nine slices because nine is the most common number of views for SAX LGE included in this study.

Heart ROI extraction

We developed heart detection DNN models to automatically extract the heart ROI regions (Extended Data Fig. 4). Three DNN models for SAX cine, 4CH cine and SAX LGE were trained and evaluated, respectively. We applied nnU-Net⁶¹ as our model backbone and generated the ground-truth segmentation masks for model supervision using a semi-automatic approach. (1) Automatic localization: for SAX cine and 4CH cine, we selected the pixel region with maximum standard deviation across all frames. These regions localize the heart ROI as heart is a beating organ with high standard deviation in its position. Specifically, for each cine movie sequence $s=\{{x}_{1},\ldots ,{x}_{n}\}$, we computed a single pixel map of standard deviations across all frames ${x}_{\mathrm{std}}=\sigma (\{{x}_{1},\ldots ,{x}_{n}\})$. This map was used to compute an Otsu threshold to binarize and label regions with the greatest variation in cine modality²¹. For each cine sequence, a binary segmentation mask of the heart ROI is defined for the length of the cardiac cycle. All segmentation masks went through manual checking. The localization procedure captures the heart ROI in around 90% of cases. The rest of the cases are labeled manually. (2) Manual labeling: we manually drew the bounding box capturing the heart ROI, using 3D Slicer⁶² and ITK-SNAP⁶³. We used the Scissors tool provided by the Segment Editor in 3D Slicer and the Polygon Inspector in ITK-SNAP to locate heart ROI. A binary segmentation mask was saved for each CMR sequence. For SAX LGE, we manually drew the annotations as model supervision.

In terms of model architecture, the detection model shares the classic U-net⁶⁴ backbone with three small adjustments: (1) batch normalization is replaced with instance normalization⁶⁵, (2) rectified linear unit (ReLU) is replaced with leaky ReLU⁶⁶ as the activation function and (3) additional auxiliary losses are added in the decoder to all but the two lowest resolutions. The model outputs the binary bounding box that extracts the heart ROI. For model training, we adopted Adam optimizer and stochastic gradient descent (SGD) with Nesterov momentum (μ = 0.99). The initial learning rate was set to be 0.01, and the decay of the learning rate followed the ‘Poly’ learning rate policy⁶⁷. Batch size was set to 36. Data augmentation included rotations, scaling, gamma correction and mirroring. The loss function is the sum of cross-entropy and Dice loss⁶⁸.

Video-based deep learning models and training details

Model architecture

For models based on cine sequence, we sampled a clip of 13 frames from each 25-frame cine video using a temporal stride of 2 and spatial size of 224 × 224, resulting in 7 × 56 × 56 input 3D tokens. The 3D patch partitioning layer obtains tokens, with each patch/token consisting of a 128-dimensional feature. In practice, 3D convolution without overlapping is applied for this tokenization, and the number of output channels is set to be 128 to project the features of each token to a 128-dimension.

The developed model consists of four stages, that is, four video swin transformer blocks. Each stage, besides the last stage, performs 2× spatial downsampling in the patch merging layer. It is worth noting that we do not downsample along the temporal dimension. The patch merging layer concatenates the features of each group of 2 × 2 spatially neighboring patches and applies a linear layer to project the concatenated features to half of their dimension. The video swin transformer block consists of a 3D window-based multihead self-attention module and a 3D-shifted window-based multihead self-attention module, followed by a feedforward network, that is, a two-layer multilayer perceptron, with Gaussian error linear unit nonlinearity in between. Layer normalization is applied before each multihead self-attention module and multilayer perceptron, and a residual connection is applied after each module. We used the base version of VST. The number of heads for each stage is 4, 8, 16 and 32. Extended Data Fig. 3a shows the schematic overview of the VST-based framework for modeling SAX cine.

Data augmentation

Model performance improved with increasing training data sample size. For the screening model, we used random rotation, random color jitter and adding random number. During each step of SGD in the training process, we perturbed each training sample, cine video sequences, with a random rotation (between −45 and +45 degrees for SAX cine and between −20 to +20 degrees for 4CH cine), random color jitter and with adding a number sampled uniformly between −0.1 and 0.1 to image pixels (pixel values are normalized) to increase or decrease brightness of the images. For LGE, we used random rotation between −45 and +45 degrees, random color jitter and random flip along the z-axis. Data augmentation resulted in improvement for all models.

Multimodality fusion

First, we developed VST-based models for SAX cine, 4CH cine and SAX LGE, respectively. Then, to fuse information from different modalities, we added a global average pooling layer following the last self-attention module for each VST model. This resulted in a 1,024-dimension feature vector from each modality. We further concatenated the 1,024-dimension vectors and added a fully connected layer on top of that to aggregate the features. The final fully connected softmax layer produces a distribution over the output classes. In terms of training, we loaded and froze the pretrained weights of each VST branch from different modalities using transfer learning⁶⁹ and only finetuned the last fully connected layers for feature aggregation.

Implementation details

Following the classic VST configuration²⁷, we employed an AdamW optimizer using a cosine decay learning rate scheduler and 2.5 epochs of linear warmup. A batch size of 32 was used. The backbone VST is initialized from the ImageNet⁷⁰ and Kinetics-600 (ref. ⁷¹) pretrained model; the head is randomly initialized. Model pretraining plays a strikingly important role in VST-based CMR interpretation. We also found that multiplying the learning rate of the backbone by 0.1 improves performance. Specifically, the initial learning rates for the pretrained backbone and randomly initialized head were set to be 1 × 10⁻⁴ and 1 × 10⁻³, respectively. The impact of learning rate modification on the VST backbone was systematically examined as below. We adopt 0.2 stochastic depth rate and 0.05 weight decay for the Swin base model used in this study. To prevent the models from becoming biased toward one class, we balanced the training datasets for both screening and diagnostics using the ClassBalancedDataset sampling strategy⁷². Each VST branch derived from the single modality was trained for 150 epochs and then fed into the fusion model, following with 20 epochs of finetuning particularly for the fusion layers. For inference, we set the batch size to be one and the number of workers to be four. The training time for model development using four NVIDIA GeForce RTX 3090 graphics processing units with 24 GB VRAM was about 77 h, and the inference time for each subject was only 0.233 s.

Learning rate on the VST backbone

The impact of learning rate modification on the VST backbone was systematically examined through a controlled experiment. The experiment encompassed a range of learning rates, from 1 × 10⁻² to 1 × 10⁻⁶, with a focus on their effects on the AI diagnostic model based on SAX cine. The investigation was conducted on the primary cohort (6,650 CVD patients), utilizing a twofold configuration for training and the remaining fold for testing. The model was trained for 150 epochs with five different learning rate initializations for the model backbone: 1 × 10⁻², 1 × 10⁻³, 1 × 10⁻⁴ (as applied in this study), 1 × 10⁻⁵ and 1 × 10⁻⁶. Other configurations were kept consistent for a fair and direct comparison, and the training loss for each scheme was plotted for analysis (Supplementary Fig. 3). From the depicted figure, several key observations emerge. When the learning rate is set too high (1 × 10⁻², curve in blue color), the model struggles to converge and the training loss fails to descend, in stark contrast to the more optimal setting of 1 × 10⁻⁴ (curve in green color). Notably, the model under the 1 × 10⁻² learning rate incorrectly classified all samples into the HCM class during testing. Conversely, when the learning rate is set too low (1 × 10⁻⁶, curve in purple color), the loss descends very slowly over the training period. As depicted in the figure, the loss curves for 1 × 10⁻⁵ and 1 × 10⁻⁶ remain at a relatively high level compared with the more effective setting of 1 × 10⁻⁴. Further evaluation included the calculation of F₁ and area under the receiver operating characteristic curve scores for the testing fold under the aforementioned experimental settings (Supplementary Fig. 3). Notably, the model trained with a learning rate of 1 × 10⁻² failed to converge and was consequently excluded from the quantitative metrics. According to the evaluation results, the initialized learning rate of 1 × 10⁻⁴ demonstrated superior performance compared with the other settings. Therefore, based on these comprehensive analyses, we selected 1 × 10⁻⁴ as the initialized learning rate for our experiment.

CNN–LSTM

We examined the conventional CNN–LSTM architecture in CMR interpretation. The CNN–LSTM consists of a DenseNet encoder with 40 layers and a growth rate of 12 for feature extraction and an LSTM for temporal feature aggregation. DenseNet encoder comprised a series of two-dimensional convolutions with kernel sizes 1 × 1 and 3 × 3 and global average pooling to extract the feature vector for each input frame. For LSTM, the feature vector for each input frame is fed into the LSTM module sequentially. LSTM fuses the feature vectors and produces the final classification score after one fully connected layer. For the training configuration of the CNN–LSTM model, we adopt the SGD optimizer with a learning rate of 0.001, a momentum of 0.9 and a weight decay of 0.001. A batch size of four is used for training and one is used for testing. The DenseNet encoder of the CNN–LSTM model is initialized from the pretrained model²¹ and the LSTM component is randomly initialized. We kept data augmentation, the input scheme and computational resources the same as VST models with the only difference: SAX cine inputs are resized to 64 × 64 due to CNN–LSTM memory constraints.

Quantitative assessment and statistical analysis

The performance of the AI models was evaluated by assessing their sensitivity, specificity, precision and F₁ score (harmonic mean of the predictive positive value and sensitivity), with two-sided 95% CIs, as well as the AUC of the ROC with two-sided CIs. The F₁ score is complementary to the AUC, which is particularly useful in the setting of multiclass prediction and less sensitive than the AUC in settings of class imbalance. For an aggregate measure of model performance, we computed the class frequency-weighted mean for the F₁ score and the AUC⁷³.

The cutoff value was set to 0.5 for screening; the CVD class with the highest probability was the diagnostic prediction. Precision, sensitivity (recall), specificity, PPV, NPV and F₁ score of each class are related to true-positive (TP), true-negative (TN), false-positive (FP) and false-negative (FN) rates, with formulas as follows:

$$\text{Sensitivity}=\frac{\mathrm{TP}}{\mathrm{TP}+\mathrm{FN}},$$

$$\text{Specificity}=\frac{\mathrm{TN}}{\mathrm{TN}+\mathrm{FP}},$$

$$\mathrm{Precision}=\frac{\mathrm{TP}}{\mathrm{TP}+\mathrm{FP}},$$

$$\mathrm{PPV}\,=\frac{\mathrm{TP}\,}{\mathrm{TP}+\mathrm{FP}},$$

$$\mathrm{NPV}\,=\frac{\mathrm{TN}\,}{\mathrm{TN}+\mathrm{FN}},$$

$${F}_{1}\text{-score}=\frac{2\times \mathrm{Precision}\times \mathrm{Sensitivity}}{\mathrm{Precision}+\mathrm{Sensitivity}}.$$

The ROC space is defined by 1 – specificity and sensitivity as the x axis and the y axis, respectively. It depicts relative trade-offs between true positive and false positive, as the classification threshold goes from zero to one. A random guess will give a point along the diagonal line from the bottom left to the top right. Points above the diagonal line represent good classification results and points below the line represent bad results. We applied the class frequency-weighted F₁ score and class frequency-weighted AUC to evaluate the performance of our diagnostic model, with the following formulas:

$${\rm{Weighted}}\,{F}_{1}\text{-}{\rm{score}}=\mathop{\sum }\limits_{i}^{C}{\mathrm{ratio}}_{i}{F}_{1}\text{-}{\mathrm{score}}_{i},$$

$${\rm{Weighted}}\,\mathrm{AUC}=\mathop{\sum }\limits_{i}^{C}{\mathrm{ratio}}_{i}{\mathrm{AUC}}_{i},$$

where ${{F}_{1}\text{-score}}_{i}$ and AUC_i denote the F₁ score and AUC for class i, respectively, and ${\mathrm{ratio}}_{i}$ denotes a frequency ratio for each class i.

In addition, to improve the model interpretability and visualize the features used by the DNN model that determine the final prediction, we used Grad-CAM²⁹ to localize important regions—saliency regions—by visualizing class-specific gradient information. In Grad-CAM, the neuron importance weight ${\alpha }_{k}^{\,c}$ is estimated as

$${\alpha }_{k}^{\,c}=\frac{1}{Z}\sum _{i}\sum _{j}\frac{\partial {y}^{\,c}}{\partial {A}_{{ij}}^{k}},$$

where y^c denotes the gradient score for class $c$ before the softmax and A^k denotes the feature map activation of the kth layer. After computing the neuron importance weights for each feature map, we can generate a heat map indicating the significant regions related to class $c$ by performing a weighted linear combination of the feature maps, followed with a ReLU activation function as

$${L}_{\mathrm{Grad}-\mathrm{CAM}}^{c}=\mathrm{ReLU}\left(\sum _{k}{\alpha }_{k}^{\,c}{A}^{k}\right).$$

We then used the Shapley values⁷⁴ to evaluate the influence of each input modality (SAX cine, 4CH cine and SAX LGE). The Shapley value is a principled attribution method used in AI to quantify the contribution of individual input features by assigning each input modality an importance value for a particular prediction. The definition of the Shapley value⁷⁵ is given in equations below:

$${{{\phi }}}_{i}\left(v\right)=\sum _{S\subset N\{i\}}{\left(\begin{array}{c}n\\ 1,\left|S\right|,n-\left|S\right|-1\end{array}\right)}^{-1}\left(v\left(S\cup \{i\}\right)-v\left(S\right)\right),$$

where ${\phi}_{i}\left(v\right)$ denotes the contribution value of input component i, namely the Shapley value of each input modality (player), $N$ is the number of layers and $v$ is a function mapping subsets of layers to the real numbers: $v:{2}^{N}\to {R}$, with $v\left(\varnothing \right)=0$, where $\varnothing$ denotes the empty set. A set of players is called a coalition. The function $v$ is called a characteristic function: if $S$ is a coalition of players, then $v(S)$, called the worth of coalition $S$, describes the total expected sum of payoffs the members of $S$ can obtain by cooperation. The sum extends over all subsets $S$ of $N$ not containing input component i; also note that $\left(\begin{array}{c}n\\ a,{b},{c}\end{array}\right)$ is the multinomial coefficient. This formula can also be interpreted as

$$\begin{array}{l}{{{\phi }}}_{i}\left(v\right)=\frac{1}{{\mathrm{Number}}\;{\rm{of}}\;{\rm{layers}}}\\\sum _{{\mathrm{coalitions}}\; {\mathrm{including}}\;i}\frac{{\mathrm{Marginal}}\;{\mathrm{contribution}}\; {\mathrm{of}}\;i\;{\mathrm{to}}\;{\mathrm{coalition}}}{{\mathrm{Number}}\; {\mathrm{of}}\; {\mathrm{coalitions}}\; {\mathrm{excluding}}\;i\; {\mathrm{of}}\; {\mathrm{this}}\; {\mathrm{size}}}.\end{array}$$

Diagnostic criteria of the CVDs and normal control

CAD or ischemic cardiomyopathy

The diagnosis of myocardial infarction or ischemic cardiomyopathy is based on the European Society of Cardiology, American College of Cardiology and American Heart Association committee criteria⁷⁶ with significant stenosis on invasive coronary angiography (CAG) or coronary computed tomography angiography, and CMR showed subendocardial or transmural LGE with matching coronary arteries. We excluded cases without available CAG present or inadequate image quality due to arrhythmia or respiratory motion artifact.

HCM

We followed the 2020 American Heart Association and American College of Cardiology guidelines for the diagnosis of patients with HCM⁷⁷. The clinical diagnosis of HCM was made by CMR showing a maximal end-diastolic wall thickness of ≥15 mm anywhere in the LV, in the absence of another cause of hypertrophy in adults. More limited hypertrophy (13–14 mm) can be diagnostic when present in family members of a patient with HCM or in conjunction with a positive genetic test.

We excluded cases with the following conditions:

1.

Valvular heart disease (aortic valve stenosis, etc.)
2.

Long-term uncontrolled hypertension
3.

Inflammatory heart disease (sarcoidosis, etc.)
4.

Infiltrative cardiomyopathy (amyloidosis, Fabry disease, etc.)
5.

Septal myectomy or alcohol ablation before CMR
6.

CMR images with poor quality

DCM

The diagnosis of DCM is based on the diagnostic criteria of the World Health Organization⁷⁸. Inclusion criteria were based on enlarged LV end-diastolic dimension (>60 mm) and reduced LVEF (<45%). The exclusion criteria were as follows:

1.

Significant stenosis of coronary artery (>50% stenosis, assessed on CAG or coronary computed tomography angiography)
2.

Severe valvular disease, hypertension or congenital heart disease
3.

Evidence of acute or subacute myocarditis (T2 weighted image and laboratory tests)
4.

Any other metabolic disease through medical documentation
5.

Inadequate CMR quality

LVNC

The diagnosis of LVNC is based on previous studies^32,79, as follows:

1.

The presence of noncompacted and compacted LV myocardium with a two-layered appearance, with at least involvement of the LV apex
2.

End-diastolic noncompaction/compaction ratio >2.3 on long-axis views and ≥3 on SAX views
3.

Noncompacted mass >20% of the global LV mass
4.

No pathologic (pressure/volume load, for example, hypertension) or physiologic (for example, pregnancy and vigorous physical activity) remodeling factors leading to excessive trabeculation

ARVC

The diagnostic standards for ARVC were based on the revised Task Force Criteria⁸⁰ score with either two major criteria, one major and two minor criteria or four minor criteria. The major criteria include regional RV akinesia or dyskinesia or dyssynchronous RV contraction, ratio of RV end-diastolic volume to body surface area >110 ml m⁻² (male) or >100 ml m⁻² (female) or RV ejection fraction <40%; fibrous replacement of the RV free wall myocardium, with or without fatty replacement of tissue on endomyocardial biopsy; repolarization abnormalities and depolarization or conduction abnormalities on ECG test.

CAM

The diagnosis of CAM is based on endomyocardial biopsy or extracardiac biopsy specimens showing positive birefringence with Congo red staining under polarized light, and with native and enhanced CMR imaging in a pattern consistent with CAM: LV wall thickness of more than 12 mm shown by CMR without other known cause, with and without diffuse LGE⁸¹.

RCM

RCM is characterized by ventricular filling difficulties with increased stiffness of the myocardium. The restrictive cardiomyopathies are defined as restrictive ventricular physiology in the presence of normal or reduced diastolic volumes⁵²^,82, as follows:

1.

Nondilated LV or RV with diastolic dysfunction
2.

Bi-atrial dilation
3.

Preserved ejection fraction (LVEF ≥50%)

We excluded subjects that met the following criteria:

1.

With a reduced LV systolic function
2.

Severe atrial fibrillation
3.

Severe valvular disease, hypertension or congenital heart disease
4.

Significant stenosis of coronary artery.

PAH

The diagnosis of PAH is based on the results of right heart catheterization examination. Patients are included in this study if they were clinically diagnosed as PAH⁸³:

1.

Mean pulmonary artery pressure (mPAP) ≥25 mmHg
2.

Pulmonary capillary wedge pressure (PCWP) <15 mmHg
3.

Pulmonary vascular resistance (PVR) >3 Wood units at rest

We excluded subjects with the following criteria:

1.

Any evidence of cardiomyopathy, myocarditis, CAD, myocardial infarction, valvular disease, or constrictive pericarditis.
2.

Any evidence of respiratory diseases.
3.

History of cardiac surgery

Congenital heart disease—Ebstein’s anomaly

The diagnosis of Ebstein’s anomaly is based on apical displacement of tricuspid valve leaflets (≥8 mm m⁻²) with fibrous and muscular attachments to the underlying myocardium³¹. Patients with other concomitant malformation (for example, congenitally corrected transposition with Ebstein’s anomaly) and history of cardiac surgery were excluded.

Acute myocarditis

The diagnosis of acute myocarditis is based on the diagnostic criteria for clinically suspected myocarditis, as recommended by the European Society of Cardiology Working Group on Myocardial and Pericardial Diseases⁸⁴, and is fulfilled by meeting the Lake Louise criteria⁸⁵ or by confirmation through endomyocardial biopsy.

Patients with clinically acute myocarditis had the following: acute chest pain, signs of acute myocardial injury (electrocardiographic changes and/or elevated troponin level) and increased laboratory markers of inflammation (for example, C-reactive protein level). CAD was excluded before cardiac MRI. Patients with preexisting CVD were excluded.

HHD

The diagnostic criteria for HHD include (1) a history of prolonged, uncontrolled arterial hypertension and (2) concentric hypertrophy with left ventricular maximal wall thickness ≥12 mm.

We excluded patients with the following conditions:

1.

Any other causes of LV hypertrophy
2.

Cardiomyopathy
3.

Obstructive coronary heart disease
4.

Severe valvular disease
5.

Inflammatory heart disease
6.

Severe ventricular arrhythmia such as ventricular tachycardia or left bundle branch block
7.

Poor CMR imaging quality

Normal controls

Healthy controls were recruited as volunteers without CVDs (including cardiomyopathy, CAD, severe arrhythmia or conduction block, valvular disease, congenital heart disease and so on) and other organic or systemic diseases on the comprehensive evaluation by patient history, clinical assessment, ECG and echocardiography.