Automatic segmentation of brain tissues and white matter hyperintensities of presumed vascular origin (WMH) in MRI of older patients is widely described in the literature. Although brain abnormalities and motion artefacts are common in this age group, most segmentation methods are not evaluated in a setting that includes these items. In the present study, our tissue segmentation method for brain MRI was extended and evaluated for additional WMH segmentation. Furthermore, our method was evaluated in two large cohorts with a realistic variation in brain abnormalities and motion artefacts.
The method uses a multi-scale convolutional neural network with a T1-weighted image, a T2-weighted fluid attenuated inversion recovery (FLAIR) image and a T1-weighted inversion recovery (IR) image as input. The method automatically segments white matter (WM), cortical grey matter (cGM), basal ganglia and thalami (BGT), cerebellum (CB), brain stem (BS), lateral ventricular cerebrospinal fluid (lvCSF), peripheral cerebrospinal fluid (pCSF), and WMH.
Our method was evaluated quantitatively with images publicly available from the MRBrainS13 challenge (n = 20), quantitatively and qualitatively in relatively healthy older subjects (n = 96), and qualitatively in patients from a memory clinic (n = 110). The method can accurately segment WMH (Overall Dice coefficient in the MRBrainS13 data of 0.67) without compromising performance for tissue segmentations (Overall Dice coefficients in the MRBrainS13 data of 0.87 for WM, 0.85 for cGM, 0.82 for BGT, 0.93 for CB, 0.92 for BS, 0.93 for lvCSF, 0.76 for pCSF). Furthermore, the automatic WMH volumes showed a high correlation with manual WMH volumes (Spearman's ρ = 0.83 for relatively healthy older subjects). In both cohorts, our method produced reliable segmentations (as determined by a human observer) in most images (relatively healthy/memory clinic: tissues 88%/77% reliable, WMH 85%/84% reliable) despite various degrees of brain abnormalities and motion artefacts.
In conclusion, this study shows that a convolutional neural network-based segmentation method can accurately segment brain tissues and WMH in MR images of older patients with varying degrees of brain abnormalities and motion artefacts.
Observing the vessel-like structure of PVS, we propose a segmentation technique based on the 3D Frangi filtering12, largely used for enhancing blood vessels, for instance in retinal images19. Given the absence of an accurate computational “ground truth” (i.e. manual labels of each PVS by experts), we propose a modelling technique to use the available information (i.e. PVS burden assessed using visual rating scales) to optimize the filter parameters. For this scope, an ordered logit model17 has been used to simulate the relationship between the number of PVS and the rating categories, taking into account the uncertainty in the measurements. The framework of the proposed optimization process is illustrated in Fig. 2.
PVS masks, obtained as described in Ramirez et al.10, were available for the SDS dataset. These masks obtained using Lesion Explorer20, which implements 2 false positive minimization strategies: (i) in order to reduce errors from minor imaging artifacts and improve differentiation from lacunar infarcts, candidate PVS are required to satisfy acceptance criteria from both T1W and T2W, and rejection criteria from PD, and (ii) to address potential registration errors and partial volume effects, the cortical Gray Matter segmentation was dilated by 1 voxel. This resulted in a relatively conservative estimate of the overall PVS burden and thus, limited its utility as a Ground Truth (GT) for segmentation optimization, as well as for pixel-wise evaluation of the results.
Two established visual rating scales for PVS severity were used in the present work. Previous work has demonstrated their comparability7,10.
The visual rating scale developed by Potter et al.7 (in the following called Wardlaw scale) required users to rate PVS burden on T2-weighted MRI in each of three major anatomical brain regions: midbrain, basal ganglia and centrum semiovale. According to the online user guide (http://www.sbirc.ed.ac.uk/documents/epvs-rating-scale-user-guide.pdf), PVS in the latter region should be assessed in the slice and hemisphere with the highest number, and rated as 0 (no PVS), 1 (mild; 1–10 PVS), 2 (moderate; 11–20 PVS), 3 (frequent; 21–40 PVS) or 4 (severe; >40 PVS).
The PVS scores proposed by Patankar et al.21 were based principally on the appearances seen on T1W inversion recovery images. PVS should be scored in the centrum semiovale as 0 (none), 1 (less than five per side), 2 (more than five on one or both sides), reflecting the lesser visibility of PVS on T1W.
Two slightly modified versions of these rating methods, as previously described10, were also used in this work. Coregistered MRIs were used for assessment, with T2W for primary identification, T1W for confirmation, and Proton Density (PD) for rejection as required. To reduce ceiling effects and account for a greater range of PVS, the Patankar scale was standardized: 0 (none), 1 (one to five), 2 (six to ten), 3 (eleven to fifteen), 4 (sixteen or more). To reduce double-counting, a slice increment of 3 was implemented as a standardized rating protocol. Centrum Semiovale was defined as the White Matter (WM) projections superior to the ventricles, present in each of the cerebral hemispheres under the cerebral cortex.
PVS were assessed in 20 representative cases of the SDS dataset by three raters: two experienced neuroradiologists using the two modified Wardlaw and Patankar visual rating scales, and a third rater strictly following the guideline of the original Wardlaw7 and Patankar21 rating methods. The two ratings (modified Wardlaw and Patankar) of the first raters were close to the conservative estimate of PVS burden obtained as described above. Inter-rater reliability was high (ICC = 0.99, <0.001) as previously discussed10. The third rater counted all visible PVS in the slice with the highest number in T1W and T2W, including the very small ones discarded by the first raters. All raters were blind to each other.
Frangi12 analyses the second order derivatives of an image I, defined in the Hessian matrix Hs(v) as:
to describe the “vesselness” F(v) of a voxel v at scale s as:
where λ1, λ2 and λ3 are the ordered eigenvalues (|λ1| ≤ |λ2| ≤ |λ3|) of the Hessian matrix, RA = |λ2|/|λ3|, RB = |λ1|/(|λ2λ3|)1/2, , and α, β, c are thresholds which control the sensitivity of the filter to the measures RA, RB and S.
For a bright tubular structure in a 3D image we expect: |λ1| ≤ |λ2|, |λ3| and ; and λ2, λ3 ≤ 0. For a dark structure λ2, λ3 ≥ 0 and the conditions in Eq. 2 should be reversed.
Given a set of scales s∈ [smin, smax], the responses are combined as:
where smin and smax are the minimum and maximum scales at which relevant structures are expected to be found12.
Ordered Logit Model
An ordered logit model defines the relationship between an ordinal variable (y) which can vary between 0 and m(m∈N+), and the vector of independent variables (x) by using a latent continuous variable () defined in an one-dimensional space characterized by threshold points (μ0, …, μm−1) as described in equation:
where β and μi are parameters to be estimated, ε is the error component which has a logistic random distribution with expected value equal to 0 and variance equal to , that accounts for the measurement error. This modelling approach provides a relevant methodology for capturing the sources of influence (independent variables) that explain an ordinal variable (dependent variable) taking into account the measurement uncertainty of such data17.
Since is not a deterministic quantity, it is only possible to define the probability to belong to each class:
where L is the logistic cumulative distribution function.
In our work, the ordinal variable (y) is the rating class (from 0 to 4) and the independent variable (x) is the number of PVS.
The ordered logit model has been calibrated by maximizing a likelihood function based on a synthetic dataset generated in 3 steps. In the first step 1000 numbers of PVS Count (PCi, i = 1, ..., 1000) have been generated using a log-normal distribution (see Fig. 3a), that reflects the observed PVS distribution in known datasets11. In the second step, the uncertainty has been simulated for each PCi casting a New value of PVS Count (NPCi) using a normal distribution with mean equal to PCi and standard deviation equal to one. Therefore, the probability that NPCi is included between PCi − 3 and PCi + 3 is 0.997. These values reflect our measurements uncertainty11. In the third step, a Rating Class (RCij) has been assigned to each generated NPCi.
Assuming m classes, the log-likelihood function can be written as:
where RCij is equal to one if the ith generated number belong to the jth rating class and it is equal to zero otherwise. The Broyden-Fletcher-Goldfarb-Shanno (BFGS) algorithm has been used to estimate the ordered logit parameters.
For the Wardlaw scale7, a rating class from 0 to 4, being 0(none), 1(1–10), 2(11–20), 3(21–40), 4(>40) PVS, has been assigned to each generated number. The estimated parameters are β = 0.514, μ0 = −2.840, μ1 = 5.708, μ2 = 10.497, μ3 = 20.040, and the model is illustrated in Fig. 3b.
For the Patankar scale10 a rating class from 0 to 4, being 0(none), 1(1–5), 2(6–10), 3(11–15), 4(>15) PVS, has been assigned to each generated number. The estimated parameters for the Patankar rating scale are β = 1.906, μ0 = 2.269, μ1 = 9.569, μ2 = 18.995, μ3 = 28.639, and the model is illustrated in Fig. 3c.
Images were preprocessed to generate the Region-of-Interest (ROI) masks. A fuzzy C-means clustering algorithm was applied to T1 images22. This is an unsupervised iterative clustering technique that effectively assigns each voxel to one of 4 membership classes: background, Cerebrospinal Fluid (CSF), Gray Matter (GM), and White Matter (WM). After a series of morphological and thresholding operations, the CSF and GM re-labelled voxels were combined to generate the final CSFGM mask which was used for false positive minimization. To avoid PVS mislabelled as GM, an hole filling procedure was used. The Centrum Semiovale (CS) was automatically identified as the region of WM, superior to the lateral ventricles previously obtained using Lesion Explorer