Authors: Lucile Ter-Minassian; Natalia Viani; Alice Wickersham; Lauren Cross; Robert Stewart; Sumithra Velupillai; Johnny Downs · Research
Can Machine Learning Predict ADHD Using School Records?
Researchers used machine learning on school records to predict ADHD diagnosis with high accuracy, while reducing biases.
Source: Ter-Minassian, L., Viani, N., Wickersham, A., Cross, L., Stewart, R., Velupillai, S., & Downs, J. (2022). Assessing machine learning for fair prediction of ADHD in school pupils using a retrospective cohort study of linked education and healthcare data. BMJ Open, 12(12), e058058. https://doi.org/10.1136/bmjopen-2021-058058
What you need to know
- Machine learning models using school records can predict ADHD diagnosis with high accuracy
- These models could help identify areas of need and guide resource allocation for ADHD services
- Techniques were used to reduce biases against ethnic minority and non-English speaking groups
- Further validation is needed before these models could be used in practice
Using school data to predict ADHD
Attention-deficit/hyperactivity disorder (ADHD) is a common neurodevelopmental condition that affects many children. However, it often goes unrecognized and untreated. To improve access to services, it’s important to be able to accurately predict which children may be at high risk of having ADHD. This allows resources and support to be allocated effectively.
Researchers from King’s College London investigated whether machine learning techniques applied to school records could predict ADHD diagnosis. They used a large dataset that linked education records to healthcare data for over 56,000 pupils in South London.
The education data included things like school performance, attendance, special educational needs status, and demographic information. This was linked to data on ADHD diagnoses from child mental health services.
Different machine learning models were tested to see how accurately they could identify children diagnosed with ADHD based only on their school records. The researchers also examined which factors were most important for predicting ADHD.
High accuracy in predicting ADHD
The study found that machine learning models could predict ADHD diagnosis with a high degree of accuracy using only school data. The best performing models were able to correctly identify over 85% of ADHD cases.
Some of the most important factors for predicting ADHD were:
- Writing performance at age 6-7
- Personal, social and emotional development in early years
- School attendance
- Gender (boys were more likely to be diagnosed)
- Special educational needs status
Interestingly, factors like ethnicity and language were less important predictors once other variables were accounted for.
The models performed well at distinguishing ADHD from the general population. However, they were less accurate at differentiating ADHD from other mental health conditions when tested on a clinical sample.
Addressing potential biases
An important consideration with any predictive model is the potential for bias. The researchers recognized that ADHD is often underdiagnosed in ethnic minority groups and non-English speakers.
To address this, they used techniques to “reweight” the data and reduce biases against these groups. This helped ensure the models weren’t systematically overlooking ADHD risk in minority populations.
Promisingly, the bias reduction techniques were able to minimize disparities without reducing the overall accuracy of the predictions.
Potential applications and limitations
If further validated, these types of machine learning models could have several useful applications:
- Estimating ADHD prevalence and need for services in different areas or schools
- Helping to identify children who may benefit from ADHD assessment
- Informing decisions about resource allocation for ADHD support
However, there are some important limitations to consider:
- The study was conducted in one region of London. The models would need to be tested in other areas to ensure they generalize.
- While accuracy was high, it wasn’t perfect. The models shouldn’t be used to diagnose individuals, only to estimate population-level risk.
- The education data may have its own inherent biases that weren’t fully addressed.
- Some relevant factors, like parental characteristics, weren’t captured in the data.
How machine learning works
To understand these findings, it’s helpful to have a basic grasp of how machine learning works:
Machine learning involves using algorithms (sets of rules) that can automatically detect patterns in large amounts of data. The algorithms “learn” from examples to make predictions on new data.
In this study, the machine learning models were trained on the school records of thousands of children, along with information on which ones were diagnosed with ADHD. By analyzing all this data, the models were able to pick up on patterns and combinations of factors associated with ADHD.
Once trained, the models could then look at a new child’s school record and estimate how likely they were to have ADHD, based on the patterns identified in the training data.
Different types of machine learning models were tested, including:
- Logistic regression: A statistical method that estimates the probability of an outcome based on multiple variables
- Random forests: An approach that combines many decision trees to make predictions
- Support vector machines: A technique that finds the optimal boundary between different categories in the data
- Neural networks: Models inspired by the human brain that can detect complex patterns
The logistic regression and random forest models performed best in this particular study.
Reducing algorithmic bias
An important aspect of this research was the effort to reduce potential biases in the machine learning models.
Algorithmic bias occurs when a model systematically produces unfair outcomes for certain groups. This can happen if the training data is biased or if the model picks up on and amplifies existing societal biases.
In the case of ADHD, we know that children from some ethnic minority groups and non-English speakers are less likely to be diagnosed, even when they have symptoms. This could be due to factors like reduced access to healthcare or cultural differences in recognizing ADHD.
If a machine learning model is trained on biased data, it may learn to predict lower risk of ADHD for these groups, further perpetuating the disparity.
To address this, the researchers used a technique called “fairness weighting.” This involved adjusting the importance of different examples in the training data to ensure fair representation across ethnic and language groups.
The goal was for the model to predict similar rates of ADHD across demographic groups, all else being equal. This helps prevent the model from learning and amplifying existing biases in diagnosis rates.
Conclusions
- Machine learning models using school data show promise for predicting ADHD risk at a population level
- These approaches could help estimate need and inform resource allocation for ADHD services
- Techniques to reduce algorithmic bias are important to ensure fair predictions across demographic groups
- Further validation in other populations is needed before such models could be implemented in practice
While these findings are encouraging, it’s important to remember that machine learning models are tools to support human decision-making, not replace it. Any predictions would need to be followed up with proper clinical assessment.
With further development and validation, this type of approach could potentially help identify children who may benefit from ADHD evaluation earlier. This could lead to more timely support for those who need it.