Stress-GPT: Towards Stress Quantification with an EEG-based Foundation Model
Proceedings of the 30th Annual International Conference on Mobile Computing and Networking (ACM MobiCom '24), Washington DC, USA, November 2024, pp. 2341–2346.
Stress is a pervasive health concern with significant implications for physical and mental wellbeing. Electroencephalography (EEG) offers a non-invasive window into brain activity that can reveal stress-related changes, but traditional approaches using statistical distributions and wavelet transforms often struggle with the variable and dynamic characteristics of EEG signals. This project explores whether large pre-trained foundation models can overcome these limitations. The work fine-tunes Neuro-GPT, a model pre-trained on over 20,000 EEG recordings from the Temple University Hospital EEG dataset, on a 40-subject open stress dataset to classify low-stress and high-stress states. Two fine-tuning strategies are evaluated: Encoder+GPT, which uses both the EEG encoder and the GPT decoder, and Encoder-Only, which uses only the encoder component. The model achieves an average accuracy of 74.4% in quantifying stress levels. The study also benchmarks against traditional machine-learning methods, providing key observations to guide future research on foundation models for biosignal analysis.
Motivation. Traditional machine-learning approaches to physiological sensing have relied on specialised tools trained for specific tasks such as epileptic seizure detection or sleep staging. These task-specific models require extensive labelled training data and often fail to generalise across different EEG contexts. Foundation models, pre-trained on large corpora and then fine-tuned for downstream tasks, offer a promising alternative that can leverage learned representations to perform well even with limited task-specific data.
Approach. The project leverages Neuro-GPT, which combines an EEG encoder with a GPT-based decoder. The architecture allows multiple fine-tuning strategies: the full Encoder+GPT pipeline retains both components for maximum representational power, while the Encoder-Only variant uses just the pre-trained encoder, offering faster inference at the cost of some expressiveness. Both strategies are evaluated on the SAM 40 open stress dataset comprising recordings from 40 subjects under controlled low-stress and high-stress conditions.
Results. The fine-tuned foundation model achieves 74.4% average accuracy in binary stress classification. Comparisons with traditional baselines including Support Vector Machines, Random Forests, and conventional neural networks highlight the strengths and current limitations of the foundation model approach, offering practical guidance for researchers applying large pre-trained models to biosignal classification.