ICA/PCA
Artifacts are abundant in EEG data. Electrodes collect activity from a variety of sources: including external noise generated from monitors, AC/DC currents, keyboards or phones as well as from a participant’s eye movements, heartbeat and actions. Detecting these artifacts before analysis is critical because they are often much higher in amplitude than brain signals and as a consequence can greatly dilute the results of an experiment. Independent component analysis (ICA) has been proven to be an effective way to isolate signals from independent brain and non-brain sources. However, EEG datasets often contain electrical activity measured at 100s of thousands of time points in several electrodes, across multiple subjects and conditions. Making sense of this high dimensional data can be a daunting task that is both extraordinarily time consuming for ICA, and often leaves the data too granular to correctly identify which components are related to artifacts. Principle component analysis (PCA) is a dimensionality reduction technique that can reduce the number of dimensions that are fed into the ICA beforehand. Although there are several reasons why PCA should be avoided before ICA, when using high density EEG, it is often a better choice. For more information on this see the paper by Artoni et. al. (2018).
As discussed above, ICA is an effective way to separate EEG data into neural activity and artifact because it isolates the data into components that are unique. As a general rule, finding stable components from N channels typically requires far more than data sample points. Because our data was too short for this given the high number of electrodes, we used PCA to narrow it down to 32 components and EEGlab will run ICA on those components. If you follow the tutorials on preprocessing and ERPs, you should already have the EEGlab toolbox in matlab. To conduct ICA/PCA on your data, the following code can be applied. This will output files that have ICA components saved for 32 components.
clear all addpath('...pathtomatlab/MATLAB/eeglab2019_0'); eeglab sfpfile ='GSN-HydroCel-257.sfp'; elecSetup = {'1:256' 'Cz'}; parentfolder = '...where_data_is_located'; fileList = dir(fullfile(parentfolder, '.set')); numsubjects=length(fileList) ALLERP = buildERPstruct([]); CURRENTERP = 0; for s=1:numsubjects subject = fileList(s).name; subjname = char(extractBefore(subject, '_')) fprintf('\n\n\n*** Processing subject %d (%s) ***\n\n\n', s, subject); EEG = pop_loadset('filename',subject,'filepath',parentfolder); EEG = pop_runica(EEG, 'icatype', 'binica', 'pca',32); EEG.setname = [subjname '_ica.set']; EEG = eeg_checkset(EEG); EEG = pop_saveset( EEG, 'filename', EEG.setname,'filepath', parentfolder); [ALLEEG EEG] = eeg_store(ALLEEG, EEG, CURRENTSET); fprintf('ICA saved and complete.\n\n'); end
Once you have the ICA, you will need to go through and look at each component to decide which ones are artifacts and which ones are related to brain activity. We discuss 4 different factors that should be considered when selecting which components are a result of artifacts:
1) The scalp distribution of the component
2) Inspecting the original EEG signal overlaid on the recovered signal after the selected component is removed
3) Looking at the time locked activity of the component per trial
4) The power spectrum of the component
We discuss how to identify each of these components in a few examples of artifacts, with images to reference them
EKG Artifact:
The topographical map for EKG/heart beat artifacts typically look like a gradient from left to right as in figure 1 (and usually start from the top corner to the bottom corner). For heartbeats, we see electrical activity peak periodically every ~1000ms as in figure 2 (can range from 500 to 1200 ms). In addition, the power spectrum activity has small peaks in power around 4 and 6 Hz, and you should not see a characteristic alpha power that would indicate that there is brain activity (see figure 3). The last way to confirm that this is the right component is to plot the EEG that is reconstructed after the specified component is removed on top of the original EEG as in figure 4. The reconstructed signal should be very close to the original EEG except for where you expect the deflections from the specific artifact to occur.
The code needed to get these components to appear is shown below:
fileList = dir(fullfile(parentfolder, '*_ica.set')); numsubjects=length(fileList) ALLERP = buildERPstruct([]); CURRENTERP = 0; for s=1:numsubjects subject = fileList(s).name; subjname = char(extractBefore(subject, '_')) fprintf('\n\n\n*** Processing subject %d (%s) ***\n\n\n', s, subject); EEG = pop_loadset('filename',subject,'filepath',parentfolder); eeglab redraw pop_selectcomps(EEG, [1:32] ); visualcheckcomp = input(['Press 1 to reject by visual inspection, press 0 to SKIP visual inspection and continue:']); while visualcheckcomp==1 num_comp = input(['Which component would you like to look at?']); EEG = pop_subcomp( EEG, [num_comp], 2); visualcheckcomp = input(['Press 1 to reject by visual inspection, press 0 to SKIP visual inspection and continue:']); end visualcheckcomp = input(['Press 1 to reject by visual inspection, press 0 to SKIP visual inspection and continue:']); if visualcheckcomp==0 comp_to_remove = input(['Which components would you like to remove?']); EEG = pop_subcomp( EEG, comp_to_remove, 0); savecomptoremove = ['comp_to_remove' subjname '.mat'] save(savecomptoremove, 'comp_to_remove') end EEG.setname = [subjname '_ica_rej.set']; % Save _be EEG = eeg_checkset(EEG); EEG = pop_saveset( EEG, 'filename', EEG.setname,'filepath', parentfolder); [ALLEEG EEG] = eeg_store(ALLEEG, EEG, CURRENTSET); fprintf('ICA saved and complete.\n\n'); eeglab redraw moveonwithloop = input(['Press any Key to Move ON']); close all end
Figures 4-6 show the characteristic properties you should look for for additional artifacts that are typically found in EEG data.
Methodological Concerns
One study has shown that using PCA to limit the number of dimensions before applying ICA can adversely affect the stability of ICA components under repeated decomposition. However, since we have shown this tutorial using high density EEG, doing an ICA without a PCA is not feasible. Not only would this drastically increase computational time, but correctly identifying each artifact component when they are split into numerous sub-components would not be possible. Another alternative to removing epochs that contain artifacts would reduce the power so significantly that it would render the results meaningless.