ICA/PCA

Artifacts are abundant in EEG data. Electrodes collect activity from a variety of sources: including external noise generated from monitors, AC/DC currents, keyboards or phones as well as from a participant’s eye movements, heartbeat and actions. Detecting these artifacts before analysis is critical because they are often much higher in amplitude than brain signals and as a consequence can greatly dilute the results of an experiment. Independent component analysis (ICA) has been proven to be an effective way to isolate signals from independent brain and non-brain sources. However, EEG datasets often contain electrical activity measured at 100s of thousands of time points in several electrodes, across multiple subjects and conditions. Making sense of this high dimensional data can be a daunting task that is both extraordinarily time consuming for ICA, and often leaves the data too granular to correctly identify which components are related to artifacts. Principle component analysis (PCA) is a dimensionality reduction technique that can reduce the number of dimensions that are fed into the ICA beforehand. Although there are several reasons why PCA should be avoided before ICA, when using high density EEG, it is often a better choice. For more information on this see the paper by Artoni et. al. (2018).

As discussed above, ICA is an effective way to separate EEG data into neural activity and artifact because it isolates the data into components that are unique. As a general rule, finding stable components from N channels typically requires far more than data sample points. Because our data was too short for this given the high number of electrodes, we used PCA to narrow it down to 32 components and EEGlab will run ICA on those components. If you follow the tutorials on preprocessing and ERPs, you should already have the EEGlab toolbox in matlab. To conduct ICA/PCA on your data, the following code can be applied. This will output files that have ICA components saved for 32 components.

clear all
addpath('...pathtomatlab/MATLAB/eeglab2019_0');
eeglab
sfpfile ='GSN-HydroCel-257.sfp';
elecSetup = {'1:256' 'Cz'};
parentfolder = '...where_data_is_located';

fileList = dir(fullfile(parentfolder, '.set'));
numsubjects=length(fileList)
ALLERP = buildERPstruct([]);
CURRENTERP = 0;


 for s=1:numsubjects
    subject = fileList(s).name;
    subjname = char(extractBefore(subject, '_'))
    fprintf('\n\n\n*** Processing subject %d (%s) ***\n\n\n', s, subject);
    EEG = pop_loadset('filename',subject,'filepath',parentfolder);
    EEG = pop_runica(EEG, 'icatype', 'binica', 'pca',32);
    EEG.setname = [subjname '_ica.set'];
    EEG = eeg_checkset(EEG);
    EEG = pop_saveset( EEG, 'filename', EEG.setname,'filepath', parentfolder);
    [ALLEEG EEG] = eeg_store(ALLEEG, EEG, CURRENTSET);
    fprintf('ICA saved and complete.\n\n');
 end

Once you have the ICA, you will need to go through and look at each component to decide which ones are artifacts and which ones are related to brain activity. We discuss 4 different factors that should be considered when selecting which components are a result of artifacts:

1) The scalp distribution of the component

2) Inspecting the original EEG signal overlaid on the recovered signal after the selected component is removed

Figure 1: A topographical map of what a heart blink artifact looks like when detected with an EEG

3) Looking at the time locked activity of the component per trial

4) The power spectrum of the component

We discuss how to identify each of these components in a few examples of artifacts, with images to reference them

EKG Artifact:

The topographical map for EKG/heart beat artifacts typically look like a gradient from left to right as in figure 1 (and usually start from the top corner to the bottom corner). For heartbeats, we see electrical activity peak periodically every ~1000ms as in figure 2 (can range from 500 to 1200 ms). In addition, the power spectrum activity has small peaks in power around 4 and 6 Hz, and you should not see a characteristic alpha power that would indicate that there is brain activity (see figure 3). The last way to confirm that this is the right component is to plot the EEG that is reconstructed after the specified component is removed on top of the original EEG as in figure 4. The reconstructed signal should be very close to the original EEG except for where you expect the deflections from the specific artifact to occur.

Figure 3: Characteristic power spectrum for the component that we suspect represents heart beats.

Figure 4: The reconstructed EEG signal with the EKG artifact component removed is shown in red, while the original component is in blue. The blue circles show where the EKG deflection occurs and it is clear that once this component is removed, you d… — Figure 4: The reconstructed EEG signal with the EKG artifact component removed is shown in red, while the original component is in blue. The blue circles show where the EKG deflection occurs and it is clear that once this component is removed, you don’t see the deflection anymore.

The code needed to get these components to appear is shown below:

fileList = dir(fullfile(parentfolder, '*_ica.set'));
numsubjects=length(fileList)
ALLERP = buildERPstruct([]);
CURRENTERP = 0;

 for s=1:numsubjects
    subject = fileList(s).name;
    subjname = char(extractBefore(subject, '_'))
    fprintf('\n\n\n*** Processing subject %d (%s) ***\n\n\n', s, subject);
    EEG = pop_loadset('filename',subject,'filepath',parentfolder);
    eeglab redraw
    pop_selectcomps(EEG, [1:32] );
    visualcheckcomp = input(['Press 1 to reject by visual inspection, press 0 to SKIP visual inspection and continue:']);                
    while visualcheckcomp==1
       num_comp = input(['Which component would you like to look at?']);
       EEG = pop_subcomp( EEG, [num_comp], 2); 
       visualcheckcomp = input(['Press 1 to reject by visual inspection, press 0 to SKIP visual inspection and continue:']);                
    end
    visualcheckcomp = input(['Press 1 to reject by visual inspection, press 0 to SKIP visual inspection and continue:']);                

    if visualcheckcomp==0
        comp_to_remove = input(['Which components would you like to remove?']);
        EEG = pop_subcomp( EEG, comp_to_remove, 0);
        savecomptoremove = ['comp_to_remove' subjname '.mat']
        save(savecomptoremove, 'comp_to_remove')
    end
    EEG.setname = [subjname '_ica_rej.set']; % Save _be
    EEG = eeg_checkset(EEG);
    EEG = pop_saveset( EEG, 'filename', EEG.setname,'filepath', parentfolder);
    [ALLEEG EEG] = eeg_store(ALLEEG, EEG, CURRENTSET);
    fprintf('ICA saved and complete.\n\n');
    eeglab redraw
    moveonwithloop = input(['Press any Key to Move ON']);  
    close all
 end

Figures 4-6 show the characteristic properties you should look for for additional artifacts that are typically found in EEG data.

Figure 5: Left-right eye movement components typically show a topographical map whereby the activity with opposing dipoles are found on the left and right frontal electrodes. In addition, depending on the task you typically see that there are certai… — Figure 5: Left-right eye movement components typically show a topographical map whereby the activity with opposing dipoles are found on the left and right frontal electrodes. In addition, depending on the task you typically see that there are certain times when eye movements are more common. Moreover, you should always expect that there is no characteristic alpha power peak (a peak at 10 Hz) in the power spectrum if the signal is independent of brain activity.

Figure 6: Eye blink components typically show a topographical map whereby the activity with the highest power is found at the frontal-central component of the head.

Figure 7: Electrical button press components typically show a topographical map whereby the activity is highly concentrated in one specific localized area of the brain. In addition, if you have an an event that corresponds to the button press in you… — Figure 7: Electrical button press components typically show a topographical map whereby the activity is highly concentrated in one specific localized area of the brain. In addition, if you have an an event that corresponds to the button press in your data you should expect to see the activity centered around or right next to the event. As always there should be no characteristic alpha power peak (a peak at 10 Hz) in the power spectrum if the signal is independent of brain activity.

Methodological Concerns

One study has shown that using PCA to limit the number of dimensions before applying ICA can adversely affect the stability of ICA components under repeated decomposition. However, since we have shown this tutorial using high density EEG, doing an ICA without a PCA is not feasible. Not only would this drastically increase computational time, but correctly identifying each artifact component when they are split into numerous sub-components would not be possible. Another alternative to removing epochs that contain artifacts would reduce the power so significantly that it would render the results meaningless.