Preprocessing

Picture1.png

When EEG data is collected it is populated with both external and internal noise. This can come from electrical signals in the room, picking up electrical activity from muscle movements, heart beats and button presses among many other confounding factors. In order to conduct a proper EEG study, it is critical to pre-process the data in such a way that you maximize the amount of brain data you keep and minimize the affects of any external or internal noise.

First you need to determine what format your EEG data is collected in before it can be imported into the matlab toolbox, EEGlab, for preprocessing. The steps followed are outlined in Figure 1 and discussed in more detail below

Figure 1: Preprocessing steps that we outline

Figure 1: Preprocessing steps that we outline

Filtering

EEG signals are composed of multiple frequencies. Some frequencies are not physiologically meaningful and removing those can greatly increase the signal to noise ratio. Based on recommendations in “An Introduction To The Event-Related Potential Technique” the band pass filter range selected was between 0.05 Hz and 30 Hz. In addition, because the alternating currents from electrical outlets in North America oscillate at 60 Hz, a notch filter was applied to further attenuate this frequency from the data.

Identify and Interpolate Bad Channels

With high density EEG, there are often several electrodes that are either unable to connect strongly to the scalp or are contaminated by artifactual noise during data collection. We identify these electrodes by using an estimate implemented in the “FASTER” algorithm. FASTER stands for Fully Automated Statistical Thresholding for EEG artifact Rejection and uses the variance, amplitude range, median gradient and channel deviation to threshold the data. We select a Z-score threshold of greater than 3 for these properties in order to select noisy channels. The selected ‘bad’ channels were removed with spherical interpolation. It is recommended that you validate how well this function works by trying to identify bad channels manually on a few subjects and comparing the results. All the electrodes identified by FASTER should also be removed visually, however, there should be additional noisy channels not identified by FASTER.  This is expected it is beneficial to set a conservative threshold conservative since automatic thresholding uses correlations to get rid of the data (ie. if ‘bad’ electrodes are correlated with each other they will not be picked up). After channels are interpolated using automatic thresholding, each dataset is inspected visually for additional noisy channels that may have been missed. These channels are then also interpolated using spherical interpolation. In cases where more than 10% of channels were removed (<25 channels) the subject was removed from the data.

Re-referencing to average

It is critical to set a reference in EEG, as in nature only the differences between two potentials can be measured 56,57. Selecting an appropriate reference against which to compare all the electrodes depends on 3 critical considerations:

1)    The choice of reference should not affect the source reconstruction

2)    The position of the reference electrode should not be close to an area where we expect our main effects to be.

3)    Using a reference in either hemisphere would introduce an undesired laterality bias in the data.

Epoch the data

Events were added in the EEG data using behavioral output files. The EEG data was then clipped around the events of interest, retaining 500 ms before and 1000 after. This time period was selected as it allows for enough time to see the ERP components without extending into other stimulus. Epoching was done using the ERPlab and EEGlab toolbox in matlab.

Manually Remove bad epochs

The clipped data was manually inspected for bad epochs, which were marked and removed. If more than half of the epochs were removed in any condition, the data for that participant was removed.

ICA/PCA to get rid of eyeblinks

Independent Component Analysis (ICA) is an effective way to separate EEG data into neural activity and artifact because it isolates the data into components that are unique. With high density EEG, it is important to use PCA combined with ICA to reduce the number of components. Four factors should be considered when identifying which components were a result of artifacts:

1)    The scalp distribution of the component

2)    Inspecting the original EEG signal overlaid on the recovered signal after the selected component is removed

3)    Looking at the time locked activity of the component per trial

4)    The power spectrum of the component

addpath(genpath('/MATLAB/eeglab2019_0'));
eeglab
%make sure that the sfp file is in the specified folder
sfpfile ='GSN-HydroCel-257.sfp'
elecSetup = {'1:256' 'Cz'};
save_everything = 1; %set to 0 if you dont want steps in between saved
DependentFiles='filepath_that_contains_helper_functions';
parentfolder = 'filepath_with_raw_data/';
stepsPath = 'filepath_to_contain_steps_of_preprocessing/';
outputFilesPath = 'filepath_with_preprocessededata/';
fileList = dir(fullfile(parentfolder, '*.raw'));
numsubjects=length(fileList)
ALLERP = buildERPstruct([]);
CURRENTERP = 0;

Following this you want to import the data as follows.

for s=1:numsubjects
    subject = fileList(s).name;
    subjname = char(extractBefore(subject, '_'))
    subjectfile = [parentfolder subject];
    EEG = pop_readegi(subjectfile, [],[],'auto');
    EEG = eeg_checkset( EEG );
    EEG.setname = subjname;
    EEG = pop_saveset( EEG, 'filename', subjname,'filepath', stepsPath);
    eventspath = 'if_you_have_modified_events/';
    EEG = pop_importevent( EEG, 'append','no','event',[eventspath subjname 'events.csv'],'fields',{'number' 'type' 'latency' 'uevent'},'skipline',1,'timeunit',0.001,'align',0);

The rest of the code for this can be requested if you email me via the contact page. You can filter the data using the following code:

    EEG  = pop_basicfilter( EEG,  1:256 , 'Boundary', 'epoc', 
       'Cutoff',  60, 'Design', 'notch', 'Filter', 'PMnotch', 
        'Order',  180, 'RemoveDC', 'on' );

In order to identify bad channels you should use the following link to download the FASTER algorithm. You can create a function that will run this function in matlab and then call it in your preprocessing function. The following code also plots the data so you can identify bad channels visually.

    [badChannels, EEG] = amnaFASTER(EEG, outputFilesPath, subjname);
    BadChannelsPath = 'save_bad_channels/';
    save([BadChannelsPath subjname], 'badChannels');
    pop_spectopo( EEG);
    badChannelsVis = input('Enter BAD electrodes here as a vector');   
    display(' ');
    YesorNo = input('Definitely interpolate these channels? Press 1 for yes, press 0 for no:');
    EEG = pop_interp(EEG, badChannels, 'spherical');

In order to reference to average use the following code:

    EEG = pop_reref( EEG, []);

To do the ICA and then Epoch the data see the ICA/PCA section and the ERP section.