Cluster Permutation

With electrical information measured at 1000s of time points for 100s of sensors, EEG experiments typically produce very high dimensional data. Whenever there is analyses with higher dimensions than the subject pool, the results may fail to fit additional data or predict future observations reliably. In other words, when numerous individual tests with the p < 0.05 threshold are conducted, the actual error rate greatly exceeds the nominal rate (5%). Correction for multiple comparisons must be applied, however many of these methods reduce power and curtail the likelihood of revealing a true effect – if there is one. Although increasing the subject pool to an appropriate level is not a feasible option, there are a few ways in which statistics can be performed on EEG data in a valid way. One way to do this is to have certain electrodes or time points of interest set a priori. Another approach is to group certain electrodes and time points a priori and determine which parameters are significant with a correction that takes into account the smaller number of tests that are done. However, if a more exploratory approach is required, cluster permutation analyses can address the multiple comparisons problem.

Cluster permutation relies on the assumption that true effects should be clustered in both time and space and has two major components:

1. One component is the cluster‐forming algorithm, which reduces the high dimensional data into smaller units based on spatio-temporal clustering.

2. The other requires a null hypothesis, against which the observed data is compared to obtain p-values using permutation tests. Performing a full permutation test would be computationally intractable. However, a special class of approximations, Monte‐Carlo sampling, can be done and yield satisfactory results. The Monte-Carlo simulation repeats random sampling to better determine the underlying correlation.

It is important to recognize that because p-values are determined from cluster level statistics, thee p-value of a cluster does not necessarily represent that of a single member of that cluster. Thus, cluster based statistics only provide weak [family-wise] error rate control, and other statistics would need to be performed to determine more precise location and timing of the effect.

The specific parameters that we used for cluster permutation are outlined below. Cluster‐based permutation tests do not, however, control the false alarm rate at the level of the (channel, frequency, time)‐triplets or the (channel, frequency)‐pairs. According to a review of permutation tests by Groppe et al., (2007):

“It is important to note that because p values are derived from cluster level statistics, the p value of a cluster may not be representative of any single member of that cluster. For example, if the p value for a cluster is 5%, one cannot be 95% certain that any single member of that cluster is itself significant […]. One is only 95% certain that there is some effect in the data. Technically, this means that cluster‐based tests provide only weak [family‐wise error rate] control”.

Specification
Description
Selected Value






cfg.latency
Whatever time range you expect to see changes in (in seconds after event onset)
Variable


cfg.frequency
The frequency(s) at which we expect to see differences
Variable


cfg.method
We select monte carlo stimulation to sample several times and increase the accuracy.
‘montecarlo’


cfg.statistic
What statistical method is used? The one we selected calculates the dependent samples T-statistic
‘ft_statfun_depsamplesT’

cfg.clusteralpha
The alpha threshold for each cluster (two tailed)
0.025 or lower

cfg.correctm
The method for correction
Cluster

cfg.minnbchan
The minimum number of channels in a cluster
2

cfg.alpha
The overall alpha threshold
0.025 or lower

cfg.numrandomization
The number of randomizations in the monte carlo stimulation. Over 1000 is ideal
1000+

cfg_neighb.method
This parameter specifies how to construct the neighborhood. Triangulation selects the nearest direct neighbors whereas distance selects the electrodes within a 3-D Euclidean distance 
‘distance’

Specification	Description	Selected Value
cfg.latency	Whatever time range you expect to see changes in (in seconds after event onset)	Variable
cfg.frequency	The frequency(s) at which we expect to see differences	Variable
cfg.method	We select monte carlo stimulation to sample several times and increase the accuracy.	‘montecarlo’
cfg.statistic	What statistical method is used? The one we selected calculates the dependent samples T-statistic	‘ft_statfun_depsamplesT’
cfg.clusteralpha	The alpha threshold for each cluster (two tailed)	0.025 or lower
cfg.correctm	The method for correction	Cluster
cfg.minnbchan	The minimum number of channels in a cluster	2
cfg.alpha	The overall alpha threshold	0.025 or lower
cfg.numrandomization	The number of randomizations in the monte carlo stimulation. Over 1000 is ideal	1000+
cfg_neighb.method	This parameter specifies how to construct the neighborhood. Triangulation selects the nearest direct neighbors whereas distance selects the electrodes within a 3-D Euclidean distance	‘distance’

The code used for this is shown below. You will need to have Fieldtrip downloaded, and it can be tricky getting data for both ERPs and fieldtrip in the right format to use. The tutorial on time frequency analysis will outline how to do it for time frequency analysis. If you need help applying cluster permutation on your ERPs directly, please reach out to me and I can provide some guidance!

avgTFRList = []; 
parentfolder = 'where_files_are_located';
fileList = dir(fullfile(parentfolder, 'what_file_names_start_with*'));
for s = 1:length(fileList)
    subject = fileList(s).name;
    subjname = char(extractAfter(subject, '_'));
    if startsWith(subjname,'T')
        avgTFRList{1, end+1} = subject;
    end
end

cfg = [];
cfg.channel          = {'all'}; %put in different channels here
cfg.latency          = [0 1.8];
cfg.frequency        = 20;
cfg.method           = 'montecarlo';
cfg.statistic        = 'ft_statfun_depsamplesT';
cfg.correctm         = 'cluster';
cfg.clusteralpha     = 0.05;
cfg.clusterstatistic = 'maxsum';
cfg.minnbchan        = 2;
cfg.tail             = 0;
cfg.clustertail      = 0;
cfg.alpha            = 0.025;
cfg.numrandomization = 2000;
% specifies with which sensors other sensors can form clusters
cfg_neighb.method    = 'distance';
cfg.neighbours       = ft_prepare_neighbours(cfg_neighb, file1);

subj = 10;
design = zeros(2,2*subj);
for i = 1:subj
design(1,i) = i;
end
for i = 1:subj
design(1,subj+i) = i;
end
design(2,1:subj)        = 1;
design(2,subj+1:2*subj) = 2;

cfg.design   = design;
cfg.uvar     = 1;
cfg.ivar     = 2;

[stat] = ft_freqstatistics(cfg, file1, file2)