Cluster Permutation

32443683-29c3e71a-c300-11e7-88c0-fc26d36b2fc8.png

With electrical information measured at 1000s of time points for 100s of sensors, EEG experiments typically produce very high dimensional data. Whenever there is analyses with higher dimensions than the subject pool, the results may fail to fit additional data or predict future observations reliably. In other words, when numerous individual tests with the p < 0.05 threshold are conducted, the actual error rate greatly exceeds the nominal rate (5%). Correction for multiple comparisons must be applied, however many of these methods reduce power and curtail the likelihood of revealing a true effect – if there is one. Although increasing the subject pool to an appropriate level is not a feasible option, there are a few ways in which statistics can be performed on EEG data in a valid way.  One way to do this is to have certain electrodes or time points of interest set a priori. Another approach is to group certain electrodes and time points a priori and determine which parameters are significant with a correction that takes into account the smaller number of tests that are done. However, if a more exploratory approach is required, cluster permutation analyses can address the multiple comparisons problem.

Cluster permutation relies on the assumption that true effects should be clustered in both time and space and has two major components:

1.     One component is the cluster‐forming algorithm, which reduces the high dimensional data into smaller units based on spatio-temporal clustering.

2.     The other requires a null hypothesis, against which the observed data is compared to obtain p-values using permutation tests. Performing a full permutation test would be computationally intractable. However, a special class of approximations, Monte‐Carlo sampling, can be done and yield satisfactory results. The Monte-Carlo simulation repeats random sampling to better determine the underlying correlation.

It is important to recognize that because p-values are determined from cluster level statistics, thee p-value of a cluster does not necessarily represent that of a single member of that cluster. Thus, cluster based statistics only provide weak [family-wise] error rate control, and other statistics would need to be performed to determine more precise location and timing of the effect.

The specific parameters that we used for cluster permutation are outlined below. Cluster‐based permutation tests do not, however, control the false alarm rate at the level of the (channel, frequency, time)‐triplets or the (channel, frequency)‐pairs.  According to a review of permutation tests by Groppe et al., (2007):

“It is important to note that because p values are derived from cluster level statistics, the p value of a cluster may not be representative of any single member of that cluster. For example, if the p value for a cluster is 5%, one cannot be 95% certain that any single member of that cluster is itself significant […]. One is only 95% certain that there is some effect in the data. Technically, this means that cluster‐based tests provide only weak [family‐wise error rate] control”.

Specification Description Selected Value
cfg.latency Whatever time range you expect to see changes in (in seconds after event onset) Variable
cfg.frequency The frequency(s) at which we expect to see differences Variable
cfg.method We select monte carlo stimulation to sample several times and increase the accuracy. ‘montecarlo’
cfg.statistic What statistical method is used? The one we selected calculates the dependent samples T-statistic ‘ft_statfun_depsamplesT’
cfg.clusteralpha The alpha threshold for each cluster (two tailed) 0.025 or lower
cfg.correctm The method for correction Cluster
cfg.minnbchan The minimum number of channels in a cluster 2
cfg.alpha The overall alpha threshold 0.025 or lower
cfg.numrandomization The number of randomizations in the monte carlo stimulation. Over 1000 is ideal 1000+
cfg_neighb.method This parameter specifies how to construct the neighborhood. Triangulation selects the nearest direct neighbors whereas distance selects the electrodes within a 3-D Euclidean distance ‘distance’

The code used for this is shown below. You will need to have Fieldtrip downloaded, and it can be tricky getting data for both ERPs and fieldtrip in the right format to use. The tutorial on time frequency analysis will outline how to do it for time frequency analysis. If you need help applying cluster permutation on your ERPs directly, please reach out to me and I can provide some guidance!

avgTFRList = []; 
parentfolder = 'where_files_are_located';
fileList = dir(fullfile(parentfolder, 'what_file_names_start_with*'));
for s = 1:length(fileList)
    subject = fileList(s).name;
    subjname = char(extractAfter(subject, '_'));
    if startsWith(subjname,'T')
        avgTFRList{1, end+1} = subject;
    end
end

cfg = [];
cfg.channel          = {'all'}; %put in different channels here
cfg.latency          = [0 1.8];
cfg.frequency        = 20;
cfg.method           = 'montecarlo';
cfg.statistic        = 'ft_statfun_depsamplesT';
cfg.correctm         = 'cluster';
cfg.clusteralpha     = 0.05;
cfg.clusterstatistic = 'maxsum';
cfg.minnbchan        = 2;
cfg.tail             = 0;
cfg.clustertail      = 0;
cfg.alpha            = 0.025;
cfg.numrandomization = 2000;
% specifies with which sensors other sensors can form clusters
cfg_neighb.method    = 'distance';
cfg.neighbours       = ft_prepare_neighbours(cfg_neighb, file1);

subj = 10;
design = zeros(2,2*subj);
for i = 1:subj
design(1,i) = i;
end
for i = 1:subj
design(1,subj+i) = i;
end
design(2,1:subj)        = 1;
design(2,subj+1:2*subj) = 2;

cfg.design   = design;
cfg.uvar     = 1;
cfg.ivar     = 2;

[stat] = ft_freqstatistics(cfg, file1, file2)