Cluster Permutation
With electrical information measured at 1000s of time points for 100s of sensors, EEG experiments typically produce very high dimensional data. Whenever there is analyses with higher dimensions than the subject pool, the results may fail to fit additional data or predict future observations reliably. In other words, when numerous individual tests with the p < 0.05 threshold are conducted, the actual error rate greatly exceeds the nominal rate (5%). Correction for multiple comparisons must be applied, however many of these methods reduce power and curtail the likelihood of revealing a true effect – if there is one. Although increasing the subject pool to an appropriate level is not a feasible option, there are a few ways in which statistics can be performed on EEG data in a valid way. One way to do this is to have certain electrodes or time points of interest set a priori. Another approach is to group certain electrodes and time points a priori and determine which parameters are significant with a correction that takes into account the smaller number of tests that are done. However, if a more exploratory approach is required, cluster permutation analyses can address the multiple comparisons problem.
Cluster permutation relies on the assumption that true effects should be clustered in both time and space and has two major components:
1. One component is the cluster‐forming algorithm, which reduces the high dimensional data into smaller units based on spatio-temporal clustering.
2. The other requires a null hypothesis, against which the observed data is compared to obtain p-values using permutation tests. Performing a full permutation test would be computationally intractable. However, a special class of approximations, Monte‐Carlo sampling, can be done and yield satisfactory results. The Monte-Carlo simulation repeats random sampling to better determine the underlying correlation.
It is important to recognize that because p-values are determined from cluster level statistics, thee p-value of a cluster does not necessarily represent that of a single member of that cluster. Thus, cluster based statistics only provide weak [family-wise] error rate control, and other statistics would need to be performed to determine more precise location and timing of the effect.
The specific parameters that we used for cluster permutation are outlined below. Cluster‐based permutation tests do not, however, control the false alarm rate at the level of the (channel, frequency, time)‐triplets or the (channel, frequency)‐pairs. According to a review of permutation tests by Groppe et al., (2007):
“It is important to note that because p values are derived from cluster level statistics, the p value of a cluster may not be representative of any single member of that cluster. For example, if the p value for a cluster is 5%, one cannot be 95% certain that any single member of that cluster is itself significant […]. One is only 95% certain that there is some effect in the data. Technically, this means that cluster‐based tests provide only weak [family‐wise error rate] control”.
Specification | Description | Selected Value |
---|---|---|
cfg.latency | Whatever time range you expect to see changes in (in seconds after event onset) | Variable |
cfg.frequency | The frequency(s) at which we expect to see differences | Variable |
cfg.method | We select monte carlo stimulation to sample several times and increase the accuracy. | ‘montecarlo’ |
cfg.statistic | What statistical method is used? The one we selected calculates the dependent samples T-statistic | ‘ft_statfun_depsamplesT’ |
cfg.clusteralpha | The alpha threshold for each cluster (two tailed) | 0.025 or lower |
cfg.correctm | The method for correction | Cluster |
cfg.minnbchan | The minimum number of channels in a cluster | 2 |
cfg.alpha | The overall alpha threshold | 0.025 or lower |
cfg.numrandomization | The number of randomizations in the monte carlo stimulation. Over 1000 is ideal | 1000+ |
cfg_neighb.method | This parameter specifies how to construct the neighborhood. Triangulation selects the nearest direct neighbors whereas distance selects the electrodes within a 3-D Euclidean distance | ‘distance’ |
The code used for this is shown below. You will need to have Fieldtrip downloaded, and it can be tricky getting data for both ERPs and fieldtrip in the right format to use. The tutorial on time frequency analysis will outline how to do it for time frequency analysis. If you need help applying cluster permutation on your ERPs directly, please reach out to me and I can provide some guidance!
avgTFRList = []; parentfolder = 'where_files_are_located'; fileList = dir(fullfile(parentfolder, 'what_file_names_start_with*')); for s = 1:length(fileList) subject = fileList(s).name; subjname = char(extractAfter(subject, '_')); if startsWith(subjname,'T') avgTFRList{1, end+1} = subject; end end cfg = []; cfg.channel = {'all'}; %put in different channels here cfg.latency = [0 1.8]; cfg.frequency = 20; cfg.method = 'montecarlo'; cfg.statistic = 'ft_statfun_depsamplesT'; cfg.correctm = 'cluster'; cfg.clusteralpha = 0.05; cfg.clusterstatistic = 'maxsum'; cfg.minnbchan = 2; cfg.tail = 0; cfg.clustertail = 0; cfg.alpha = 0.025; cfg.numrandomization = 2000; % specifies with which sensors other sensors can form clusters cfg_neighb.method = 'distance'; cfg.neighbours = ft_prepare_neighbours(cfg_neighb, file1); subj = 10; design = zeros(2,2*subj); for i = 1:subj design(1,i) = i; end for i = 1:subj design(1,subj+i) = i; end design(2,1:subj) = 1; design(2,subj+1:2*subj) = 2; cfg.design = design; cfg.uvar = 1; cfg.ivar = 2; [stat] = ft_freqstatistics(cfg, file1, file2)