DSPSR documentation: dspsr

hosted by

DSPSR user documentation: `dspsr`

Spectral Kurtosis based Radio Frequency Interference Mitigation

Depending on the input data type, dspsr can perform Radio Frequency Interference mitigation using either

the Spectral Kurtosis statistic (Nita & Gary 2010a) for either undetected/baseband voltage time series or Nyquist-sampled intensities; or
the Generalized Spectral Kurtosis statistic (Nita & Gary 2010b) for time series of integrated intensities.

Spectral Kurtosis based outlier detection is enabled with the -skz command line option. By default, dspsr will estimate the Spectral Kurtosis (SK) using blocks of 128 samples and apply a cutoff threshold of 3 standard deviations (false alarm rate of 0.3%).

The number of samples used to estimate the SK can be set to M with -skzm M and the cutoff threshold can be set to sigma standard deviations with -skzs sigma.

To oversample the SK estimates by a factor of N, such that sample blocks overlap by a factor of (N-1)/N, use -skzover N.

By default, dspsr tests three different SK estimates in the following order.

sk-time-frequency: for each block of M time samples, the SK is computed for each frequency channel;
sk-time: for each block of M time samples, the SK estimates computed for each frequency channel are averaged over all frequency channels; and
sk-frequency: for each frequency channel, a single SK estimate is re-computed using all time samples currently in memory.

Each of these independent tests can be disabled with the -skz_no_ft, -skz_no_fscr, and -skz_no_tscr command-line options, respectively.

By default, any block of samples identified as an outlier in any one of the above three tests will not be masked in subsequent tests. This default behaviour also applies when computing the SK statistic on multiple time scales, as described below. To omit outliers detected during an earlier test from SK estimates computed during later tests, add -skz_omit to the command line.

Multiple Comparisons

When the input data include timeseries for two polarizations, the SK is computed and tested independently for each polarization. In this case, the false alarm rate is treated as the family wide error rate, and the threshold applied to each polarization is increased accordingly.

Currently, a similar correction is not implemented to compensate for the multiple comparisons made when sample blocks overlap, when multiple tests (sk-time-frequency, sk-time, and sk-frequency) are performed, or when the SK is computed on multiple time scales. Therefore, when oversampling, multiple tests, and/or multiple timescales are used, the false-alarm rate will be greater than that expected for a given threshold.

Computing the Spectral Kurtosis on multiple time scales

Important Note: To compile dspsr's new YAML-based configuration file parser, it is necessary to install yaml-cpp; e.g. from the source code (requires cmake)

git clone https://github.com/jbeder/yaml-cpp.git
mkdir yaml-cpp/build
cd yaml-cpp/build
module load cmake/3.10.2
cmake -DCMAKE_INSTALL_PREFIX:PATH=$PSRHOME/$LOGIN_ARCH -DCMAKE_CXX_FLAGS="$CXXFLAGS -fPIC" ..
make
make install

It is also necessary to configure dspsr with C++11 enabled and PKG_CONFIG_PATH set to find the yaml-cpp.pc file; e.g.

setenv CXXFLAGS "$CXXFLAGS -std=c++11"
setenv PKG_CONFIG_PATH $PSRHOME/$LOGIN_ARCH/share/pkgconfig:$PKG_CONFIG_PATH

Finally, reconfigure and recompile dspsr; e.g.

cd dspsr/
./update
./configure options
make && make install

YAML Format Configuration File

With YAML support enabled, you can specify a Spectral Kurtosis configuration file in YAML format on the command line; e.g.

dspsr -skz -skz_config skz.yaml ...

For example, the following YAML configuration file

[
{ M: 128, overlap: 2 },
{ M: 256, overlap: 2 }
]

instructs dspsr to run spectral kurtosis in two iterations, each time with different block lengths (M=128 and M=256); for each iteration, blocks used to estimate the SK statistic are oversampled by a factor of 2.

Note that iterations can be configured independently, and the configuration for an iteration is a comma-separated list of parameters enclosed in {braces}. Also note that M must be divisible by the specified overlap, and adjacent blocks overlap by (overlap-1)/overlap; e.g. for overlap=2, adjacent blocks overlap by 50%.

The following YAML configuration excludes two blocks of 1024 channels from RFI mitigation in only the first iteration (with M=128). Note that channel index ranges are inclusive.

[
{ M: 128, overlap: 2, exclude: [[101,1124],[3001,4024]] },
{ M: 256, overlap: 2 }
]

The following YAML configuration specifies that RFI mitigation will be performed on only one block of 1024 channels in both iterations.

[
{ M: 128, overlap: 2, include: [3001,4024] },
{ M: 256, overlap: 2, include: [3001,4024] }
]

The YAML interface is flexible. For example, the input need not be a sequence of maps. If executing only a single iteration, a single map can also be input on several rows; e.g.

M: 128
overlap: 2
include: [3001,4024]

Similarly, the include and exclude attributes will accept either a single pair of start and end channel indeces, as in

include: [3001,4024]

or a sequence of start/end channel pairs, as in

exclude: [[101,512],[1096,2048],[3001,4024]]

An arbitrary number of iterations and include/exclude windows can be specified. The logic follows this sequence:

If no include ranges are specified, then all channels are included for zapping
If one or more include ranges are specified, then all channels are excluded from zapping except those ranges specified
If one or more exclude ranges are specified, then those ranges specifed are excluded from zapping

Developer Notes: Input Buffering

Blocks of M samples are oversampled by a factor of n_overlap (blocks overlap by a factor of (n_overlap-1)/n_overlap)
Neighbouring blocks are offset by idat_off time samples
Nblock is the number of overlapping blocks that fit within a block of ndat time samples
Only time samples that have been analysed in n_overlap overlapping blocks are considered "done"
Therefore, only those time samples between idat_start and idat_end are passed on to the next step in the signal chain
The sums for the last n_overlap -1 blocks (highlighted in yellow) are re-computed on the next call to this routine
The input buffer stores all time samples from idat_next to the end of the data block for use on the next iteration
Given that the first idat_start samples will also be discarded from the start on the next iteration, the next idat_start will be the current idat_end

Developer Notes: Multiple Timescales

The input sums (orange) integrated over M₁ samples with an overlap factor of n_overlap,1 and idat_off,1 = M₁ / n_overlap,1 can be used to calculate output sums (green), integrated over M₂ samples with an overlap factor of n_overlap,2 and idat_off,2 = M₂ / n_overlap,2, where

stride between input sums = idat_off,2 / idat_off,1
number of input sums integrated = M₂ / M₁