Software Open Access
Sample Entropy Estimation
Published: Nov. 8, 2004. Version: 1.0.0
Goldberger AL, Amaral LAN, Glass L, Hausdorff JM, Ivanov PCh, Mark RG, Mietus JE, Moody GB, Peng C-K, Stanley HE. PhysioBank, PhysioToolkit, and PhysioNet: Components of a New Research Resource for Complex Physiologic Signals (2003). Circulation. 101(23):e215-e220.
sampen [ option ... ] [ input-file ]
Sample Entropy is a useful tool for investigating the dynamics of heart rate and other time series. Sample Entropy is the negative natural logarithm of an estimate of the conditional probability that subseries (epochs) of length m that match pointwise within a tolerance r also match at the next point.
This program calculates the sample entropy of the time series given in the specified (text format) input-file. (If no input-file is specified, sampen reads the time series from its standard input.) The outputs are the sample entropies of the input, for all epoch lengths of 1 to a specified maximum length, m.
The algorithm builds up runs of points matching within the tolerance r until there is not a match, and keeps track of template matches in counters A(k) and B(k) for all lengths k up to m. If a particular run ends up being of length 4, for example, then that means that 1 is added to the count for template matches of length 4. In addition, there are 2 template matches of length 3, 3 of length 2, and 4 of length 1 that need to be added to the corresponding counts. A special case is needed when a run ends at the last point in the data, where the A(k) counters are incremented but the B(k) counters are not. Once all the matches are counted, the sample entropy values are calculated by SampEn(k,r,N)=-ln(A(k)/B(k-1) for k=0,1,...,m-1 with B(0)=N, the length of the input series.
The algorithm to find runs starts by finding all points that match the first point within a tolerance r. The points that match begin a run of length 1 and those that don't match have runs of length 0. If the points after those with runs of length 1 match the second point, the runs are now of length 2; otherwise, the run is ended. If the points after those with runs of length 0 match the second point, the runs are now of length 1. This procedure of finding runs is continued until the end of the data.
The -n option normalizes the data before finding matches, which is equivalent to the common practice of expressing the tolerance as r times the standard deviation. The -v option estimates the standard deviations for the SampEn statistic using a procedure described in the references below. This calculation is rather involved and can increase the calculation time considerably, especially for large N.
- -h r
- Print a usage summary.
- -m m
- Set the maximum epoch length to m. Default is m = 2.
- "Normalize" the time series prior to the estimation of sample entropy, by transforming the time series to have sample mean 0 and sample variance 1.
- -r r
- Set the tolerance to r. Default is r = 0.2.
- Output an estimate of the standard deviation of the sample entropy estimate
for each epoch length.
The following references discuss Sample Entropy and a closely related entropy measure called Approximate Entropy:
- Lake, D. E., J. S. Richman, M. P. Griffin, and J. R. Moorman.
- Sample entropy analysis of neonatal heart rate variability. Am J Physiol 2002; 283(3):R789-R797;
- Richman, J. S. and J. R. Moorman.
- Physiological time series analysis using approximate entropy and sample entropy. Am J Physiol 2000; 278(6):H2039-H2049; http://ajpheart.physiology.org/content/278/6/H2039.abstract
Matlab code for calculating Sample Entropy is also available; see http://www.physionet.org/physiotools/sampen/matlab/.
Also see this note about testing sampen and interpreting its output.
DK Lake (dlake at virginia dot edu), JR Moorman and Cao Hanqing.
Total uncompressed size: 572.7 KB.