A widely used dimension algorithm in data analysis is the correlation dimension. Fix m, a positive integer, and r, a positive real number. Given a time-series of data u(1), u(2), …, u(N),from measurements equally spaced in time, form a sequence of vectors x(1), x(2),…, x(N- m + 1) in R’, defined by x(i) = [u(i), u(i+ 1),…,u(i+ m – 1)]. Next, define for each i, 1 ≤ i ≤ N – m + 1,
Cmi (r)= (number of j such that d[x(i), x(j)] ≤ r)/(N-m+1) ———- [1]
We must define d[x(i), x(j)] for vectors x(i) and x(j). We define
d[x(i), x(j)]= maxk = 1,2,…,m (|u(i+k-1) – u(j+k-1)j) ———- [2]
From the Cmi (r), define
Cm(r) = (N- m + i)-1 ∑(N – m + 1)i = 1 Cmi (r) ———- [3]
and define
βm = limr → 0 limn → ∞ log Cm(r)/log r ———- [4]
The assertion is that for m sufficiently large, βmis the correlation dimension. Such a limiting slope has been shown to exist for the commonly studied chaotic attractors. This procedure has frequently been applied to experimental data; investigators seek a “scaling range”of r values for which log Cm(r)/log r is nearly constant for large m, and they infer that this ratio is the correlation dimension. In some instances, investigators have concluded that this procedure establishes deterministic chaos.
The later conclusion is not necessarily correct: a converged, finite correlation dimension value does not guarantee that the defining process is deterministic. Consider the following stochastic process. Fix 0 ≤ p ≤1. Define Xj = α-l/2 sin(2πj/12) ∀ j,where α is specified below. Define Yj as a family of independent identicaly distributed (i.i.d.) real random variables, with uniform density on the interval [-√3, √3]. Define Zj = 1 with probability p, Zj = 0 with probability 1 – p.
α = (∑j = 112 sin2(2πj/12)/12 ———- [5]
and define MI Xj = (1- Zj) Xj + ZjYj. Intuitively, MI X(p) is generated by first ascertaining, for each j, whether the jth sample will be from the deterministic sine wave or from the random uniform deviate, with likelihood (1- p) of the former choice, then calculating either Xj or Yj. Increasing p marks a tendency towards greater system randomness. We now show that almost surely βmin [4] equals 0 ∀ m for the MI X(p) process, p ≠ 1. Fix m, define Kj = (12m)j- 12m, and define Nj = 1 if (MI Xk(j)+l,…, k(j)+m) = (X1,. . ., Xm), Nj = 0 otherwise. The Nj are i.i.d.random variables, with the expected value of Nj, E(Nj) ≥ (1- p)m. By the Strong Law of Large Numbers,
limN → ∞ ∑j = 1N Nj/N = E(N1) ≥ (1-p)m
Observe that (∑j = 1N Nj/12 mN)2 is a lower bound to Cm(r), since xk(i)+1,…., xk(j)+1 if Ni = Nj = 1. Thus for r ‹ 1
limN → ∞ sup log Cm(r)/log r ≤ (1/log r) limN → ∞ (∑j = 1N Nj/12 mN)2 ≤ log (1-p)2m/(12m)2/log r
Since, (1-p)2m/(12m)2 is independent of r, βm = limr → 0 limN → ∞ log Cm(r)/log r = 0. Since, βm ≠ 0 with probability 0 for each m, by countable additivity, ∀m, βm = 0.
The MIX(p) process can be motivated by considering an autonomous unit that produces sinusoidal output, surrounded by a world of interacting processes that in ensemble produces output that resembles noise relative to the timing of the unit. The extent to which the surrounding world interacts with the unit could be controlled by a gateway between the two, with a larger gateway admitting greater apparent noise to compete with the sinusoidal signal. It is easy to show that, given a sequence Xj, a sequence of k = 1, 2,…, m i.i.d.Yj, defined by a density function and independent of the Xj, and Zj = Xj + Yj, then Zj has an infinite correlation dimension. It appears that correlation dimension distinguishes between correlated and uncorrelated successive iterates, with larger estimates of dimension corresponding to uncorrelated data. For a more complete interpretation of correlation dimension results, stochastic processes with correlated increments should be analyzed. Error estimates in dimension calculations are commonly seen. In statistics, one presumes a specified underlying stochastic distribution to estimate misclassification probabilities. Without knowing the form of a distribution, or if the system is deterministic or stochastic, one must be suspicious of error estimates. There often appears to be a desire to establish a noninteger dimension value, to give a fractal and chaotic interpretation to the result, but again, prior to a thorough study of the relationship between the geometric Hausdorff dimension and the time series formula labeled correlation dimension, it is speculation to draw conclusions from a noninteger correlation dimension value.