Crist Clark, in a posting on the NANOG mailing list, started an interesting thread on analyzing network traffic based upon frequency analysis rather than the traditional time based analysis. He started the thread by asking about Fourier Analysis on network traffic time series. A number of responses indicated that Wavelet Analysis might be the 'more modern' approrach. This type of analysis has been used for Network Traffic Anomoalies Detection. The responses indicate that operating systems can be deduced through analysis of RTD (Round Trip Delay) of ping generated traffic.
The thread started with:
Crist Clark started:
Has anyone found any value in examining network utilization numbers
with Fourier analyses? After staring at pretty MRTG graphs for a bit
too long today, I'm wondering if there are some interesting periodic
characteristics in the data that could be easily teased out beyond,
"Well, the diurnal fluctuations are obvious, but looks like we may
have some hourly traffic spikes in there too. And maybe some of those
are bigger every fourth hour."
Dave Plonka Responded:
Such techniques are used in the are of network anomaly detection.
For instance, a search for "network anomaly detection" at scholar.google.com will yield very many results.
Our 2002 paper, "A Signal Analysis of Network Traffic Anomalies"
[ACM SIGCOMM Internet Measurement Workshop 2002, Barford, et al.], is one such work. We mention that we use wavelet analysis
rather than Fourier analysis because wavelet/framelet analysis is able to localize events both in the frequency and time
domains, whereas Fourier analysis would localize the events only in frequency, so an iterative approach (with varying intervals
of time) would be necessary.
In general, this is the reason why Fourier analysis has not been a common technique used in network anomaly detection.
That work used data stored in RRD files at five minute intervals.
Our subsequent work used data stored at one second intervals, again in RRD files.
Anton Kapela had a couple of messages and a
link (look for Kapela):
Indeed, there are. Interesting things emerge in frequency (or phase) space - bits/sec, packets/sec, and ave size, etc. - all
have new meaning, often revealing subtle details otherwise missed. The UW paper [Barford/Plonka et. al] is one of my favories
and often referenced in other publications.
Along similar lines, I presented a lightning talk at nanog that demonstrates using windowed Ft's (mostly Gaussian or Hamming)
in three-axis graphs (i.e. 'waterfalls') available in common tools (buadline, sigview, labview, etc) for characterizing round
trip times through various network queues and queue states. Unexpectedly, interesting details regarding host IP stacks and OS
scheduler behavior became visible.
I want to suggest that time windowed Ft might be a reasonable middle ground, certainly for Crist's case. Naturally, the
trade-offs will be in frequency accuracy (ie. longer window) vs. temporal accuracy (ie.
short window). Another solution for your needs might be cascaded FIR "bandpass" filters, but again, you're subject to
time/frequency error trade-offs as related a filter's bandwidth.
While you're at it, consider processing your time series data into histogram stacks, or nested histograms. I haven't
specifically seen a paper covering this, but another UW gent (DW, are you reading this?) used to process their 30 second ifmib
data into a raw .ps file, and printed this out weekly/daily. The trends visible here were quite interesting, but I don't think
much further work was done to see if anything super-interesting was more/less visible in this form than traditional ones.
... one point - since packets/bits/etc data is more monotonic than not (math wizards, please debate/chime in) and
since it's not a 'signal' in the continuous sense, you might find value in differentially filtering the input data *before* FT
or wavelet processing. This would serve to remove the weird-looking "DC" offset in the output simply by creating a semi-even
distribution of both positive and negative input sample values.