Time resolution of Red Book <=45ns

Discussion in 'Audio Hardware' started by Publius, Jul 6, 2006.

  1. Publius

    Publius Active Member

    Location:
    Austin, TX
    Since there is a lot of back and forth about just how little time resolution 16/44 has, I decided to run a little test, to simulate how well a subsample delay would be encoded. I did the following:
    1. Created a waveform with a single impulse at the center of magnitude 32757. Assume that this is at 44khz (that frequency is actually irrelevant for the rest of this).
    2. Upsampled the waveform by a factor of 500 using an FIR filter. This is about 22Mhz.
    3. Delayed the waveform by one sample. That's a 45 ns delay.
    4. Downsampled the waveform by a factor of 500, taking it back to 44100. (I just did a simple sample of every 500 points, since I already know the signal is bandlimited and won't alias.)
    5. Quantized back to integer boundaries after adding in 0.5 bits white noise (primitive dither). Given that the original impulse was of magnitude 32767, this is equivalent to quantizing back to 16 bits.
    6. Upsampled the quantized waveform by a factor of 500.
    7. Compared the final upsampled waveform to the original and delayed upsampled waveforms.
    The final waveform was correctly delayed by a single sample at 22Mhz. I therefore conclude that under ideal conditions, the minimum encodable delay of a waveform at 16 bits and 44.1khz is no greater than 45 nanoseconds.
  2. lukpac

    lukpac Forum Resident

    Location:
    Madison, WI
    Wow, cool stuff. Hopefully this myth goes away now...
  3. soundQman

    soundQman Idealist of the Musical Apocalypse

    Location:
    Arlington, VA, USA
    Interesting. I take it this is orders of magnitude below the threshold of audible detection. On the other hand, jitter was shown to be detectable below this amount of time, wasn't it? I wonder why that is.
  4. Publius

    Publius Active Member

    Location:
    Austin, TX
    Jitter distorts by frequency modulation, which is a completely different beast entirely. This is a linear transform (time delay) that only introduces distortion due to quantization error.. which is pretty low to begin with especially with proper dither.
  5. House de Kris

    House de Kris Active Member

    Location:
    Texas
    Why 500?

    Just a newb here, but why did you choose 500 as your upsampling multiplier? If you selected 5000, couldn't your last sentance be "I therefore conclude that under ideal conditions, the minimum encodable delay of a waveform at 16 bits and 44.1khz is no greater than 4.5 nanoseconds." Or, if you upsampled with 50000, then your last sentance could have been, "I therefore conclude that under ideal conditions, the minimum encodable delay of a waveform at 16 bits and 44.1khz is no greater than 450 picoseconds."

    So really, why stop at 500? Of course, I'm probably missing the obvious here.
  6. LeeS

    LeeS Tubes Rule

    Location:
    Atlanta
    I'm not sure what this proves...if we can hear 200 picosecond jitter with the human ear that still seems within the threshold of audibility.
  7. soundQman

    soundQman Idealist of the Musical Apocalypse

    Location:
    Arlington, VA, USA
    See post #4. Apparently the ear/brain perceives jitter sensitively but requires a much larger group delay of a waveform to notice. Perhaps this would be the rationale behind the relatively much larger time delay setting adjustment increments used in surround processors?
  8. Black Elk

    Black Elk Music Lover

    Location:
    Bay Area, U.S.A.
    Who says we can hear 200 ps of jitter, and at what frequency? As has been pointed out in the giant 'vinyl versus CD' thread, AES preprint 4826 reports that 10 ns RMS is audible with 20 kHz tones, but this becomes 100 ns RMS at 4 kHz, and by extrapolation is 2 microseconds at 20 Hz. Even at 20 kHz these figures are 50 times the ones you quote. Are you getting confused with the jitter requirement to guarantee no reduction in signal-to-noise ratio for a 16/44.1 system, which works out to be about 120 ps at 20 kHz?

    As for the 'subsample delay issue', this is very easy to explain. The Nyquist theorem states that any signal with frequency F can be exactly reconstructed from its samples with frequency 2F. To ensure no aliasing, all frequencies below F are eliminated by the anti-alias filter (which we'll assume to be an ideal brickwall for now). Nyquist makes no restriction on where you begin sampling.

    Now, picture yourself in a room with a plethora of sound sources (could be massed violins, or massed voices or solo instruments in a reverberant space). All these sound sources have a particular spectrum, and all the sounds arrive at your observation point with different arrival times. What you hear is the vector sum of all these randomly delayed elements. A microphone at the same point 'sees' the same thing. Provided all the spectra can be contained in F, then the resulting summed signal gets through the anti-alias filter unchanged. Now, by Nyquist you can sample that signal anywhere in its phase and exactly reconstruct it. (Even if the signal has a spectrum larger than F, the filtered version will still exhibit the same behavior.)

    Where theory and practice differ is that Nyquist assumed a perfect anti-alias filter (which is impossible to build, so a slightly higher sampling frequency is needed in practice, cf. 44.1 kHz for a 20 kHz bandwidth in CD) and perfect sample values (which have to be approximated in a real system by the length of the PCM word, cf. 16 bits in CD). So, as Publius pointed out, in a real world system you may see a very small difference as you change the sampling phase due to the quantization error.
  9. dekkersj

    dekkersj New Member

    Good question. Why stop at 500? Not that I argue the test as such. I find it a funny test. And a valid one. It proves the Nyquist theorem once more.

    Regards,
    Jacco
  10. Publius

    Publius Active Member

    Location:
    Austin, TX
    I stopped at 500 because that was as far as I could take it and still get a waveform that looked about right - signal peak was clearly where it was supposed to be, etc. I could take it to 1000 but then the delay was not quite as pronounced. Either the delay looks more like half a sample than a whole sample, or it looks like there are high-frequency artifacts in the signal, or stuff of that nature. I throw those results out.

    In general you're going to be limited by three things: the signal quantization, the quality of the filter for upsampling, and the internal computation word length. At these levels I would not be surprised if double precision is sort of a limiting factor here. Unfortunately I'm not set up to test with extended precision, otherwise I could probably knock an order of magnitude or so off of the numbers. And the test is getting pretty crazy as it is (it's using about a gig of RAM and takes a few minutes to run).

    At the same time, I hadn't optimized the resampling filters at ALL. I just did that and now I'm seeing good subsample delays at 20,000x upsampling. So that's an effective delay, of, um, 1.13ns?
  11. Publius

    Publius Active Member

    Location:
    Austin, TX
    Oh yeah, one more thing:

    According to my estimates, if you want to reproduce this result with a DSP, you'd need a 2.4 million point FIR filter for the upsample, and at least 48 bits of internal precision. If you want to upsample from 44.1khz you would need to upsample to 882 Mhz.
  12. LeeS

    LeeS Tubes Rule

    Location:
    Atlanta
    I've done tests using master clocks which indicate we can hear down to 200 ps. This is actually a conservative estimate. We have used clocks accurate to 45 ps and had the same results.

    This is actually something I have researched as a side study since I worked with Bob Katz and got quite interested in the phenomena. The report you reference is older as is Julian Dunn's work in the area. Most people who look at this now believe the next order of magnitude picosecond range is actually what we are capable of.

    We've done a simple test: if you record music into a hard drive, you can simulate varying levels of jitter by using differing master clocks.

    Of course you raise a good point about jitter in that it is not just the timing in question but also the frequency spectrum of jitter. What we did was just isolate the timing difference.

    It's taken me quite a while since I have to pay out of pocket but we are lining up some musicians to make a disc where the same track is recorded under different master clocks and the tracks are jumbled to do an A/B comparison.

    This is the problem with redbook as brickwall filters are far from perfect. DSD does not have this problem.
  13. Black Elk

    Black Elk Music Lover

    Location:
    Bay Area, U.S.A.
    Are you saying that with your most accurate clock that the jitter is still audible?

    A jump from 10 ns at 20 kHz to 200 ps at 20 kHz would be nearly two orders of magnitude.

    But haven't you stated above that the jitter is audible with your most accurate (45 ps) clock? Why do you need to simulate higher levels of jitter? Surely, you need an ADC/record/playback/DAC chain that has inaudible jitter levels to begin with before you can start adding known jitter to look for audibility? (Or have I misunderstood your comment above about the 45 ps clock?)

    I dealt with the non-perfect nature in the last paragraph of the post you quoted.
  14. Steve G

    Steve G Well-Known Member

    Location:
    los angeles
    wouldn't this test only work if you were actually playing the wave containing the impulse into the air with something capable of producing sound and then actually recording it digitally using actual sampling, and then manually time shifting the impulse by LESS than 45 ns and then again playing it with something and recording it digitally again, with the record points being specifically set by the computer so the samples were being taken in the same synchronization with the wave?

    once it's all on the computer I don't see how it shows anything other than that computer will not forget information once it has been stored

    -s
  15. LeeS

    LeeS Tubes Rule

    Location:
    Atlanta
    I need to be more clear. The most accurate clock is not audible at 45ps, but we compare it to a known clock of 200-500ps.

    We are not looking at frequency, we are doing sonic comparisons between masterclock, masterclock2, and no clock at all. And we have done MC v. no clock and MC2 v. no clock. I will try to send you a disc at some point.
  16. lukpac

    lukpac Forum Resident

    Location:
    Madison, WI
    The point is that in the past, some have claimed that you can't have sub-sample timing differences with PCM. I.e., that the smallest shift something could have is 1/44,100 of a second. This proves that isn't true.
  17. felimid

    felimid New Member

    Location:
    ulster
    It proves you can infer some time details beyond the sample rate, but this is practicaly a kind of limited exception to the rule implied by the sampling rate -not an escape from accuracy limitations.

    It was shown that the phase of a sinusoidal pattern which is assumed as perfect and constant can be resolved to a fraction of the sampling interval.
    This subsample accuracy was possible because the pattern recorded is not a discrete event, it's impression is recorded throughout many consecutive samples and its exact formation is inferable (idealy).
    For all discrete or unassumable events, PCM records can only specify time of occurence to within a whole length of the sampling interval. Time resolution can only be improved when a known pattern can be observed throughout multiple samples -which is the case for computing the phase of synthetic frequency components, but not at all when trying to refine the temporal location of unassumed events.

    To visualise simply, imagine a weather barometer which records once a day the atmospheric pressure. It creates a pcm record of pressure level with a sample interval of 1 day. At the end of each recording interval, the record does not imply if the pressure has just changed within the last hour, or changed greatly throughout the day, or once just after the previous record. Individual subsample details cannot be explicity recorded ,only sometimes suggested by assuming usual weather patterns over multiple records.

    In other words, the timing of clicks, pops, attacks etc cannnot be resolved to less than the pcm sampling interval -unless patterns strictly following or accompanying the events are assumeable and included in computations to resolve them.

    With many digital audio filters acting in the domain of ideal standardised sinusoidal components, phase of idealised frequencies can be discerned well within sampling intervals and maintained through processing chains, but in audio processing the temporal location of most discrete events would be mostly impractical (often impossible) to discern beyond the sample interval. Speculative attempts can be made with important astronomical or forensic data (by relating accompanying patterns).

    'hope this cumbersome explaination helps a bit.
  18. Publius

    Publius Active Member

    Location:
    Austin, TX
    You say that for any "unassumable" signal, that is something that is not defined a priori, you can't define a delay lower than the sample rate. However, the example you give is of a weather station where the signal between the sampled points is not well defined. Simply put, this isn't a valid example. This is no longer a band-limited signal! You should never, ever be in such a situation in a data processing application. This is why, as a commonly cited rule of thumb for DAQ, one should sample at 10X the actual signal bandwidth, to give you more headroom for alias rejection. (This is also why you should never use a DAQ card for audio I/O unless you know exactly what you're doing. Many DAQ devices, IIRC, are NOS sample-and-hold unless otherwise specified.)

    The only thing I am "assuming" here is that the input signal is bandlimited to under the sample rate. This is perfectly valid assumption - if you want to use PCM, you gotta have the input filter. Once you assume that - and I think it is a very reasonable assumption that it is working properly, it is something that is not open to debate here - then for any possible signal, the spaces between the points are completely defined. That's the point of the Nyquist-Shannon Theorem.

    I admit I may not completely understand the fundamental point you're trying to make. The way I see it, as long as I have control over my reconstruction filters, I can numerically prove that subsample delays are resolvable, under any circumstance. That is, I take situation A, and situation B which is just like situation A with a little delay, and PCM can resolve it just fine under its sampling rate. The way I see your argument, you can never disprove that, since your argument is inherently non-numeric and relies on "unknown" signals. I'm sorry, but I find that illogical as well.

    I don't see how this is a valid point. If you define any sort of "event" in a signal, as long as the definition is reasonable enough, you can define it on a subsample basis. For the case of pops and ticks, if you define a pop's "location" in terms of its peak, the peaks can obviously occur between samples if you upsample/oversample. This oversampling is precisely what happens in a DAC. Moreover, the peaks on a record are uncorrelated with the sampling rate - if you upsample, and if they are uncorrelated with each other, their sub-sample phases will be random. Are you saying that when you upsample, the peaks will still always be aligned on the original sampling period?

    If you define them in terms of peak RMS energy across sample blocks, then obviously you're going to be a little constrained. But that's not pertinent to the conversation. Nobody's stopping you from upsampling.
  19. felimid

    felimid New Member

    Location:
    ulster
    This is all laboured and long winded of me but wot the hey'

    What could define the signal between the sample points, except through patterns necessarily present in other points? Only at each sample point is a definition made. The inbetweens can only be presumed. Confidently presumed regarding provided synthetic patterns - but nothing else really, not live recording, be it weather station, radio telescopic or digital audio data or anything 'naturaly' sourced and intending to record what actualy happened (progressively).

    The demonstration does not test 'delay' it tests 'phase resolution' which you must realise is not the same thing. You cannot determine the the exact position of shock fronts or other discrete details on any recording of dynamic unpredicatable data -beyond the instances which are defined in the recording.
    How accurately such details need to be known is a different arguement, but whether bandpassed or not, the only explicit informing of events is done at each sample point, the states inbetween are necessarily ambigious by virtue of the samplerate.

    The situation is similar with visual data, the boundaries between pixels have to be assumed to be fairly certain and constant (or else the visual effect will be very hit or miss, non-photorealistic)

    I think confusion is caused while interpreting theory in this area over how authoritative frequency domain manipulation of signals might be. Any signal can be viewed exclusively in the frequency domain, and often transparently manipulated there, but everything that is transmitted through the medium of space or datasets is not a collection of frequencies (as everything we see is not a collection of colours)

    A frequency is an abstract mathematical identity of infinite length and unchanging power. A frequency has a phase, which relates its cycle throughout time but does not locate it in time (since it is infinitely long)
    By domain transformation methods, waveforms are processed as intricate collections of coinciding frequencies with assumed phases and powers. The impression of a vibrating guitar string in a PCM record will be represented by multiple fundamental frequencies after domain transformation, and many many more lower powered but just as constant and simple 'ghost' frequencies which are used in the domain to optimaly encode non linear tendencies and deviations present in the natural signal.

    A single spike surrounded by silence can be present in a record as fairly as long wiggling 'frequency' impressions can. Faced with a single spike, talk of its component frequencies and their phases is entirely optional. If you needed to manualy look through a very dull pcm record, for a spike of unknown size or duration, once you found the spike you would have no means of determining its subsample timing, or at least I would have no means of determining that. Imagine looking through the record with your eyes on a chart or page, how could insist a recorded event definitely started at some instant between sample points? Regarding the spike in the frequency domain would not help refine the instant of its occurence either.
    I hope that is an accessible thought experiment.

    The claim that the practical time resolution of PCM records is finer than their sampling rate should not have been made. In data processing applications, analoge records (whose time resolution is affected primarily by jitter) must be digitised at a sampling rate greater than or equal to the desired time resolution of discrete details -such as spikes, shock fronts etc (simple stuff). Those things temporal positioning is in a way stretched by frequency analysis(theoreticaly to infinity, practicaly to the extents of their block), when they are instead encompassed by a collection coincidental sinusoids of precise power and phases. -But that does head into theoretical jungle I dont expect either of us is fully conversant in. Basicaly the phase of frequency component do not detail any events in a PCM, not even the bigging or end of that components inclusion.

    I read you go on to confirm the neccessity of adequate sample rate and mention alias rejection (aliasing is all about the ambiguity of the periods between samples)

    In those terms, you have to argue that all spikes have a predictable frequency spectrum. Nyquist-Shannon Theorem can suggest that spaces between points are idealy definable for the sample rate, but not actualy identical to the source, because it has been bandlimited (removing energy) Nyquist shannon theory asserts that All Frequencies below half the sample rate are perfectly maintained, but frequencies higher than that rate are what is required to render subsample discrete timing details -like the intersample position of pops and clicks etc.
    Sampling involves removing that level of detail, because you may still infer more detail in certain patterns, does not mean accuracy to greater levels of unsynthesised artifacts is actualy attainable.
    You shouldnt think of phase coefficients in block tranfsormations of PCM data as 'delays'. If you filter and sample to whichever level, subsample (temporal) positioning will by definition be ambigious.

    I dont expected ive communicated all of that convincingly, simply concerning this quote:
    The tested 'delay' in the demonstation was 'in' the phase (2pi cycle) of a long sinusoid. It will work less accurately on shorter, noisier sinusoid patterns, and wont work at all on non sinusoidal artifacts such as ticks pops general attacks.

    Thats at least my predicition, hope others can figure it out for themselves :angel:

    I dont expect it to be an audible issue, but the sampling period is the definitive limit of accurate placement of each detail linearly proceeding through time...

    Basicaly you cant tell if anything starts or ends, half a sample earlier or later than its impression appears in the record.

    Youll have a tough job talking me out of that, so I may just have to agree to differ :thumbsup:
  20. Publius

    Publius Active Member

    Location:
    Austin, TX
    Wikipedia's entry on Nyquist-Shannon doesn't mention anything about errors in subsample detail. My undergrad-level textbook on the subject (Signal Processing & Linear Systems, B.P. Lathi) does not mention it. I have never heard this issue ever related before, in my life, except maybe in the context of sample-and-hold ADCs, which are definitely not being discussed here. More specifically, as an EE, I have never heard before that frequencies above the Nyquist rate are required to accurately represent information about baseband signals that involve times shorter than the Nyquist period.

    Can you list any source that says "frequencier higher than that rate are what is required to render subsample discrete timing details"? Like, a textbook, or a peer-reviewed article? Or can you relate this to me in terms of a proof? And not just some random webpage or conjecture?
    Of course it's not going to work as well - I agree, the sine wave is going to be about as ideal a signal as we're going to find. However, I'm highly unconvinced that what you're saying is true. I think a subsample delay is going to be achievable in any 16/44 signal. Any signal at all.

    Tell you what. If you can give me a test - that can be evaluated by purely numeric means - that both of us believe supports your claim, I will implement it, describe the algorithms I'm using to implement it (hopefully to your satisfaction), and post the results. Or, you can describe your test, implement it and report the results yourself, and I will attempt to replicate it here. (Or, you can reference a proof as mentioned above, which would make looking for a counterexample unnecessary.)
  21. felimid

    felimid New Member

    Location:
    ulster
    I think that is because most of what is studied is in terms of frequencies. With all detail being treated as coinciding patterns of frequencies. People understand the theoretical world only in terms of frequency, expect that the Real world follows. But basicly a 'click' is a 'click' not a symphony of theoretical tools. I know what Im saying here,( not so much how to say it) , from first hand experimentation and analysis at the low level, after which the circumstances of the case are bluntly apparent. I do have misgivings about a lot of the theory which is 'reported' around this subject. In this case laboured explainations are unneccessary though. The claim that time resolution of pcm records is greater than their sampling rate is simply obversable to be mistaken. Resolution of assumed patterns can be higher, but for instantaneous details -there is just no way.

    I know frequencies higher than half the sampling rate are required to render/(generate) subsample detail because subsample detail is required to generate frequencies higher than half the sample rate. You can see - thats actualy pretty basic Nyquist Theorem.

    Implement this test by any means you prefer,

    Here is a sequence of levels,

    0, 0, 0, 0, 0, 0, 8, 0 ,0 ,0, 0, 0....

    What is the level of the signal half a sample before the 8, and half a sample after?
    Calculate a frequency spectrum of the blip if you like. Its frequency spectrum at higher sampling rates which would explicitly state is level at the positions required is unknown to us at this lower sampling rate.

    We can assume this signal is bandlimited above the nyquist freq, for a cleaner situation. There is still no way to say (limited or not) If there was a way then we could store more information in a sequence of numbers than there are possible permutations of the record. You see for your claim to be true (that subsample accurate detail of waveforms are present and maintained in pcm) downsampling has to be a practicaly lossless operation.

    Again basicaly, you have blurred distinction between detecting differences in assumable patterns and specifying differences at precise instants.

    'Easy for things to get blurred in this subject, I might even have made a couple of boobs myself here, but the overall case is pretty certain.

    Thanks for bearing with me'
  22. lukpac

    lukpac Forum Resident

    Location:
    Madison, WI
    I'm not sure exactly what you're getting at, but assuming you shifted the sampling by a sub-sample amount, all three levels (0,8,0) would be slightly different. Not sure why you think it couldn't be reconstructed...
  23. felimid

    felimid New Member

    Location:
    ulster
    Because the information simply isnt there.
    Ive no idea how you think you might recover it... astrology?

    I did make a slight mistake in my description earlier, saying we could 'assume the signal is bandlimited' In fact the only consistent interpretation of any pcm record is to assume the signal is band limited -that is whether a lowpass was done or not is a matter of downsampling quality, there is no way to distinguish in pcm from frequencies origionating lower than half the sample rate and frequencies which where generated from higher ones which where unavoidably reflected downwards in the conversion. It is because they can't be distinguished from each other after the conversion that lowpass is applied before/during it. (They cant be distinguished from after the conversion because the neccessary time resolution is not their to distinguish frequencies above half the sampling rate. Frequencies need over 2 samples for each of its cycles to be distinguished 'as apparent' at any sample rate).

    So, examining 0,8,0 at a higher rate, it could be:

    0,2,8,1,0 or
    0,-3,8,9,0 or ....

    0,X,8,Y,0

    X & Y it could be any configuration of the available record space.
    At half the rate X and Y are not stored and hence not recoverable.

    Lowpassing could complicate considerations, but do nothing about that fundamental circumstance.

    eg

    1,-1,8,8,-1 : after a lowpass might change to
    0,0,8,8,0 (approx)

    Samples X and Y are still not recoverable at half the sample rate.
    For 0,8,0 an ideal assumption at twice the rate would probably be simply. 0, (4), 8, (4), 0
    And at twice that, around: 0, {1} ,(4), {7}, 8, {7}, (4), {1}, 0
    The values in parenthesis are all neccessarily presumed by assuming there was no more ~oscillating-energy in the origional signal. Although with some knowledge of the signals type, it might often be more realistic to add some noise at higher frequencies -an aspect of dither.

    Hope things are getting a bit clearer now.
    Regards,
    fe
  24. lukpac

    lukpac Forum Resident

    Location:
    Madison, WI
    Umm...it seems like you're trying to capture an event that's faster than half the sampling rate. We already know you can't do that. Capturing the *position* of an event that isn't as fast as half the sampling rate isn't the same thing.
  25. felimid

    felimid New Member

    Location:
    ulster
    It is not possible to specify the position of an(y) event in a PCM record finer than the sampling rate describes. This is the core of my objection to claims made in this thread.

    Regard
    Also the -1,8,8,-1,1 after lowpassing would be:
    0,8,8,0,0

    at half the rate, with 0,8,0

    It cant be told whether the form was 0,8,8,0,0 or 0,0,8,8,0 or any value for x and y authoritively.
    Can you give me your solution to this specific query then? In this case, what values of x an y *did* I remove in the downsample? (values which detail the subsample *position* of the event (8)