Latency on the new HifiBerry stage DAC

walt · March 26, 2021, 10:05am

I’m thinking of buying the new kit with the sweet new DAC. I’m currently using a KOMPLETE Audio 1 USB interface, and at 256 samples I notice there is a discernable delay which impacts me when playing live (harder to keep rhythm). When running 128 samples and setBfree, I get an xrun every 600 milliseconds or so.

Can any users attest what usable sample sizes they can achieve with a Raspberry PI 4? And does anyone have a tip for increasing performance? Increasing the sample rate should lower delay, but this seems counter intuitive as it introduces more processing overhead increasing the sample size.

Thanks

riban · March 26, 2021, 11:14am

Hi @walt

This is a game we play to balance a variety of parameters and minimise the risk of xruns. xruns occur when there is too much or too little data available for jack to process on each processing cycle. Ultimately this comes down to the capacity of the processing power (CPU) and hence is influenced by CPU speed, CPU load, quantity of context switches, etc. It can also be impacted by the rate at which a process can access data, e.g. read or write to RAM / disk. Audio processing applications should be optimised to reduce slow resources access, e.g. to cache data files in memory. (It is faster to access RAM than disk.) JACK processes a block of data on each of its process cycles that is passed from / to all of its clients. A client may be the ALSA driver for an audio / MIDI input / output, a soft synth, audio effects plugin, zynthian module (like sequencer), etc. If any of these are unable to process the block of data within the allocated time then an xrun can occur.

In general audio processing is more demanding than MIDI but very simple audio processing (like gain adjustment) may be simpler than complex MIDI processing (like time critical sequencers, etc.). Each engine behaves differently and one must chose a combination that best matches their hardware. I did some benchmarking for a few engines which may help. This work needs to continue to give us this data for all modules across more hardware configurations. I key lesson is that some engines are conservative in their CPU usage, ramping up on demand whilst others consume all the CPU they will require continually, e.g. Pianoteq idles at a low CPU but can ramp up quite high whilst setBFree idles at its maximum usage (around 20% CPU ). This information allows us to know which engines can run idle without significant impact and hence which we can load but not use simultaneously.

The audio interface is significant. Some drivers are poorly written or have to interface with bad hardware and hence may cause significant latency or worse - variable latancy (jitter). This can trigger xruns unless buffers are set very high. Some cheap USB interfaces cab behave like this but some of the better USB interfaces can behave very well with low latency. If you have a USB2 or USB3 audio interface then use the USB2 or USB3 ports on the Zynthian to take advantage of any speed improvements. Plugging a slower device into the same hub may have detrimental effect on other devices so watch out for USB keyboards and USB sound modules plugged into ports on the same (internal or external) hub.

There is an option in the admin menu to enable Headphone output on the built-in Raspberry Pi audio jack. This significantly degrades the Zynthian’s JACK performance. (Samplerate conversion occurs to allow copying data from one ALSA interface to another.) Definitely disable this feature when not required.

If the CPU gets too hot then it can start to throttle its speed. This will have immediate detrimental effect on the audio. The throttling happens at quite a high temperature and Zynthian is designed to avoid this but check the temperature (shown statically in webconf- press refresh to update) to see if it is below 60 degrees Celcius.

Samplerate is important. Anything below 44100 samples per second is (in theory) observable. The typical human hearing range is 20Hz-20kHz and Nyquist describes how we must sample at over twice the highest frequency to avoid aliasing effects. (44100 was chosen as a standard in the music deistribution industry for compact discs because it was relatively simple to derive the frequency from a range of commonly used frequencies at the time - mostly influenced by TV. 48000 remains the defacto standard for most of the audio industry with higher rates (typically multiples of 48000) offering improved signal to noise ratio. The higher the rate then the more samples that need to be processed within each JACK period which adds to the processing but there may be a trade off if an application or interface is optimised for a particular sample rate. Some soundcards operate natively at 48000 and there are some plugins that only work at this rate or are optimised for it. I would recommend trying 48000, 44100, and 32000 to see if these provide different behaviour. 32000 will reduce the frequency response to 15kHz but most of us don’t hear much above that (especially us older gits) and there is often very little to be gained by operating the device at full bandwidth. Of course this depends on its use. As a high fidelity audio processing unit for mastering we may want full bandwidth but emulation of an old analogue synth may even benefit from reduced bandwidth.

Changing the buffer size (256, 128, 64, etc.) changes the amount of data to be processed in each cycle. This has to be a power of 2 and each time it is halved we give all processes half the time to complete their processing so one must test this to see how low it can go before the combination of engines you want to use simultaneously starts to fail (xrun). Lower buffer size reduces latency, e.g. the latency introduced by JACK running at 48000 with two 128 frame buffers is 5.3ms. (JACK needs at least 2 buffers which is its default (the -n parameter in the config) but should have 3 buffers for USB audio.) There will be a little extra latency introduced by the physical interface and ALSA driver but in general, with a reasonable quality audio interface this tends to be pretty low and immutable so we tend to ignore it to a point. It becomes very challenging to play against a latency (of yourself) of 30ms but long before this, playing becomes uncomfortable or difficult to keep rhythm as you have noted. It depends heavily on the virtuosity of the musician, the type of music they are playing and the instruments used to produce the music but a rule of thumb is that latency above about 10ms is starting to impact rhythm and latency below 5ms is difficult to perceive. The effect of latency can be mitigated to some degree by avoiding hearing the trigger, e.g. wearing headphones or using loudspeakers that mask the sound of the keys being pressed can help. Some people refer to the distance that sound travels (about 343 m/s) to point out that a musician playing an electronic instrument (like a guitar) expects to hear their audio about 10ms later than they triggered it if they are standing about 3m away from their amplifier but this does not help much because in a live performance that latency is cascaded with the latency of the audio processing.

Effects processing adds CPU load so adding effects to an instrument needs to be considered carefully. Some instruments have built-in effects which may be enabled by default, e.g. Fluidsynth enables its reverb and chorus by default which I think should be disabled.

Be aware that Pianoteq on Zynthian is configured to run at a lower internal samplerate to reduce CPU loading hence is band limited.

My experience is that a Raspberry Pi 4 in a Zynthian 4.1 will allow most engines to be played fairly agressively without xrun with the default settings of two 256 frame buffers running at 44100. Notable engines that use high CPU are Surge and Vitalium which can be used with judicial selection of preset and limit on polyphony. There seem to be xruns whenever a layer is added or removed so avoid playing and recalling snapshots. (ZS3 can assist with performance.) I would like to see more work done on benchmarking and publishing information on what does and doesn’t work but it is a large piece of work that will be challenging to present and maintain. As a minimum I think we should have metrics on the standard hardware and a selection of approved engines / configurations.

Sorry if there was too much detail here and worse if I didn’t actually answer any of your questions.

wyleu · March 26, 2021, 5:51pm

Could we move this response into a page on the wiki…?

I’ve added it to FAQ.

jofemodo · March 26, 2021, 7:32pm

Fantastic report, Mr @riban!

+1 to add It to the wiki

riban · March 28, 2021, 5:47pm

@walt I did some testing today and reducing buffers to 128 on a Zynthian 4.1 with Raspberry Pi 4 will trigger xruns on many engines, e.g. Pianoteq. With just Linux Sampler loaded with a piano sample you have to play quite hard to trigger an xrun but they do occasionally occur.

walt · March 28, 2021, 5:58pm

Thanks for the elaborate reply, a lot longer and more in-depth than I expected. With headphones connected to the the interface the lag is noticeably lower. My amplifier is digital and probably adds a couple of milliseconds to the delay.

I’ll also improve the cooling of my self made box as I noticed some throttling. It’s made of metal so it dissipates some heat but the Pi is fully enclosed with no gaps to get rid of hot air. That said, at 256 samples I never encountered an xrun with multiple engines running simultaneously, all drenched in effects. The only engine that ocassionaly triggers xruns is zynaddsubfx when pushed hard.

And you’re right, physical instruments usually have some delay as well. Hadn’t thought of that.

Thank you for the additional testing!

philippejadin · March 31, 2021, 6:45pm

Anything below 10 msec is probably not noticeable (a 1/100’th of a second !). Si if you have trouble it might indeed come from your amp.

riban · March 31, 2021, 6:59pm

That depends on what you are playing and your experience. Virtuoso musicians playing fairly fast music will notice latency lower than 10ms. It affects timing. Most of us will be affected by it but if you are playing fairly slow music or have effects like reverb, delay, distortion then this may mask the effect. When it gets below 10ms it is more about the feel. You don’t really hear the latency, you feel it. It kicks at your rhythm.

philippejadin · March 31, 2021, 7:04pm

I’d need to make more tests. In my experience, playing drums on a keyboard is the most problematic. If 10 msec is still too much, I’m quite sure 5 msec would be good enough.
Buffer size of 256 samples in a 44.1kHz project means latency is 256/44100, or 5.8ms

Really, it should be ok.

riban · March 31, 2021, 8:02pm

Not quite. The here are 2 (or 3 for USB) buffers required so the latency doubles (or triples).

MaxMaxis · May 1, 2021, 10:30am

I just ran across a latency comparison of 3 different boards.

Excerpt from results:

Device	Hz	FPP	Latency
Audiobox USB 96	48k	128	339 frames (7.0ms)
Pisound	48k	128	44.247frames (0.9ms)
HiFiBerry DAC+ ADC Pro	48k	128	51.100 frames (1.0ms)

The Pisound people seem to have created Patchbox OS, MODEP, and the Pisound app that remote controls synth programs to help market their hardware.
(Only the android app and it’s Pi server code are closed)

jofemodo · May 1, 2021, 2:44pm

We should expect the same metrics for the Stage, as it integrates almost the same ICs with very similar circuitry and same kernel driver.

Regards,

oortone · June 28, 2022, 2:48pm

Small remark, higher sample rate does not improve signal to noise ratio, it’s the bit depth (typically, 16, 24 or 32 bit) that does that. Higher sample rate extends the frequency range.

riban · June 28, 2022, 2:59pm

It does both. By over sampling you improve the noise floor by 6dB for each doubling of sample frequency. This is why you can do high quality (wide bandwidth, low noise) sampling with one bit.

oortone · August 3, 2022, 12:53pm

Yeah, ok in some respect true but it’s not as easy and linear as in the case of bit resulution and also 48k is not oversampling. The way you put the passage I quoted is misleading for those who don’t have the full understanding of sampling frequency and bit resolution.

It’s also not 6 dB per doubling, seems more like 3 dB but depends on signal type:

For instance, to implement a 24-bit converter, it is 
sufficient to use a 20-bit converter that can run at 256 
times the target sampling rate.

This averaging is only effective if the signal contains 
sufficient uncorrelated noise to be recorded by the 
ADC.[3] If not, in the case of a stationary input signal, all 2^{n} samples would have the same 
value and the resulting average would be identical to this 
value; so in this case, oversampling would have made no 
improvement. In similar cases where the ADC records no 
noise and the input signal is changing over time, 
oversampling improves the result, but to an inconsistent 
and unpredictable extent.