Latency on the new HifiBerry stage DAC

riban · March 26, 2021, 11:14am

This is a game we play to balance a variety of parameters and minimise the risk of xruns. xruns occur when there is too much or too little data available for jack to process on each processing cycle. Ultimately this comes down to the capacity of the processing power (CPU) and hence is influenced by CPU speed, CPU load, quantity of context switches, etc. It can also be impacted by the rate at which a process can access data, e.g. read or write to RAM / disk. Audio processing applications should be optimised to reduce slow resources access, e.g. to cache data files in memory. (It is faster to access RAM than disk.) JACK processes a block of data on each of its process cycles that is passed from / to all of its clients. A client may be the ALSA driver for an audio / MIDI input / output, a soft synth, audio effects plugin, zynthian module (like sequencer), etc. If any of these are unable to process the block of data within the allocated time then an xrun can occur.

In general audio processing is more demanding than MIDI but very simple audio processing (like gain adjustment) may be simpler than complex MIDI processing (like time critical sequencers, etc.). Each engine behaves differently and one must chose a combination that best matches their hardware. I did some benchmarking for a few engines which may help. This work needs to continue to give us this data for all modules across more hardware configurations. I key lesson is that some engines are conservative in their CPU usage, ramping up on demand whilst others consume all the CPU they will require continually, e.g. Pianoteq idles at a low CPU but can ramp up quite high whilst setBFree idles at its maximum usage (around 20% CPU ). This information allows us to know which engines can run idle without significant impact and hence which we can load but not use simultaneously.

The audio interface is significant. Some drivers are poorly written or have to interface with bad hardware and hence may cause significant latency or worse - variable latancy (jitter). This can trigger xruns unless buffers are set very high. Some cheap USB interfaces cab behave like this but some of the better USB interfaces can behave very well with low latency. If you have a USB2 or USB3 audio interface then use the USB2 or USB3 ports on the Zynthian to take advantage of any speed improvements. Plugging a slower device into the same hub may have detrimental effect on other devices so watch out for USB keyboards and USB sound modules plugged into ports on the same (internal or external) hub.

There is an option in the admin menu to enable Headphone output on the built-in Raspberry Pi audio jack. This significantly degrades the Zynthian’s JACK performance. (Samplerate conversion occurs to allow copying data from one ALSA interface to another.) Definitely disable this feature when not required.

If the CPU gets too hot then it can start to throttle its speed. This will have immediate detrimental effect on the audio. The throttling happens at quite a high temperature and Zynthian is designed to avoid this but check the temperature (shown statically in webconf- press refresh to update) to see if it is below 60 degrees Celcius.

Samplerate is important. Anything below 44100 samples per second is (in theory) observable. The typical human hearing range is 20Hz-20kHz and Nyquist describes how we must sample at over twice the highest frequency to avoid aliasing effects. (44100 was chosen as a standard in the music deistribution industry for compact discs because it was relatively simple to derive the frequency from a range of commonly used frequencies at the time - mostly influenced by TV. 48000 remains the defacto standard for most of the audio industry with higher rates (typically multiples of 48000) offering improved signal to noise ratio. The higher the rate then the more samples that need to be processed within each JACK period which adds to the processing but there may be a trade off if an application or interface is optimised for a particular sample rate. Some soundcards operate natively at 48000 and there are some plugins that only work at this rate or are optimised for it. I would recommend trying 48000, 44100, and 32000 to see if these provide different behaviour. 32000 will reduce the frequency response to 15kHz but most of us don’t hear much above that (especially us older gits) and there is often very little to be gained by operating the device at full bandwidth. Of course this depends on its use. As a high fidelity audio processing unit for mastering we may want full bandwidth but emulation of an old analogue synth may even benefit from reduced bandwidth.

Changing the buffer size (256, 128, 64, etc.) changes the amount of data to be processed in each cycle. This has to be a power of 2 and each time it is halved we give all processes half the time to complete their processing so one must test this to see how low it can go before the combination of engines you want to use simultaneously starts to fail (xrun). Lower buffer size reduces latency, e.g. the latency introduced by JACK running at 48000 with two 128 frame buffers is 5.3ms. (JACK needs at least 2 buffers which is its default (the -n parameter in the config) but should have 3 buffers for USB audio.) There will be a little extra latency introduced by the physical interface and ALSA driver but in general, with a reasonable quality audio interface this tends to be pretty low and immutable so we tend to ignore it to a point. It becomes very challenging to play against a latency (of yourself) of 30ms but long before this, playing becomes uncomfortable or difficult to keep rhythm as you have noted. It depends heavily on the virtuosity of the musician, the type of music they are playing and the instruments used to produce the music but a rule of thumb is that latency above about 10ms is starting to impact rhythm and latency below 5ms is difficult to perceive. The effect of latency can be mitigated to some degree by avoiding hearing the trigger, e.g. wearing headphones or using loudspeakers that mask the sound of the keys being pressed can help. Some people refer to the distance that sound travels (about 343 m/s) to point out that a musician playing an electronic instrument (like a guitar) expects to hear their audio about 10ms later than they triggered it if they are standing about 3m away from their amplifier but this does not help much because in a live performance that latency is cascaded with the latency of the audio processing.

Effects processing adds CPU load so adding effects to an instrument needs to be considered carefully. Some instruments have built-in effects which may be enabled by default, e.g. Fluidsynth enables its reverb and chorus by default which I think should be disabled.

Be aware that Pianoteq on Zynthian is configured to run at a lower internal samplerate to reduce CPU loading hence is band limited.

My experience is that a Raspberry Pi 4 in a Zynthian 4.1 will allow most engines to be played fairly agressively without xrun with the default settings of two 256 frame buffers running at 44100. Notable engines that use high CPU are Surge and Vitalium which can be used with judicial selection of preset and limit on polyphony. There seem to be xruns whenever a layer is added or removed so avoid playing and recalling snapshots. (ZS3 can assist with performance.) I would like to see more work done on benchmarking and publishing information on what does and doesn’t work but it is a large piece of work that will be challenging to present and maintain. As a minimum I think we should have metrics on the standard hardware and a selection of approved engines / configurations.

Sorry if there was too much detail here and worse if I didn’t actually answer any of your questions.