[solved] XRUN every ~110 seconds with USB audio interface

I’m attempting to switch from a HiFi Berry audio interface to an external Audient iD4 MKII, it works mostly with these jackd settings:
-P 70 -t 2000 -s -d alsa -d hw:iD4 -r 44100 -p 256 -n 2 --shorts -X raw

However, every 110 seconds or so there is an XRUN. Engine responses vary, pianoteq goes silent and never recovers, zynaddsubfx drops out for about one second then recovers.

Based on the timestamps of 30 events, the average is 111s, minimum 99s, maximum 123s. So it could be something which runs then sleeps for 100s.

According to snoopy, no process is executed at these times, so this isn’t like Issue #412

I’m running an up-to-date stable zynthian. I’ll continue attempting to diagnose, but I wanted to raise the issue now just in case someone has an idea, or a suggestion for diagnosis approach.

I’ve solved this, my jackd settings are now:
-S -P 70 -t 2000 -s -d alsa -d hw:iD4 -r 44100 -p 256 -n 2 --softmode -X raw

The 2 main differences are:
–softmode, when I said Pianoteq wasn’t recovering from these XRUNs, what was actually happening was jack removed Pianoteq from the graph because it is treated as a slow-responding client, so it wasn’t the root cause but helped with recovery
-S, a basically undocumented option to put jackd in synchronous mode. This completely eliminated the XRUNs for me. It is mentioned in a couple of places

I have seen this before and rather assumed (actually did some tests before) that this statement is true although your post makes be reassess my memory!

We need to validate what impact synchronous mode has on each jack client and plugins.

Thanks @steveb for this investigation. Let’s look to see what lessons can be learned for the wider community and other interfaces and modules.

[Edit] I have read elsewhere that the default mode of jackd2 is asynchronous which adds an extra buffer period. This is probably want we no longer need to use 3 buffers for some USB audio devices. I haven’t found a description of synchronous mode but seem to remember reading about it previously. I thing soft/hard are not relevant to sync mode where every client must process and deliver all data within the prescribed period whereas async allows early / late delivery. I wonder if that involves a degree of samplerate conversation or similar to allow for stream slippage. That works certainly explain the higher quantity of xruns.

Can you test -r 48000 with and without -S? It might be the thing about usb interfaces not liking fractional latencies.

I tried different options again with a little more discipline, and I’d like to retract -S as being the solution.

Using the following jackd as a baseline:

-P 70 -t 2000 -s -d alsa -d hw:iD4 -r 44100 -p 256 -n 2 -X raw

I tried the following differences:

-r 44100 -n 2 (XRUN, period ~110s)
-r 44100 -n 3 (XRUN, period ~210s)
-r 48000 -n 2 (XRUN, period ~110s)
-r 48000 -n 3 (XRUN, period ~210s)
-S ... -r 44100 -n 2 (XRUN, period ~110s)
-r 44100 -n 2 --softmode (NONE!)

Based of the testing this time I’d conclude the following:

  • -r 48000 or -n 3 makes no difference except some correlation between -n and the XRUN period
  • -S on its own doesn’t help
  • --softmode avoids the XRUNs

I also tried rebuilding jack2 to upgrade from v1.9.14 to v1.9.20. It made no difference but maybe testing should switch to the latest release, 2019 was a while ago now.

1 Like

Hmmm, The only non-fractional latency you really tested was -p 256 -n 3 -r 48000

Could you have a look at the link and try some of the other ones in bold?

You’re on a pi4?

I’ve tried 4 non-fractional latencies now, and the XRUNs occur regardless. I’ve plotted the XRUN period with the latency to show there is a clear correlation:

I’m running kernel 5.10.60-v7l+ on a pi4, is there a way I can try a 5.16 kernel without building it myself?

1 Like

Soft mode is basically masking the xruns. It allows clients to remain connected and for buffer overruns to just overwrite or truncate data, i.e. where an xrun would have occurred there is now a discontinuity. The impact may be less, i.e. there is not a partial period of silence (manifesting as an audible click) but sample discontinuity which also clicks - maybe less noticeable. This can also lead to odd timing issues. Basically soft mode may have less impact on the audio but does not stop the problem. It masks the problem which may make it more difficult to track, i.e. without indicaiton of xruns you don’t know the system is overloaded / misconfigured. (I know that we all want the clicks to just stop but we need to know when they happen to track the cause!)

Are there any other usb things plugged in?

I have to disagree with @Baggypants on this. Time is an illusion. Lunchtime doubly so… or to be more accurate, time is a social construct. The idea that fractions of milliseconds should matter any more than milliseconds themselves being fractions of seconds seems absurd!

An xrun is an underrun or overrun (hence the ‘x’ acting as a wildcard in its name). These occur when there is insufficient or excessive data delivered to a software or hardware audio module.

An underrun occurs when the source is slow and delivers too little data in the prescribed time period. The destination must continue without the expected data which (usually) leaves a hole (silence) in the audio (although could leave previous (stale) data depending on the code) which manifests as a click.

An overrun occurs when the destination is slow and cannot receive / process the data before the source needs to reuse it’s output buffer. The data must be discarded which results in a discontinuity and may even lead to a subsequent underrun.

Softmode allows overrurns to write over destination input buffer rather than discarded whole buffer. The same sample slippage occurs but the audible effect may be less impactful. This may have significant effect on the module if it depends heavily on the data, e.g. convolution.

The core aim is to allow each module to complete it’s process of receiving a block of data, processing it and delivering it to its output buffer within the period defined by jackd. This includes ADC and DAC which will run at their own clock rate(s). Jackd should use the ADC/DAC clock hence you should use a samplerate that is native to the soundcard. Many soundcards allow this to be changed, e.g. 32000, 44100, 48000, 88200, 96000 samples per second. I would say this is the first thing to configure so that jackd is running at a rate supported by the soundcard. Consistently timed periodic xruns is an indication that this is wrong. The drift between soundcard samplerate and jackd (unlocked) rate will cause xrun after a quantity of periods when the buffer had emptied / filled hence the xrun occurs with regular period.

Most modules have peaky load. The CPU (and other hardware / software resources) may be able to process a module’s average load okay but may fail occasionally due to a (possibly sustained) peak of high processing. This may result in the module failing to complete it’s processing in a period, triggering an xrun. Reducing the load can reduce the risk of this, e.g. using less demanding configuration / preset. Increasing the available time to process each block of data also reduces this risk, e.g. increasing quantity / size of buffers. This smoothes out the load, giving more time during each processing period for those peaks to be processed. This allows modules to run without xruns if they have high transient peaks but a low average load. It doesn’t fix persistently high load. Increasing the time for proceeding increases the latency.

Jackd process all nodes in its graph synchronously which means that everything has the same latency. If you have a module that needs longer to process is data then all other modules must wait the same time. This means you can’t have lower latency where needed whilst allowing modules that may not benefit from low latency to run slower, e.g. synths with slow attack.

Pipewire does allow different latencies but this requires a greater overhead within the sound system. I am eager to investigate if / when Pipewire can provide comparable performance as jackd and whether such features may provide tangible benefits within Zynthian.

A midi controller and the touch screen for the HDMI display, no change to the XRUNS when they’re both unplugged

1 Like

I updated the kernel from 5.10.60-v7l+ to 5.15.32-v7l+ using rpi-update, and so far there have been zero periodic XRUNs while using -r 48000 -p 128 -n 3!

This kernel doesn’t have the commit originally referenced but it does have a couple of others which may be helping.

1 Like

That sounds promising. How does it behave with -n 2? It should be possible to run with two explicitly defined buffers due to the extra buffering performed by the ALSA driver.

Good news that rpi-update worked. We will have to do some more testing but it might be a relatively simple migration.

AFAIK there’s no way to avoid xruns on usb. I’ve reverted to recording everything on the sd and then moving the files to another machine for editing.