To allow a computer to interface with audio, the computer must process all audio at the same rate. This is referred to as the samplerate. Even the smallest of deviation from this will lead to too many or too few frames being available to process which leads to over-runs (too much data so need to throw some away - click) and under-runs (too little data so need to pad - click). These are grouped together under the term xruns. This is true of any audio system but we use JACK so I will refer to that.
JACK connects to a single soundcard and uses that soundcard’s clock to drive its internal samplerate. The soundcard uses the same clock for input and output so all data arrives and leaves JACK at the same rate. No xruns. (xruns can occur within JACK due to other resource issues such as a plugin unable to process data within its allocated time, but we are just talking about physical audio interfaces here.)
If you add another soundcard, then it will use its own clock which will always be different to the one that JACK is locked to. (They may both be configured to run at 48000 fps but one will be slightly faster than the other and both will drift / jitter.) This will lead to audio from the second soundcard glitching as JACK sees xruns. Some high-end soundcards have mechanisms to lock them together, e.g. wordclock. These can run at the same rate and can be configured (in ALSA) to appear as a single soundcard but few of us are willing to invest the large amount of money for such cards. (It may be possible to run some lower-end soundcards in this way, e.g. I think some Behringer USB devices can be locked together with SPDIF.)
A way to fix this is to samplerate convert the extra soundcard’s audio. There are software modules called alsa_in
and alsa_out
that do this, connecting to an ALSA device and presenting the audio to JACK as an audio client, with the corresponding audio input/output, samperate converted and locked to the JACK clock. This reduces the risk of xruns but has significant overhead. It was this process that I was playing with. We already do this for the headphone output which uses the Raspberry Pi onboard sound as an extra soundcard but it is discouraged due to the extra resources used. My experiments last night were to see how much effort would be required to make it work. We would also need to benchmark the process to see how much it impacted performance which I suspect will depend partly on the audio interface used, e.g. last night I used a nasty cheap USB adapter that I know is quite resource heavy but a more professional interface may be lighter.
Pipewire does this samplerate conversion at the edge of its graph, allowing any audio source at any samplerate to work which is how it provides the simple user experience - it just works. But it is done at the expense of system resources. I am yet to see any data on the efficency of Pipewire but each time I look, the best I find is that it is almost as good as JACK.
I may play further with this. It would be good to have user configurable option to allow hot-plugging of USB audio devices. Who knows - maybe we can make it just work.