RT like operation - CPU isolation for special processes

C0d3man · September 29, 2017, 9:57am

Hi all,

here is a simple manual/recipe for some optimization of your Zynthian.
Please note: THIS IS BETA! DON’T USE IF YOU DON’T KNOW WHAT YOU ARE DOING!

What we are doing today is to try to divide processes into nrt (non-realtime) and rt (realtime). Why should we do this?
The standard preemptive Linux kernel is really good for audio processing - jack2 (our audio server) works very well also on high loaded systems. But on top we can say: I want to use 2 or 3 of my cores only for audio and the rest should work for the GUI, etc.

A very nice manual describes this and from there I also noticed there is a good tool for making everything easier: https://github.com/OpenEneaLinux/rt-tools.

Ok, let’s go: We want to device our 4 core of a Raspi-2/-3 (every step is in a script in https://github.com/dcoredump/zynthian-recipe/recipre/rt-tools.sh):

Add the following to /boot/cmdline.txt: cgroup_enable=cpuset isolcpus=1,2,3. The line shoud look like this:
dwc_otg.lpm_enable=0 console=tty1 elevator=noop root=/dev/mmcblk0p2 rootfstype=ext4 fsck.repair=yes cgroup_enable=cpuset isolcpus=1,2,3 rootwait
We are telling the kernel only to use CPU0 (ans not CPU1,2,3). Also we enable cgroup - a feature for advising processes to a CPU-set.
Now clone, build and install https://github.com/OpenEneaLinux/rt-tools.git
Reboot

If we now look at our used CPUs (e.g. with htop) there should only CPU0 be used. This CPU is for our nrt-processes. The other 3 CPUs are for the rt-processes. Now we can use partrt for creating and advising processes to the CPUs.

First calculate a bitmask. The mask for CPU2,3,4 is 0xe (for CPU2,3: 0xc).
Create a CPU-set for rt: partrt create 0xe
Now you can move a process (you have to know the PID) to the rt-CPU-set: partrt move <PID> rt or you can start a new process directly on the rt-CPU-set: partrt run -f 99 rt /usr/local/bin/jackd -R -P99 -d alsa -dhw:0 -r48000 -p256 -n2 -Xraw (here I have also used the option -f99 as FIFO schedular with priority 99).

I tuned my Zynthian for using 2 CPUs for rt and changed the systemd-scripts so they start jack2 and mod-host on the rt-CPU-set.

Note: Every new started process (compiler, …) is located per default on the nrt-CPU-set and cannot advise threads anymore to more CPUs than located in nrt (or rt)! So: Compiling on 4 CPUs is not doable anymore!

And a nice trick for getting some more CPU cycles for other things:

sysctl vm.dirty_writeback_centisecs=1500

This avoids too fast writebacks to the SD card (if I had understand right). This can simply be added add the end of /etc/sysctl.conf.

Regards, Holger

jofemodo · September 29, 2017, 10:21am

This is a great work, @C0d3man!!
I will test it ASAP
Do you have some ideas about measuring the “performance” of these changes? I think it wont be easy …

Regards,

C0d3man · September 29, 2017, 10:44am

For jack and mod-host it is simple to implement. I have made some service files for systemd:
https://github.com/dcoredump/zynthian-recipe/zynthian.stage/etc/systemd

For the engines you have to go into your Python scripts and change the start behaviour of the engines - that’s nothing for me… you know: my python-skills…

On the mentioned web-page they also have some simple examples for tests (cyclictest).

Regards, Holger

silversubie · March 18, 2021, 3:43pm

Sorry for asking this 3 years old post…I follow this instruction to the teeth but for mpd and not jackd. When viewing top, CPU 2, 3 and 4 are still 100% idle? I was expecting 1 of the 3 CPUs will be utilized?

I used partrt move rt to move existing process to rt.

Did I miss another step? Thanks.

riban · March 18, 2021, 5:25pm

@jofemodo did you test this and what were your conclusions? I don’t think we use this technique in current Zynthian release.

C0d3man · March 18, 2021, 5:37pm

It depends also on the software. If it is not optimized for multiprocessing you cannot get any advantage with multiple CPUs.

In fact a serial audio chain (e.g. synth->effect->next effect->…) (IMHO) cannot get any advantage of multiple CPUs because every item of the chain has to wait until the audio block has been calculated by the object before.

CPU isolation may only make sense for holding one CPU free for UI/system calculations and the rest for multiple audio chains (if the host supports multiprocessing).

AFAIK only Pinoteq currently supports multiprocessing.

Regards, Holger

C0d3man · March 18, 2021, 5:40pm

I made some tests three years ago and I think, if there is no urgent need (due to slow UI or something like this) to use it. Nice to know, but not necessary.

Regards, Holger