Factory soundfonts library

hannesmenzel · March 25, 2026, 3:10pm

Just rechecked: VSCO woodwinds, brass and strings together are 1.6G, flac’ed would be 800M then.

So, if we would chose to flac all the sample libraries then populating our own repository would be good.

riban · March 25, 2026, 3:16pm

Use FLAC for encoding the audio, ideally at a consistent samplerate (48000 fps?). Any samplerate conversion should be done with attention to quality to minimise any artifacts.

jawn · March 25, 2026, 3:32pm

Is there much overhead in working with FLAC vs uncompressed? I can imagine the reduced IO of SD cards could result in a faster load time in some cases. Only a test can show of course.

hannesmenzel · March 25, 2026, 3:45pm

It’s roughly half the size. For some people this matters :

Aethermind · March 25, 2026, 3:47pm

Hi @hannesmenzel,

Thanks for your and the other thread partakers’ discussion and efforts. I’ve been kind of a beholder of this topic, but, as a soundfont user, I would be obviously very keen to be able to use a carefully crafted Zynthian official collection. My ideal setup, for a purely Z-based virtual orchestration station, would be a bank of four equal rack-mounted Zynthians, each receiving an average 8-10 Midi channels, thus safely and reliably affording up to 32-40 individual instrumental parts. Depending on weight and complexity of the loaded soundfonts, the parts count might be even higher, but I think that 32-40 Midi Ch is more than enough even for the most complex orchestral canvases and arrangements, and a safe bet for a reliably running configuration, with allowance for some occasional system overhead. As for the SF library itself, my standpoint is towards possibly sacrificing some sources, and restricting the available instruments to the more sonically accomplished and professional-sounding, in favour of consistency of expressive parameters, accurate match of volume, similar dynamic response, compatibility of the spacial placement, homogeneity of timbral presence and standardised range of available articulations.

Cheers!

jawn · March 25, 2026, 3:47pm

That I understood and appreciate since I’m using SD cards. I was curious if using FLAC means extra CPU work when loading the soundfonts.

riban · March 25, 2026, 4:17pm

FLAC uses CPU to decode. It is significant but not substantial. I have done some benchmarks in the past (but don’t have those figures to hand) and decided that for extensive use (e.g. in the audio launcher with regular file opening) it can add to the resource load and become noticable, e.g. reduce the total polyphony of the device. For most other uses I found it was okay. With soundfonts, they are often loaded into memory as linear audio so the decoding is done when the file is loaded. This may increase the time to load a preset but then should not have substantial impact on CPU. (This may differ for larger soundfonts or audio files that need to be streamed from disk.)

hannesmenzel · March 25, 2026, 4:22pm

I think at least with sfizz the samples are streamed from disk unless stated in the sfz file (apart from a tiny bit of “note head”).

jpetso · March 27, 2026, 12:33pm

You’d be surprised. On that note, would it make sense to consider Accurate-Salamander instead of vanilla Salamander Grand in the proposed soundfont list on the wiki?

jpetso · March 27, 2026, 12:42pm

One consideration is that, depending on the synth, soundfonts can be lighter on CPU usage and thus allow for more chains or polyphony in parallel. I don’t know if this is an issue in practice with the kind of soundfonts you have in mind for removals.

I imagine that the kind of sounds that are easy on computing resources in actual synths are also the kind of sounds that are easy on disk space in soundfonts.

jpetso · March 27, 2026, 1:06pm

I noticed that your list (which is also reflected in the wiki as of right now) has “Ensemble” sounds (Solo Ensemble, Chamber Ensemble, Orchestral Ensemble) for different orchestral sections - Strings, Brass, Woodwinds, and Reeds.

This organization of sounds does not allow for a blended “Orchestra” sound which includes samples of all of the different sections in a single soundfont. Not sure how commonly these are used in practice by musicians, but both my Dexibell sound module and some orchestral SF2 that I manually loaded onto it (probably a variation of SSO) ship with one such sound.

On a different note, I’m skeptical about reducing leads and pads to just one sound each, given the immense variety in this space. Many rompers and stage pianos have a separate category each for leads and pads. Although, as also pointed out earlier, such devices don’t necessarily ship with separate sound engines that will generate lead and pad sounds algorithmically and with preset collections distinct from this sound organization.

Aethermind · March 27, 2026, 1:45pm

I definitely agree. and for this only reason I would strongly advocate for having generous synth sections, in the coming official Z SF collection.

This conversely is not necessarily true, sometimes even the contrary. While a limited number of mid-long samples might suffice - for a complex patch with a strongly layered attack transient (a-la-D50), since what is really needed is to convey its recognisable timbral signature -, a definitely higher amount of multi-samples tends to be required, to render the timbral richness and organicity of, say, a well-programmed analogue brass preset, like certain mighty orchestral sounds of my Rev2.

Aethermind · March 27, 2026, 2:09pm

Just speaking for myself, but I’ve never been a huge fan of those bread&butter “full orchestra on one key” presets, mostly good just for anthemic pseudo-epic cues, in videogames and blockbuster movies. Definitely not the kind of symphonic writing that a serious composer would consider for a majestic moment, but there’s nothing in contrary if this kind of sketching patch for quick orchestral drafts finds its place, in the Zynthian SF collection.

cfausto · March 27, 2026, 2:18pm

I’d like to add my voice to these statements.

Considering my own use cases, I privilege the availability of (good) soundfonts that allow me more polyphony, even if the actual audio is not as good as the corresponding synth - or even not possible due to system limitations, e.g. xruns. For me, this is particularly relevant for acoustical and 70s electrical pianos, warm pads and saw pads and leads.

wyleu · March 27, 2026, 4:09pm

Action isn’t always the answer.

To leave as is, can on occasion, be the correct decision and shouldn’t be regarded as a failing.

In fact it can be indicative of a problem that isn’t of the moment, because the context around which it is built hasn’t coalesced yet.

It doesn’t in any way preclude people with a desire, a requirement, or just an itch pushing it forward.

Perhaps this is a function best handed off to a bit of disinterested silicon?

jlearman · March 27, 2026, 5:42pm

Regarding Accurate Salamander Grand: Sadly, they didn’t fix the variable latency! ARRGH. It’s obvious even to a ham-fisted player like me, so I just can’t fathom that they did all this refined work with so-called experts, while ignoring one of the most glaring flaws of the original sampleset. I’ll look into what to do about that. Also, the samples aren’t normalized. And release time is the same for all notes, ugh.

I thought about backwards compatibility. I think the best path would be to provide a script that process zss files and replaces any legacy factory soundfont references with a reference to a new location. For example, soundfonts/legacy/oram/… . So, upgrading would be a one-time minor hassle, at the expense of probably less than 2G of space (assuming we convert to 16-bit FLAC.)

Regarding formats: FLAC doesn’t reduce the size of 24-bit files very much. (Google AI says FLAC reduces 16-bit to 50-60% and 24-bit to 50-70%. But my experience is that it’s better (about 45%) for 16-bit, and worse (about 80%) for 24-bit. The reason for bad results in 24-bit audio is that by design, 24-bit IO files have the bottom 4 bits set randomly, which is very uncompressable.

However, caution is needed. Most samplesets have normalized sample files (often, normalized by velocity layer than per file, since it makes the sfz way easier to code.) But some samplesets do not normalize. In that case, 24-bit samplesets need to stay 24-bits. But for normalized files, 16 bits is really very good; and considerably better than a performance recorded in 16 bits. Studies show that nobody can tell a 16-bit recording from a 24-bit one. (This despite that tracks should be recorded in 24 bits whenever possible, for reasons outside the scope here.)

PS: Here’s a histogram of ms of latency (in the Accurate Salamander Grand audio files)
msec | number of files with this latency
0 0
1 32
2 95
3 65
4 8
5 92
6 68
7 24
8 16
9 19
10 1
11 14
12 0
13 14
14 3
15 12
16 3
17 9
18 5

tunagenes · March 27, 2026, 6:15pm

Did you really mean “bottom 24 bits” or is that a typo? Could you point me to where you’re getting that info-idea from, please?

jlearman · March 27, 2026, 6:16pm

ACK! typo! Bottom 4 bits. Thanks! I’ll fix the post.

I’ve known it for decades, but after a bit of research, I suspect it’s true noise rather than intentionally injected noise. I may be conflating it with imaging. In the 90’s I did a little work in medical imaging, with hardware whose “12-bit” A/Ds were actually 10 bits plus two bits of intentional noise. The injected noise increased image quality dramatically by obscuring systematic quantum noise with random noise. If you know what “mach bands” are, you may know what I mean. However, that doesn’t translate directly to audio, because audio has reconstruction filters in the time domain, whereas the artifacts I’m talking about were in the space (up/down/left/right) domain, with no reconstruction filters.

The 24-Bit Delusion - Mojo Audio says

According to the experts who manufacture the finest DAC chips, resistors, and power regulators, there is theoretically no way to make electronics that are capable of discerning much greater than a 20-bit resolution (120dB dynamic range). Any company who claims 24-bit resolution from their DAC is simply full of shit. Oh they can decode 24-bits, because 24-bits does exist on the digital side, but the analog output stage in the world’s best DACs are not capable of resolving much more than 20-bits of dynamic range.

To do better requires supercooling, because the lower limit is temperature-dependent. So, the bottom 4 bits are noise. Whether intentionally injected or just ambient noise hardly matters.

HansR · April 7, 2026, 12:27pm

In the wiki it says that Sfizz can play GIG files.
Afaik only LinuxSampler can play GIG files, Sfizz only plays SFZ.

riban · April 7, 2026, 12:34pm

Fixed!