API Proposal

Hi

There have been a few topics here discussing implementing an Application Programming Interface (API) but none have progressed too far. I have been working on this a bit lately and have a high-level proposal for the kind of approach and type of monitoring and control that we may want / need to implement. I have intentionally avoided considering the low-level technology to implement this because I don’t want the high-level design to be driven or constrained by preconceptions of specific technology.

The aim is to separate user interface from core functionality, providing a core accessible via the API and allowing different user interfaces to be designed. These may include the current GUI with 4 encoders, simple stomp boxes, web interfaces (like webconf), etc.

Here is the initial high-level proposal which I publish here for comment but should probably live somewhere (like GitHub). Please take a look if you are interested in using an API and comment on errors, omissions, etc. We will soon be looking at the low level technology to implement this. The API may replace CUIA and other access methods, e.g. direct OSC monitoring and control of mixer.


Zynthian API proposal

Purpose

Zynthian-core is (will be) the module that provides core functionality for Zynthian. This includes managing audio and MIDI processing engines, signal routing, monitoring, etc. The API will provide control and monitoring of zynthian-core allowing different user interfaces to be connected.

The current implementation of Zynthian (2109) does not separate UI from core functionality. This will be refactored to separate zynthian-core from zynthian-ui (and other elements of ZynthianOS). An interim API may be implemented by wrapping 2109 zynthian-ui to present the API as if only zynthian-core is running. There is an overhead to this process which will give reduced performance.

High Level Design

A model-view-controller approach is proposed. Zynthian-core will contain a model that holds the current configuration and data. This is equivalent to the current snapshot, containing all data to describe the state of the system. It will also contain much of the controller, allowing the control of the model although some elements of the controller may be implemented in the user interface modules. Interaction with users will be implemented by the view which may be any user interface. The UI may also implement some controller elements to enhance functionality, e.g. add extra features not embedded within the controller / model. The API will provide Create/Read/Update/Delete (CRUD) type access to the model and functions of zynthian-core as well as real-time functions that do not fit the CRUD access model. The API will be extensible to allow addition of data without breaking the API and use major.minor version to indicate breaking / non-breaking changes.

Create - Clients may request elements be added to the model.

Read - Clients may request (almost) any data within the model. This may be polled and/or accessed via a subscription + publish mechanism to allow registration by API clients for data to be dynamically published to the client as the data changes. (These may be considered the pull and push mechanisms.) These two approaches allows for static or slow changing data to be requested / polled whilst more urgent or dynamic data to be published in a timely manner or for an event driven client. Not all data will be available to the subscription + publish model to avoid excessive code and overhead - not all data is required.

Update - Clients may request changes to data / states.

Delete - Clients may request elements be removed from the model.

There may be a permissions scheme applied for clients to allow / restrict some operations. This is not high priority but may prove advantageous, e.g. allowing tightly bound modules fuller access than remote control devices.

Detailed Design

The specific technology will be chosen to suit requirements. This will consider overhead, latency, security, likely (forecast) clients, etc.

Client connections may be session or non-session based. It should be possible for a client to make a request without expecting a response. It should also be possible for a client to validate a request has succeeded. This may be via a session based connection, a data subscription or a poll.

Sessions - A session is a bi-directional connection between client and server. The client connects to the server and remains connected throughout the session. Requests made by the client can be responded to by the server over the connection. The server may know about current sessions hence be able to send dynamic data to connected clients.

Connectionless - Connectionless is a uni-directional transmission of data. A sender may send a request or data to a receiver. There may not be a concept of connection so any dynamically updated data may need a keep-alive and timeout mechanism to avoid dead clients being sent data. A client may subscribe for data then request a change and be notified of the change through the subscription publication. It will also be notified via the same mechanism for changes made by other clients or the core.

Choice of session or connectionless based API will be considered with use cases and likely (forecast) clients. One or both may be adopted. The higher-level concepts of subscription + publish, etc. will be implemented and selected lower-level data transfer mechanism(s) will implement this.

Specific Data

The following is a non-exhaustive list of data that may be implemented in the model / API access methods and real-time messages. The suffix in square braces identifies how the data may be manipulated by the API [CRUD].

  • Mixer - Multiple channels of audio mixed to a main stereo mix-bus

    • Fader level per chain [-RU-]
    • Balance/pan value per chain [-RU-]
    • Mute state per chain [-RU-]
    • Solo state per chain [-RU-]
    • Mono per chain [-RU-]
    • Peak programme meter level per chain per audio channel [-R–]
    • Peak programme meter peak hold per chain per audio channel [-R–]
    • Chain [CRUD] *Testing branch binds chains to mixer channels - may abstract this if more flexibility required otherwise may add:
      • Audio [-R–] Boolean: True if audio chain hence enable audio mixer channel - feels wrong but is pragmatic reduction in dataset
  • Chain - Group of interconnected engines

    • Quantity of chains [-R–]
    • Name [-RU-]
    • Index [-RU-] *This is currently inferred from MIDI channel - it may be advantageous to separate this
    • MIDI Channel [-RU-] *This is a base design constraint that requires clone to extend - should we instead allow parallel MIDI engines within a chain and assign MIDI channel to engine?
    • Note range [-RU-] *Should this be implemented at engine level?
    • Transpose [-RU-] *Should this be implemented at engine level?
    • Engines within chain [CRUD] *Need to represent the map of how engines are interconnected - currently done by routing lookup but may benefit from a logical map (note that some devices place constraints on such a map to simplify design / UI / UX)
  • Engine - Instance of an engine class

    • Engine class [-R–]
    • MIDI control mapping - MIDI learn [CRUD]
    • Selected preset [-RU-]
    • Preset modified [-R–] Boolean: True if any parameter differs from preset
    • Parameter values [-RU-]
    • Chain [-R–] *This is a reverse lookup or double linked data - might be implemented virtually rather than maintain redundant data
  • Engine Class - Types of audio / MIDI processors / generators

    • Available engine classess [-R–] *List of available engine classes
    • Name [-R–]
    • Type [-R–]
    • Nodes [CRUD] *List of inputs and outputs
    • Banks which are groups of presets [CRUD]
    • Presets which are engine configurations [CRUD] *Advantageous to present bank/preset relationship in both directions
    • Parameters which are configuration elements of engines [-R–]
      • Name
      • Type
      • Default value
      • Range / permissible values
      • Steps
      • Units
      • Group (allow grouping parameters, e.g. ADSR within envelope group)
      • Each instance also has associated MIDI control parameters:
        • Channel
        • Control
        • Range / mapped values
  • Routing Graph - graph of MIDI and audio routes

    • Nodes [CRUD] - List of nodes in graph
      • Type [-R–] Audio | MIDI
      • I/O [-R–] Source | Destination
    • Interconnects [CRUD] - List of connections between nodes
      • Type [-R–] Audio | MIDI
      • Source [-RU-] - source node
      • Destination [-RU-] - destination node
  • Presets - Saved configuration of an engine

    • Parameter values [-RU-] *May be a delta from the default or a full set of parameter values in which case [CRUD] - TBC
    • Name [-RU-]
    • Favourite [-RU-] Boolean: True if a favourite - may be read on bulk to get list of favourites, optionally filtered by engine, tag, etc.
  • Snapshots - Full model data

    • Available snapshots [CRUD] *List of snapshots but also methods to create, update and delete snapshots
    • Name [-RU-]
    • Data [-RU-] *Full state model
  • Physical UI - Switches, encoders, potentiometers, etc

    • Quantity of switches [-R–]
    • Quantity of Encoders [-R–]
    • Quantity of Potentiometers [-R–]
    • Switch value [-RU-] *Allow trigger of switch via API
    • Encoder parameters [-RU-] *Allows core to present encoder as a data element
      • Value
      • Minimum
      • Maximum
      • Step
    • Potentiometer parameters [-RU-] *Allows core to present encoder as a data element
      • Value
      • Minimum
      • Maximum
      • Scale
  • Step Sequencer - Manipulation of patterns, sequences, etc.

    • See zynseq.h for exposed methods that may be exposed via API
    • Pattern [CRUD]
    • Track [CRUD]
    • Sequence [CRUD]
    • Song [CRUD]
  • Real Time Messaging - Interface for low latency, real-time messages *Some other messages may need to be reduced latency, e.g. mixer control and monitoring though maybe not real-time, i.e. there may be different levels of latency message type

    • Send MIDI [----] *Ability to send a MIDI message to a defined set of MIDI destinations
    • Register for MIDI events [CRUD] *Request MIDI messages are sent to client when received - filter on message type, value ranges, source, etc.
    • Transport [-RU-]
  • System - Monitor and control the overall system

    • Uptime [-R–]
    • Errors [-RU-] *xruns / voltage / temperature since reset
    • Restart core [–U-]
    • Shutdown [–U-]
    • Reboot [–U-]
    • Panic [–U-] *All notes / all sounds
    • Audio recording [CRUD]
    • MIDI recording [CRUD]
  • GUI - Zynthian specific monitoring and control of GUI, extensible / optional to allow system specific implementation

    • Reload MIDI config [–U-] *Currently loaded from file - may be replaced with direct control of config
    • Reload key binding [–U-] *Currently loaded from file - may be replaced with direct control of config
    • Last state action [-RU-]
    • Selected screen [-RU-]
    • Selected chain [-RU-]

[Edit] This text is updated to reflect discussion and decisions made below.

12 Likes

that’s a great high-level definition for such a complex system, good job!!
It will be interesting how this implementation might enable the community to better integrate their own new spin into things, such a system reminds me of the community behind monome norns and how It’s (admittedly with a steep learning curve) more accessible approach to a code-based DSP environment has had user-developed content become part of the original project and device, making the it all way bigger then the sum of it’s parts.

Here’s to zynthian’s bright future, with such great minds at the wheel who know’s where we’ll go.

Hi @riban !

Great work, mate! The time is coming … in fact it’s almost now. Let’s do it!

Thanks!

Just a clarification:
what you call “nodes” there, corresponds pretty much to what in the old ui and ui code is called “layers” ?

No, a node within this description is an engine. I have renamed “Layer” as “Chain”. (In fact, “Layer” is used ambiguously within current Zynthian code but what a user would think of as a Layer is now Chain - at least in this iteration of the API.)

[Edit] I may change “node” to “engine” or similar to release the word “node” for use as a point in the connection graph, e.g. audio output, etc.

I would say a node is “an engine instance” inside a “chain” ;-b
Of course, you can have a single “chain” with a single “engine” instance, for instance if you only want to use Pianoteq.

Best Regards,

Regarding the protocol, i would suggest json-rpc:

or gRPC:

Both are lightweight and easy to implement on almost any language.

gRPC is probably better, but i only have experience with json-rpc, so we should do a deeper research.

Any other proposal?

Regards

1 Like

I think we need a low-level interface that allows close binding so that core functionality such as the main UI can be integrated with minimal overhead and latency. I suggest C/C++ and Python bindings. We can then add higher level remote interfaces such as the two that @jofemodo mentions.

Hi,
On the “view” side, is it planned to use the actual graphical toolkit ?

Of you have an engine with effects are the effects then nodes along the chain?

@le51 The view is the GUI so reimplementing the current UI would put the tkinter code in the view. Other UI may be implemented using different toolkits or even hardware, e.g. a foot pedal might be considered a view of it included some configuration element. (Switches themselves are part of the controller.)

@Baggypants Any engine is a node (or maybe an engine) whether that be a source like a synth or a processor like an effect. A chain is the combination of engines in a serial and/or parallel chain. The effects within an engine are not treated separately, e.g. if a synth has reverb then that is part of the synth engine/node.

I’ve looked at your code for the ZynMixer patch, it’s clear and commented :+1: but the “view” part is the trickyest.

Regarding the API

I’ve just found this maybe it could be of some interest.

Mentat is a HUB / Conductor for OSC / MIDI capable softwares. It aims to centralize all controls in one place, manage their state and create routings.
Mentat is a module for [python 3]

They are engines, modules, routes …

Mentat doc
Github repo

I would like to maintain an OSC interface and expand / rationalalise it to match the new API. We can also add other remote access interfaces. We could farm out each interface to different developers. We shouldn’t spread ourselves too thinly. We don’t want lots of half finished interfaces.

I think we should start with c / c++ and python bindings then add a remote interface, maybe OSC first as that is already partially implemented, simple and we’ll understood by some of us.

Of course. My proposal is for “remote API” access.

I like using “node” for an “engine instance” inside a “chain”. Please, don’t use “engine” alone for meaning a “node”, because it would create confusion. We could use “engine instance”, but “node” is shorter and cleaner :wink:

So let’s define what a chain should be. It could be like this:

  • Every chain is composed of “nodes”, that are audio/MIDI processing units (engine instances!). Nodes are interconnected inside the chain. We should discuss the details of this “interconnection”. FMPOV, this is one of the hottest points :wink:

  • Every chain is attached to a mixer channel (mixchan). This could be the chain’s ID, what do you think @riban? Or do you think it could be useful to assign several chains to the same mixer chain?

  • Every chain receive MIDI messages from a list of MIDI sources. Each midisrc is defined by a mididev and/or a set of midichans. Mididev or midichans could be empty, meaning that it applies to all devs or MIDI channels. Of course, the midisrc list could be empty, what means this chain doesn’t receive MIDI from any source. Please, note this doesn’t affect the MIDI learning mechanism!
    Of course, we have to rethink (and rename!!) the “stage-mode” with all this on mind. @riban suggested to flag input devices for including/excluding them into the “stage mode mechanism”. I think it’s a good idea.

  • Every chain have a standard MIDI input filter that allows:

    • limiting key range
    • transposing by octaves and semitones
    • velocity curves? others ???
  • Every chain sends MIDI messages to a list of MIDI destinations. Each mididest is defined by a mididev and/or a midichan. An empty mididev means it sends to all output devices. If midichan is set for a mididest, it means messages will be translated to this channel before sending to the output device.

I would like to extract some keywords for including into our “developers dictionary” and avoid further confusion while increasing efficiency on communication:

  • engine
  • node (engine instance)
  • chain
  • mixchan
  • midichan
  • mididev
  • midisrc
  • mididest

Ideas? Suggestions?

Enjoy!

A node (or vertex) is a point within a graph and an interconnect (or edge) is a connection between nodes. It makes sense for us to map the jack graph into Zynthian data model so I propose we reserve the word node for a node within the jack graph, i.e. an individual audio or MIDI input or output.

We have types or classes of engine from which we create instances. I would prefer to use engine class to describe the type of engine and engine to describe the instance. If we compare the analogy, A Land Rover Freelander may have a TD4 engine. We refer to the engine in each car (which is an instance of a TD4 class of engine).

Actually this may not be true. A purely MIDI chain will not connect to an audio mixer channel. There may be an argument for handling audio and MIDI chains separately.

I don’t think this gives significant benefit and adds complexity which can lead to confusion. One mixer channel per audio chain.

This is similar to the graph in that a chain may receive MIDI messages from zero or more MIDI source nodes. We should consider how this is optimised (routed in jack / filtered within zynthian module). Should we limit to this or do we allow routing MIDI to any engine within a chain? If there is a constraint that a maximum of one engine in a chain benefits from MIDI input this would be fine, e.g. only one synth engine exists to be triggered by MIDI events or only the first MIDI effect receives MIDI input. But what if we want more complex configuration, e.g. a chain contains multiple manuals of an organ with each expecting a different MIDI controller input?

I don’t understand what a mididev refers to. @jofemodo please explain.

Yes but it should be bypassed by default and when the settings are neutral, i.e. if a user sets transpose to zero then there should be no processing performed. A similar thing could be put into audio inputs, e.g. a gate that is only instantiated if non-neutral setting selected. On that point - I think the audio inputs should default to being routed to mixer channels - probably mono, i.e. input A goes to channel 1, input B goes to channel 2.

Such a dictionary or glossary should include definitions too, e.g.

name description
engineclass A class or type of MIDI or audio generator or processor, e.g. Fluidsynth
engine An instance of an engine class
node A MIDI or audio source or destination within the jack graph
interconnect A MIDI or audio connection between two nodes
chain Group of nodes, and interconnects
mixchan Channel strip in audio mixer

I am not sure if we need midisrc and mididest if we have nodes which we might describe as midi source node. We need to distinguish between MIDI and audio nodes and between source and destination nodes. We either do so with modifiers like MIDI, audio, source, destination or by having 4 different entries in our glossary, e.g. midisrc, mididest, audiosrc, audiodest which are all nodes in the jack graph.

The API should avoid constraints although the implementation may have constraints which should be validated by the API, e.g. the audio mixer is currently 16 channels maximum but this may change.

[Edit] I have updated the initial post (high-level API proposal) with some of these suggestions - conversation can continue…

ok, this clarifies things a lot, so a chain is a chain of engines, which one would be the engine main synth, the other ones the effects…

Close enough! A chain of engines can be quite complex with audio generators (like synths) and / or effects and / or MIDI processing. For synth chains we tend to think of the synths as the root engine with a chain of MIDI processing engines feeding its input and a chain of audio processing from its output feeding the mixer. The chain can have serial and parallel paths.

A couple of things about snapshots:
for the functionality we need, we also had to add the function of a “partial” snapshot that encompasses just a subset of layers (chains?) (we do save a layer and everything in midi cloned to)

this to share online a particular sound with all its settings

we would need that kind of functionality as well

Yep! The interface (which is the next task to define) will allow for loading parts of a snapshot expanding on the current functionality. This will allow, amongst other things, the ability to load / overwrite / merge a filtered selection, e.g. just chain 1. I think the high-level description which says snapshots can be created, read, updated and deleted allows for this. I noticed last night that the section on Snapshots is short but it kind of has to be otherwise it would restate everything else as it is the whole data model.