Latest Oram glitch - turned out to be a bad mcp23017

stojos · June 14, 2024, 5:01pm

Recent Oram updates introduced issue when certain buttons (opt/admin, mix/level, ctrl/preser, zs3/zss, pad/step) would suddenly stop working. All other buttons would continue working including encoders and their switches so this is not hardware issue. I think they would stop working when you by accident long or double press pad button and somehow by mistake end up on arranger screen. All screens all still available navigating to them using either some other path or using touch screen but these buttons listed above can only be reactivated by restart.

If no one experienced this I will troubleshoot more to try to replicate the bug.

riban · June 14, 2024, 5:33pm

It sounds like a couple of things:

There is a hardware bug in the MCP I2C GPI expanders that can sometimes lock up and stop responding to some events. @jofemodo can describe in more detail but it is also mentioned elsewhere in this forum (fairly recently)
Button mapping might get corrupted but this seems most unlikely.

stojos · June 14, 2024, 6:06pm

Thanks @riban.

I know about mcp i2c bug but I don’t think that it is the issue because other switches on the same mcp chip and on the same interrupt are still working without a problem.

I don’t know if this issue is introduced by moving to rpi5 or moving to hifiberry dac2 adc pro or simple because I have a complex sequence that get corrupted with many Oram updates.

I will investigate further, just wanted to check if somebody else experienced something similar.

stojos · June 15, 2024, 12:27pm

@riban it looks like you were right. This was a MCP bug issue caused by unstable/not debounced switch. I re-soldered all switches and capacitors with a stronger solder iron. I remembered that I was not happy first time with iron temperature specifically when soldering GND pins because they are directly connected to big copper GND plate that takes longer to heat up.

I must have cold solder joint on some of the switches or capacitors.

Now I do not have any more loss of button functions.

stojos · June 20, 2024, 2:13pm

This problem is not yet resolved. Mini v2 can still loose the same 8 buttons on rpi5.

Rpi4 works fine - no issues at all.

@riban, told me that rpi5 has a different i2c chip. I am curious if new v5 on rpi 5 is having the same issue with mcp23017 where known chip bug is resurfaced by a different i2c hardware on rpi5.

Wiring of mini v2 is exactly the same as on v5 apart from v5 uses rubber buttons with leds and v5 debounce capacitors are 47nF while mini v2 debounce capacitors are 100nF.

jofemodo · June 21, 2024, 9:41am

Hi @stojos!

I can’t reproduce your issues regarding MPC23017 + RPi5 with V5 hardware. It works flawlessly.
Does the issue trigger while moving encoders or while pushing buttons? How random is it? Could you describe a “kind of pattern” for triggering the issue?

The mechanical switches you use will certainly have faster bounces that the V5 silicone ones. Indeed, they should be quite similar to the encoder switches. Using a 100nF seems OK to me, but you could try other values too. Regarding the encoders, try to place the 10nF as close as possible to the encoder.

Also:

Did you add extra pullup resistors to the I2C lines? The V5 control board has 10K pullup resistors (R1 & R2) for the I2C lines. Note that the position of this resistors is important. Adding extra pullup resistors to to I2C bus will increase a little bit power consumption, but they will also increase bus stability. You should add extra pullups in strategical places, specially if you run the i2C bus across several boards. Also, take care of impedance matching, track distance matching, line cross-talking, etc… I2C bus is quite slow and it would be rare to have this kind of issues, but it’s always better to not forget about them.
It’s also very important, in order to get the MCP working as stable as possible, to place the power decoupling capacitor (1uF) as close as possible to the IC. Use thick and straight tracks to connect it to power and IC, good grounding, etc.

The MCP23017 will run much more smoothly and the issue will almost vanish if you take care of all this things. To be clear: It shouldn’t be like this. It should be much more resilient and robust, but it’s not, so you have to bring to him all your love and care

Regards,

riban · June 21, 2024, 11:33am

If you can’t then make bends at shallow angles, e.g. 45°.

stojos · June 22, 2024, 6:47am

@jofemodo , thanks for very detailed explanation.

Most that you say I followed apart from:

encoder a and b line capacitors are more close to IC then to encoders (encoder switch capacitors are close to encoders)
IC power decoupling capacitor is 0.1uF instead of 1uF.

What is strange is that I never loose encoders or their switches. I always loose a same set of 8 buttons, all linked to the same interrupt. Also this never happens on rpi4.

I can’t reproduce what is causing it. They sometimes stop working when one of these 8 is prest but sometimes not. Sometimes I can feel that something wrong happened because button behaved like long press instead of short. But sometimes not.

Encoder, encoder switches and the other 22 buttons always behave properly without any glitches.

Zynthian is still very usable, it is just not perfect.

I will increase IC capacitor to 1uF and put encoder an and b line caps close to them to check if this make a difference.

stojos · June 22, 2024, 6:51am

Forgot to say that there is no need to restart the whole raspberry to get them working again. Simple zynthian ui restart get buttons working again.

riban · June 22, 2024, 8:21am

What is happening is that some activity on the MCP is causing it to lock-up. After that, there are no more signals received by Zynthian, so if the last activity detected before the failure mode was a switch press, zynthian will not get the switch off and interpret as a long press. This is why you will sometimes see a long press event at the point of failure.

The trigger for the problem is likely to be excessive change of state of the GPI pins. Each change of state raises the interrupt flag (asserts interrupt pin) which triggers code in zynthian to check the state of the MCP. Zynthian avoids interpreting excessive changes of state by debouncing switches in code (using a filter algorithm) hence, you get the expected behaviour, even when the pins are bouncing all over the place.

This means that the hardware can be misbehaving all the time, up to the point of failure whilst zynthian continues to operate normally. Indeed - without the interrupt driven switch detection, zynthian’s enocoder and switch algoritms work really well. The capacitors form a low-pass filter that reduces the frequency of changes of state, acting as a debounce circuit and reduces the false triggers. Zynthian can handle false triggers but the MCP is more likely to break (enter this failure condition).

The MCP can be reset by software to resolve this state and zynthian does this reset at startup, hence restarting the zynthian code will clear the fault condition. We are looking at how we might use that to periodically reset the MCP during runtime to reduce the risk of this issue - but you are experiencing it more than the official device, suggesting something is awry with your configuration.

Have you tried a different MCP? It is plausible that you have one from a bad batch that is more susceptible. You can also try all the suggestions that @jofemodo has offered. Zynthian did not magically appear, fully formed - @jofemodo spent many hours designing and testing until it is optimised hardware design.

PSU decoupling around ICs is generaly considered as 2 issues: HF & LF which requires 2 solutions to filter the potential high frequency oscillation and low frequency ripple:

Ripple will occur due to supply rails sagging. As each component draws different current, there will be a potential difference across the resistance of the wiring / PCB tracks. The larger (1uF) capacitors, positioned around the board should soak up that ripple, acting as small resevoirs - filling with charge when they can and providing it back into the circuit on demand. You want a short distance between them so that they can supply sufficient current quick enough to all components and keep their supply voltages close to level/even. You place them close to the components drawig more current or more suceptible to ripple, e.g. an audio chip might have observable adverse effect if its supply is rippling.
Oscillation happens when the circuit becomes unstable due to positive feedback loops. Small, hf current can pass though all sorts of components and is difficult to micro-design out so, we tend to place an abundance of small (e.g. 100nF) distributed across the board, very close to each IC supply pins. This is a pragmatic way to reduce the risk of hf oscilations building.

Typically, each chip working at high frequencies will have “one big and one small” capacitor across its supply pins. Where there are multiple supply pins (like on a STM32 microcontroller) you may do this for each pair of supply pins.

Here endeth the lesson.

jofemodo · June 22, 2024, 9:57am

Try this first

stojos · June 28, 2024, 8:30am

Just to report back that all these problems were due to faulty mcp23017 chip. It is strange because it is from the same batch that I used for last few builds and never had the same problem.

I have replaced it and now all buttons work on rpi 5 too. It is working flawlessly for few days.

tunagenes · June 28, 2024, 12:18pm

Be sure to Widlarize the offending mcp23017!