Bio-inspired multimodal learning with organic neuromorphic electronics for behavioral conditioning in robotics

June 4, 2024

66 12 minutes read

Bio-inspired multimodal learning with organic neuromorphic electronics for behavioral conditioning in robotics — 41467 2024 48881 Fig1 HTML.png

The robotic system is based on the Arduino Braccio Kit (Fig. 1a), with five degrees of freedom and an additional movement option for opening and closing a gripper. The gripper acts as the hand of the robotic manipulator and is equipped with four sensors that continuously collect multimodal sensory stimuli of pressure, distance, temperature, and color tone when manipulating objects (Figs. 1a and 1b). A custom gripper setup is realized to accommodate the collection of multimodal sensory signals in a hand-like shape (Fig. S1 and “Methods” section). Different cups (dark/hot, white/cold) are placed sequentially near the robotic system so that it is able to either pick them up or refuse them. Each movement of the robot follows an autonomic sequence of specified moves that provides a behavioral baseline for any action taken. The movements vary between a pick-up action with a grab or no-grab option in the end, a drop action that concludes a successful grab, and a pull-back action to avoid the cup that functions as a no-grab. These actions are driven via an Arduino Uno that operates the motors of the robotic setup. The motor commands are continuously modulated by sensory stimuli from the environment, i.e., a detection of a cup in close proximity with the hand or a pressure applied due to a successful grab, creating a real-time response of the robot to its surroundings (that is, the object of interest). Without any prior external influence, the robot is in an explorative state in which it incidentally picks a cup or not with the grab or no-grab actions initially taken randomly (Fig. 1b). Whenever a cup is discovered (grabbed) by chance, it inherently leads to new sensory sensations. An analog trainable neuromorphic circuit (Figs. 1a and 1c) interacts locally with the sensory signals and allows learning via adaptive associative connections necessary for behavioral conditioning (Fig. 1b, right). The organic neuromorphic circuit comprises of organic electrochemical devices, OECTs and ECRAMs, that are either volatile or non-volatile respectively (Fig. 1d). The output voltage \(\sum V\) of the organic neuromorphic circuit depends on the conductance state of each organic electrochemical device and reflects the sensory signals in an event-driven nature. \(\sum V\) merges the input branches of electrical circuitry similar to the dendritic summation of multiple neurons via the synapses (Fig. 1b, right).

Fig. 1: Robotic manipulator with an organic neuromorphic circuit using bio-inspired learning.

a A robotic manipulator with a custom-made gripper is equipped with four multimodal sensors. The sensory stimuli are processed adaptively via specialized hardware and condition the grasp behavior of the robotic system. b The robot employs the following bio-inspired principles for learning: an exploration of its environment through random movement, collection of multimodal sensory inputs and adaptive processing leading to behavioral conditioning. c The robotic system is connected to a local organic neuromorphic circuit that emulates neuronal processing, such as short-term and long-term synaptic plasticity and dendritic summation. The neuromorphic circuit consists of organic electrochemical devices. d Schematic architecture of an organic electrochemical device based on the semiconducting polymer p(g2T-TT) and a solid-state electrolyte based on the ionic liquid EMIM:TFSI. The device is defined by three electrodes (gray): source (left), drain (right) and gate (top). The polymer is distributed between the source and drain terminals (blue) and exhibits mixed electronic-ionic conduction. Anions (dark blue) from the electrolyte can penetrate into the polymer bulk leading to the formation of holes (white) along the polymer backbone and changing its conductivity. The drawing of the full robotic arm in Fig. 1a is based on the Arduino® Braccio Kit image by Arduino under the CC BY-SA license.

The organic neuromorphic circuit consists of four micrometer-scale organic electrochemical devices (Fig. S2 and “Methods” section), mimicking synaptic plasticity and, therefore, exhibiting neuro-emulating functionality. Two of these devices function as OECT and operate in a volatile, short-term manner (indicated as ST). The other two devices operate in a non-volatile manner as ECRAM with long-term effects (referenced as LT, Fig. 2a). The four devices are arranged in two branches (+ and -) that each contains a volatile and a non-volatile element in series. The combined output voltage is the sum over both branches: \(\sum V={V}_{+}+{V}_{-}\). This closely resembles the dendritic summation of multiple presynaptic signals at the synapses of a postsynaptic neuron (Fig. 2b). Each branch also displays an intrinsic associative adaptation due to the interplay of OECT and ECRAM. If loaded with a (adaptive) resistive load, the OECT changes its operating regime and thus its transconductance (Fig. S3 and S4). The transconductance represents a tunable sensitivity towards the sensory stimuli that can be strengthened or weakened via the ECRAM leading to an inherent association between the two stimuli at OECT and ECRAM.

**Fig. 2: Characterization of the organic neuromorphic circuit.**

The output voltage \(\sum V\) is translated into a motor action through an activation function that relates the signal to a behavioral probability (Fig. 2c). The activation function is sigmoidal and proportional to the widely used activation function hyperbolic tangent (tanh), converging around 1. It is executed on the Arduino Uno and while this is part of the processing, it only provides a static, fixed translation of an analog output voltage such as \(\sum V\) into a behavioral movement pattern. The output voltage is also interpreted in terms of probability, which means only determines the probability for a certain action, but not necessarily the action itself. The non-deterministic and fail-prone behavior in biological systems causing new sensations is one of the reasons for their remarkable adaptability in unknown situations⁴⁰. While the Arduino Uno relays signals from the organic neuromorphic circuit to the robotic setup, it operates solely as a translator/mediator and has no agency on the behavior of the robotic agent. In order to react to the environment, the neuromorphic circuit handles optical, thermal, and mechanical stimuli. A color and proximity sensor are used for gaining information on objects (i.e., a cup) from afar/without contact and drive the gates and thus (trans-)conductance of the volatile devices, \({G}_{{ST}+}\) and \({G}_{{ST}-}\). A pressure and temperature sensor feed a signal on contact to the non-volatile gates of the neuromorphic circuit, \({G}_{{LT}+}\) and \({G}_{{LT}-}\) providing the necessary impulses for learning and conditioning. Via the series connection in the circuit layout, the (+)-branch then combines the sensory input of pressure and proximity in a single information stream leading to the output voltage \({V}_{+}\). This functionality is mirrored in the (-)-branch coupling temperature and color resulting in signal stream \({V}_{-}\). We employ off-the-shelf sensors for collecting sensory input which provides lifelike, noise-containing data (Fig. S5, see sensor section in “Methods”). The sensory signals undergo basic pretreatment through an additional analog hardware unit to align with the low operating voltages (≤1.0 V) of the neuromorphic devices (Fig. S6, see sensor section in “Methods”).

The robotic system follows its movement patterns remaining in an explorative state until it starts interacting with the environment and receives new sensory stimuli. These stimuli change the output voltage \(\sum V\) momentarily or permanently leading to an event-driven and adaptive behavior.

The neuromorphic circuit consists of volatile (OECTs) and non-volatile (ECRAMs) organic electrochemical devices. These devices utilize the semiconducting polymer poly(2-(3,3′-bis(2-(2-(2-methoxyethoxy)ethoxy)ethoxy)-[2,2′-bithiophen]−5-yl) thieno [3,2-b] thiophene) [p(g2T-TT)] as the channel material and are controlled through an electrolyte. The modulation of the electronic current within the channel, specifically the conductance state, is achieved through the application of an ionic gate current⁴¹. The polymer p(g2T-TT) displays mixed ionic-electronic conduction by supporting the transport of both holes and ions. This polymer serves as a versatile platform for various functionalities and is suitable for both short- and long-term devices depending on the probing conditions^34,42. Hence, the organic neuromorphic circuit allows for monolithic integration of both volatile and non-volatile functionalities with the same polymer as the channel material of the transistors. It exhibits a wide range of well-defined conductance states (with a > 100 on/off ratio), high linearity, sensitivity to gate pulses (ranging from μS to mS), and stability (>10^9 write-read operations)^42,43. The low-voltage operation (≤ ± 1 V) and compatibility with solution-based processing methods contribute to high energy efficiency and cost-effectiveness. While short-term (volatile) and long-term (non-volatile) synaptic devices share a similar device architecture, their primary distinction lies in the device configuration. For the short-term effect, the gates are directly linked to the sensor signal. Conversely, in non-volatile devices, a switch with a current-limiting resistance of \(100M\Omega\) is connected in series to the gate, inducing an open-circuit potential when no sensor signal is applied (see Methods). This induces a lasting change in conductance, inducing long-term (non-volatile) synaptic memory phenomena. We adopt a side-gate device architecture with a solid-state electrolyte comprised of the ionic liquid [1-ethyl-3-methylimidazolium bis(trifluoromethylsulfonyl)imide (EMIM:TFSI) embedded in a polyvinylidene fluoride-co-hexafluoropropylene (PVDF-HFP) polymer matrix (see Methods).

The device characteristics of the neuromorphic circuit are shown in Figs. 2d–2f in the face of the volatile and non-volatile synaptic devices respectively. We attain low voltage operation for all components of the organic neuromorphic circuit and write currents <5 nA and conductance values < 100nS for the ECRAM (Fig. S7) indicating low energy demands of the circuit³⁰. We achieve stable performance with a minimal hysteresis for the volatile synaptic device as shown in the output (\({I}_{D}\) over \({V}_{D}\)) and transfer (\({I}_{D}\) over \({V}_{G}\)) characteristics (Figs. 2d and 2e, respectively). The transconductance \({g}_{m}\) (Fig. 2e), also described as the device sensitivity, depends on the gate voltage but can also be influenced via the drain voltage. An OECT switched in series with a resistive load \({R}_{L}\) moves its operation from linear to saturation depending on \({R}_{L}\) as detailed in⁴⁴. The ratio of resistances between load and OECT is critical and a substantial ratio change (\(\frac{{R}_{{OECT}}}{{R}_{L}}=1\to 50\)) is necessary to achieve a significant change in the output voltage (\({V}_{{OUT}}=\,\frac{{V}_{{SUPP}}}{2}\,\to 0V\)) and in the amplification of the gate voltage through the transconductance (Fig. S3). An additional measurement of the voltage output for an OECT loaded with different resistances is provided in Fig. S4. Replacing the resistive load \({R}_{L}\) with the non-volatile synaptic device (LT), as in our circuit topology, prompts similar changes in voltage level for the branch voltages \({V}_{+}\) and \({V}_{-}\) and in the transconductance of the OECTs. This change in transconductance of the OECT and therefore change in output voltage causes an inherent link between the two gate stimuli, a form of associative learning. Figure 2f shows the programming characteristics of the non-volatile synaptic device. which displays high on-off ratio across orders of magnitude with linear switching behavior and stable state retention (zoom-ins) for long-term plasticity at very low programming voltage (\(V\le \left|0.2V\right|\)). The conductance states are adjusted reversibly by applying gate pulses of opposite polarity. These long-term conductance changes in the artificial synapses create the memory effect needed for learning and adaptive behavior.

Overall, the learning process of the robotic manipulator is shown in Fig. 3. The organic neuromorphic circuit combines the collection of multimodal sensory stimuli with neuronal processing leading to associative connections and behavioral consequences. Therefore, the robot learns to avoid potentially harmful objects like a hot cup. Initially, the robotic system is an explorative state in which it experiments with different behaviors, in this case grabbing or non-grabbing action (Fig. 3a). As a baseline behavior, the robotic system is already able to grab a cup, but this occurs at random and is unrelated to any external stimuli (i.e., the trait of a cup). It operates undirected and associative conditioning is latent and thus yet to be formed. Sensory cues are already present but lead to no change in behavior via the activation function. Initially, only standard (cold) cups are used as objects which render the (-)-branch (Figs. 2a and 2b, orange bolt) of the neuromorphic circuit reacting to temperature inactive for now. An object (i.e., a cup) gets registered by the proximity sensor, causing a short-term peak of \({V}_{+}\) and subsequently of \(\sum V\) (Fig. 3a). A longer peak in this context means that the cup is picked up (checkmark ✓) and held until the follow-up drop action, a shorter peak indicates that the cup is indeed detected but not grabbed (cross ✗) (Fig. 3a and Movie S1). To showcase the random behavior of the robotic agent over time without learning, the training signals are disconnected from the non-volatile synaptic device for this experiment to prevent any adaptation. With all sensor connections restored, the organic neuromorphic circuit adapts to the sensory cues from its environment. Whenever the robot successfully grabs a cup, the pressure sensor on the gripper directly forwards a signal to the non-volatile synaptic device (\({V}_{G,{LT}+}=\pm 0.5V\)). This happens in addition to the peak shown before, which was provoked by a pulse from the proximity sensor at the gate of the OECT (\({V}_{G,{ST}+}=\,-0.25V\)). The activation leads to an increase in voltage \({V}_{+}\) (Fig. 3b). The probability for a grab behavior therefore changes represented as the background color (light to darker blue) in Fig. 3 and consequently the overall behavior shifts from random to systemic (Movie S2). A darker blue tone indicates a high probability of grabbing a cup. From Fig. 3b, it is apparent that a certainty in behavior develops only for the simultaneous occurrence of long-term synaptic change (increase in general voltage level of \({V}_{+}\)) and the short-term change during the detection of an object (peak in \({V}_{+}\)). In between peaks (that is, in between object detections) the probability declines again (lighter blue), so an inherent associative link between object proximity and the grabbing action (the training pressure signal) is formed, similar to biological associative learning or respondent conditioning (Pavlovian response). Complete adaptation is achieved after 14 training steps and the robotic manipulator consistently grabs the cup if it is close by (Fig. 3c, checkmarks and Movie S3). This behavior is also resistant to instabilities and imperfect sensor signals that can be caused by non-optimal grip and/or shifting and slipping of the object during grasping (seen in the last peak of the measurement, Fig. 3c at 90–95 s) and maintained stably over time and under movement.

**Fig. 3: Behavioral change of the robotic manipulator upon adaptive processing of multimodal stimuli.**

Complex tasks can often be broken down into smaller components that are learned separately and incrementally. This technique is called chaining and is well-known in research fields like behavioral psychology and deep learning^45,46. Chaining involves teaching a series of behaviors in a specific sequence. Each behavior serves as a cue for the next one. After completing the first cycle of learning, a second behavioral change is built on top (chained), concluding in the fulfillment of a more complex task: The robotic system now faces cups of different temperatures (cold and hot), which are mirrored in their color: a cold cup is white, and a hot cup is dark. Introducing this new thermal stimulus, the (-)-branch connected to the related sensor signals (temperature and grayscale/color) is also active. In the initial state, the previously learned behavior is maintained (Fig. 3d and Movie S4). The (+)-branch (\({V}_{+}\) in blue) follows the adapted behavior from before. The (-)-branch yields a small voltage \({V}_{-}\) (in orange) and a peak reaction to the color of the dark (hot) cups. The probability output of the activation function is depicted as an orange hue in the background. Cold and hot cups are handed alternately. Initially, the robot again grabs the cup every time it comes close, disregarding the temperature or color (Fig. 3d, checkmarks) as it has learned to do previously. However, the new thermal stimulus induces a gate voltage at the second non-volatile device (\({V}_{G,{LT}}=\,\pm 0.5V\)), causing a change in voltage level \({V}_{-}\) and increasing the response in output voltage (peak height) towards a color stimulus. Like in the first training process, an association between the temperature and color is formed, resulting in an associative link (Fig. 3b and Movie S5). Color is thus coupled to temperature. After 4 training steps, the activation function with \(\sum V\) as input reaches a very high stimulus intensity (Fig. 2c, probability >100%) forcing a protective reaction of the robotic hand. It draws back and avoids the object. This overstimulation – noxious behavior – only occurs when a hot (and dark) cup is detected, highlighted in Figs. 3e and 3f in dark orange. This progresses our initial adaptation from respondent/Pavlovian learning to a more complex behavior of operant conditioning by learning from positive (pressure) and negative (temperature) consequences of different stimuli. At the end of the whole training process, by including both branches (\({V}_{+}\), \({V}_{-}\) and \(\sum V\)), the robotic system is able to distinguish between two types of cups, essentially classifying dangerous and non-dangerous objects. More specifically, by following and adapting to the dynamic cues of the environment, the robot learns to avoid potentially harmful objects like a hot cup while actively engaging with other safe objects. Figure 3f and Movie S6 present the final output signals and behavior. Both color and temperature sensors are more sensitive to positioning (seen as noisy signals in the measurements), demonstrating a high tolerance for stimulus variations in the learning scheme.