Friday Facts 16: UpDog (Unity+ML)

Unity has a Machine Learning component. Can I teach it to roll over and stand up?

My robots always start as a simulation.  Instead of hand-rolling a physics engine and a machine learning system I decided this time to try to teach a quadruped bot to a gait strategy (aka get up and walk) using Unity mlagents which use Tensorflow.  My scene and all related material is here:

What is Machine Learning?

Machine Learning is a magic box. On one side you feed it some inputs and then give/withold rewards until the other side produces the result you want. The first trick is to design the inputs well. The second is to design the reward well. There may be other tricks after that but I haven’t found them yet. Part of writing this is document what I’m trying. Maybe you have ideas about how to do it better? Talk with me.

The quadruped model

The root Environment lets me put everything in a Prefab which I can then instance to run many bots in parallel. Dog contains Torso which is the root of the skeleton. Each bone of the skeleton is an ArticulatedBody because they most closely resemble the behavior I’d expect to feed to my servos – set desired angle value, cross fingers. 

Each joint has one degree of rotational freedom and some reasonable limits.  I have no idea what joint stiffness would mirror real life.

Each joint contains one Model that has the mesh of the bot, the material, etc.

On the end of each leg is a foot which is ignored by physics. I paint them red when I detect contact with the terrain.

Behavior Parameters set the inputs and outputs of the network. Space Size is the number of inputs. Stacked Vectors stacks recent inputs so that the network has a kind of memory. Actions are the outputs, and Continuous Actions are decimal numbers usually in the range -1…1. Behavior Type is where you can set drive-by-ML or heuristic (aka drive it yourself).

Dog Controller is the script that is collects the Observations – it feeds the ML with sensory input (more on that later). Dog Controller also receives instructions from the network and then applies them – in this case, it sets the Drive Targets of each ArticulatedBody, same as I would do with a servo on a quadruped. Lastly, Dog Controller usually also gives a Reward to the network to teach it.

Decision Requester sets how often the network should make a decision. The value here is in Academy Steps, which isn’t Unity Update ticks, it isn’t frames, it isn’t FixedUpdate physics steps, it’s… something else.


At the start of each episode (learning cycle) I have to reset the position of the quadruped. I’ve found the easiest way is to make a preserve the original Torso and instead use a completely new clone of the entire Torso hierarchy.

Calculating Rewards

I have now experimented with many automatic styles of reward. Mostly my thinking is a variation on these themes:

  • The reward function first gives points for “uprightness” (read torso.transform.up.y approaching 1).
  • Add to that height off ground (max height 3.6, so height/3.6 will never be more than 1).
  • Add to that horizontal movement (maximum XZ speed of 10, then divided by 10, never more than 1).

These three elements are further scaled by a values that can be tweaked from Dog Controller.

I have also tried a manual voting style where the user (me) can click on bots that appear to be doing well to feed them a reward. I also added a button to end all episodes and start over, in case they’re all terrible. Machine Learning Eugenics :S

Results so far

nine bots running in parallel while my CPU gently weeps

So far it has managed to roll over a few times but never figured out how to stand up. It doesn’t even stay rolled over.

Things I know I don’t know


I have no idea if my network is shaped/configured well. Currently I’m using three layers with 128 neurons each.

I have no idea when rewards are applied to the network, so I’m not sure when is best to manually feed them. I don’t know if I should reward continuously while it is running or do as in the codemonkey example (reward and immediately reset).

ArticulatedBody is not great to work with.  GetJointPositions is in radians and SetJointTargets is in degrees? Physically I have no idea what stiffness to set for ArticulatedBody joints. Too high and they can launch the quad off the ground, too low and they don’t have the strength to roll the dog over.

Setting a good Joint Speed in the Dog Controller is also part of the problem. High stiffness and low joint speed (under 0.5 degree per step)

Why do I have to restart tensorboard results graphing script every time I restart the mlagent-learn training tool? my parameters haven’t changed, only the results that are being graphed.

Final thoughts

I’d really like to talk to some passionate and experienced people about training this successfully. You can find me on Discord or Github all the time. If you have specific comments or questions about this article, let me know and I’ll try to address them in an edit.


Friday Facts 15: Unity 2021.3 tips

Unity game development engine can take some getting used to, no matter your background. Here are some tips and tricks I’ve learned recently that might help you. Documenting them will help me later when I google them and find my own article!

Separate UI and game world mouse clicks

I’ll describe the problem and then the solution. I have UI elements built with UIToolkit on top of the things in my Scene and I don’t want a click on the UI to also click through to whatever is behind. I would be weird to click “sign peace treaty” and unintentionally order troops to attack the town in the same move, right?

Here’s a sample UI in a 2D world. I’m making a game based on Populous, RimWorld and Prison Architect. Little people moving around doing their best and I sometimes nudge them.

In the UIToolkit elements are placed with the same rules and webpages with CSS. Here’s the UIToolkit view of my HUD. I’ve highlighted the #Background element, which says all child elements are aligned to the bottom – that’s how I get the buttons way down there.

Elsewhere, with no obvious connection in the system, there’s PlayerControls, an Input Action Asset – a map between human input devices and actions in game. In has an event called Click mapped to things like the game controller X button and the mouse left button. Still further away in my GameController I have the same PlayerControls as a Component where the Click event calls GameController.OnLeftClick.

The code for GameController.OnLeftClick is a stub for now.

// only get here if the click is NOT on the UI, please.
public void OnLeftClick() {

After many hours of searching I found a way to detect when the cursor is over a UI element. It is EventSystem.current.IsPointerOverGameObject(). Unfortunately you can’t put this in an InputAction event or you’ll get a nasty warning message.

Calling IsPointerOverGameObject() from within event processing (such as from InputAction callbacks) will not work as expected; it will query UI state from the last frame
UnityEngine.EventSystems.EventSystem:IsPointerOverGameObject ()

This fix is to put the code is in GameController.Update() and use it to disable the InputEvents when appropriate.

// Update is called once per frame
void Update() {

private void DoNotClickUIAndGameAtSameTime() {
    PlayerInput inputSystem = GetComponent<PlayerInput>();
    if (EventSystem.current.IsPointerOverGameObject()) {
    } else {

I’m a big fan of small methods that do one job each. I hope you are too! It’s much easier to debug.

Ok, so this completely disables all game input when a cursor is over the UI elements. New problem! Remember that #Background element? It fills the entire screen. Everything is UI! Fortunately the fix is easy.

Setting the background elements to Picking Mode: Ignore will let your mouse pass through and poke at your game. Nice!

Oof, this is already so long I’ll save my next tip for a future post.


Painting a Program (instead of writing)

Flow-based programming is a way of writing programs by drawing pictures of graphs. It’s a very intuitive and friendly way of working. Today I’d like to talk about my adventures exploring this space. I hope that you’ve had similar ideas and would like to join me on this journey.

If you’ve learned to code a little bit, you’re probably familiar with code that looks like this, a method that adds two numbers together:

class Math {
   * Returns a+b
   * @returns a+b
  double add(double a, double b) {
    return a+b;

If you play a lot of video games you might have seen designs like this assembler combining two inputs to produce an output in Factorio:

Adding two things together in Factorio

And people born after 2000 are probably familiar with the Scratch programming language from MIT:

Adding two things together in Scratch 3

I contend that there is a level of abstraction where these are all the same thing. There is a magic box that does a job, some connection to inputs, some connection where to send the results, and visual indicators showing the state of the various elements. In Flow-Based Programming (FBP), Packages of data flow through Nodes. You may have also seen similar models in Unity, Blender, Nuke, NodeRed, and other editing programs. The idea has been kicking around since the 1960s.

Philosophically, I feel that a good use of a maker’s time is to make making easier. The benefits compound! One of the ways FBP makes programming easier is that there are no syntax errors, only logical errors. I can’t forget a pair of braces or a semi-colon. Also values are visible all the time, so no digging into the debugger to find the element I want… the list goes on. Strangely, we still write code as long form text in drab color documents with weird ideas like ‘linking’ and ‘package management’ that should have died in the Cambrian era. Point being I’m highly motivated to use a FBP system.

The system has to be like all modern editors: there has to be a way to extend or plug in or mod new features. It must be easy for everyone to extend in a collaborative way – not just jailed to Unity or Blender, but some where all nodes of all types thrive together. It’s got to be as beautiful and intuitive (and fun) as Factorio. To this end I started an editor called Donatello.

(1) The very first image of Donatello

I built a simple Graph that has Nodes linked through their ReceivingDock and ShippingDock by Connections. All data in the system is stored in the Docks. When a ShippingDock is changed it is flagged at dirty. After all nodes are processed the Graph looks for dirty ShippingDocks and copies their values to connected ReceivingDocks, regardless of whatever may be there. When a ReceivingDock is dirty it causes the Node to run on the next cycle. Dirty Docks are marked with red text. After a few days it started to look not awful:

(2) A machine that counts up forever. It can be done with only one Node!

To repeat: Graphs have Connections between Nodes. Nodes have AbstractDocks, some of which are ShippingDocks and some are ReceivingDocks. When a node runs, it might poke the ShippingDock. AbstractDocks might also have one ConnectionPoint each that make it easier to add and remove Connections.

When you want to make a new Node, you have to pick a name, declare the Docks, and write the update() method to read the inputs, do a thing, and probably poke an output.

Part of the motivation comes from making line art for plotter robots like Makelangelo 5. Having written several picture-to-art projects I started to notice similarities. I thought “Wouldn’t it be great if these similar bits were LEGO blocks and anyone could play with them?”

So far I had

  • A core NodeGraphCore library defining Graph/Node/Dock/Connection and a Service that plugins can use to advertise their Nodes.
  • A GUI built using NodeGraphCore called Donatello. It uses Java ServiceLoader to find valid plugins and load the Nodes.
  • A set of Nodes in Makelangelo-software to give me access to all the line art manipulation tools. I put Makelangelo-software in the ~/Donatello/extensions/ folder and then Donatello can do stuff with lines and art.

I put them all together to make some art and test the practical limits of big graphs. Note in the video (3) I am tweaking the program while it is running, tho I could pause it at any time.

(3) Making art in Donatello with nodes from Makelangelo-software

What worked

Making new nodes is so easy, as is registering them with a NodeFactory so they show up in the GUI. Using the mouse to move items around, draw and erase connections, and alter behavior was also very intuitive. It was easy to save graphs, load graphs, and step through the program. I have built unit tests without visuals. I can copy/paste/delete/edit nodes and their connections; I can select two nodes far apart and run a tool that selects everything between them in the shortest down-stream path. Undo/Redo work great. I can graph a group of Nodes and Fold them down to a single Node. Put another way, a Node might represent an entire Graph. Folded Nodes can be Unfolded and should have the ability to be saved to a separate file, but it hasn’t been a priority.

What needs work

My first way of drawing connections was straight lines. The second implementation of connections uses Bezier curves. It’s okay…? but now I dream of a third way with right angle lines like I’m used to seeing in a circuit diagram or Factorio.

(4) KiCAD circuit diagram

Also like both circuits and Factorio it would be great if I could rotate Nodes for more intuitive graph layouts.

There’s little visual indication of what connection point can legally attach to another connection point. At least there is some checking! In circuits and Factorio you can connect anything to anything and no one will stop you. Will it run? No. but it also won’t complain, either.

I cannot yet easily make blueprints/templates of popular design patterns as in fig 5. I bet as program grow a lot of patterns will emerge and it would be great to have those ready-to-go.

There’s no reason Nodes need to be drawn in 2D. They could be done in 3D just as easily. This is neither good nor bad and I’m sure there’s interesting tradeoffs. I’m not sure I’d want to work with a VR helmet on 8h a day.

Updates to NodeGraphCore mean Makelangelo-software HAS to be rebuilt or the provided Nodes will not load in updated Donatello. Put another way, the version of all three has to match all the time. This is kinda sucky but there’s more! When a plug-in is out of date Java does not give a clear warning – it silently refuses to load and the user is left wondering what went wrong.

What doesn’t work

In classical FBP connections are more like the Factorio conveyor belts: packets of data queue on the connection until the FBP Node (assembling machine) is ready to process them. The Nodes don’t hold on to anything, ever. Resetting the program is as easy as cleaning out all the connections. This is hard in Donatello because there is no distinction between a Node’s current value and its default value. I get around it by saving the state of the graph in the starting state and ONLY the starting state. Yes, I have saved the graph in the wrong state and had to clean it up.

In Donatello nodes contain all data and immediately transmit the results down-stream when they finish a calculation. Also once any input to a node has changed that node will run on the next cycle. This was easy to implement but it has drawbacks later. In a big graph some parts run faster than others. So if a group of nodes A runs twice as fast as some other group B then Add might run A1+0, A1+B1, A2+B1, A2+B2, and so on. You can see it below (8) when TransformTurtle and AddTurtle create unintended noise in the results.

Partial solutions

If the system is set up to run like Factorio or FBP then there has to be one input on each Connection before the TransformTurtle will run. Another way would be an activation trigger – most inputs are optional and nothing runs until data arrives on the trigger input.

I’ve tried twice now to rebuild the Nodes and Connections to work more like classic FBP. There are no more isDirty flags, which means every node has its own “should I run now” test at the start of update() and every Node has to be called every step. This lets me have activation triggers.

The real problem starts when I want to initialize the graph. I need a way to put at least a number or a string of text into Connections so that they flow into the system and make everything run. I can hard-code that kind of thing easily but how do I put it in the GUI? It can’t be a Node because Nodes don’t hold data any more! It’s some kind of new class or interface … maybe Emitter or Entrance or Source. In the model above (7) I had nodes called LoadString and LoadNumber that served well for setting an initial value.

(10) Folding with Loaders

Do all Loaders become ReceivingDock and all… Unloaders? become ShippingDock when Nodes are Folded? Will there be Loaders that should not become ReceivingDock – internal constants?

If a single Loader is supposed to deliver to many Nodes, what then? I’m not going to make many identical Loaders! But I also run into problems when a single Loader is connected to many ReceivingDocks. Data types in Java cannot be cloned (copied) easily. Data types can’t be serialized easy, either. (another way to copy – save and load it into a second instance.) One workaround is to put the same instance of Packet into every Connection. As long as the Nodes treat the Packet as read-only then everything will work…

I feel inspired by Simon Lague’s Digital Logic Sim to put a fence around the entire Graph. The fence is the Node that contains this Graph. Only Loaders and Unloaders that touch the fence are exposed when Folded.

Final thoughts

I’m far from done thinking about this and I’d love to hear your ideas and see your solutions. Find me on Discord and let’s talk about it.