This robot can navigate while keeping its balance

Isaac Sim Robotics

Design and simulation of a two-wheeled inverted pendulum trained using reinforcement learning in NVIDIA Isaac Lab to maintain balance and navigate. This project is a collaboration with a master's candidate from the Institute of Smart Systems and Artificial Intelligence, Nazarbayev University.

The Problem

Mohammad Shoaib Babar is a master's candidate conducting research on enabling a two-wheeled inverted pendulum robot to balance and perform simple movements (such as moving forward and backward) using Isaac Sim and Isaac Lab. He needed guidance in building the training pipeline necessary for his project and graduation. Fortunately, I was able to significantly speed up his progress by helping him set up all the scripts related to simulation, training, and testing.

Since this work is part of an ongoing research project (at the time of writing this blog), I’m presenting a spin-off of our collaboration, showcasing results with my own two-wheeled inverted pendulum robot.

Using plain Deep Reinforcement Learning

The robot must learn to balance itself, orient toward a goal, and move without falling. Deep Reinforcement Learning (DRL) enables this by letting the robot learn from trial and error—similar to how a child learns through experiences.

For instance, imagine a child trying to avoid eating lunch and running away, only to get hit by a sandal thrown by their mom—a humorous but illustrative negative reward. The next time, the child eats the food and receives no negative response. This learning through rewards and punishments is how DRL functions.

Initially, I attempted to train the robot to do everything—balance, navigate, and stop at the goal—in a single session. However, this failed due to the limited learning capacity of the robot’s neural network.

Curriculum Learning

The solution was to divide the problem into simpler tasks and train the robot step by step—a technique called curriculum learning. The idea is to use the results of the first stage of training (e.g., balancing) to help with the next stage (e.g., navigating).

The robot’s sensors play a key role in this process. They provide feedback about the consequences of its actions, allowing us to guide its learning more effectively.

Each stage builds on the previous one. This allows the neural network to compete and prioritize among different behaviors, gradually solidifying the most useful ones.

Navigation

With the robot trained to move, the next challenge was navigating in realistic environments. For this, I created a virtual house using Unreal Engine 5, which integrates smoothly with IsaacSim.

IsaacSim provides tools for generating occupancy maps of environments. These maps are then used with the A* algorithm to generate paths.

The paths from A* are smoothed using Bezier curves and then damped to create goal points the robot can follow naturally, enabling it to move fluidly through complex environments.

Results

The robot can successfully move from one point to another without falling. More impressively, it can follow predefined paths through a house, navigating around furniture and tight corners.

This blog marks the first phase of a larger project. The next stage will involve transferring everything from simulation to a real robot—a challenging but exciting Sim2Real transition.