Skip to content

Latest commit

 

History

History
274 lines (184 loc) · 19.8 KB

File metadata and controls

274 lines (184 loc) · 19.8 KB

Testing vision-based off-road navigation with geographical hints

Summary

Company name Milrem Robotics
Project Manager Meelis Leib
Systems Architect Erik Ilbis
Company name Autonomous Driving Lab, Institute of Computer Science, University of Tartu
Team lead Tambet Matiisen
Data collection Kertu Toompea
Model training Romet Aidla
Robot integration Anish Shrestha
Map preparation Edgar Sepp

Objectives of the Demonstration Project

The goal of the project is to collect and validate dataset for vision-based off-road navigation with geographical hints.

Milrem UGV needs to be able to navigate:

  • in unstructured environment (no buildings, roads or other landmarks),
  • with passive sensors (using only camera and GNSS, active sensors make the UGV discoverable to the enemy),
  • with no prior map or with outdated map,
  • with unreliable satellite positioning signals.

System that satisfies the above goals was proposed in the ViKiNG paper by Dhruv Shah and Sergey Levine from University of California, Berkeley. The paper demonstrated vision-based kilometer-scale navigation with geographical hints in semi-structured urban environments, including parks. The goal of this project was to extend the ViKiNG solution to unstructured off-road environments, for example forests.

Examples of desired environment:

forest1 forest2 forest3

Activities and results of demonstration project

Challenge adressed

The goal of using passive sensors means that the camera is the primary sensor. The currently best known way to make sense of camera images is to use artificial neural networks. These networks need a lot of training data to work well. Therefore the main goal of this project was to collect and validate the data to train artificial neural networks for vision-based navigation.

We set ourselves a goal to collect 50 hours of data consisting of 150 km of trajectories. This was inspired by the ViKiNG paper having 42 hours of training data. Time-wise this goal was achieved, distance-wise 104 km was collected.

In addition to collecting the data we wanted to validate if it is usable for training the neural networks. We actually went further than that by not only training the networks, but also implementing a proof-of-concept navigation system on Jackal robot.

Jackal UGV

Data sources

The data was collected from April 12th till October 6th, 2023 from 27 orienteering events and 20 self-guided sessions around Tartu, Estonia. Details of the places and weather conditions can be found in this table.

Data collection was performed with golf trolley fitted with following sensors:

trolley1 trolley2 trolley3 trolley4

Four different types of data was collected:

  1. camera images,
  2. visual odometry (trajectories derived from camera movement),
  3. GPS trajectories,
  4. georeferenced maps.

Following types of maps were acquired and georeferenced:

Map type Example image
orienteering maps (usually from organizers, sometimes from Estonian O-Map) orienteering map
Estonian base map (from Estonian Land Board) Estonian base map
Estonian base map with elevation (from Estonian Land Board) Estonian base map with elevation
Estonian orthophoto (from Estonian Land Board) Estonian orthophoto
Google satellite photo (from Google Maps Static API) Google satellite photo
Google road map (from Google Maps Static API) Google road map
Google hybrid map (from Google Maps Static API) Google hybrid map

Further cleaning was applied to the data with following sections removed:

  • Missing odometry data
  • Big change in position: >1.0m per timestep
  • Low velocity: <0.05 m/s
  • High velocity: >2.5 m/s
  • Model prediction errors were analyzed
  • Bad trajectories
  • Missing or bad camera images

Altogether this resulted in 94.4 km of trajectories used for training.

In addition the dataset for local planner was combined with RECON dataset of 40 hours of autonomously collected trajectories.

Description of AI technology

The system makes use of two neural networks: local planner and global planner.

Local planner takes a camera image and predicts next waypoints, where the robot can drive without hitting obstacles.

Inputs to the model Outputs of the model
  • Current camera image
  • Past 5 camera images for context
  • Goal image
  • Trajectory of 5 waypoints
  • Temporal distance to the goal

The local planner is trained using camera images and visual odometry. The goal image was taken as an image from fixed timesteps from the future. Temporal distance to the goal represents the number of timesteps to the goal image.

Local planner

Global planner takes the waypoints proposed by the local planner and estimates which of them are likely on the path to the final goal.

Inputs to the model Outputs of the model
  • Overhead map
  • Current location
  • Goal location
  • Probabilities whether each map pixel is
    on the path from current location to goal

The global planner is trained using georeferenced maps and GPS trajectories - given two points on the trajectory, all points in-between were marked as high-probability points.

Global planner

These two models work in coordination to handle outdated maps and inaccurate GPS:

  • as long as the local planner proposes valid waypoints the robot never collides with obstacles,
  • as the global planner picks waypoints which are on the path to the final destination, it tends to move towards the final goal, even if the GPS positioning is wrong or the map is outdated.

Results of validation

Local planner

For local planner following network architectures were considered:

Model Pretrained weights Trained or finetuned On-policy tested Generative Waypoint proposal method
VAE - + + + Sampling from latent representation
GNM + + + - Cropping the current observation
ViNT + - + + Goal image diffusion
NoMaD + - - + Trajectory diffusion

VAE model was trained from scratch, all other models were used with pre-trained weights from Berkeley group. GNM model was additionally fine-tuned with our own dataset.

The models were tested both off-policy and on-policy. Off-policy means that the model was applied to recorded data, the model's actions were just visualized, but not actuated. On-policy means that the model’s actions were actually actuated on the robot.

For on-policy testing we recorded a fixed route, took goal images at fixed intervals and measured success rate in navigating to every goal image along the route. Basically it shows how well the model understands the direction of goal image and how well detect it can detect if the goal was reached. The operator intervened when the robot was going completely off the path and guided it back to the track. Sometimes the robot failed to detect the goal, but was driving in the right direction and successfully recognized the subsequent goal. Then the goal was not marked as achieved, but no intervention was necessary.

Off-policy results

The videos below show models applied to pre-recorded data. In the videos green trajectory represents ground truth, red trajectory represents goal-conditioned predicted trajectory (many in case of NoMaD), blue represents sampled possible trajectories (in case of VAE).

Model Video
VAE VAE
GNM finetuned GNM finetuned
ViNT ViNT
NoMaD with goal images at fixed intervals NoMaD goal
NoMaD with one fixed goal (exploratory mode) NoMaD explore
NoMaD orienteering NoMaD orienteering
On-policy results indoors

We recorded a fixed route in Delta office with goal images every 1 or 2 meters and measured the goal success rate for each interval.

Model Goal interval Number of goal images Number of interventions Success rate Video
GNM 1m 30 0 90.00 video
GNM finetuned 1m 30 1 93.33 video
ViNT 1m 30 2 96.67 video
GNM 2m 15 0 86.67 video
GNM finetuned 2m 15 0 93.33 video
ViNT 2m 15 0 93.33 video

Example video of top-performing model (GNM-finetuned) at 4X speed:

GNM finetuned indoors

On-policy results outdoors

We recorded a fixed route in Delta park with goal images every 2, 5 or 10 meters and measured the goal success rate for each interval.

Model Goal interval Number of goal images Number of interventions Success rate Video
GNM 2m 38 1 86.84 video
GNM finetuned 2m 38 0 81.58 video
GNM finetuned 5m 17 7 100 video
ViNT 5m 17 7 100 video
ViNT 10m 8 9 100 video

Example video of top-performing model (GNM-finetuned) at 4X speed:

GNM finetuned outdoors

Global planner

For global planner following network architectures were considered:

As the U-Net approach worked much better, the contrastive approach was abandoned. Most of the experimentation was done with the base map with elevation.

Following videos show on-policy simulation where the robot proposes a number of random waypoints and then moves towards the one that has the highest probability. Blue dot shows the robot current location and yellow dot is the goal location.

Location Video
Ihaste Ihaste
Kärgandi, sticking to the road Kärgandi
Annelinn, avoidance of houses Annelinn

Following videos show different behavior for different map modalities.

Location Video
Base map - sticking to the road Base map
Road map - going straight (not enough context) Road map
Orthophoto - mostly sticking to the road Orthophoto

Putting it all together

Following video shows off-policy evaluation of the whole system on a recorded session. Colored trajectories are produced with crops of the original camera image used as goal, as shown in the video. White trajectory comes from the final goal.

Delta park off-policy final

On-policy evaluation of the whole system was not possible due to some technical difficulties with the GNSS sensor and due to winter making the use of the models pointless, because they were mainly trained on summer data.

Technical architecture

For local planner following network architectures were tried:

For global planner following network architectures were tried:

Potential areas of use

The working solution could be used in any area that needs navigation in unstructured environment with poor GPS signal and outdated maps, for example:

  • military,
  • agriculture,
  • forestry,
  • rescue.

The dataset collected in this project can also be used to create a visual navigation benchmark and international robot orienteering competition. Such competition would make novel solutions and international talent accessible to Milrem Robotics.

Lessons learned

For training the local planner the dataset seemed insufficient or contained too simple trajectories (moving mostly forward). Even after combining our data with RECON dataset or fine-tuning existing models, the results were inconclusive - sometimes the fine-tuned model was performing better, sometimes worse than the original. The original general navigation models were also unreliable, they were not always able to avoid the obstacles. More work is needed to make visual navigation reliable.

Alternative model outputs could be considered, e.g. predicting free space instead of trajectories and proposing waypoints from that free space. Also collection of more explorative data directly with the robot might be necessary, as in the ViKiNG paper they used mainly automatically collected exploratory data (30 hours) and relatively few expert trajectories (12 hours). In our case all of the data was expert trajectories.

Global planner trained much better and was able to estimate reasonably well the recommended path between two points. We also observed different behavior for different map modalities, e.g. base map and road map. More work is needed to reduce the artifacts produced by the fully convolutional network and some map modalities might need further tuning.

Final takeaways:

  • Training neural networks in 2023 is still hard.
  • Dataset curation is non-trivial and less documented than model training.
  • Should use (or fine-tune) pre-trained models whenever available.
  • Off-policy performance (on recordings) does not match on-policy performance (on robot).

Description of User Interface

Delta park off-policy final

  • The screen shows current camera image and proposed trajectories. White trajectory represents the trajectory induced by the goal image at top right.
  • Bottom right shows the probability map (the path from current position to goal) and original map. Waypoint colors match the trajectory colors.
  • The left pane shows the robot command.