[3D Pose Estimation AI #3] synthetic dataset
|Blog Post Author||Kilian Mehringer|
|Job Description||3D Artist, Software Engineer|
|Co Authors||Sebastian Lack|
[3D Pose Estimation AI #3] Using Blender3D to generate synthetic machine learning datasets.
Blender3D for Deep Learning:
If we want to train a deep neural network to predict or classify something we need to feed and train the model we build with extremely large datasets. This datasets can contain numerical data or images collected over time.
For a simple classifier model you need a huge amount of labeled images for your dataset. Otherwise the model that was trained with it has no chance to find consistent patterns that it can recognize when it sees a new image.
For our 3d pose estimation problem its not just the amount of data we need, that makes it so hard to get our dataset. We need way more information about the image than just some classes. We need images which contain our object we want to get the Pose for and a lot of information. We need to know the exact position of the objects 3d boundingbox center in relation to our camera, thee bounding box in 2D pixel coordinates that outlines the edges of the object in our image as well as the rotation of the object in camera coordinates. And all of this parameters need to be consistent trough out the whole dataset.
To solve this problem we used Blender3d and its powerful path tracing engine cycles to generate our images and the corresponding informations we need to train our model.
Blender offers a python api that allowed me to code a script that automatically generates the data we want. The result will be a set of images and the matching positions and rotations of the object in relation to the camera and a bounding box that shows us where our object is located in the 2d pixel image.
The blender python script:
The script should generate images of an object with random backgrounds, random positions and rotations as well as random lighting effects if possible. The idea is to get images that look so photorealistic that a model trained with this images wouldnt perform much worse on real images. (SPOILER: This part of the project is way harder than we thought).
How does the script work. At first it generates random positions in a given range for our camera and the object itself. There is no need to randomize both of them but why not. After that our scene needs a random background. I used a set of pre selected HDRI maps for this. By selecting one of them and applying random scaling and rotation to it we get some pretty good random backgrounds. The huge benefit od HDRI Maps is that we can use them for lighting as well. So our backround and the illumination and reflections on our object generated by lighting are connected to the apperance of the background. This might be something that helps our model to perform nearly eaqualy on real images after training.
With this three randomized parts of the scene we have everything we needed to render our image. But before the script starts the rendering process it calculates all the data we need to train the model against. This data contains the position of the object in camera coordinates and the rotation as quaternion in relation to the camera as well. Adittionally it calculates the 2d pixel bounding box.
Here is one of the images that the script generated:
The data according to this image is saved into a simple txt file that maps every line to an image with the file name representing the line in this file.
The line for this image looks like this:
59843;l[2.797018527984619, -0.1947733461856842, -46.41618728637695];q[0.29057079553604126, 0.354412704706192, -0.1901087462902069, -0.868227481842041];b[203, 96, 64, 85]
Our final dataset contains 60.000 images of the ps4 controller model. The numerical data for this images was written into a data.txt file.
At the time i'm writing this post we already finished our project and discovered some problems with this dataset. The HDRI maps we used to generate our backgrounds are real images with camera noise, lower resolution and some other artifacts. The synthetic image of the 3d model we rendered on top of it on the other hand is completely noise free, high resolution and has some other artifacts due to the use of the denoising function in blender3d. the problem with this is, that our trained model over fits on this pattern of different artifact and noise structure in the object and the background area of the image. The only way to reduce this issue was to add noise and blur afterwards to all images after rendering. In this way we can somehow equalizes the noise and artifact pattern difference between background and object.
We had some other problems with input sizes and other image processing we needed to do on the entire dataset that we decided to write a new python module for that can be used to edit our dataset.
I will try to find a way to upload the dataset with this python module for everyone free to download and post it in a new blogpost with additional informations.
Working with blender and OpenGL:
One big problem i had to solve was not just generating this images and calculating all of the data according to them. The blender api is extremely nice to work with in this regard. Calculating model view matricies and getting quaternions out of them was fairly easy. The only thing that i had to write completely on my own without finding any useful blender api function that could help with that was calculating the 2D bounding box that the object falls into on the final image plane.
At the end the real problem was to use the generated data in blender as well as OpenGl. Why? Blender has a different coordinate system than OpneGL. So everything needs to be converted to the OpenGL coordinate system. I decided to export the data from blender in OpenGL coordinates rather than changing the OpenGL coordinates. This will allow everyone else who wants to use the dataset someday to easily work with it in OpenGL. The OpenGL renderer we wrote for this project will be part of another blogpost.
machine learning, blender3d, synthetic datasets, ML, tensorflow, computer vision, computer sience, yolov3, software engeneering, python, OpenCV, OpenGL, 6DOF, virtual realety, VR, robotics, robot vision