Image matting is an interesting problem in computer vision that deals with identifying the foreground and background components of an image. While this may seem a fairly easy task, sometimes even experienced graphics professional find it difficult.
Lets see the very famous example of dandelion image.
Our task is to separate the background (green) and the foreground (flower).
I am sure you must have noticed these:
- The edges of the flower are not geometrical. You can’t draw a line or something to divide.
- No matter how much you zoom you are more than likely to screw up with fine areas.
- Due to the nature of light and our vision, some edges can’t be divided into foreground and background completely. They are like 20% background and 80% foreground.
This is a very important problem because most of the Hollywood movies are shot in green background. During post production someone needs to get rid of the green background and replace it with something which aligns with the movie theme.
Do you know there is a website called http://www.alphamatting.com/ where you can download a trial dataset and evaluate your matting algorithm. You will find a number of papers and statistics on different approaches to matting.
Typically you would take the input image and generate an alpha mat, which indicates the probability of being a foreground. An example from the alpha matting site is this picture.
The alpha mat for the above image would be
Our goal today is to try and generate a similar Mat for our Dandelion image.
Lets formulate the problem
For each pixel in a given image, identify the probability that a pixel is foreground or background
Now, lets get back to figuring out a simple solution using Machine Learning. We will solve this problem using a Linear Regression first and then using a Neural Network.
To get started, we will use something called a Trimap. A Trimap is a kind of one-time input from the user that highlights pixels that are definitely foreground and those that are definitely background. These don’t need to be large, even 2-5% of total pixels is pretty good for working out a solution.
For our Dandelion example, the Trimap would be
If you look at the image, you will see that we have indicated foregrounds with “white” and definitely background with “black”.
Now given this input our problem statement evolves to
For each pixel in a given image, identify the probability that a pixel is foreground or background on the basis of a small training set of mappings
Mathematically,
N = total number of pixels in the image
M = No of pixels that have been marked foreground/background in tripmap
H = height of Image
W = width of the image
I(row, column) – the RGB value (0 – 255) of the pixel at position (row, column)
X = set of input pixels that have been marked as foreground/background (size of X is M and M<<N)
Y = Background foreground mapping 0.0<= Y <=1.0, where 1 indicates definitely foreground and 0 indicates definitely background
Now, using our knowledge of Linear Regression, we can define a set of weights W such that
Y = W.X
Then,
W = inverse (X). Y
Such that later when we have unmarked pixels X’ we can use W to determine their foreground/background probability using
Y’ = W.X’
Doable right. Lets give it a shot.
Lets examine what will a row in the matrix X be. It will have 3 columns each representing each channel (red, green, blue) in an image i.e.
[200, 100, 115]
Now, just these three values do not provide us enough width to come up with an intelligent set of weights.
If we assume spatial coherence, i.e. every pixel’s value is dependent on the pixels surrounding it then our input data can include the surrounding pixels as well.
We will include the RGB values of pixels around current pixel as part of the input data. For example, if the pixel intensities were
100,150,120
100,150,100
110,140,90
100,150,90
90,100,120
90,102,120
110,150,90
90.100,110
100,100,110
Then our input X will be
[100,150,120,100,150,100,110,140,90,100,150,90,90, 100,120,90,102,120,110,150,90,90.100,110,100,100,110]
The actual order of these columns does not matter however it is custom to go from left to right and top to bottom.
We will also add another column with a fixed value “1” which will act as our bias.
This makes one row of input X as
[100,150,120,100,150,100,110,140,90,100,150,90,90, 100,120,90,102,120,110,150,90,90.100,110,100,100,110, 1]
We will do this for the entire set of pixels and get a matrix X. Similarly we will get Y which will have 1 in case of white markings in Trimap and 0 in case of black markings.
Now, it’s easy. You can implement a program to do this in PHP, Java or C.
I am using Java Jblas with OpenCV. I will load the image into a DoubleMatrix X and alpha values in a DoubleMatrix Y.
The Weights would then be
DoubleMatrix weights = Solve.pinv(X).mmul(Y)
Now that we have weights, we will go through the X process once again but collect only X values this time, ignoring the Y values. Note this needs to be done now for entire image and not just the alpha mappings.
The predicted values now become
DoubleMatrix YHat = XHat.mmul(weights)
That will give the intensity for the entire image. Converting this matrix into an Image, we get a very nice alpha mat as
Pretty impressive right. If you look closely, some of the areas are too difficult to mark as foreground and background for a human but come out very well with machine learning.
Is this perfect, not really. Primary reason being that our approach assumes that the problem is a linear one, which In my opinion is not. Since our approach uses a linear approximation, the output is fairly well in most of the cases but not perfect.
Now. We will go deeper. i.e. we will implement this same algorithm using a Neural Network which uses Non linear approach to solve this.
While details of a Neural network are beyond the scope of this post, for the given problem I used a custom made Feed Forward Neural Network with the following parameters
No of Layers |
2 |
Neurons in each Layer |
Width * height * channels * 2 (from the image)
|
Loss Function |
Squared loss function |
Activation function |
TANH |
Optimization |
Gradient Descent with ADAM |
Max Iterations |
10000 |
Regularization |
1e-2 |
Step Update |
1e-3 |
You can use any open source or custom neural network for this as long as it supports regression and squared loss.
The Training took about a minute or so on my Mac. The result however was astonishing
You will notice that a large number of strands that were not visible earlier are now coming clear.
This is just one of the several approaches you will find on alpha matting site.
Hope you liked it. Comments and suggestions are welcome.