Image Matting using Machine Learning

Image matting is an interesting problem in computer vision that deals with identifying the foreground and background components of an image. While this may seem a fairly easy task, sometimes even experienced graphics professional find it difficult.

Lets see the very famous example of dandelion image.

Our task is to separate the background (green) and the foreground (flower).

I am sure you must have noticed these:

  1. The edges of the flower are not geometrical. You can’t draw a line or something to divide.
  2. No matter how much you zoom you are more than likely to screw up with fine areas.
  3. Due to the nature of light and our vision, some edges can’t be divided into foreground and background completely.  They are like 20% background and 80% foreground.

This is a very important problem because most of the Hollywood movies are shot in green background. During post production someone needs to get rid of the green background and replace it with something which aligns with the movie theme.

Do you know there is a website called http://www.alphamatting.com/ where you can download a trial dataset and evaluate your matting algorithm.  You will find a number of papers and statistics on different approaches to matting.

Typically you would take the input image and generate an alpha mat, which indicates the probability of being a foreground. An example from the alpha matting site is this picture.

 

The alpha mat for the above image would be

 

 

Our goal today is to try and generate a similar Mat for our Dandelion image.

Lets formulate the problem

For each pixel in a given image, identify the probability that a pixel is foreground or background

Now, lets get back to figuring out a simple solution using Machine Learning. We will solve this problem using a Linear Regression first and then using a Neural Network.

To get started, we will use something called a Trimap. A Trimap is a kind of one-time input from the user that highlights pixels that are definitely foreground and those that are definitely background. These don’t need to be large, even 2-5% of total pixels is pretty good for working out a solution.

For our Dandelion example, the Trimap would be

If you look at the image, you will see that we have indicated foregrounds with “white” and definitely background with “black”.

Now given this input our problem statement evolves to

For each pixel in a given image, identify the probability that a pixel is foreground or background on the basis of a small training set of mappings

 

Mathematically,

N = total number of pixels in the image

M = No of pixels that have been marked foreground/background in tripmap

H = height of Image

W = width of the image

I(row, column) – the RGB value (0 – 255) of the pixel at position (row, column)

 

X = set of input pixels that have been marked as foreground/background (size of  X is M and M<<N)

Y = Background foreground mapping 0.0<= Y <=1.0,  where 1 indicates definitely foreground and 0 indicates definitely background

 

Now, using our knowledge of Linear Regression, we can define a set of weights W such that

 

Y  = W.X

 

Then,

W = inverse (X). Y

 

Such that later when we have unmarked pixels X’ we can use W to determine their foreground/background probability using

 

Y’  = W.X’

 

Doable right.  Lets give it a shot.

Lets examine what will a row in the matrix X be.  It will have 3 columns each representing each channel (red, green, blue) in an image i.e.

[200, 100, 115]

Now, just these three values do not provide us enough width to come up with an intelligent set of weights.

If we assume spatial coherence, i.e. every pixel’s value is dependent on the pixels surrounding it then our input data can include the surrounding pixels as well.

We will include the RGB values of pixels around current pixel as part of the input data. For example, if the pixel intensities were

100,150,120

100,150,100

110,140,90

100,150,90

90,100,120

90,102,120

110,150,90

90.100,110

100,100,110

 

 

Then our input X will be

[100,150,120,100,150,100,110,140,90,100,150,90,90, 100,120,90,102,120,110,150,90,90.100,110,100,100,110]

 

The actual order of these columns does not matter however it is custom to go from left to right and top to bottom.

We will also add another column with a fixed value “1” which will act as our bias.

This makes one row of input X as

[100,150,120,100,150,100,110,140,90,100,150,90,90, 100,120,90,102,120,110,150,90,90.100,110,100,100,110, 1]

We will do this for the entire set of pixels and get a matrix X.  Similarly we will get Y which will have 1 in case of white markings in Trimap and 0 in case of black markings.

Now, it’s easy. You can implement a program to do this in PHP, Java or C.

I am using Java Jblas with OpenCV. I will load the image into a DoubleMatrix  X and alpha values in a DoubleMatrix Y.

The Weights would then be

DoubleMatrix weights = Solve.pinv(X).mmul(Y)

Now that we have weights, we will go through the X process once again but collect only X values this time, ignoring the Y values.  Note this needs to be done now for entire image and not just the alpha mappings.

The predicted values now become

DoubleMatrix YHat = XHat.mmul(weights)

That will give the intensity for the entire image. Converting this matrix into an Image, we get a very nice alpha mat as

 

 

Pretty impressive right. If you look closely, some of the areas are too difficult to mark as foreground and background for a human but come out very well with machine learning.

Is this perfect, not really. Primary reason being that our approach assumes that the problem is a linear one, which In my opinion is not. Since our approach uses a linear approximation, the output is fairly well in most of the cases but not perfect.

Now. We will go deeper. i.e. we will implement this same algorithm using a Neural Network which uses Non linear approach to solve this.

While details of a Neural network are beyond the scope of this post, for the given problem I used a custom made Feed Forward Neural Network with the following parameters

No of Layers

2

Neurons in each Layer

Width * height * channels * 2

(from the image)

 

Loss Function

Squared loss function

Activation function

TANH

Optimization

Gradient Descent with ADAM

Max Iterations

10000

Regularization

1e-2

Step Update

1e-3

 

You can use any open source or custom neural network for this as long as it supports regression and squared loss.

The Training took about a minute or so on my Mac. The result however was astonishing

 

 

You will notice that a large number of strands that were not visible earlier are now coming clear.

This is just one of the several approaches you will find on alpha matting site.

Hope you liked it. Comments and suggestions are welcome.