Image Segmentation Integrating Generative and Discriminative Methods

In this paper we present a Bayesian framework for segmenting images into their constituent visual patterns. The segmentation algorithm optimizes the posterior probability and outputs a scene representation as a hierarchical graph representation, in a spirit similar to stochastic grammars in natural language. This computational framework integrates two popular inference approaches-generative (top-down) methods and discriminative (bottom-up) methods. The former formulates the posterior probability in terms of generative models for images defined by likelihood functions and priors. The latter computes discriminative probabilities based on sequence of bottom-up tests/filters. The final results are validated in a Bayesian framework. Our experiments illustrate the advantages and importance of combining bottom-up and top-down models and of performing segmentation. The work can be used as a basis to design robust and effective computer vision systems which can be used, to assist the blind and visually impaired, for content based image retrieval and many other applications.


Ⅰ. Introduction
Image segmentation is a long standing problem in computer vision, and it is found to be difficult and challenging for two reasons.
The first challenge is the difficulty of modeling the vast amount of visual patterns that appear in generic images.The second challenge is the intrinsic ambiguities in image perception, especially when there is no specific task to guide the attention.Furthermore, an image often demonstrates details at multiple scales.
Therefore, it must be wrong to think that a segmentation algorithm outputs only one result.
It should output multiple distinct solutions dynamically so that solutions "best preserve" the intrinsic ambiguity.In our opinion, image segmentation should be considered a computing process not a vision task., In a Bayesian framework, we make the inference about W from I over a solution space Ω .
( ) ( ) ( ) These tests are selected from a dictionary of discriminative features dis Ψ .In correspondence to the generative dictionary gen Δ in eqn.(2.7), we denote it by

{ }
, : The bottom-up tests generate two types of hypotheses.q q e off f I = = for whether the two elements s, t belong to the same pattern.
Equivalently it is the probability ratio

Ⅴ. Experiments
The image segmentation algorithm is applied on a number of outdoor/indoor images.The Motivated by the above two observations, we present a stochastic computing method for image segmentation.We define image partition to be the task of decomposing an image I into its constituent visual patterns.The output is represented by a hierarchical graph.Firstly, we formulate the problem as Bayesian inference, and the solution space is decomposed into union of many subspaces of varying dimensions.The goal is to optimize the Bayesian posterior probability.Secondly, top-down generative models are used to describe how objects and generic region models (e.g.texture and shading) generate the image intensities.The goal of image partition is to invert this process and represent an input image by the parameters of the generative models that best describe it together with the boundaries of the regions and objects.Thirdly, in order to estimate these parameters we use bottom-up proposals, based on low-level cues, to guide the search through the parameter space.We test the algorithm on a wide variety of grey level and color images, and some results are shown in the paper.Ⅱ.The Bayesian formulation for segmentation and I Λ an image defined on Λ .image.The problem of image segmentation refers to partitioning the lattice into an unknown number of K disjoint regions.

1 ,
specifies the image generating processes from W to I, and the prior probability ( ) p W represents our prior knowledge of the world.The goal is to estimate the most probable interpretation of an input image I.This require computing the W * that maximized a posteriori probability over Ω , the solution space of W, Stochastic grammar of image One fundamental difficulty that we encounter in vision is to represent the enormous amount of visual knowledge needed for making robust inference from real world images.The origin of image grammar is that certain elements of the image tend to occur together more frequently than by chance.These elements are then composed recursively to form increasingly larger units which can share some "reusable" parts.Our production rules are graph operators and thus the image grammar is an attributed graph grammar.The graph grammar can be embedded in an And-Or graph representation ,where each Or-mode points to alternative choices of sub-configuration, and an And-node is decomposed into a number of parts.Each non-terminal node generates child nodes starting with the scene label at the root and proceeds to objects, object parts, and ends with pixels as the leaves (terminal nodes).This hierarchical representation includes a dictionary gen Δ of generative image features used in the generative models.A special choice of the Or-nodes produces a configuration.The virtue of the grammar lies in its expressive power of generating a very large set of configurations through a relatively much smaller vocabulary.

Figure 1
Figure 1 shows the grammar graph for an input image.Each node of the graph has an attribute variable for the labels and model parameters.The top node 0 is the scene label, and the nodes at the bottom are the image pixels.Three types of objects with different entropies are shown in nodes 1, 2, and 3.To formulate this representation, we denote

Figure 1 . 1 ( 5 ) 6 ) 7 )
Figure 1. the stochastic grammar for an input image On each non-terminal node (i).The what-is-what hypothesis for some node v in the partition graph W, the features used (in this paper we use Adaboost).(ii).The what-go-with-what hypotheses for some horizontal edge , e s t =< > in the partition graph, which are posterior probabilities (

(
measuring the dis-similarity between s, t.It has been proved that with sufficient number of tests of discriminative probabilities are then composed on-the-fly to generate hypotheses which are represented by importance proposal kernel, ' ' by a set A of reversible jumps, such as death-birth, spit-merge, model switching, etc. these jumps construct the partition graph and in combination they simulate an ergodic Markov chain search in the solution space of W. each type of jump a A ∈ kernels are "informed" by proposal kernels computed by discriminative method, and are realized by the Metropolis-Hastings method, Tst I p W I Q W W Tst I Q W W Tst I p W I  Hastings step compares the discriminative probability ratio with the true Bayesian posterior probability ratio, and can be considered as a probabilistic version of hypothesis-and-test.
speed in PCs is comparable to segmentation methods such as normalized cuts.It typically runs around 10-20 min.the main portion of the computing time is spent in segmenting the generic patterns and by boundary diffusion.

Figure 3 Figure 2
Figure 3 and 4 show some example.We present the results in two parts.One shows the segmentation boundaries for generic region and objects, and the other shows the labelmap for generic region and objects to indicate objects recognition.From the segmentation results we can see high-level knowledge helps segmentation to overcome problem of oversegmentation.