Information-Theoretic Segmentation by Inpainting Error Maximization

Pedro Savarese1   Sunnie S. Y. Kim2   Michael Maire3   Greg Shakhnarovich1   David McAllester1
1Toyota Technological Institute at Chicago     2Princeton University    3University of Chicago

Illustration of our Inpainting Error Maximization (IEM) framework for completely unsupervised segmentation, applied to flowers, birds, and cars. Segmentation masks maximize the error of inpainting foreground given background and vice-versa.




We study image segmentation from an information-theoretic perspective, proposing a novel adversarial method that performs unsupervised segmentation by partitioning images into maximally independent sets. More specifically, we group image pixels into foreground and background, with the goal of minimizing predictability of one set from the other. An easily computed loss drives a greedy search process to maximize inpainting error over these partitions. Our method does not involve training deep networks, is computationally cheap, class-agnostic, and even applicable in isolation to a single unlabeled image. Experiments demonstrate that it achieves a new state-of-the-art in unsupervised segmentation quality, while being substantially faster and more general than competing approaches.

Information-theoretic segmentation

Our Inpainting Error Maximization (IEM) framework is motivated by the intuition that a segmentation into objects minimizes the mutual information between the pixels in the segments and hence makes inpainting of one segment given the others difficult. This gives a natural adversarial objective where a segmenter tries to maxmize, while an inpainter tries to minimize, inpainting error. However, rather than adopt an adversarial training objective we found it more effective to fix a basic inpainter and directly maximize inpainting error through a form of gradient descent on the segmentation. IEM is learning-free and can be applied directly to any image in any domain.
Given an unlabeled image X, a mask generator module first produces segmentation masks (e.g., foreground M and background M). Each mask selects a subset of pixels from the original image by performing an element-wise product between the mask and the image, hence partitioning the image into regions. Inpainting modules try to reconstruct each region given all others in the partition, and the IEM loss is defined by a weighted sum of inpainting errors.
Foreground and background inpainting results using ground-truth, eroded (smaller), and dilated (bigger) masks. We see that the ground-truth mask incurs high inpainting error for both the foreground and the background, while the eroded mask allows reasonable inpainting of the foreground and the dilated mask allows reasonable inpainting of the background. Hence we expect IEM, which maximizes the inpainting error of each partition given the others, to yield a segmentation mask close to the ground-truth.


Unsupervised segmentation results on Flowers, CUB-200-2011, and LSUN Car. Segmentation masks used for evaluation are publicly available ground-truth (Flowers, CUB-200-2011) or automatically generated with Mask R-CNN (LSUN Car).

Related Work

GrabCut - Interactive Foreground Extraction using Iterated Graph Cuts. Carsten Rother, Vladimir Kolmogorov, and Andrew Blake. SIGGRAPH 2004.
Comment: A classic weakly-supervised segmentation algorithm. We compare our IEM results to results from this method.
Unsupervised Object Segmentation by Redrawing. Mickaël Chen, Thierry Artières, and Ludovic Denoyer. NeurIPS 2019.
Comment: A GAN-based unsupervised segmentation method. We compare our IEM results to results from this method.
Emergence of Object Segmentation in Perturbed Generative Models. Adam Bielski and Paolo Favaro. NeurIPS 2019.
Comment: A GAN-based unsupervised segmentation method. We compare our IEM results to results from this method.
Big GANs Are Watching You: Towards Unsupervised Object Segmentation with Off-the-Shelf Generative Models. Andrey Voynov, Stanislav Morozov, and Artem Babenko. arXiv 2020.
Comment: This work examines the latent space of an off-the-shelf GAN (BigBiGAN), obtain saliency masks of synthetic images via latent space manipulations, and using these masks, trains a segmentation model with supervision. We don't compare IEM to it as it involves manual examination of latent space manipulation directions.
OneGAN: Simultaneous Unsupervised Learning of Conditional Image Generation, Foreground Segmentation, and Fine-Grained Clustering. Yaniv Benny and Lior Wolf. ECCV 2020.
Comment: OneGAN, composed of multiple encoders, generators, and discriminators, solves several tasks simultaneously, including foreground segmentation. However, we don't compare IEM to it as it utilizes weak supervision from class labels and clean background images.
Time-Supervised Primary Object Segmentation. Yanchao Yang, Brian Lai, and Stefano Soatto. arXiv 2020.
Comment: This work segments objects in video by minimizing the mutual information between motion field partitions, which they approximate with an adversarial inpainting network. We likewise focus on inpainting objectives, but in a manner not anchored to trained adversaries and not reliant on video dynamics.
Inpainting Networks Learn to Separate Cells in Microscopy Images. Steffen Wolf, Fred Hamprecht and Jan Funke. BMVC 2020.
Comment: This work segment cells in microscopy images by minimizing a measure of information gain between partitions. While philosophically aligned in terms of objective, our optimization strategy and algorithm differs from this work.


Pedro Savarese (