Semantic Hierarchy Emerges in Deep Generative Representations for Scene Synthesis

Ceyuan Yang*, Yujun Shen*, Bolei Zhou

The Chinese University of Hong Kong

[Paper] [Code]



Overview

In this work, we show that highly-structured semantic hierarchy emerges from the generative representations as the variation factors for synthesizing scenes. By probing the layer-wise representations with a broad set of visual concepts at different abstraction levels, we are able to quantify the causality between the activations and the semantics occurring in the output image. The qualitative and quantitative results suggest that the generative representations learned by GAN are specialized to synthesize different hierarchical semantics: the early layers tend to determine the spatial layout and configuration, the middle layers control the categorical objects, and the later layers finally render the scene attributes as well as color scheme.


Results

Identifying such a set of manipulatable latent variation factors facilitates semantic scene manipulation.


Check more results of various scenes in the following video.



Reference

@article{yang2019semantic,
    title = {Semantic Hierarchy Emerges in Deep Generative Representations for Scene Synthesis},
    author = {Ceyuan Yang, Yujun Shen, Bolei Zhou},
    journal = {arXiv preprint arXiv:1911.09267},
    year = {2019}
}

Other related work:

B.Zhou, A.Khosla, A.Lapedriza, A.Oliva, and A.Torralba. Object Detectors Emerge in Deep Scene CNNs. ICLR, 2015.
Comment: Employs AMT workers to observe emergent interpretable object detectors inside the CNN trained for classifying scenes.

D.Bau, JY.Zhu, H.Strobelt, B.Zhou, JB.Tenenbaum, WT.Freeman, A.Torralba. GAN Dissection: Visualizing and Understanding Generative Adversarial Networks. ICLR, 2019.
Comment: Investigates the internals of a GAN, and shows how neurons can be directly manipulated to change the behavior of a generator.

D.Bau, H.Strobelt, W.Peebles, J.Wulff, B.Zhou, JY.Zhu, A.Torralba. Semantic Photo Manipulation with a Generative Image Prior. SIGGRAPH 2019.
Comment: Applies GAN dissection to the manipulation of user-provided real photographs.

L.Goetschalckx, A.Andonian, A.Oliva, P.Isola. GANalyze: Toward Visual Definitions of Cognitive Image Properties. ICCV, 2019.
Comment: Navigates the manifold in the latent space to make images more or less memorable.

Y.Shen, J.Gu, X.Tang, B.Zhou. Interpreting Latent Space of GANs for Semantic Face Editing.
Comment: Proposes a technique for semantic face editing in latent space.

A.Jahanian, L.Chai, P.Isola. On the "steerability" of generative adversarial networks.
Comment: Shifts the distribution by "steering" the latent code to change camera motion and image color tone.