KCCV 2019 Tutorial

Richard Hartley

Richard Hartley

Tutorial Title: Topics in data analytics.
I will consider some techniques in data analytics, topics lying on the boundary of machine learning and algorithmic data analysis.  These will (I hope) give some fresh views on subjects relating to Support Vector Machines, Manifold learning, spectral clustering and dimensionality reduction.  The tutorial will be a quick survey and selection of topics used for a 12 hour course in data analytics, covered in 120 pages of notes, which can be distributed.

The talk will cover some recent research in some mathematical aspects of non-learning computer vision, including recent work on event cameras, as well as some recent thoughts on geometric neural networks.  


Michael Brown

Michael S. Brown

Tutorial Title:  Understanding Color for Computer Vision

Color is a topic that is often not well understood in computer vision. 
We often assume that the red, green, and blue (i.e. RGB) values in an image represent the same information even when captured from different cameras and sensors. Many cameras now support multiple image formats, for example, Adobe RGB, standard RGB, and raw RGB.  In addition, computer vision researchers often use alternative color spaces such as L*ab, YUV, and HSV without considering what these color spaces actually represent.  This tutorial provides an overview of color for computer vision and aims to clarify how color is processed onboard a camera and 
the differences in the various color spaces commonly used in computer vision algorithms.

Invited Talk Title: Rethinking the Camera Pipeline to Improve Photographic and Scientific Applications 

The in-camera processing pipeline used in modern consumer cameras (e.g., DSLR and smartphone cameras) is essentially the same design that was used in the early digital consumer cameras from the 1990s.   While this original design has stood the test of time, there are many areas in which the current pipeline can be improved.  Moreover, with the integration of cameras in our smartphones, there are increasing demands to have our cameras operate as scientific imaging devices instead of photographic devices.  This talk will describe several recent works aimed at addressing limitations in the current camera pipeline and ideas for designing a dual-propose pipeline suitable for both photographic and scientific applications 


Yejin Choi

Yejin Choi

Tutorial Title: From Recognition to Cognition with Machine Commonsense

With one glance at an image, people can easily grasp the situation captured in the image and reason about the context beyond what is immediately visible. While easy for humans, this task is tremendously difficult for today’s vision systems, as the task requires moving from the recognition-level understanding of images to the cognition-level, which in turn requires commonsense reasoning about the world.

In the first part of the talk, I will introduce Visual Commonsense Reasoning (VCR), a new task and dataset that consists of 290k QA problems that require visual and textual understanding of a situation with commonsense reasoning: e.g., what may be the goals and mental states of people in the scene, and what might happen before and after the scene. I will then present a new reasoning engine, Recognition to Cognition Networks (R2C) that model the layered inferences that are necessary for VCR: grounding, contextualization, and reasoning.

In the second part of the talk, I will introduce ATOMIC: an atlas of machine commonsense organized as a graph that consists of nearly 900k if-then knowledge described in natural language. I will then present COMET: a generative model of machine commonsense. By combining representation learning of language and knowledge, COMET can learn to reason about previous unseen events and make rich commonsense predictions. I will conclude the talk by discussing future research directions that aim to bridge the gap between computer vision and NLP research further by addressing the shared challenge of modeling machine commonsense.