Seil Na Graduate researcher @ Vision & Learning Lab, Seoul National University

About me Publications

About me

Graduate researcher @ Vision & Learning Lab, Seoul National University

Publications

iccv17
A Read-Write Memory Network for Movie Story Understanding
  • Seil Na, Sangho Lee, Jisung Kim(SKT), Gunhee Kim.
    Accepted at ICCV 2017
  • We propose a novel memory network model named Read-Write Memory Network (RWMN) to perform question and answering tasks for large-scale, multimodal movie story understanding. The key focus of our RWMN model is to design the read network and the write network that consist of multiple convolutional layers, which enable memory read and write operations to have high capacity and flexibility. While existing memory-augmented network models treat each memory slot as an independent block, our use of multi-layered CNNs allows the model to read and write sequential memory cells as chunks, which is more reasonable to represent a sequential story because adjacent memory blocks often have strong correlations. For evaluation, we apply our model to all the six tasks of the MovieQA benchmark, and achieve the best accuracies on several tasks, especially on the visual QA task. Our model shows a potential to better understand not only the content in the story, but also more abstract information, such as relationships between characters and the reasons for their actions.

    PDF Project(Soon)
    cvpr17
    Encoding Video and Label Priors for Multi-label Video Classification on YouTube-8M dataset
  • Seil Na, Youngjae Yu, Sangho Lee, Jisung Kim(SKT), Gunhee Kim.
    Accepted at CVPR 2017 Workshop on YouTube-8M Large-Scale Video Understanding(Oral)
  • YouTube-8M is the largest video dataset for multi-label video classification. In order to tackle the multi-label classification on this challenging dataset, it is necessary to solve several issues such as temporal modeling of videos, label imbalances, and correlations between labels. We develop a deep neural network model, which consists of four components: the frame encoder, the classification layer, the label processing layer, and the loss function. We introduce our newly proposed methods and discusses how existing models operate in the YouTube-8M Classification Task, what insights they have, and why they succeed (or fail) to achieve good performance. Most of the models we proposed are very high compared to the baseline models, and the ensemble of the models we used is 8th in the Kaggle Competition.

    PDF Project