Data

  Home     NOVA Lab     People     Publications     Activities     Codes     Data     Teaching

MOSCATO dataset: Predicting Multiple Object State Change through Actions

  • MOSCATO is a new benchmark for predicting the evolving states of multiple objects through long videos that consist of multiple actions. In each video multiple objects change states and the state of each object could change several times based on the actions.

  • The dataset can be downloaded from the link below. When using the dataset in your work, you should cite the following paper:

    P. Zameni, Y. Shen and E. Elhamifar, MOSCATO: Predicting Multiple Object State Change through Actions,
    International Conference on Computer Vision (ICCV), 2025.

    Dataset Page

Multi-task Egocentric Kitchen Activities (MEKA) dataset

  • MEKA is an egocentric dataset for multi-task activity understanding and multi-task temporal action segmentation (MT-TAS). The dataset is built upon the EgoPER dataset. It contains multi-task videos, where each video consists of interleaved actions/steps from several tasks.

  • The dataset can be downloaded from the link below. When using the dataset in your work, you should cite the following paper:

    Y. Shen and E. Elhamifar, Error Detection in Egocentric Procedural Task Videos,
    IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2025.

    Dataset Page

EgoPER Dataset for Procedural Error Understanding

  • EgoPER is an egocentric dataset for error understanding in procedural videos. It contains multimodal data (RGB, depth, audio, gaze and hands) along annotations of steps and bounding boxes of objects and active objects. The dataset contains both normal and error videos from 5 different cooking tasks.

  • The dataset can be downloaded from the link below. When using the dataset in your work, you should cite the following paper:

    S. Lee, Z. Lu, Z. Zhang, M. Hoai and E. Elhamifar, Error Detection in Egocentric Procedural Task Videos,
    IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2024.

    Dataset Page

ProceL Dataset for Learning from Instructional Videos

  • ProceL is a multimodal procedural learning dataset for research on instructional video understanding.

  • The dataset consists of 47.3 hours of annotated videos from 720 videos coming from 12 diverse tasks. For every task, an instruction grammar is built and videos are annotated with the beginning and ending time of each key-step in the grammar. The dataset can be downloaded from the link below. When using the dataset in your work, you should cite the following paper:

    E. Elhamifar, Z. Naing, Unsupervised Procedure Learning via Joint Dynamic Summarization,
    International Conference on Computer Vision (ICCV), 2020.

  • Dataset Page