Error Detection in Egocentric Procedural Task Videos

Egocentric Procedural ERror (EgoPER) Dataset

Dataset & Code

Shih-Po Lee¹

Zijia Lu¹

Zekung Zhang²

Minh Hoai²

Ehsan Elhamifar¹

¹Northeastern University

²Stony Brook University

What is EgoPER

Characteristics

Challenges

The dataset contains egocentric procedural task videos and other modalities such as audio, depth, hand tracking, etc, on 5 different cooking tasks - pinwheels, coffee, quesadilla, tea, and oatmeal. Besides the correct/normal videos, EgoPER dataset contains erroneous/abnormal videos with 5 different categories - slip, correction, modification, addition, and omission.

28 Hours of Recording
5 Cooking Tasks
Erroneous Videos
5 Error Types
Multiple Modalities
Frame-wise Step Labels
Object Bounding Boxes
Active Object Labels

Temporal Action Segmentation
Procedural Error Detection
Action Recognition
Active Object Detection

Error Taxonomy

1) Step Omission

corresponds to skipping one or multiple steps, e.g., not checking water temperature in the kettle, or not putting bananas on the tortilla.

2) Step Addition

corresponds to having unnecessary extra steps that are not in the task graph, e.g., pouring sugar onto tortilla or sprinkle cinnamon into mug.

3) Step Modification

corresponds to performing a step in a different way than the one specified by the recipe, e.g., scoop nut butter using spoon or pour water without circular motion. This does not necessarily change the outcome of the step.

4) Step Slip

corresponds to executing a step in a way that leads to not achieving the goal of the step, e.g., adding water to a different bowl from the one containing oats, or dropping tortilla on the floor.

5) Step Correction

corresponds to performing an action to mitigate the effect of an slip error, e.g., transferring water from the second bowl to the one containing oats or discarding the tortilla on the floor and picking a new one.

Active Object Detection

Dataset Statistics

Download & Code

Please access the dataset, annotations, and code from our github

Citation

Cite our CVPR paper

@InProceedings{Lee_2024_CVPR, author = {Lee, Shih-Po and Lu, Zijia and Zhang, Zekun and Hoai, Minh and Elhamifar, Ehsan}, title = {Error Detection in Egocentric Procedural Task Videos}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2024}, pages = {18655-18666} }