Stories are more expressive in style, language and content, involving imaginary concepts not explicit in the images.An ideal deep learning system should learn and develop cohesive, meaningful, and causal stories.Unfortunately, most existing storytelling methods are trained and evaluated on a single dataset, i.e.
, the VIsual STorytelling (VIST) dataset.Multiple datasets are essential to test the generalization ability of algorithms.We bridge the gap and present a new dataset for expressive and coherent story creation.We present the Sequential Storytelling Image Dataset (SSID,
org/documents/sequential-storytelling-image-dataset-ssid
Moreover, our dataset achieves lower mean average scores across all metrics, meaning that the ground truth stories of our dataset are more diverse.Finally, we train and evaluate existing state-of-the-art rhetorical storytelling methods on both datasets and show that our dataset is more challenging and requires sophisticated techniques to accurately detect a significant variety of events.