Datasets that are collected for research often contain millions of records
and may carry hidden pitfalls that are hard to detect. This work demonstrates
how visual analytics can be used for identifying problems in the spatial
distribution of crawled photographic data in different datasets: Picasa Web
Albums, Panoramio, Flickr and Geograph, chosen to be potential data sources
for ongoing doctoral research. This poster summary describes a number of
problems found in the datasets using visual analytics and suggests that
greater attention should be paid to assessing the quality of data gathered
from user-generated photographic content. This work is the first part of a
three-year PhD project aimed at producing a pedestrian-routing system that
can suggest attractive pathways extracted from user-generated photographic
content.