The visual system can make highly efficient aggregate judgements about a set of objects, with speed roughly independent of the number of objects considered. While there is a rich literature on these mechanisms and their ramifications for visual summarization tasks, this prior work rarely considers more complex tasks requiring multiple judgements over long periods of time, and has not considered certain critical aggregation types, such as the localization of the mean value of a set of points. In this paper, we explore these questions using a common visualization task as a case study: relative mean value judgements within multi-class scatterplots. We describe how the perception literature provides a set of expected constraints on the task, and evaluate these predictions with a large-scale perceptual study with crowd-sourced participants. Judgements are no harder when each set contains more points, redundant and conflicting encodings, as well as additional sets, do not strongly affect performance, and judgements are harder when using less salient encodings. These results have concrete ramifications for the design of scatterplots.