2024 VIS Area Curation Committee Executive Summary

Introduction

This report summarizes the findings, recommendations, and process by the VIS Area Curation Committee (ACC) regarding the areas and keywords used for paper submissions to IEEE VIS 2024. It is based on the 2021, 2022, and 2023 ACC committee reports, updated with the 2024 data. According to the Charter, the goal of this committee is to analyze and report how submissions made use of the areas and keywords to describe their contribution. It is important to understand when these descriptors no longer adequately cover the breadth of research presented at VIS.

We use submission and bidding information from VIS 2024 to analyze recent trends following the move to an area model. We also conducted a survey among authors who submitted to VIS about their satisfaction with the area model.

The full data and source code to rebuild this project are available here.

  • Committee members 2024: Jean-Daniel Fekete (co-chair), Alexander Lex (co-chair), Helwig Hauser, Ingrid Hotz, David Laidlaw, Torsten Möller, Michael Papka, Danielle Szafir, Yingcai Wu.
  • Committee members 2023: Steven Drucker (chair), Jean-Daniel Fekete, Ingrid Hotz, David Laidlaw, Alexander Lex, Torsten Möller, Michael Papka, Hendrik Strobelt, Shigeo Takahashi.
  • Committee members 2022: Steven Drucker (chair), Ingrid Hotz, David Laidlaw, Heike Leitte, Torsten Möller, Carlos Scheidegger, Hendrik Strobelt, Shigeo Takahashi, Penny Rheingans.
  • Committee members 2021: Alex Endert (chair), Steven Drucker (next chair), Issei Fujishiro, Christoph Garth, Heidi Lam, Heike Leitte, Carlos Scheidegger, Hendrik Strobelt, Penny Rheingans.

Last edited: 2024-10-16.

Executive Summary

  • Overall, the area model appears to be successful.
  • There is substantial growth in the Application area and we hence recommend to take action to address the load for APCs of the application area.
  • Acceptance rates have declined this year, caused by lower first-round acceptance rates and more than usual second-round rejects (some of which are likely one-time effects).
  • Variability of acceptance rates between areas is substantial.

Otherwise, our analysis suggests that submissions are relatively balanced across areas, keywords are (with a small exception) well distributed, and the unified PC appears to provide broad and overlapping coverage.

Recommendations

The main issue that needs addressing for 2026 is the increasing size of the Applications area. The goal of area size is to be approximately 100 submissions, with a lower bound of 50 and an upper bound of 150. After several years of gradual growth, (2021: 100; 2023: 123), the application area has grown to 154 papers in 2024. The Systems & Rendering area is slightly above the lower bound with 51 papers, as is the Data Transformations area with 53 papers.

We see three possible remedies for this uneven workload:

Option 1: Area Split

The most obvious choice would be to split the areas that have grown too large. While this would be sensible for areas with a “natural” fault line, like Theoretical and Empirical, it seems more fraught for Applications. It is unclear how a sensible split of Applications could be executed that is not confusing to authors. One possibility would be to split by application domain (e.g., Applications in AI/ML and all other Applications), but this would lead to ambiguity for a number of papers.

Option 2: Additional APCs for Large Areas

The main motivation for keeping areas to similar sizes is to manage workload for APCs. An alternative approach to a split would be to add additional APCs for large areas (e.g., a third APC for applications). While it would reduce workload slightly, there are several downsides to this approach:

  • It would necessitate recruiting another senior researcher as an APC, and recruiting tends to be difficult.
  • It would grow the overall size of the APC while the number of papers is not growing substantially (it is still lower than in the outlier year 2020).
  • It might make decision-making more complicated.

Option 3: Introducing Secondary Areas and Moving Papers

An alternative approach would be to introduce a secondary area for all paper submissions, and then balance papers after the initial submission. Author feedback shows that a second-choice area would be desirable.

The advantages of this approach:

  • It could not only be used to alleviate issues in large areas, but also move papers to areas that do not receive enough submissions, creating an overall more balanced load.
  • It would be a relatively minor change from author’s perspective.

The downsides of this approach:

  • It adds an additional step to the review process. While papers are currently moved between areas on rare occasion, this would make this a standard part of the process.
  • Authors could be discontent if their paper isn’t reviewed in their primary area.

The ACC currently favors Option 3 as it seems most promising in balancing areas. However, we recommend eliciting feedback from APCs and OPCs on these options.

Survey Results

Regarding authors satisfaction related to the areas from the IEEE VIS Areas Feedback Survey 2024:

  • For our survey, we received 103 responses. This is not a particularly high return rate, but it’s a good absolute number.
  • We received most responses from authors, who submitted to areas 1 (theory…), 2 (applications), and 4 (representations…), and less from areas 3 (systems), 5 (data transformations), and 6 (analytics).
  • Very few (6) found that they could not find a suitable area, a fair share of the authors (25 of 103), however, state that they could also have submitted to another area.
  • 8 authors, who submitted to applications, would also have submitted to another area.
  • Area 6 came up most often as a possible alternative area.
  • At large, it seems that most authors do not see the immediate need to change the areas.
  • Quite a few of the authors (18 of 103) thought that area 4 needs an improvement.
  • Also quite a few (12) thought that area 6 should be improved.
  • When asked regarding merging/splitting areas, “splitting area 2 (applications)” came up most often.
  • The authors sent a pretty strong signal that indicating a second-choice area would be welcome.

Quantitative Analysis

Code
import itertools

import pandas as pd
import numpy as np

# Import the necessaries libraries
import plotly.offline as pio
import plotly.graph_objs as go
import plotly.express as px
# [jdf] no need to specify the renderer but, for interactive use, init_notebook should be called
# pio.renderers.default = "jupyterlab"
# Set notebook mode to work in offline
# pio.init_notebook_mode()
# pio.init_notebook_mode(connected=True)
width = 750

import sqlite3

#### Data Preparation

# static data – codes -> names etc.
staticdata = dict(
    decision = { 
        'C': 'Confer vs. cond Accept', # relevant for the 2020 and 2021 data have a different meaning
        'A': 'Accept', # for the 2020 data
        'A2': 'Accept', # after the second round, should be 120 in 2022
        'R': 'Reject', # reject after the first round -- should be 322 in 2022
        'R2': 'Reject in round 2', # reject after the second round -- should be 2 in 2022
        'R-2nd': 'Reject in round 2', 
        'DR-S': 'Desk Reject (Scope)', # should be 7 in 2022
        'DR-P': 'Desk Reject (Plagiarism)', # should be 4 in 2022
        'AR-P': 'Admin Reject (Plagiarism)', # should be 1 in 2022
        'DR-F': 'Desk Reject (Format)', # should be 4 in 2022
        'R-Strong': 'Reject Strong', # cannot resubmit to TVCG for a year
        'T': 'Reject TVCG fasttrack', # Explicitly invited to resubmit to TVCG, status in major revision
    },
    FinalDecision = { # Just flatten to Accept and Reject
        'C': 'Accept', 
        'A': 'Accept', # for the 2020 data
        'A2': 'Accept', # after the second round, should be 120 in 2022
        'R': 'Reject', # reject after the first round -- should be 322 in 2022
        'R2': 'Reject', # reject after the second round -- should be 2 in 2022
        'R-2nd': 'Reject', 
        'DR-S': 'Reject', # should be 7 in 2022
        'DR-P': 'Reject', # should be 4 in 2022
        'AR-P': 'Reject', # should be 1 in 2022
        'DR-F': 'Reject', # should be 4 in 2022
        'R-Strong': 'Reject',
        'T': 'Reject',
    },
    area = {
        'T&E': 'Theoretical & Empirical',
        'App': 'Applications',
        'S&R': 'Systems & Rendering',
        'R&I': 'Representations & Interaction',
        'DTr': 'Data Transformations',
        'A&D': 'Analytics & Decisions',
    },
    bid = { 
        0: 'no bid',
        1: 'want',
        2: 'willing',
        3: 'reluctant',
        4: 'conflict'
    },
    stat = {
        'Prim': 'Primary', 
        'Seco': 'Secondary'
    },
    keywords = pd.read_csv("../data/2021/keywords.csv", sep=';'), # 2021 is correct as there was no new keywords file in 2022
    colnames = {
        'confsubid': 'Paper ID',
        'rid': 'Reviewer',
        'decision': 'Decision',
        'area': 'Area',
        'stat': 'Role',
        'bid': 'Bid'
    }
)

dbcon = sqlite3.connect('../data/vis-area-chair.db') #[jdf] assume data is in ..

submissions_raw20 = pd.read_sql_query('SELECT * from submissions WHERE year = 2020', dbcon, 'sid')
submissions_raw21 = pd.read_sql_query('SELECT * from submissions WHERE year = 2021', dbcon, 'sid')
submissions_raw22 = pd.read_sql_query('SELECT * from submissions WHERE year = 2022', dbcon, 'sid')
submissions_raw23 = pd.read_sql_query('SELECT * from submissions WHERE year = 2023', dbcon, 'sid')
submissions_raw24 = pd.read_sql_query('SELECT * from submissions WHERE year = 2024', dbcon, 'sid')
submissions_raw = pd.read_sql_query('SELECT * from submissions', dbcon, 'sid')
#print(submissions_raw24)

submissions = (submissions_raw
    .join(
        pd.read_sql_query('SELECT * from areas', dbcon, 'aid'), 
        on='aid'
    )
    .assign(Keywords = lambda df: (pd
        .read_sql_query('SELECT * FROM submissionkeywords', dbcon, 'sid')
        .loc[df.index]
        .join(
            pd.read_sql_query('SELECT * FROM keywords', dbcon, 'kid'), 
            on='kid'
        )
        .keyword
        .groupby('sid')
            .apply(list)
    ))
    .assign(**{'# Keywords': lambda df: df.Keywords.apply(len)})
    .assign(**{'FinalDecision': lambda df: df['decision']})
    .replace(staticdata)
    .rename(columns = staticdata['colnames'])
    .drop(columns = ['legacy', 'aid'])
#    .set_index('sid')
#    .set_index('Paper ID')
# note -- I changed the index, since 'Paper ID' was not unique for multiple years.
# By not setting the index to 'Paper ID' the index remains with 'sid'.
# However, 'sid' is used as a unique index in the creation of the database anyways.
)

# replace the old 'Paper ID' with a unique identifier, so that the code from 2021 will work
submissions = submissions.rename(columns = {'Paper ID':'Old Paper ID'})
submissions.reset_index(inplace=True)
submissions['Paper ID'] = submissions['sid']
submissions = submissions.set_index('Paper ID')
#submissions colums: (index), sid (unique id), Paper ID (unique), Old Paper ID, Decision, year, Area, Keywords (as a list), # Keywords

all_years = submissions['year'].unique()

#rates_decision computes the acceptance rates (and total number of papers) per year
#rates_decision: (index), Decision, year, count, Percentage
rates_decision = (submissions
    .value_counts(['Decision', 'year'])
    .reset_index()
    # .rename(columns = {0: 'count'})
)
rates_decision['Percentage'] = rates_decision.groupby(['year'])['count'].transform(lambda x: x/x.sum()*100)
rates_decision = rates_decision.round({'Percentage': 1})
#rates_decision computes the acceptance rates (and total number of papers) per year
#rates_decision: (index), Decision, year, count, Percentage
rates_decision_final = (submissions
    .value_counts(['FinalDecision', 'year'])
    .reset_index()
    # .rename(columns = {0: 'count'})
)
rates_decision_final['Percentage'] = rates_decision_final.groupby(['year'])['count'].transform(lambda x: x/x.sum()*100)
rates_decision_final = rates_decision_final.round({'Percentage': 1})
#submissions
#bids_raw: (index), Reviewer ID, sid (unique paper identifier over mult years), match score, bid of the reviewer, role of the reviewer, Paper ID
bids_raw = (pd
    .read_sql_query('SELECT * from reviewerbids', dbcon)
    .merge(submissions_raw['confsubid'], on='sid')
    .replace(staticdata)
    .rename(columns = staticdata['colnames'])
)
#bids_raw

## Renaming Paper ID to Old Paper ID, setting Paper ID to sid, keeping all 3 for now...
bids_raw = bids_raw.rename(columns = {'Paper ID':'Old Paper ID'})
bids_raw['Paper ID'] = bids_raw['sid']
# bids = Reviewer, sid, Bid (how the reviewer bid on this paper)
#      doesn't include review/sid that were not bid for [.query('Bid != "no bid"')]
bids = (bids_raw
    .query('Bid != "no bid"')
# Paper ID is not unique over multiple years!
#    .drop(columns = ['sid'])
#    [['Reviewer','Paper ID', 'Bid']]
    [['Reviewer','sid', 'Paper ID', 'Bid']]
    .reset_index(drop = True)
)

# matchscores becomes a table to reviewer/sid with the match scores
# many of these will be "NaN" since we now have multiple years together.
# we need to check whether the reviewer IDs remain unique across the years!
matchscores = (bids_raw
# Paper ID is not unique over multiple years!
#    [['Reviewer','Paper ID','match']]
    [['Reviewer','sid','Paper ID','match']]
# Paper ID is not unique over multiple years!
#    .set_index(['Reviewer', 'Paper ID'])
    .set_index(['Reviewer', 'Paper ID'])
    .match
    .unstack(level=1)
)

# assignments = Reviewer, sid, Role (primary, secondary)
#      doesn't include review/sid that were not assigned [.query('Role != ""')]
assignments = (bids_raw
    .query('Role != ""')
# Paper ID is not unique over multiple years!
#    [['Reviewer', 'Paper ID', 'Role']]
    [['Reviewer', 'sid', 'Paper ID', 'Role']]
    .reset_index(drop = True)
)

del dbcon

#### Plot Defaults

acc_template = go.layout.Template()

acc_template.layout = dict(
    font = dict( 
        family='Fira Sans',
        color = 'black',
        size = 13
    ),
    title_font_size = 14,
    plot_bgcolor = 'rgba(255,255,255,0)',
    paper_bgcolor = 'rgba(255,255,255,0)',
    margin = dict(pad=10),
    xaxis = dict(
        title = dict( 
            font = dict( family='Fira Sans Medium', size=13 ),
            standoff = 10
        ),
        gridcolor='lightgray',
        gridwidth=1,
        automargin = True,
        fixedrange = True,
    ),
    yaxis = dict(
        title = dict( 
            font = dict( family='Fira Sans Medium', size=13 ),
            standoff = 10,
        ),
        gridcolor='lightgray',
        gridwidth=1,
        automargin = True,
        fixedrange = True,
    ),
    legend=dict(
        title_font_family="Fira Sans Medium",
    ),
    colorway = px.colors.qualitative.T10,
    hovermode = 'closest',
    hoverlabel=dict(
        bgcolor="white",
        bordercolor='lightgray',
        font_color = 'black',
        font_family = 'Fira Sans'
    ),
)

acc_template.data.bar = [dict(
    textposition = 'inside',
    insidetextanchor='middle',
    textfont_size = 12,
)]

px.defaults.template = acc_template

px.defaults.category_orders = {
    'Decision': list(staticdata['decision'].values()),
    'FinalDecision':  list(staticdata['FinalDecision'].values()),
    'Area': list(staticdata['area'].values()),
    'Short Name': staticdata['keywords']['Short Name'].tolist(),
}

config = dict(
    displayModeBar = False,
    scrollZoom = False,
    responsive = False
)