Data Visualization

Written and curated by Emily Potts, MS, Vivian Zhang, MS, and Cheng-Shiun Leu, PhD.

Overview

In data analysis, visualizations are useful to explore relationships within your dataset or communicate your results to others. Visualization tools include graphs, charts, plots, maps, or interactive dashboards, with each type having unique settings where they are most appropriate. When designing visualizations, researchers should carefully consider the types and number of variables, the intended audience (e.g., technical vs. non-technical), and the purpose of the figure to ensure clarity and accuracy. Following visual design best practices—such as using minimal chart elements, maintaining proportional scaling, and selecting color schemes accessible to those with color blindness—ensures clarity and inclusivity. Data visualization is a crucial skill for clinical researchers as it enables a comprehensive understanding of the needs of their data and allows for the clear communication of complex relationships.  

Videos

  • Data Visualization (12 minutes) 
    • Broadly discusses the importance of data visualization and how to choose the right type for various combinations of numeric and categorical variables, regardless of the statistical package being used. 
  • Revealing Best Practices in Visual Exploratory Data Analysis (35 minutes) - University of California Berkeley 
    • Describes the role and process of visually examining data, emphasizing interactively and iteratively inspecting the data.

Websites

  • Data-to-Viz | Archive 
    • Useful tool to learn the various graph types, select the most appropriate graph for your variable types, and learn common caveats to avoid. 
  • Types of Graphs - JMP Statistical Software | Archive 
    • Simple explanations for a variety of graph types (histograms, grouped bar charts, mosaic plots, and scatter plot matrices) with an example of that type of graph, the number of variables that graph uses and a description of its purpose. 
  • Data Visualizations, Charts, and Graphs - Harvard University | Archive 
    • Highlights the importance of accessibility in crafting effective visualizations, including color scheme, labeling, and supplemental formats.

Readings

R Annotated Code

R software comes with built-in capability for base graphics. However, many R users extend functionality using R packages. The most popular is ggplot2 by Hadley Wickham, which has an additional number of extension packages. 

  • Data Visualization with Base R | Archive 
    • Describes the five main plot types in base R (barplot, pie, hist, boxplot, and plot) and advantages and disadvantages of each. Includes detailed documentation on how to customize their text, points, lines, legends, etc.  
  • R Base Graphs | Archive 
    • Resource page containing links for generating a variety of graph types in base R, including scatter plot matrices, strip charts, histograms, and density plots.
  • Top 50 ggplot2 Visualizations - The Master List (With Full R Code) | Archive 
    • Comprehensive list of visualization types, separated by statistical topic, including correlation, distribution, change over time, and spatial pattern.
  • Data visualization with ggplot2 :: Cheat Sheet | Archive 
    • Resource to use ggpplot2 organized by number of variables and type, such as two variables (one discrete, one continuous).
    • Pdf available for download.
  • Visualization with ggplot2 Part 1 and Part 2 - Jeff Goldsmith at Columbia University 
    • Beginner lab tutorial to ggplot2 starting from a basic plot to customize aesthetics, labels, scales, etc. | Archived Part 1 & Part 2 
  • Visualizing categorical data and numerical data - Andrew Bray at Reed College 
    • Walk-through lab to understand the unique visualization needs of categorical and numerical data, with exploration of these features in the ggplot package.
  • R for Data Science (2e) - Hadley Wickham | Archive 
    • A more advanced text with Chapters 9-11 discussing layered grammar of graphics in ggplot2, conducting EDA, and how to revise exploratory graphics for presentation.

SAS Annotated Code

SAS can be used to produce graphical representations by coding procedures like PROC SGPLOT, PROC SGSCATTER, and PROC GCHART. Users specify their data and customization options within the procedure's syntax and can export their graphs using ODS (Output Delivery System) statements, which allow for the output to be saved in various formats such as PDF, PNG, and HTML. Alternatively, SAS Viya / Visual Analytics offers a user-friendly, drag-and-drop interface for interactive visualizations for users without extensive coding. 

  • SAS Data Visualization | Archive 
    • Describes the Basic SAS Graphical Data Representations (Histograms, Bar Charts, Pie Charts, Scatter Plots, and Box plots), with explanations on syntax and variable types, common customizations, and examples.
  • Gallery of Plots and Charts - SAS 
    • Provides a comprehensive list of possible data visualizations, with an example plot, code syntax, and options for customization. 
  • Intro to SAS Notes - The Output Delivery System and Graphics - Robert Parker at the University of Florida | Archive 
    • Details the ODS system that SAS uses to export graphics as well as using plot options to customize and control graph appearance. 
  • Explore and Visualize Data with SAS Visual Analytics - SAS (14 minutes) 
    • Describes how to access and prepare data, create interactive charts, reports, and maps, and build a preliminary predictive model in SAS Viya. 

Stata Annotated Code

Data visualizations can be produced in Stata using either the Graphics drop-down menu, where users select the graph type and customize options through a dialog box. Alternatively, users can generate and customize plots by typing specific graph commands directly through the command line. 

Courses and Resources

Related Topics