Data Visualization
Written and curated by Emily Potts, MS, Vivian Zhang, MS, and Cheng-Shiun Leu, PhD.
Overview
In data analysis, visualizations are useful to explore relationships within your dataset or communicate your results to others. Visualization tools include graphs, charts, plots, maps, or interactive dashboards, with each type having unique settings where they are most appropriate. When designing visualizations, researchers should carefully consider the types and number of variables, the intended audience (e.g., technical vs. non-technical), and the purpose of the figure to ensure clarity and accuracy. Following visual design best practices—such as using minimal chart elements, maintaining proportional scaling, and selecting color schemes accessible to those with color blindness—ensures clarity and inclusivity. Data visualization is a crucial skill for clinical researchers as it enables a comprehensive understanding of the needs of their data and allows for the clear communication of complex relationships.
Videos
- Data Visualization (12 minutes)
- Broadly discusses the importance of data visualization and how to choose the right type for various combinations of numeric and categorical variables, regardless of the statistical package being used.
- Revealing Best Practices in Visual Exploratory Data Analysis (35 minutes) - University of California Berkeley
- Describes the role and process of visually examining data, emphasizing interactively and iteratively inspecting the data.
Websites
- Data-to-Viz | Archive
- Useful tool to learn the various graph types, select the most appropriate graph for your variable types, and learn common caveats to avoid.
- Types of Graphs - JMP Statistical Software | Archive
- Simple explanations for a variety of graph types (histograms, grouped bar charts, mosaic plots, and scatter plot matrices) with an example of that type of graph, the number of variables that graph uses and a description of its purpose.
- Data Visualizations, Charts, and Graphs - Harvard University | Archive
- Highlights the importance of accessibility in crafting effective visualizations, including color scheme, labeling, and supplemental formats.
Readings
- McNutt, M., Bradford, M., Drazen, J., Hanson, B., Howard, B., Jamieson, K. H., ... & Winker, M. (2014). Transparency in authors’ contributions and responsibilities to promote integrity in scientific publication. PLOS Computational Biology, 10(7), e1003833. https://doi.org/10.1371/journal.pcbi.1003833
- Aims to provide a basic set of rules to improve figure design and to explain some of the common pitfalls.
- Fundamentals of Data Visualization by Claus O. Wilke | Archive
- Guide to making visualizations that accurately reflect the data, tell a story, and look professional.
- Introduction to Modern Statistics, First Edition by Mine Çetinkaya-Rundel and Johanna Hardin | Archive
- Part II covers EDA and data visualization, with chapter 4 on categorical data and chapter 5 on numerical, continuous data.
R Annotated Code
R software comes with built-in capability for base graphics. However, many R users extend functionality using R packages. The most popular is ggplot2 by Hadley Wickham, which has an additional number of extension packages.
- Data Visualization with Base R | Archive
- Describes the five main plot types in base R (barplot, pie, hist, boxplot, and plot) and advantages and disadvantages of each. Includes detailed documentation on how to customize their text, points, lines, legends, etc.
- R Base Graphs | Archive
- Resource page containing links for generating a variety of graph types in base R, including scatter plot matrices, strip charts, histograms, and density plots.
- Top 50 ggplot2 Visualizations - The Master List (With Full R Code) | Archive
- Comprehensive list of visualization types, separated by statistical topic, including correlation, distribution, change over time, and spatial pattern.
- Data visualization with ggplot2 :: Cheat Sheet | Archive
- Resource to use ggpplot2 organized by number of variables and type, such as two variables (one discrete, one continuous).
- Pdf available for download.
- Visualization with ggplot2 Part 1 and Part 2 - Jeff Goldsmith at Columbia University
- Visualizing categorical data and numerical data - Andrew Bray at Reed College
- Walk-through lab to understand the unique visualization needs of categorical and numerical data, with exploration of these features in the ggplot package.
- R for Data Science (2e) - Hadley Wickham | Archive
- A more advanced text with Chapters 9-11 discussing layered grammar of graphics in ggplot2, conducting EDA, and how to revise exploratory graphics for presentation.
SAS Annotated Code
SAS can be used to produce graphical representations by coding procedures like PROC SGPLOT, PROC SGSCATTER, and PROC GCHART. Users specify their data and customization options within the procedure's syntax and can export their graphs using ODS (Output Delivery System) statements, which allow for the output to be saved in various formats such as PDF, PNG, and HTML. Alternatively, SAS Viya / Visual Analytics offers a user-friendly, drag-and-drop interface for interactive visualizations for users without extensive coding.
- SAS Data Visualization | Archive
- Describes the Basic SAS Graphical Data Representations (Histograms, Bar Charts, Pie Charts, Scatter Plots, and Box plots), with explanations on syntax and variable types, common customizations, and examples.
- Gallery of Plots and Charts - SAS
- Provides a comprehensive list of possible data visualizations, with an example plot, code syntax, and options for customization.
- Intro to SAS Notes - The Output Delivery System and Graphics - Robert Parker at the University of Florida | Archive
- Details the ODS system that SAS uses to export graphics as well as using plot options to customize and control graph appearance.
- Explore and Visualize Data with SAS Visual Analytics - SAS (14 minutes)
- Describes how to access and prepare data, create interactive charts, reports, and maps, and build a preliminary predictive model in SAS Viya.
Stata Annotated Code
Data visualizations can be produced in Stata using either the Graphics drop-down menu, where users select the graph type and customize options through a dialog box. Alternatively, users can generate and customize plots by typing specific graph commands directly through the command line.
- Top 25 Stata Visualizations — With Full Code | Archive
- Full replicable code tutorial will help you choose the right type of chart for your objectives and how to code them easily in Stata.
- Stata Cheatsheets | Archive
- Last two pages cover data visualization, including basic plot syntax, commands and options for graphs relevant for one-three variables, and visual customization.
- Creating graphs in Stata® (Playlist with videos 4-6 minutes each) - StataCorp LLC
- Stata Graphics Reference Manual Release 18 | Archive
- Comprehensive documentation on all graphical commands and relevant options.
- Stata Graph Gallery | Archive
- A useful resource for less common graphical tasks, such as generating a survival curve, producing multiple graphs at once, or making maps.
- Data Visualization Handout – Some Basic Graphs | Archive
- Intuitive walk-through on the basic syntax and use of a Stata graph command, with annotated code for single variable and multiple variable graphs.
Courses and Resources
- Visualization Apps | Irving Institute for Clinical and Translational Research
- Free web-based apps allow users to upload data to create publication and presentation quality graphics with flexible export options.
- Data Visualization: Tools - University of Buffalo
- Curated resource list of visualization tools (outside of programming software such as R, SAS, and Stata), many of which are free and web-based.