A scatterplot matrix, or SPLOM, displays pairwise relationships among multiple variables in a grid of scatterplots. Each variable is combined with every other variable across rows and columns, allowing correlations, patterns, clusters, and outliers to be explored at once.
Historical Background
Scatterplot matrices developed in the context of exploratory data analysis, especially the work of John W. Tukey. They became standard in statistical software such as S and later R, and are now available in Python libraries, Tableau, Power BI, and other tools.
Data Structure
The data is a table with observations as rows and numeric variables as columns. A matrix is generated by plotting each variable against every other variable.
Purpose
The purpose is to examine many pairwise relationships quickly. It is especially useful at the beginning of analysis, before deciding which relationships deserve more detailed modeling.
Design Notes
- Use it for a moderate number of variables.
- Add color for categories when helpful.
- Consider histograms or density plots on the diagonal.
- Use brushing or interaction for dense datasets.
- Avoid overplotting through transparency or sampling.
Summary
Scatterplot matrices are a basic tool for multivariate exploration. They help analysts see relationships and outliers across many variables, but they become difficult to read when the number of variables or observations is too large.