Column Distributions
Learn how to work with column distributions. Analyze and clean data via auto-updating column distributions.
About Distributions
Section titled “About Distributions”Coco Alemana generates a set of distributions automatically whenever you load in a frame. These distributions are intelligently updated when changes occur to them. The distributions are interactive, allowing you to clean and modify data without writing code.
Coco Alemana provides a value counts list, as well as a numeric distribution for each column that supports it.
Viewing Distributions
Section titled “Viewing Distributions”You can access the distributions view by toggling the distribution view icon in the top right side of the frame. This is also available via a shortcut: ⇧ ⌘ 1
.
Working with Value Counts
Section titled “Working with Value Counts”Each column comes with an auto-calculated value counts table. This allows you to see values, and their associated counts in the frame. You can sort by value, or by count.
Since this is generally an expensive operation, we limit the unique values to 50,000 by default, but this is adjustable up-to 1 million unique values.
Search values
Section titled “Search values”The value counts section of the distributions view is interactive. If you have a value you’d like to find, you can search for it via ⌘ F
to pull up the search panel. Search is based on whether the value contains your search term.
Merge multiple values into one
Section titled “Merge multiple values into one”You can select multiple values, similar to how you would select in Finder, and merge them into a single value. This is particularly useful when unifying data during the cleaning process.
To do this, select the values you’d like to merge, and right-click, select “Merge…”.
Renaming values
Section titled “Renaming values”Just as you can select multiple values to merge them into one, you can also rename individual ones.
With a single value selected, you can hit [Enter]
on your keyboard to start renaming. You can also double-click on the value to start editing.
Converting values to NULL
Section titled “Converting values to NULL”You can select any set of values, and hit [Delete]
on your keyboard to convert them to a NULL
value.
Working with Binned Distributions
Section titled “Working with Binned Distributions”For continuous data, such as float or integer types, Coco Alemana generates a binned distribution. This is an interactive distribution, which allows you to see statistical information which may be helpful during an analysis.