Version 0.4.0

Coco Alemana releases version 0.4.0, with native support for interactive, column-level profiling charts.

You can download the newest version here

Column profiling for different types

Visualizing the distribution of different columns in your dataset is a crucial step in any data analysis. Traditional programming environments make this process manual, and largely static.

We’ve provided an automatic, constantly updating view of your data’s distribution. Coco Alemana also allows you to manipulate the distribution to merge, remove or keep particular subsets.

This requires zero code from you, and works on any data - including remote sources like Athena and BigQuery.

Viewing the Charts

In order to access these charts, you can navigate to the distributions view of any frame. Click on any graph to expand it. You can start interacting with the graph instantly.

Accessing distributions view

Interactivity

Continuous value types have the most interactivity. They contain bins which span from the smallest value to the largest, usually with a bin count of around 30.

Continuous value charts, as well as date/time charts, support range selection. You can select a range, and right click to see a set of options, where you can “Merge”, “Convert to NULL”, “Exclude” or “Keep Only”. Keep Only or Exclude are equivalent to applying a filter to the entire frame.

Actions for continuous values

Statistics

By default, we generate useful statistical information each time the chart updates. You can access these values by hitting the control button on the bottom left of the chart. Values are copyable by clicking the copy icon to the left of the numerical value.

Statistics for continuous values

Guide Lines

We provide statistical value guide lines to help you more precisely drag ranges. You can drag from the Average to Q3, to Max, or another bin offset of your choosing. If you’re using a track pad, it clicks, too!

You can toggle guide lines per chart in the chart options panel.

Using guidelines

Other Chart Types

We also support charts for String (categorical) and Boolean values. These have significantly less interactivity.

Limitations

Statistical Approximates

Certain statistical calculations are approximates. This is due to the computational complexity involved in calculating percentiles. The list of statistics which are approximate are as follows:

Q1
Median
Q3
IQR (based on approx. Q1 and Q3)
Range based selections “snapped” to guide lines will show approximate row counts.

Bin Sizing

We currently do not support custom number of bins, or specifying a bin width. The normal amount is around 30, and may be less if the number of unique values is less than 30. This may change in the future.

Date Precision

The maximum granularity we support for dates & timestamps is “hour”-level precision. This is likely to change in the future for more granular types. The standalone TIME type is not supported.

Strings

String columns are provided with a maximum of 30 values. This is not changeable. For best results with strings, continue to use the value counts list below the chart.

Bug Fixes in this Release

While most of our changes here are new features, we did fix several bugs along the way.

Reduces load times for frames
Fixes issues with cache-related issues
Fixes login issues
Resolves certain pre-fetch inaccuracies.
Adds approx_quantile function into CocoSQL

Authors

Ryan Melehan