The Science of Data Visualization Ep-1

HOW TO VISUALIZE DATA

Yugant Hadiyal
4 min readNov 26, 2018

Visualization depends on the dimensions of the data and the nature of its features. The data can contain the time, string, numbers, binary or alike.

Importance of storytelling

It is the most unusual thing to get a problem where you have to plot a single dimension data. You can use a line and put points on it and it will work. Anything can be optimized which is created by nature and even nature itself evolves with time.

For example, you have only one feature and have a thousand entries for it. When you put a mark on different places to represent the data. It can’t show us redundancy of the data.

By the above examples, we can conclude that there are various ways to represent one type of data. It is necessary to decide which one is better for storytelling.

There are a whole mathematics and science behind generating such visualizations. One of the well-known examples is the use of the golden ratio. Anything which is found with dimensions in this ratio is said to be beautiful or a golden section. Value of this ratio is 1.618.

The following image shows its presence in the human body.

Moreover, it is also found when the Fibonacci series is visualized. After reaching to higher index in the series the ratio of the chart’s height and width tends to acquire the value of golden ratio with high precision.

The above-mentioned examples of golden ratio are to get the idea about the science of design, similarly, there is a whole branch in the field of data science which is called data visualization. The major problem of the domain is how to fit a large amount of data in a single chart which can communicate to the humans. You can check out the below list for the problem statements which this field has to offer.

  1. Plotting large amount of data in single visual.
  2. Methods to visualize multidimensional data.
  3. Methods to generate dynamic visual for the data containing time as one of the features or when storytelling requires time flow in it.
  4. Visualization of the live stream of data.
  5. Making user-friendly and customizable visual.
  6. Design aspects of the visuals.

Each challenge has their own sub-problems. Lots of them are yet to solve. It is the field of research, innovation and design. All the problems are described in this series further.

Plotting large amount of data in single visual

Handling of huge data:

Normally, we choose to use minimum data in the process of analytics because of its fast transferable ability and low data requirement.

For this problem, we have cloud servers running 24x7 with run-time scalability. The biggest giant AWS is already serving lots of companies and developers around the globe. Other service providers are Google Cloud, MS Azure, Alibaba Cloud, IBM Cloud, Salesforce, Oracle Cloud and many more. Live Data Streaming can be seen as a less used service in some parts of the world.

Fit the data in the single visual :

It is crucial to select a perfect chart-type or mode of media to present the insights of the collected data

The solution to this problem is normalization. Normalization is the process to scale down the data to fit in the chart range. This process seems like a compression of the data and fits it on the chart. Though normalization can affect the results. One might not get the precision but will definitely get the overall data in one sight.

Methods to visualize multidimensional data

Relative positions of multiple axes:

We use a 2D screen. So we have to render all the data in a 2D plane. Now imagine plotting a simple bar chart in your MS Excel. That’s the example of classical plotting method which is used more often. You must be thinking that we can plot in 3D on our computer screen. That is right thought but how will you render the data when the dimensions of the data increases.

The problem is faced because our coordinate system has all the axis perpendicular to each other. Data scientist came up with the idea of parallel dimensions where they try to fit more number of axis.

The following image shows one of the methods to implement a parallel coordinate system.

Another example of the parallel coordinate system.

Circular parallel coordinates

Rendering the features:

Different features have different data-types which can be anything from numbers, boolean, character to a string or categories.

For an example, the below image shows the bubble plot. It’s each point has two more features like size of the triangle and different colours can show different categories. So it is already plotting 5-dimensional data in single sight.

That’s all for now. Publishing the second episode soon!

--

--

Yugant Hadiyal

Want to become a pirate in the “sea” of Data Science ☠️