Julian Valkieser shares his thoughts with us about “R language” in this blog post for our Emerging Fellows program. The views expressed are those of the author and not necessarily those of the APF or its other members.
Of course, the topic “Big Data” was already mentioned a few times in the Profuturist blog. Of course, we all know what it involves and consists of. We now move to a higher and higher activity on the Internet. We produce data – massive data. Worldwide, already 3 billion people are online. We spend much of our time online. The amount of data that is created, rise to a stunning 107,958 petabytes per month by 2018. For example, these are over 100 mio. hard drives with a capacity of 1 Terabyte – a drive with capacity the most of us would never use.
Companies like Google act and work with this data. Of course, they are not focused solely on this one business model. So Google is spreading in different directions. But a focus can be seen. Google is also spreading more and more offline. Why?
The data created online, are relatively negligible in comparison to the data you can still receive from the physical world. Behavior patterns online are certainly interesting, e.g. for the field of e-commerce – but behavior and properties offline are much more interesting. The greatest benefit would be to analyze all information that can be obtained and secondly to be able to deduce something. Exciting!
Here I want to present an example specifically for research-intensive areas. The start-up “Mapegy” from Berlin in Germany.
Mapegy is the compass for the high-tech world, referring to their own definition. One possible application would be the following. Let’s imagine.
I am interested in a specific topic and I would like to evaluate. Now Big Data comes into the game. Let’s take the example of a patent analysis. With tools like Mapegy I could figure out easily, who is an important stakeholder of a particular technology development, as he is related to another and what influence he has. A method of representation is about maps. Stakeholders and technological developments are illustrated via a kind of map. The larger the island, the more stakeholders gather around a particular development. The higher the mountain, the more patents were applied by a stakeholder. The closer the islands are arranged to each other, the stronger is the reference to one another. With this kind of Visual Analytic it is quite easy to illustrate how a certain subject area is connected to others.
And that is the sticking point. A lot of data is already available. But finally the correct processing and representation make this data useful.
AT THIS POINT I WANT TO MENTION “R”.
“R is a free software programming language and software environment for statistical computing and graphics. The R language is widely used among statisticians and data miners for developing statistical software and data analysis. Polls and surveys of data miners are showing R’s popularity has increased substantially in recent years.” (Wikipedia)
Someone who can program in “R” is well paid. Even at the upper end of the scale. And not for no reason. To be able to understand a context and deduce recommendations for action, not only in the economy, but also in science and research, such as in biotechnology and of course the pharmacy, is a higher aim in business and decision processes.
If you already understand some small connections, you can use it to create a network and may even explain the behavior of systems. In this specific example, it would be human behavior. Of course, the influencing factors are still too complex to be able to make reliable predictions from available data collections. But the more powerful computational resources, the closer is the opportunity to analyze all factors.
Mapegy is an example of visualizing relationships and influencing factors via big data analysis. For example, the cost of genetic testing is an indicator of how quickly data analysis will change in the next years. The costs decreased in recent years more as the price of computer chips in relation to Moore’s Law. In my next article I go further to the development in big data analysis with “R”.