We hear that in recent years more and more big data have been mentioned. As a concept, great data is in everyone’s language, but everyone understands the same thing and it is a bit controversial. What do you really know about big data? Can we get ahead in the competition by using big data? Or will we have to read the success stories of others after a few years?
Why are we talking about big data? Because now almost everything in our circle produces data. Not only applications that we use only in the business, we bill daily and create customer records, but the devices that share the location of the device, status information via GPS signals, log files that automatically record every transaction made on the computer, web sites, GPS signal records, home electronics devices such as televisions, refrigerators, passenger and transportation vehicles, and so on. It seems that the number of ‘things’ with data generation capacity will increase in the near future.
But what does it benefit us?
These data, if used correctly, can give a serious sense of what might happen in the future if they happen to be in the loop. Companies can develop and implement strategies that will increase their profitability through analysis based on the database. Institutions can benefit from this data to increase service quality.
A new area of IT industry
What exactly do we mean by big data? It is possible to collect the data accumulated in the enterprises roughly in two categories. The first category can be called structural data, that is, data generated through software used in companies or corporations. ERP systems, sales, back office applications, finance, credits, credit cards, stock, accounting … Let’s think about the modules used for each business unit, some of which are produced in separate systems, some of which are software packages for different purposes, all collected in a single ERP package. can be given as a typical example. The data are structurally easy to pair with each other. We have a customer number, and we can track the transactions the customer has made with it. This data is under control at the same time, after a certain period of time system administrators are deleted or archived.
The second type is the data generated by all kinds of ‘things’ except for the conventional applications mentioned above, which gradually popularize the concept of large data in recent years. In this category we are talking about a wide range of wearable technologies, as well as the sensors produced by motion sensors. Both the fan is large and the size is very large. At the same time, we are not very much in control, or because we grow very fast, there are possibilities of rapidly occupying storage capacities, databases. It is not so easy to relate structural notions to other available data. We should not enrich the database companies if we do not store such data. Perhaps this categorization of ‘very large data’ can be useful in highlighting the difference between the two categories.
Is it necessary to evaluate both categories separately? First of all, we need to take good care of the investment balance that will be realized with the benefit to be obtained here. It is worth noting that open system based technologies have been created especially for storing and processing data in the second category at low cost. In other words, very large data, in fact, led to the birth of a new field in the IT industry. Generally, data warehouses are created to obtain results from the data generated by the software used in the first category.
Benefits of data warehouse
You are collecting the data generated in databases belonging to different software in one database and cleaning it up into a pool of information that will integrate and produce a single version of the truth. You also determine the historical depth of this data pool without being restricted by your existing software. For example, in the ERP software, you can keep as much historical data as you want in your data warehouse and in line with your business needs when the data is deleted or archived at the end of the year. This gives you the opportunity to reorganize the past transactions of your customers or service providers in the detail you desire.
It is important to know that analyzing detail data is a very serious consideration in understanding trends and customer behaviors. Consumers are constantly giving messages with their behavior. What we do is to take advantage of the data warehouse while translating these behaviors into rational business decisions.
Let’s give an example from the store. The list of best-selling products generally does not change much in retail. But there may be periodic dramatic differences. I remember one example in a project we did on the days when Tekel had not yet been privatized. A racial brand was in the 7th or 8th place on the top-selling list. However, when the month of Ramadan approached, the sales of this product began to fall a week before Ramadan, and this product was barely entering the top 100 in the second week of Ramadan. This is a normal and predictable development. But the unpredictable thing is: Sales of the same product have risen to the 2nd or 3rd place with a striking increase in the third and fourth week of Ramadan. It is not possible to predict this behavior by looking at monthly average sales, nor can it be detected by questionnaire. People usually show the ideal response in the questionnaire. When you look at the average figures, there is no change in sales, but there are serious consequences when you look at the detractors.
When the consumers come to the kas and pay the money, they tell us the truth in a sense. Of course, this example is a sample in a store with a certain time period and a certain customer portfolio. But the data warehouse gives us details that we can not find anywhere else about human behavior. Using these details, it is our job to determine our stocks, our staff, and our marketing strategies.
Where are we talking about the data warehouse?
Of course, there are companies that manage highly successful customer relationships by evaluating data warehouse results. We can call it the pioneer for retail, telecommunication and banks. There are still many jobs to be done in many different sectors, because some mistakes are being made.
First of all, it is necessary to evaluate such projects according to their business needs. If the focus is on the solutions of the technology companies rather than the needs, it can be difficult to reach the desired results. Detail data is often ignored because such projects are viewed as static report projects and the data can be kept as summarized. However, such projects create analysis environments and reporting is only one part of this. As in the case of raki sales above, if you look at the monthly average, there is no change in sales. But the two-week decline is offset by a two-week rise. If you do not look at the details, you will not see it. But one of the biggest mistakes is probably trying to do this integration with reporting tools instead of creating a data warehouse.
Small but valuable information from uniform data …
Is each large data project a data warehouse project?
No, the data that we have mentioned as the second category and that are outside the structural data need to be treated differently.
Take sensor data, for example, sensors that measure the temperature of refrigerators at certain periods in a grocery store chain and send them to a central database. If you consider the number of stores and the shortness of the measurement period, the data coming from all the refrigerators of all stores will reach huge sizes.
One of the typical characteristics of such data is the data size, but the other, and the character which is dominant as at least the size, most of these data are uniform and do not tell us anything. If there is no real situation, the temperature is always under control, only the extreme values that are worth investigating in some extreme situations, such as a temperature drop (risk of spending too much energy) or an increase (risk of deterioration of the products) … If such a data- if you try to analyze it using a relational database that is sold on the market, you will have to endure huge costs. The concept of large data or very large data comes into play here, and it aims to process the small but valuable amount of data contained within a very large database in a cost-effective way.
Especially in the field of data warehouse technology has been developing very fast in recent years. There are new generation solutions that offer cost effective and high performance. They have next-generation databases that have in-memory technology but at the same time can be scaled to standards, and they can deliver very high performance at a very reasonable cost.
If the data warehouse and large data projects are not competitive with each other then?
No, it is not! We need to think of them as complementary to each other. We may think that these two environments can trigger each other from time to time. It is quite possible that some data in a large data environment can be integrated into a faster query environment or that transaction information that is not required to be kept in the data warehouse can be shifted to more cost effective open system based (HadoK) solutions. Of course, you should always keep in mind the balance of cost and price performance when doing all this.
How does it start?
Basically you need to have a good idea about what problem you are going to solve. Such projects are costly, difficult and risky projects. Therefore, it would be a sensible approach to do a scope work before it starts. With this study, it is necessary to determine a road map such as what kind of business needs are based on the project, in which steps and how long these needs can be solved, and what type of technological infrastructure will be used. Of course, the technological components involved in this roadmap should be compatible with IT strategies and include alternatives.
There is a principle that we implement in data warehouse projects: Keeping the project steps rich enough in terms of content to produce a tangible result to the user, so short that it does not diminish its output. If a project step does not produce meaningful results for users, the support for the project will weaken. Or if the duration of the project is too long, there will be a possibility of diminishing the interest and turning to other topics. I need to find a good balance between the two.
Result:
We use our data correctly and we are the most powerful weapon to understand what is going on. You need to use this weapon well to increase productivity. Yes, data sizes are growing, varying and difficult to manage, but nowadays there are technologies and next generation approaches that can overcome these problems. Therefore, to produce high performance solutions at reasonable cost by using the right technology in the right place.
@Turkishtime, July 2016 / Ertan Erışık, General Manager of Kara Consultancy