Skip to main content

The role of Data Scientist in 2022

 

As it was mentioned in previous articles, data analysis is used for data processing and data manipulation which are the processes of changing or altering data in order to organize them and make them more understandable. The 4 types of analysis have been briefly analyzed but in order to be able to use them, you will need programming knowledge. Due to the enormous number of structured and unstructured data, it is necessary to process them using advance technology to derive meanigful information. The programming languages and other BI tools, are the most important factors to accomplish the data analysis. So, every data scientist or analyst will need to know abouth the aforementioned.



What are they assuming about data scientist role?

One of the analysis types is the predictive analysis, which involves the implementation and usage of machine learning algorithms to predict or classify the data. Regarding the role description, what is being advertised and what a data scientist actually needs to do, are 2 different things and I will describe them briefly below.

A good, clever and hard-working employee who has the position of a data scientist, most of the times thinks, that he will working only on machine learning tasks and processing data through the programming languages or other clever BI tools. Especially, graduate students of this field have the opinion and moto that "I will be only researching machine learning algorithms and create prototypes for my work". However, this is not exactly true. It's correct to think that a proportion of your time will be allocated for coding and test your ML models, and its correct that probably, you will have lots of data for testing and validation, and its correct that you will receive the necessary equipment to execute them.

For all the above statements that have been mentioned, an experienced data scientist, can say that this will be the 50% of the job. 

Let's see an example to understand it better: If you are assigned a project, the aim of the project in the most realistic and usual scenario is to select the data properties that are needed, and then, to test them using some algorithms that have been selected by observing their distribution of the data and some other measurements to select the best fitted algorithm. Finally, the last step will be to evaluate the results based on evaluation measurements etc.

What is the reality for Data Scientist role

In reality, your company will provide you the "big picture" of the project, and you by yourself you need to identify what is needed. Basically, this is linked to the data analysis types. And this is because the rest of the technical departments are thinking in a more structured way like 1+1=2 , on the other hand, machine learning is calculating the results without knowing the rule. Most of the algorithms are based on statistics and probabilities so the rule does not exist in machine learning. What do you need to do? You need to identify the problem and visualize the flow of the machine learning.

The second step is to identify the features to be used. You need to focus on them, because the most likelihood outcome prediction of the ML argorith will be provided by those features. For example, let's say that you will need to identify what is the fraud probability of a transaction? You will not use the daily weather forecast, because by accident it predicted that, in rainy days the people were tend to make fraud accounts or money laundry. So feature engineering is crucial procedure for a data scientist to identify the most appropriate properties in order to use them in modelling part.

The next step is to understand the whole idea in detail. You will need to have a clear underatanding of the project before you initiate it. You need to think outside of the box in order to identify indicators that might be created from other data and added as features in your dataset. Another example is, you can conclude some other explainable information such as aggregations of your data, that will amplify the results from the dataset itself. As you can see, most of the times, the data scientist is the responsible person to analyze and implement the above examples.

After that, the next step is to process the data in an efficient and effective way. So, in previous articles these 2 fundamental objectives have been described with examples so you can refer to them from the links above. Try to control your data processing engine, in order to execute whatever it's needed to be executed without wasting all the computational power and memory of the computer or server on extra useless procedures. For example, it's not necessary to have 3 different ways to calculate the feature importance of the training model, just because you need to ensure that the procedure is correct. Use the most trusted procedures for your official code, and of-course you can comment out the rest of the code in case that you will use it in the future. Remember that the machine learning task is just one "Task" not a whole software. The software or the platform maybe need this power to execute other services that do not engaged any machine learning tasks.

The Fifth step then is to create some prototypes and continue with the rest usual tasks of a data scientist. Now at this step, its important to mention that a data scientist needs to explain in simple "English" how the algorithm or the model works, and then what results are expected for a specific observation given for prediction. In light of this, a lot of people from the rest of the departments have no idea how to make it work or how these results appear on the screen. They don't even think that multiple combinations were repeated millions of times to reach to a specific outcome with statistical justifications. Most of the times, they are expecting something like "if condition is true then do this, otherwise do that". ML is not working in this way. Not even close to this way. So remember that you need to explain simply your work without a lot of mathematical or statistical terms. Simple things with simple examples to be understandable for everyone.

The Final step of a data scientist is to promote and provide trusted models to the buyers or to those that are going to sell it. So, its not enough just to build it, but you need to sell it correctly so that the end user will understand the basic idea and how the results are derived from the trained model. Sometimes, it's needed to be explained how the model learning is improved and how is working and with  what algorithms and technology is reaching the respective results. So be prepared about this section, and focus to your project and whatever it has been used.

Conclusion

Of-course somewhere in between of all the above steps, the improving of your model and the continues checks for the correctness of any change, based on your business, it's another huge procedure that a data scientist needs to do in his daily tasks.

These are the actual reasons that most employers pay their data scientists with much more higher salary compared to other departments. Data scientist can easily link the business department with development department without the need of extra knowledge. Basically data scientists can developed the ML engine from scratch, if its needed, can even connect the ML with the rest of the platforms. They are creative of finding new innovating ideas due to the AI knowledge and solutions.

However, the most significant is that the most of the businesses have a structured environment for their services rather than an autonomous learning environment, so all businesses have the necessity to use this technology, in order to become competitors in the market and sell their innovative solutions

Comments

Popular posts from this blog

The 4 Types of Data Analytics

  Analysis, the key of success Analysis of data is the key to become successful as human, and even better, to succeed on running a business. When data are used effectively in an efficient way you can get fast and clear observations on the past performance of your life and your business, and then to make better decisions for the future. There are million ways that the data analytics can support you, in any level of your business in order to achieve any target of your choice. However, to succeed your goals, you need to know the four types of data analysis that are commonly used in every business in every industry. Every category has its own target and they are connected to each other. What's even more, every category is an extension of the other so you need to have a consolidated view and understanding of each category to find the seeking outcome. As you can see at the below graph demonstrates the complexity of each category and added-value contribution. At the same time you can see ...

Data Analytics and Science, can they coexist or not?

The high level technology has been expanded Technological achievements have impressed everyone the last three decades, especially when it comes to the storing in big electronic storages data without the need of paper and ink. This was the result of creating unmistakable databases of data, without the sloppy human hand and pointless effort. In this high level storages, the necessity of the data analysis field, has been erected. Industries of every field have evolved as a result to have the need of having someone to analyze their data to create a clear view of where they stand and where they are heading.  By manipulating the enormous and complicated structured and unstructured data are stored in the mentioned electronic views such as databases and cloud, different kind of techniques and methodologies have been developed to create a peaceful environment for all the analyst. Analysts in some cases are called "The story tellers" because are able from unrelated data to create meani...