Is AI part of data science

"All of the systems that surround us are based on math and data science."

Have the text read to you

Required reading time: 8 minutes

In May 2018, Joachim gave us an introduction to data science and the Python programming language as part of a 6-week course at the CODE University of Applied Sciences Berlin. I met him for an interview a few days ago.

Hello Joachim, how would you explain data science to an eight-year-old child?

Joachim Krois: Data Science / Data Science - this is a vague term. I don't even know whether there is a clear definition for it at all and whether the term doesn't fall into the buzzwords area. Data science is a new field that has developed over the past 10 years. If I were to explain to a child what data science is, then I would probably be trying to describe what Part of this work is. Only part - not the entire spectrum - because this is a very large spectrum that can be seen as a cross-sectional task. One speaks of a A trio of tasks. These are basic scientific, mathematical, statistical knowledge, coding experience (use of script-based programming languages ​​for analyzes) and domain knowledge, i.e. special knowledge about the area that is currently being dealt with.

I would explain the term data science to an 8-year-old like this: Imagine I want to find out something about you, but I don't want to ask you about it. Also, imagine that I go into your nursery and take some of your toys and clothes and put those things in a box. As a data scientist, I now look into the box and use these things to try to make statements about you. For example, whether you are a boy or a girl, how old you are, what your favorite comic is and which things you enjoy and which you don't.

"Data acquisition and predictive work are the two major areas of responsibility of data scientists."

This is part of the job of a data scientist: Describe data and one added value to extract from the data obtained. Besides this descriptive, exploratory beings of a data scientist, there is another big field, and that is predictive or the predictive essence. The idea behind this is that I don't just take toys and clothes chosen by one child, but from many children. So I go to several children's rooms, always grab the same things and ask the respective child about their age, their favorite food, their favorite color, their favorite comic etc. If I've visited enough children's rooms, then at a certain point I can in go to any child's room without knowing the child. I look at their stuff and then I can say with a certain degree of certainty what age they are, what gender they are, what their favorite comic is, etc.

These are the two major areas of responsibility of data scientists. You have these boxes that you fill - that is the data acquisition - and then you try to work either exploratively or predictively based on this data. For the predictive work you usually need several of these data acquisition campaigns.

Our readers are primarily interested in Artificial Intelligence (AI). How are data science and AI related?

Joachim Krois: Artificial Intelligence - again Buzzword. Many people understand this to mean very different things. I looked it up again because I also understand different things by it - depending on the current situation. One of the Definitions, which I have found, understands by AI to teach a machine to solve intellectual tasks as a human would solve them.

Not so long ago it could have been called AI when a machine is capable of defeating a chess grandmaster. That has now been resolved. Would the machine now be called AI? No! Not even close. A few years ago a far more complex game called Go was solved again by a machine. If you look at the algorithms, that's a pretty impressive feat. Many people did not believe that it was possible. It happened, but did it make the machine an AI? Not even that!

There are segmented or domain-specific servicesthat are not necessarily what you know from comics or movies and would call AI or general AI. It's a huge field and sometimes just as elusive as data science. But what the areas have in common and where one of the intersections lies is that machine learning. This enables us to write programs and build frameworks in such a way that machines no longer have to be programmed in order to take action. If you dictate the rules and the machine acts accordingly, you speak of symbolic AI. At the machine learning you enter data and the machine learns the rules itself.

"Machine learning is one of the techniques used in data science to work predictively."

Exactly this aspect is relevant for AI: that programs or systems are able to react to data or input from the world, to learn from it and to carry out knowledge-guided, clever and comprehensible actions accordingly. Then one could speak of AI. Machine learning is one of the techniques used in data science, primarily to work predictively, i.e. to perform predictive analysis. I have a dataset and am trying to predict what the outcome will be based on the dataset for new instances. And here, too, you use many of the algorithms that machine learning brings with it.

Use both data science and AI Algorithmsthat enable machine learning for their respective goals. Data science also has other facets. Questions such as: how do I get data, how do I load it, how do I transform it, also play a major role.

Where do we encounter data science in our everyday life? Do you know a few exciting examples?

Joachim Krois: The question is perhaps more where you don't encounter data science. As I said, data science is a buzzword and means very different things to many people. The question can be broken down into: Where do we encounter things that are not based on statistics or mathematics? All the systems that surround us are based on math and statistics. Cars would not be possible without mathematical and statistical principles. Just as little as street intersections and traffic light systems. The question is more what we want to call it. In the end, these systems and phenomena are part of our cultural technological development, i.e. we see them everywhere. They are particularly noticeable now because digital services such as automatic translations or product recommendations have developed around these technologies.

"I find it difficult to see aspects where statistics and data science do not play a role."

Anyone who still knows a library knows that instructions are required to find your way around a library. For this purpose there used to be boxes in which catchwords could be looked up. These boxes are very large and just putting them together was a mammoth task. There are now corresponding technical systems that simplify this. We enter a keyword into the search engine and either the book or related books are shown immediately.

From a technological point of view, I find it difficult to see aspects where statistics and data science do not play a role. All sensual things like the sound of the sea or the sunset are of course completely free of it. But when you enter the technical sphere, we find mathematical and statistical approaches everywhere - it's just called differently now.

What did you study and how did you discover data science for yourself?

Joachim Krois: I studied geological sciences at the Free University of Berlin. Believe it or not, geoscientists are also constantly confronted with data. This is data from and about the earth, from and about the environment and interactions between human and natural systems. You cannot avoid statistics and data analysis if you pursue this degree.

"A data scientist is a statistician who understands more about programming than the classic statistician and at the same time someone who has less knowledge of programming than a programmer."

To this day, I probably wouldn't call myself a data scientist because I don't know exactly what that designation actually means. If I were pinned down, I'd rather be than Data analyst see. In my current position I am as Geo-statistician active. I like to quote Anthony Goldbloom, the CEO of Kaggle who recently joined the Data Framed Podcast spoke on the subject of "Kaggle and the Future of Data Science". In this conversation he explained what makes a data scientist and I found myself in this definition. In his opinion, a data scientist is a statistician who knows more about programming than the classic statistician and at the same time someone who has less knowledge of programming than a programmer. In this area of ​​tension I feel very comfortable - better to code than a statistician and less good than a full-stack developer.

What would you advise laypeople interested in data science to get started with?

Joachim Krois: Since non-fiction and textbooks on the subject are only slowly coming onto the market, in my opinion that is Internet the best approach to approach this field. I also recommend after Blogs to look for the topic. These have the advantage that they are more accessible and easier to understand than scientific papers, which mostly represent the basis or the methodology for the blogs. If you read such blogs you come across exciting ones Buzzwords, you can research these independently on the Internet and thus deepen your knowledge. There are also fantastic online resources such as Codeacademy to help overcome “I can't code” hurdles.

Image by Photo by Johnstocker, EyeEm

Joachim Krois has been working as a geo-statistician in the Department of Dental Conservation and Preventive Dentistry at Charité Universitätsmedizin Berlin since September 2017. The basis of his professional career is Joachim Krois (PhD) studies in geological sciences at the Free University of Berlin. During his studies, the native Austrian dealt intensively with hydrological phenomena and data analysis. He regularly publishes his research results in scientific papers.