What does a data scientist do in the context of artificial intelligence and machine learning work? Lots of pros who deal with these sorts of projects every day would say the question is kind of hard to answer simply. A better question would be: What do data scientists NOT do?
A data scientist is integral to an AI or ML process, in the sense that all of these projects are depending on big data or complex inputs. The data scientist is the essential careerist who knows how to work with data to produce results.
However, there are some ways to talk about what a data scientist does, what qualifications he or she needs, and what his or her role is in the process.
Read: 6 Key Data Science Concepts You Can Master Through Online Learning
Varied Definitions, Varied Duties
Many experts who describe the work of a data scientist speak about it in broad terms.
“At small companies or when working in a new market, the role of a data scientist is to convert relatively novel (but obvious) sources of data into stuff that solves a problem for an end user, which would not have been possible, previously, where the technologies employed didn’t exist,” says Antonio Hicks, an Account Manager at Mercury Global Partners. “The ideal candidate is someone who is part mathematician, part software engineer, and part entrepreneur.”
Others echo this basic idea, mentioning what data scientists need to tackle modeling projects.
“The most important attribute a data scientist needs is a deep curiosity about the world around them – whether they’re answering questions or building models, a desire to understand the problem in front of them is key,” says Erin Akinci, Data Scientist Manager at Asana. “From there, most people will require skills in math and programming to find solutions, but the specific kinds of math and programming vary widely depending on the area of expertise within data science.”
“Excellent scientific work has more to do with the way a scientist thinks about a problem, than the tools they use to solve it,” adds Charlie Burgoyne, Founder and CEO of Valkyrie Intelligence. Valkyrie is an applied science consulting company with impressive projects under its wing such as the Mark I, a dedicated network appliance that boosts neural network training and testing, improving on what’s possible with preceding cloud-based machine learning platforms.
“The market demands scientists who are proficient in Python development, neural network design and the ability to reshape a data repository into the latest database architecture,” Burgoyne says. “Those capabilities, however, are table-stakes for a talented scientist. What is less obvious is a scientist’s aptitude for intrepid curiosity, aggressive ingenuity and an adherence to the scientific method.”
The Skills of a Data Scientist
So as far as practical skill sets, data scientists need some amount of creativity and savvy as far as modeling goes. They can also benefit quite a lot from having “hard skills” such as experience coding in Python, C++ or other common languages applied to ML projects.
“Python and C++ are essential and being able to combine coding skills with data analysis and processing and statistics are core skills that will make a data scientist stand out as a strong candidate or employee,” says Val Streif at Pramp, an online mock interview platform for software engineers, developers and data scientists. “While some of the programming skills could be taken care of by pairing a data scientist with a developer, it’s much easier if you have both skills combined in one, from the perspective of a company.”
Other experts add R, Hadoop, Spark, Sas and Java to the list as well as technologies like Tableau, Hive and MATLAB.
All of those make for an impressive resume, but some of those who are experienced with recruiting data scientists say the other “human” side matters, too. (One type of data scientist is the citizen data scientist. Learn more in The Role of Citizen Data Scientists in the Big Data World.)
“Traditionally, individuals with a diverse liberal arts education make excellent data scientists,” Burgoyne says, making a distinction between engineers, who are on the building side, and data scientist, whose work can be much more conceptual. He continues:
Expertise in a traditional STEM field with a complementary focus in the humanities, arts or business domains yields those qualities which make an excellent industry-oriented scientist. It must be said that it is just as important for the organization’s ability to harness those qualities and to shape their fervor and methods in a productive manner. I’ve observed that when a data science initiative is unsuccessful, the organization is as likely to be as culpable as the scientists. Scientists are not engineers. They are not driven to execute and build. They are driven to discover and understand. Organizations that grasp this difference are well rewarded for the cultivation of both fields.
As for what data scientists typically apply themselves to, that has to do with the core goals of the company. Some firms are chasing a decentralized internet – some are playing around with IoT or SaaS. Others are trying to pioneer “user-friendly” or “ethical” or “transparent” AI.
In any case, data scientists are likely to be bridging the divide between the hard metrics on the data that they use, in whatever technology stack it is in play, and the freewheeling work of conceptualizing AI/ML functionality.
“We hire data scientists to manage data collection and cleaning, as well as translating that data into meaningful information,” says Michael Hupp, Manager of Data Science and Analytics at G2 Crowd. He elaborates:
Typically that means managing any important algorithms driving a company’s data engine and being fluent in key analytics tools and languages, but in recent years has also included emerging fields like natural language processing, machine learning, other forms of AI-enabled analysis. The most successful data scientists are those who combine their hard skills with an ability to learn quickly, and the ability to effectively communicate the insights they uncover so they'll be meaningful to their business.
With these types of insights, it’s easier for young professionals or students to figure out whether data scientist would be a good role for them, and how to acquire skills. STEM learning is becoming more accessible in schools around the country, but there’s no substitute for a passion for coding and technology, and the ability to learn on the fly.