Becoming a data scientist --- or at least learning some data skills --- is an enticing proposition in a world where jobs increasingly demand that applicants be "data-driven" and companies are pushing to increase productivity through big data and AI.
But how can you learn the skills you need to get where you'd like to go? That's a more difficult question. Working in data science requires technical skills and specialized knowledge, and it's easy to get lost, distracted, or frustrated during the learning process.
If you want to be successful, taking some time at the outset to plan the right approach works wonders. So how can you become a data scientist? Here's how to get started:
Know what you're getting into
What is the job?
Data science has been called the sexiest job of the 21st century, but the reality of life as a data scientist is a bit more mundane than that suggests. While a data scientist does spend time tinkering with cool machine learning algorithms, they typically spend a lot more time cleaning and preparing data.
Day to day, a lot of your time will be spent writing code to do things like acquire and format your data and to handle missing values and other inconsistencies within it. You'll also spend a fair amount of time doing things like writing reports and (often) trying to convince other members of your team to act based on the patterns you've found in the data.
Being a data scientist pays well, of course, and so do other jobs in the data science field like data analyst and data engineer. But it's worth spending some time experimenting with simple data analysis in a free tool like Google spreadsheets, or taking a free Python course, to be sure you actually enjoy working with data.
What do you need to learn?
How much you need to learn depends on what your aspirations are. Do you want to become a full-on data scientist? Or work as a data analyst? Or just add some data skills to your resume? Generally speaking, though, you'll need to learn the following skills:
1. The fundamentals of a programming language (either Python or R). This includes both the "base" language and popular packages/libraries for data work. For example, if you choose Python, you'd need to be familiar with the basics of Python and with how to use Pandas, the most popular Python library for working with data. If you're just looking to add some data skills to your resume, just learning this will enable you to do some very cool things with data.
2. The fundamentals of SQL. SQL is decades old at this point, but so much of today's data is stored in SQL databases that being able to query data with SQL is an absolute necessity for anyone looking for any kind of professional role in data science.
3. Statistics. If you don't have a math background, you'll also need to brush up on your probability and statistics knowledge to ensure that your analyses are statistically valid.
4. Machine learning. Data analysts don't need this skill, but if you want to work as a data scientist, you'll need to have experience building and evaluating machine learning models. You'll also need to be familiar with the popular machine learning tools used in your language of choice (so if you choose Python, you'll need to be comfortable using scikit-learn, for example).
5. Domain knowledge. Working in data science means answering business questions, and to do that effectively, you have to understand the business. You may already have this knowledge, and it can sometimes be developed on-the-job, but if you know you want to work in a particular industry, learning about the common problems in that industry and how data science can be effectively applied to resolve them is key.
Learn data science the right way
Now that you know what you need to learn, the next step is to start learning it! There are a variety of ways to do that, ranging from a university degree (time-consuming and very expensive) to bootcamps (quick but expensive) to self-paced online learning. Within the online realm, there are traditional lecture-based options like Coursera and interactive learning platforms like Dataquest that teach coding by having you actually write code.
There's no single "right" option for everyone, and most data scientists have learned (and continue to learn) from a variety of different resources. What's best for you will depend on personal factors like your budget, how much free time you have, and how you prefer to learn.
That said, there are a few key factors to keep in mind that can help make your learning more efficient:
1. Learn by doing
When it comes to learning by watching versus learning by doing, the science has been settled for decades. This meta-study of over 200 different studies of STEM learning, for example, found that STEM students who learned passively were 1.5x as likely to fail.
That means that however you choose to learn data science, it's critically important to regularly apply what you've learned by actually doing data science. Some platforms will take care of this for you by asking you to write code in your browser, but if you're taking a video-based course (for example) it's important to remember to open up your IDE or notebook software of choice and write real code to practice what you've learned after each lesson.
2. Don't skip the "boring" stuff
Some parts of data science are more alluring than others --- who doesn't want to learn to build cool "AI" machine learning models? --- but be sure that you aren't overly focused on those things to the detriment of more fundamental skills.
This is important because "boring" stuff like data cleaning is the foundation upon which all of the "sexier" data science work rests. Even a well-coded machine learning model will perform poorly (or not at all) if it's fed dirty, poor-quality data.
This is something to keep an eye out for on whatever learning platform you choose as well. Some bootcamps and other programs focus a lot on machine learning because that's what attracts students, but fail to effectively cover less exciting but still totally necessary skills like SQL.
3. Build projects as you learn
Along with working through exercises to practice specific concepts by writing code as you learn, you'll want to start trying to build real data projects as early as possible in your studies. You'll probably want to pick guided projects or tutorials to work through at first, but as you learn more, you can branch out into totally original projects.
Building these sorts of projects will help you develop problem-solving skills --- you'll learn quickly how critical resources like Google and StackOverflow are for data science professionals. You'll also be practicing and applying the programming and statistics skills you're learning, and at the same time, developing a portfolio of work that you'll eventually be able to use on job applications.
How to get a job in data
When you've acquired the skills you need, it's time to start applying for jobs! This process varies a lot based on your country of residence, the types of jobs you're interested in, and your own personal background. But again, there are a few important principles that will help ensure you're successful in your job hunt:
1. Prove your skills with projects.
The reality of any job search is that employers aren't going to pay you to do something you've never done before. And without prior working experience in data science, there's really only one way for a new data scientist to prove their skills: a project portfolio.
At a bare minimum, this means an active-looking Github with data science projects that employers can look at. The more relevant these projects are to the jobs you're applying for, the more convincing they're likely to be. So if you're looking for a data job in marketing, you'll definitely need at least a couple of projects that use marketing data or that address common data questions in marketing.
2. Quality over quantity.
Applying to jobs can often feel like a numbers game, but generally, you'll get better results if you focus on the jobs that look like the best fit for your skills and put more time into those key applications. At a minimum, this means that your resume should be arranged with that specific job in mind --- highlight the skills they want to see first and foremost, and showcase projects that are relevant to their industry.
But the more time and effort you put into an application, the higher your chances of being rewarded are. For example, researching a company's data problems and then reaching out to a real person with a customized resume and cover letter is more likely to produce a result than clicking "EasyApply" on their LinkedIn job posting.
And of course, while applying and making connections online can be great, there's really no substitute for putting your face in front of people who might be interested in hiring you. Attending conferences, talks, and meetups, can pay dividends, although it's often a long-term investment.
3. Focus on business results.
In all of your application materials and in your interviews, remember that the people hiring you (generally) aren't data scientists or programmers. What's likely to be important to them is whether or not your work can provide them with meaningful, actionable business insights, so you should frame your projects with that in mind.
Focus on communicating your results clearly (visualizing data in nice-looking charts is always better than sharing raw numbers) and explaining what the results mean in a business context. The CEO probably doesn't care about how your recommendation model works, for example, but she probably does care about what impact it would be likely to have on sales.
Becoming a data scientist isn't easy, but it's also not as challenging as it's sometimes made out to be, particularly if you take the right approach. Hopefully, this big-picture guide has given you an idea of how to get started on your path towards becoming a data scientist or simply adding some data skills to your skill set.
**This is a guest post written by Dataquest, an online data science education platform that teaches data science from scratch using real data and challenging you to write and run real code right from your browser.