What's inside
Becoming a data scientist - or at least learning some data skills - is an enticing proposition in a world where jobs increasingly demand that applicants be "data-driven" and companies are pushing to increase productivity through big data and AI.
But how can you learn the skills you need to get where you'd like to go? That's a more difficult question. Working in data science requires technical skills and specialized knowledge, and it's easy to get lost, distracted, or frustrated during the learning process.
If you want to be successful, taking some time at the outset to plan the right approach works wonders. So how can you become a data scientist? Here's how to get started.
Know what you're getting into
What is the job?
Data science has been called the sexiest job of the 21st century, but the reality of life as a data scientist is a bit more mundane than that suggests. While data scientists spend time tinkering with cool machine-learning algorithms, they typically spend a lot more time cleaning and preparing data.
Daily, much of your time will be spent writing code to acquire and format your data and handling missing values and other inconsistencies. You'll also spend a fair amount of time writing reports and (often) trying to convince other team members to act based on the patterns you've found in the data.
Being a data scientist pays well, of course, and so do other jobs in the data science field, like data analyst and data engineer. But it's worth spending some time experimenting with simple data analysis in a free tool like Google Sheets, or taking a free Python course, to be sure you enjoy working with data.
What do you need to learn?
How much you need to learn depends on what your aspirations are. Do you want to become a full-on data scientist? Or work as a data analyst? Or just add some data skills to your resume? Generally speaking, you'll need to learn the following skills:
- The fundamentals of a programming language (Python or R).
This includes the "base" language and popular packages/libraries for data work. For example, if you choose Python, you'd need to be familiar with Pandas, the most popular Python library for working with data. If you're just looking to add some data skills to your resume, just learning this will enable you to do some cool things with data.
- The fundamentals of SQL.
SQL is decades old at this point, but so much of today's data is stored in SQL databases that being able to query data with SQL is an absolute necessity for anyone looking for any kind of professional role in data science.
- Statistics.
If you don't have a math background, you'll also need to brush up on your knowledge in probability and statistics to ensure that your analyses are statistically valid.
- Machine learning.
Data analysts don't need this skill, but if you want to work as a data scientist, you'll need to have experience building and evaluating machine learning models. You'll also need to be familiar with the popular machine learning tools used in your language of choice (so if you choose Python, you'll need to be comfortable using scikit-learn, for example).
- Domain knowledge.
Working in data science means answering business questions; to do that effectively, you must understand the business. You may already have this knowledge, which can sometimes be developed on the job, but if you want to work in a particular industry, learning about common problems and how data science can be effectively applied to resolve them is key.
Learn data science the right way
Now that you know what you need to learn, the next step is to start learning it! There are various ways to do that, ranging from a university degree (time-consuming and costly) to boot camps (quick but expensive) to self-paced online learning. Within the online realm, traditional lecture-based options like Coursera and interactive learning platforms like Dataquest teach coding by having you actually write code.
There's no single "right" option for everyone, and most data scientists have learned (and continue to learn) from a variety of different resources. What's best for you depends on personal factors like your budget, your free time, and how you prefer to learn.
That said, there are a few key factors to keep in mind that can help make your learning more efficient:
Learn by doing
When it comes to learning by watching versus learning by doing, the science has been settled for decades. For example, this meta-study of over 200 different studies of STEM education found that STEM students who learned passively were 1.5x as likely to fail.
That means that however you choose to learn data science, it's critically important to regularly apply what you've learned by actually doing data science. Some platforms will take care of this by asking you to write code in your browser. Still, if you're taking a video-based course (for example), it's important to remember to open up your IDE or notebook software and write real code to practice what you've learned after each lesson.
Don't skip the "boring" stuff
Some parts of data science are more alluring than others --- who doesn't want to learn to build cool "AI" machine learning models? --- but be sure that you aren't overly focused on those things to the detriment of more fundamental skills.
This is important because "boring" stuff like data cleaning is the foundation upon which all of the "sexier" data science work rests. Even a well-coded machine learning model will perform poorly (or not at all) if it's fed dirty, poor-quality data.
This is something to keep an eye out for on whatever learning platform you choose. Some boot camps and other programs focus a lot on machine learning because that attracts students, but fail to effectively cover less exciting but still necessary skills like SQL.
Build projects as you learn
Along with working through exercises to practice specific concepts by writing code as you learn, you'll want to start trying to build accurate data projects as early as possible in your studies. You'll probably want to pick guided projects or tutorials to work through at first, but as you learn more, you can branch out into totally original projects.
Building these sorts of projects will help you develop problem-solving skills --- you'll learn quickly how critical resources like Google and StackOverflow are for data science professionals. You'll also be practicing and applying the programming and statistics skills you're learning and, at the same time, developing a portfolio of work that you'll eventually be able to use on job applications.
How to get a job in data
When you've acquired the skills you need, it's time to start applying for jobs! This process varies greatly based on your country of residence, the types of jobs you're interested in, and your background. But again, there are a few important principles that will help ensure you're successful in your job hunt:
Prove your skills with projects
The reality of any job search is that employers aren't going to pay you to do something you've never done before. And without prior working experience in data science, a project portfolio is only one way for new data scientists to prove their skills.
At a bare minimum, this means an active-looking GitHub with data science projects that employers can look at. The more relevant these projects are to the jobs you're applying for, the more convincing they're likely to be. So if you're looking for a data job in marketing, you'll definitely need at least a few projects that use marketing data or addresses common problems in the field of marketing.
Quality over quantity
Applying for a job can often feel like a numbers game, but you'll generally get better results if you focus on the jobs that look the best fit for your skills and put more time into those key applications. At a minimum, this means that your resume should be arranged with that specific job in mind --- highlight the skills they want to see first and foremost and showcase projects relevant to their industry.
But the more time and effort you put into an application, the higher your chances of being rewarded are. For example, researching a company's data problems and reaching out to a real person with a customized resume and cover letter is more likely to produce a result than clicking "EasyApply" on their LinkedIn job posting.
And, of course, while applying and making connections online can be great, there's no substitute for putting your face in front of people who might be interested in hiring you. Attending conferences, talks, and meetups can pay dividends, although it's often a long-term investment.
Focus on business results
In all your application materials and interviews, remember that the people hiring you (generally) aren't data scientists or programmers. What's likely to be important to them is whether or not your work can provide them with meaningful, actionable business insights, so you should frame your projects with that in mind.
Focus on communicating your results clearly (visualizing data in nice-looking charts is always better than sharing raw numbers) and explaining what the results mean in a business context. The CEO probably doesn't care about how your recommendation model works, for example, but she probably does care about what impact it would likely have on sales.
Becoming a data scientist isn't easy, but it's also not as challenging as it's sometimes made out to be, particularly if you take the right approach. Hopefully, this big-picture guide has given you an idea of how to start your path toward becoming a data scientist or simply adding some data skills to your skill set.
This is a guest post written by Dataquest, an online data science education platform that teaches data science from scratch using real data and challenging you to write and run real code from your browser.