This post continues from Part 1, in which I began to share my thoughts on how to get into a career in Data Science, and then began with what might work for a graduate in computer science or any other branch of engineering.
I’ve met data science aspirants from these disciplines who are able to code in Python, R or any one of the other popular languages used in this field, as well as those who are not able to code. As I’ve mentioned earlier, being able to code is a first starting point for them.
But what next? If you go online, there’s a bewildering range of courses of all kinds available, many of them seeming to be good ways to get introduced to data science. Some begin with machine learning theory, some begin with machine learning using a programming language, and some begin with the basics of the kind of mathematics underlying data science techniques. It can get quite confusing for the uninitiated, so here’s my two cents.
Remember that at this stage what you’re want to do is get on a path towards becoming a data scientist, not try to become one overnight. The theory behind machine learning as well as the mathematics used in it can get to pretty advanced levels, and you may not really need to know all of it to reach your first goal, which is to get an entry level job in data sciences.
For young aspirants, that first entry level job may be as a machine learning developer or as a data analyst. Or maybe a role in which some tasks from both roles need to be done. If your starting point is that you are an accomplished Python (or R, etc) programmer, then the next step is to learn how to code machine learning algorithms and programs using that language.
Fortunately, a lot of the machine learning data pre-processing techniques and computation algorithms that are commonly used have already been made available in open source libraries such as scikit-learn. There are also other libraries for a range of other areas such as NLP and image processing. These implement the theory and algorithms very well, and so most of the time you will not need to actually write the code to implement the most commonly used machine learning algorithms. You just need to learn how to use these libraries. This is the reason why at an entry level job you don’t really need to learn a whole lot of machine learning theory just yet. The decisions about which functions to use, argument values, and so on will very likely be made by a technical lead or a lead data scientist, and that's the reason I say it may not be necessary for you to learn all the theory unless you are sure you can cope with it at this stage.
There are several course that you could take to learn how to use one of the languages that you know to produce machine learning programs without having too learn everything about the science behind them.
One of the most popular is Machine Learning AZ™: Hands On Python & R in Data Science. This course, of course, covers the implementation of all the commonly used machine learning algorithms in both Python and R, and it’s entirely up to you if you want to learn both.
If you’re happy to just learn how to implement those programs in Python, there’s Python for Data Science and Machine Learning Bootcamp.
You could even do it the good old fashioned way and just buy a book that covers the same content.
There are similar courses available online that cover equivalent ground for other languages as well, so just choose the one for the language you are comfortable with. It would be a good idea, of course, to do a quick browse-through some online job ads in your desired work location to find out what language most employers are using for their data science work.That’s it for this post. In my next one I’ll talk about what comes next. Later on, I’ll of course also cover what I recommend as good paths for graduates in statistics/mathematics, and those who come from non STEM (science, technology, engineering and mathematics) backgrounds.