If you're a fresher and have heard about data science, and particularly that it's an area in which people are in high demand, you may be wondering if you can get into it, and how. I’ll offer you some insights into this area in a series of posts, of which this is the first.
The starting point is to form a broad plan that suits your starting qualification and skills.
Broadly speaking, the practice of data science relies on the application of three skill areas:
1. Information technology (particularly software development)
2. Linear algebra and statistics. The elements of probability theory would also help.
3. A deep understanding of business processes, methods and perspectives in at least one area of business, such as retail marketing, or risk analysis, or just about any other area in any industry sector. I've mentioned these two just to give you an idea about the granularity at which the expertise would be helpful, but really, there is almost no limit to the functional areas in which data science is required.
Depending on which of these three areas your degree comes closest to, the most practical thing to do would be to select that area as your launch pad, strengthen it, and then systematically acquire the additional skills in the remaining areas over time.
So if you have a degree in computer science or engineering, and have already had an introduction to computing and at least one programming language, that’s where you can further build from. The most commonly used programming languages in data science are Python and Java, although there are certain areas where C++, C, Julia and Scala are also common. One of the most popular languages is R, which is specifically meant for machine learning, analytics and visualization, but which may not be in the university curriculum of students in information technology or other branches of engineering.
If you have graduated in science or technology, but have not had any exposure to programming, perhaps the easiest and most beneficial languages to learn would be either Python or R, or both. The process of learning R will automatically lead you into learning exercises and applications in data science. Python, however, is a general purpose language and so after learning about how to programme in Python you will need to specifically learn how to apply it in data science.
Data science often requires the sourcing, handling, processing and storage of very huge volumes of data, and so additional courses in Big Data may also be useful. You need to be aware, however, that Big Data technology is a large area of specialised expertise and can become a career option in itself, and so one must keep the focus on the goal of eventually becoming a data scientist. A starting ability to work with large data sets is adequate for this goal. At work, the specialists in Big Data technology do the additional work of designing and maintaining big data architectures, environments and their life cycle evolution.
To learn a language or technology you could either sign up for a classroom course, or learn it from one of any number of courses available on online academies such as Udemy, Coursera, Codeacademy, Udacity, or others (in no particular order).
In my next posts, I’ll continue this discussion for science and technology graduates, and then describe what graduates in statistics, mathematics and other disciplines can potentially do to get into data sciences, so keep watching this space.
Continue Reading: How can a Fresher get into a Data Science career? (Part 2)