on 02 July 21
With apologies for a bit of a delay in getting around to this next post, this time I'll share my ideas on how graduates in the mathematics or statistics disciplines can get on the road towards a career in data science.
In all probability (pun unintended), graduates in statistics, mathematics or other
quantitative subject matter areas such as optimization or operations research, may already have a lot of the knowledge competencies requires to become a data scientist.
Data science requires an understanding of at least the fundamentals of linear algebra, graphs, distributions and probability theory. Graduates in statistics, mathematics and
related areas usually go a lot deeper into these subjects than is actually required, covering areas such as the derivation of various equations, integral calculus and so on. However it is precisely this depth that (hopefully) imparts these graduates with a very sound understanding of the very meaning of these subjects as they describe real life and events, a feel of mathematics that goes beyond numbers and more towards applied meanings. An example could be what the normal distribution means, what a standard deviation means, and the implications of a tall and narrow normal distribution versus a flatter, broader one.
Along with these, a statistician would also understand concepts about sample sizes and the adequacy of data required to form conclusions about regressions and correlations and other patterns that data scientists are usually on about. They would also appreciate the techniques and amount of work that go into data preparation, standardization and normalization. Maybe they know a lot about visualization of data using packages or high level statistical analysis languages such as R.
So then, what's left for them to learn about? That's where we go back to the other two components that combine with the first (statistics) to produce a data scientist. One is the ability to write code in at least one programming language, and the other is the knowledge of at least one business domain. But why is coding ability beyond the kind used in Matlab, SAS or R required? The answer is because more often than not, working with data to produce business results involves moving through multiple phases such as data extraction and cleaning, manipulation, processing or crunching, analysis, and a lot more. All of this is about transforming data into information through a series of manipulations and calculations, and these may have nothing to do with statistics, but are about more mundane tasks such as extracting data, parsing, populating databases, or performing any of a wide number of operations on data that may be linguistic or computational. A general purpose language such as Python or Java also has a wide variety of ready made functions for various kinds of data transformation that are packaged in open source libraries, and that by itself is a huge reason to learn one of these languages.
There are a number of online courses available, or one could just learn a language through a book and practice exercises on a computer. How basic a level to start at depends on one's starting degree of competence, and perhaps that's where the range available in online courses might offer more options.
When it comes to learning the business, the best way to do it is at work, by teaming up with business domain experts and listening to their interpretations of data, while you apply all your statistical, mathematical and computing knowledge to crunch numbers for them. Building up a solid understanding of at least one business domain or process area usually takes the longest of the three components to learn, and a real depth of understanding comes with sticking to one domain and working within it for years and years. But, fortunately, you won't have to wait as long to be called a data scientist!
So once you have some new computer skills under your belt, it's time to start applying for that first entry level job, one which requires you to extract and prepare the data that a data scientist will need, or perhaps one which requires you to help prepare data visualizations for others. Or maybe you'd be introduced to a path that strengthens your big data technology capabilities by requiring you to develop big data pre-processors. The starting avenues are many, and once you're in, you just have to remember to constantly combine all your various capabilities together. That's what will make you a better and better data scientist in time.
Share your perspective
Share your achievement or new finding or bring a new tech idea to life. Your IT community is waiting!