Welcome to Managing Big Data on Google's Cloud Platform. This is the second course in a series of courses designed to help you attain the coveted Google Certified Data Engineer.
Additionally, the series of courses is going to show you the role of the data engineer on the Google Cloud Platform.
At this juncture the Google Certified Data Engineer is the only real world certification for data and machine learning engineers.
NOTE: This is NOT a course on Big Data. This is a course on a specific cloud service called Google Cloud Dataproc. The course was designed to be part of a series for those who want to become data engineers on Google's Cloud Platform.
This course is all about Google's Cloud and migrating on-premise Hadoop jobs to GCP. In reality, Big Data is simply about unstructured data. There are two core types of data in the real world. The first is structured data, this is the kind of data found in a relational database. The second is unstructured, this is a file sitting on a file system. Approximately 90% of all data in the enterprise is unstructured and our job is to give it structure.
Why do we want to give it structure? We want to give is structure so we can analyze it. Recall that 99% of all applied machine learning is supervised learning. That simply means we have a data set and we point our machine learning models at that data set in order to gain insight into that data.
In the course we will spend much of the time working in Cloud Dataproc. This is Google’s managed Hadoop and Spark platform.
Recall the end goal of big data is to get that data into a state where it can be analyzed and modeled. Therefore, we are also going to cover how to work on machine learning projects with big data at scale.
Please keep in mind this course alone will not give you the knowledge and skills to pass the exam. The course will provide you with the big data knowledge you need for working with Cloud Dataproc and for moving existing projects to the Google Cloud Platform.
*Five Reasons to take this Course.*
1) The Top Job in the World
The data engineer role is the single most needed role in the world. Many believe that it's the data scientist but several studies have broken down the job descriptions and the most needed position is that of the data engineer.
2) Google's the World Leader in Data
Amazon's AWS is the most used cloud and Azure has the best UI but no cloud vendor in the world understands data like Google. They are the world leader in open sources artificial intelligence. You can't be the leader in AI without being the leader in data.
3) 90% of all Organizational Data is Unstructured
The study of big data is the study of unstructured data. As the data in companies grows most will need to scale to unprecedented level. Without a significant investment in infrastructure and talent this won't be possible without the cloud.
4) The Data Revolution is Now
We are in a data revolution. Data used to be viewed as a simple necessity and lower on the totem pole. Now it is more widely recognized as the source of truth. As we move into more complex systems of data management, the role of the data engineer becomes extremely important as a bridge between the DBA and the data consumer. Beyond the ubiquitous spreadsheet, graduating from RDBMS (which will always have a place in the data stack), we now work with NoSQL and Big Data technologies.
5) Data is Foundation
Data engineers are the plumbers building a data pipeline, while data scientists are the painters and storytellers giving meaning to an otherwise static entity. Simply put, data engineers clean, prepare and optimize data for consumption. Once the data becomes useful, data scientists can perform a variety of analyses and visualization techniques to truly understand the data, and eventually, tell a story from the data.
Thank you for your interest in Managing Big Data on Google's Cloud Platform and we will see you in the course!!