MyPage is a personalized page based on your interests.The page is customized to help you to find content that matters you the most.

I'm not curious

Immediate need on: PySpark(with Hadoop) Developer @ OH( Remote) on C2C/W2

Location Columbus, United States
Posted 09-September-2021


5+ years of experience in handling Data Warehousing and Business Intelligence projects in Banking, Finance, Credit card and Insurance industry.Design and Developed real time streaming pipelines for sourcing data from IOT devices, defining strategy for data lakes, data flow, retention, aggregation, summarization for optimizing the performance of analytics products.Extensive experience on Data analyticsGood knowledge on Hadoop Architecture and its ecosystem.Having extensive knowledge on Hadoop technology experience in Storage, writing Queries, processing and analysis of data.Experience on migrating on Premises ETL process to Cloud.Work on various Hadoop file formatsExperience in Data Warehousing applications, responsible for the Extraction, Transformation and Loading (ETL) of data from multiple sources into Data WarehouseExperience in optimizing Hive SQL queries, Datastage and Spark Jobs.Implemented various frameworks like Data Quality Analysis, Data Governance, Data Trending, Data Validation and Data Profiling with the help of technologies like Spark, Python and DB2Experience with creation of Technical document for Functional Requirement, Impact Analysis, Technical Design documents, Data Flow Diagram with MS Visio.Experience in delivering the highly complex project with Agile and Scrum methodology.Quick learner and up-to-date with industry trends, Excellent written and oral communications, analytical and problem-solving skills and good team player, Ability to work independently and well-organized.


Design and develop ETL integration patterns using Python on Spark.Develop framework for converting existing Datastage mappings and to PySpark (Python and Spark) Jobs.Create Pyspark frame to bring data from DB2Translate business requirements into maintainable software components and understand impact (Technical and Business)Provide guidance to development team working on PySpark as ETL platformOptimize the Pyspark jobs to run on Kubernetes Cluster for faster data processingProvide workload estimates to clientMigrate On prem ETL process to AWS cloud and SnowflakesImplement CICD(Continuous Integration and Continuous Development) pipeline for Code DeploymentReviews components developed by the team members

Required Skills : PySpark, Hadoop, DataStage/SSIS, DB2
Basic Qualification :
Additional Skills :
Background Check :Yes
Drug Screen :Yes
Notes :Can sit 100% remote, please send me anyone close!
Selling points for candidate :
Project Verification Info :
Candidate must be your W2 Employee :No
Exclusive to Apex :No
Face to face interview required :No
Candidate must be local :No
Candidate must be authorized to work without sponsorship ::No
Interview times set : :No
Type of project :Development/Engineering
Master Job Title :DBA: Other
Branch Code :Columbus

Awards & Accolades for MyTechLogy
Winner of
Top 100 Asia
Finalist at SiTF Awards 2014 under the category Best Social & Community Product
Finalist at HR Vendor of the Year 2015 Awards under the category Best Learning Management System
Finalist at HR Vendor of the Year 2015 Awards under the category Best Talent Management Software
Hidden Image Url