Requirements:
- Experience in building and optimizing data pipelines using Pyspark.
 - Experience with big data tools: Hadoop, Hdfs, Pyspark, Hive, Kafka, Yarn.
 - Good understanding of Python programming.
 - Good understanding of spark internal architecture.
 - Having a good amount of knowledge of any schedular tools (Airflow/Oozie/TWS/Autosys).
 - Experience performing root cause analysis on internal and external data and processes to answer specific business questions and identify opportunities for improvement.
 - Strong analytic skills related to working with structured and unstructured datasets.
 - Build processes supporting data transformation, data structures, metadata, dependency, and workload management.
 - A successful history of manipulating, processing, and extracting value from large disconnected datasets.
 - Advanced working SQL knowledge and experience working with relational databases, query authoring (SQL) as well as working familiarity with a variety of databases.
 - Working knowledge of message queuing, stream processing, and highly scalable big data stores.
 - Strong project management and organizational skills.
 - Experience supporting and working with cross-functional teams in a very fast dynamic environment.
 
									 
					