Implementing and managing ETL process
Monitoring performance and advising any necessary infrastructure changes
Skills
Proficient understanding of distributed computing principles and Data analytics.
Understanding of GCP & related Big Data Services.
Proficiency with Hadoop, MapReduce, HDFS
Experience with Design, implement, and optimize end-to-end ETL data
pipelines
Good knowledge of Big Data querying tools, such as Pig, Hive, SparkSQL
Experience with Spark, Pyspark ,Python
Experience with integration of data from multiple data sources
Experience with NoSQL databases, such as HBase, Cassandra, MongoDB
Experience with Cloudera/MapR