Data Engineer with AWS Glue ETL/ Apache Spark
Universal Music Publishing Group is the world’s leading music company and the home for music’s greatest artists, innovators and entrepreneurs. They own and operate a broad array of businesses engaged in recorded music, music publishing, merchandising, and audiovisual content in more than 60 countries.
UMPG is also a global leader in Classical, Christian/Gospel, and Production Music. The company plays a major role in film and TV, providing creative and synch licensing services, and administration for such companies as Warner Bros. Entertainment, Universal Pictures, HBO, DreamWorks Animation, NBC Universal TV, and Sesame Workshop, among others.
The Data Engineer, will be part of cross-disciplinary team typically involving large, complex data sets.
Here is some of what you’ll need:
Very good hands on experience with Data modelling and designing ETL pipelines & solutions.
Hands on working experience in AWS GLUE ETL tool and having good knowledge on any of the ETL tools (Informatica, Talend etc.)
Must have working experience with SQL, and databases (.: Mysql,Postgres, Redshift) implementing Data Models.
Expertise with Python Language and Apache Spark, Pandas.
Understanding and technical knowledge on AWS service like EC2, S3. Should have used these technologies in previously executed projects.
Thorough with the concepts of Continuous Integration, Deployment and Delivery (CI/CD).
Look across multiple systems, understands the purpose of each system and defines data requirements by systems.
Identify downstream implications of data loads/migration (., data quality, regulatory, etc)
Here is a glimpse of what you’ll do:
Understand, update and maintain data model and data architecture
Research and develop strategies to extract added value from existing data sets
Capable to implement data analytics modules, design data models and identify patterns
Automate tasks end to end with full respect for etl monitoring and data governance
Capabilities to build and maintain data quality and ,data traceability
Implement data extraction and transformation using MPP(Massive Parallel Processing)
Prepare data marts and load data in databases(SQL & noSQL) or Business Intelligence(BI) layer
Experience with high volume of data and de-duplication approaches, incremental delta updates
Build and maintain data pipeline
Cluster resource management (CPU, RAM, NIO, etc)