Support R&D and testing teams to fill their specific data requirements, collaborating with our data acquisition team to source and create datasets
Research, review and catalogue datasets suitable for training, validation, and testing machine learning solutions (audio and imaging)
Implement or support the development of data acquisition procedures, systems and tools for specific project needs (capture/record, synthesize, crawl)
Oversee the annotation and verification of datasets working with internal annotation teams and/or external partners
Curate, process and convert datasets using automated tools to ensure compliance with data format and structure requirements; create tools to support this activity
Organize data into meaningful collections following standard procedures and in compliance with project requirements
Perform testing of ML solutions and evaluate relevance and suitability of specific datasets
Create tools to automate dataset management, to organize and store/upload datasets into data storage systems
Job Requirements:
BS or MSc Degree in a relevant field and at least 3 years of relevant industry experience
Solid Python programming skills and experience with version control systems (Git, SVN)
Experience sourcing or creating data sets for machine learning training or testing
Experience in processing and converting imaging, audio data and associated metadata
Experience with data labelling/annotation and working with various metadata formats (text, CSV, JSON, XML)
Self-motivated, showing initiative, thoroughness and superior attention to details
Excellent communication skills in both spoken and written English
Ability to work effectively and independently, managing multiple priorities and meeting deliverable deadlines
Ability to work well in globally distributed teams
Additional Skills (would be added advantage):
Experience in project management and development methodologies such as Agile/Scrum/Kanban
Knowledge of imaging or audio data formats and digital cameras or audio recording systems
Interest in audio and imaging, data acquisition, processing and analytics
Experience with cloud platforms
Knowledge or experience in the creation of synthetic datasets
Experience using machine learning development tools (PyTorch, TensorFlow, Caffe)