Principal Data Engineer
Here at the world-famous and internationally respected Wellcome Sanger Institute, we have an exciting opportunity for a Principal Data Engineer to join the Surveillance Operations team in support of the COVID-19 and MalariaGEN programmes at the Wellcome Sanger Institute. If you are interested in being part of something that is actively contributing towards solving real global challenges, and you can adapt quickly to fast-paced environments, then this may be a once in a lifetime opportunity - and we are looking for someone like you!
About the Role:
As Principal Data Engineer, you will be expected to:
- Engage with Surveillance Operations project teams, internal stakeholders, refining user stories, and assisting with the definition and implementation of the Surveillance Operations data and technology strategy
- Work with other principals and seniors to harmonise data infrastructure to support products
- Develop, recruit and integrate with flexible scrum teams that can work across our product portfolios
- Implement and maintain a data architecture that is interoperable, accessible, and resilient
- Collaborate with CoreIT and Data Architect to assess the existing platforms used to handle genomic data and analysis-ready data sets with a view to improve release turnaround times to partners and the public, and even explore the implementation of new platforms
- Creating a scrum team that supports Surveillance Operations digital products to ensure all necessary data points are captured, data quality currency provenance and to a high standard
- Line manage a team of senior and junior data engineers and their deliverables
About the Tech:
We currently run our pipelines in Apache NiFi, and use NiFi registry for deployment. This registry is backed up in Gitlab. We are however in the process of migrating NiFi to Apache Spark which will provide a more testable, scalable and capable codebase. We use ElasticSearch/Kibana for monitoring logs and use Python scripts for processing that cannot be implemented in NiFi. We use Kafka as a message broker. We also use black and flake8 for code standards and gitlab CI/CD for test pipelines and deployment along with terraform. We are always looking for ways to improve how we develop.
- A natural leader when it comes to data management technology and people.
- Able to relate to scrum team members and motivate them to achieve great results.
- Passionate about data engineering and using technology in this stack to deliver according to organisation requirements.
- Demonstrable experience when it comes to using data engineering tools to achieve strategic objectives and have worked with teams to implement these tools with success.
- Flexible in your approach and role-model service-oriented behaviour in all you do.
- Able to interpret the needs of an organisation and translate them into actionable plans - collaborating closely with your fellow principals as well as your colleagues in the data architecture, data engineering, and software development space.
The Surveillance Operations team is part of a scientific network that connects researchers, clinicians and public health agencies across the globe with cutting-edge DNA sequencing technologies and genomic research. Through a number of multi-centre projects, we provide a framework for generating, integrating and sharing genetic and genomic data, and for investigating key questions about COVID-19, malaria biology and epidemiology.
Data at Sanger is generated based on organic samples that are processed through sequencing informatics pipelines and combined with multiple heterogeneous data sources. The objective is to provide an end-to-end view of the samples’ lifecycle while delivering a range of highly available, scalable, and near real-time data products to third party organisations or internal teams.
It is the responsibility of the Data Engineering team to ensure that all necessary data points are consolidated from multiple internal/external systems, analytics outputs, and partners using the most appropriate data architecture. Our goal is for our product portfolio to contain datasets that are interoperable with the receiving organisation’s existing systems and processes
- Demonstrated experience of leading and managing multiple data engineering team members, based in multi-disciplined scrum team, to drive productivity and deliver success, managing technical debt
- Ability to work with and collaborate with scrum masters, product managers, operations managers and other teams to ensure that we have the technologies and processes in place to allow our teams to work efficiently
- Experience of big data engineering and Agile principles
- Demonstrable understanding of big data engineering tools and how they can be used strategically (e.g. Spark, NiFi, Hive, Hadoop, Dask)
- Experience in defining and operating systems that integrate data from multiple sources in an environment where data provenance is essential
- Knowledge and experience with modern software development practices, including version control, continuous integration and workflow management tools such as Jira, Gitlab, etc.
- Python development experience
- Familiarity with SQL and databases
- Previous Linux (Ubuntu) system administration skills
- Experience of Cloud based technologies and management (e.g. Open Stack, AWS, GCP)
- Experience implementing messaging technologies (e.g. Kafka)
- Accredited by FedIP (Desirable)
Competencies and Behaviours:
- Experience of managing a diverse range of people
- Ability to quickly understand technical and process challenges and breakdown complex problems into actionable steps
- Good interpersonal skills
- Excellent written and spoken communication with the ability to explain technical challenges to many types of partners and stakeholders
- Ability to build collaborative working relationships with internal and external stakeholders at all levels
We have adopted a flexible hybrid model to enable a balance of remote and office working. You can find out more about our inspiring campus here.
Please apply with your CV and cover letter outlining your suitability for the role.