Site Reliability Engineer (Azure)

Permanent - Leeds, London and Uppingham

At MMT Digital, our Site Reliability Engineers create a bridge between development and operations by applying a software engineering mindset to system administration situations. This role allows us to apply these practices to clients and provide services such as managed SRE. 

When you join our Systems team, you’ll be: 

  • Directly responsible for uptime of client infrastructure and applications – including availability, latency, performance, efficiency, change management, monitoring, emergency response, and capacity planning 

  • Owning end-to-end availability and performance of key services and build automation to prevent problem recurrence 

  • Automating current manual infrastructure management and alerts handling processes 

  • Assisting in the roll-out and deployment of new product features and installations 

  • Finding scalability bottlenecks and areas for performance improvements 

  • Creating and maintaining sufficient levels of documentation for the solutions produced 

  • Assisting project teams in enhancing commercial opportunities and mitigating risks 

  • Triaging and solving service issues and outages 

  • Performing tasks related to Cloud Systems Engineering and other infrastructure/pipeline work as needed to support the overarching Systems team effort 

You’ll be working closely with clients to ensure safe, accurate delivery of services whilst positively representing MMT Digital and our values through these interactions. Working with the support of your Development team peers and Technical Architects, you’ll make sure development work is delivered on time and budget and in line with the technical vision for the project. You’ll take accountability for the success of the project as a whole, including offering input and insight to areas other than just the development, and bearing the responsibility of decisions that need to be made. 

What you'll bring

In order to flourish in this role, you’ll need the following:  

  • Experience with running production systems, triaging and solving outages 

  • Experience of working in Azure, with AWS experience as a bonus 

  • Strong written and verbal communication skills 

  • Experience with modern monitoring systems such as Azure Monitor, DataDog, Application Insights, New Relic 

  • Solid understanding of infrastructure as code 

  • A working knowledge of IaaS, PaaS, Containers and Serverless technologies 

  • Strong understanding of System Architecture  

  • Experience of using Terraform 

  • Strong troubleshooting skills, able to drive out root causes of complex technical problems 

It would also be great (but not essential) if you have:  

  • Experience of working in a Serverless environment 

  • Azure DevOps experience 

  • Hands-on Software engineering experience with one or more programming languages (e.g. Python, Go, Bash, Powershell, Javascript, Java, C#) 

  • A general understanding of development processes and practices

    Our offices in London, Leeds and Uppingham have reopened fully, but should you prefer to work from home or combine remote and office work, you are welcome to! We just ask that you’re flexible to travel to the office or client sites as required. You will be able to confirm your preferences in the application form.