Platform Operations Lead – Cloud Platform Team
Working closely with the Platform Engineering, Platform Design, and Platform Tooling teams, you will ensure that cloud platform services are delivered and operated effectively across environments.
Platform Operations Lead – Cloud Platform Team
The Role
The Platform Operations Lead is responsible for leading a team of engineers focused on the operational management, reliability, and continuous improvement of the organisation’s cloud platform. The role ensures that the platform is delivered in a fast, secure, and best value way, and consumable for application teams through self-service automated services, delivered by Infrastructure as Code (IaC).
Working closely with the Platform Engineering, Platform Design, and Platform Tooling teams, you will ensure that cloud platform services are delivered and operated effectively across environments. You will lead the operational lifecycle of platform capabilities, including incident response, automation, reliability engineering, and continuous optimisation.
This is a hands-on leadership role requiring strong expertise in cloud infrastructure, automation, operational excellence, and DevOps practices. You will champion a platform-as-a-product mindset, ensuring the platform remains resilient, cost-effective, and aligned with organisational cloud strategy.
Key Responsibilities
· Platform Operations & Reliability
· Lead the operational management of the organisation’s cloud platform, ensuring high availability, performance, and reliability of platform services.
· Manage platform incidents, problems, and service requests, ensuring timely resolution and continuous service improvement.
· Establish and drive Site Reliability Engineering (SRE) practices, including service-level objectives (SLOs), monitoring, and automated remediation.
· Ensure operational readiness of new platform capabilities delivered by the Platform Engineering team.
· Maintain operational runbooks, documentation, and standard operating procedures.
· Lead root cause analysis and post-incident reviews to identify systemic improvements.
Leadership and Delivery
Lead and mentor a squad of platform engineers responsible for operating and improving the cloud platform.
Manage the team backlog, prioritising operational improvements, reliability enhancements, and platform automation.
Collaborate with Platform Engineering and Platform Design teams to ensure new capabilities are designed with operability, scalability, and supportability in mind.
Partner with the Platform Tooling team to ensure adherence to security, compliance, and FinOps standards.
Support platform lifecycle management including upgrades, patching, and capacity planning.
Automation & Platform Improvement
Drive automation of operational processes using Infrastructure as Code and configuration management tools.
Continuously improve the platform through observability, monitoring, and performance optimisation.
Standardise operational practices across environments to ensure consistency and efficiency.
Enable self-service consumption of platform services through automated provisioning and reusable patterns.
Contribute to the evolution of the Platform-as-a-Product operating model.
Agile & Continuous Improvement
Champion Agile and DevOps ways of working within the platform operations team.
Use platform metrics, operational insights, and reliability data to drive improvements.
Promote a culture of automation, learning, and operational excellence.
Contribute to internal knowledge sharing, documentation, and platform enablement initiatives.
The Person
Experience
10+ years of managing IT teams with 5+ years of this time managing cloud infrastructure, platform engineering, or DevOps operations roles.
Proven experience leading engineering or operations teams in cloud environments.
Strong hands-on experience with Infrastructure as Code (Terraform, Bicep, CloudFormation, Ansible, etc.).
Experience operating platforms in AWS (preferred) and/or Azure environments.
Experience with CI/CD pipelines (GitHub Actions, Azure DevOps, Jenkins, GitLab, AWS CodePipeline).
Demonstrated experience implementing observability, monitoring, and operational automation.
Experience with incident management, problem management, and service reliability practices.
Understanding of security, governance, and FinOps principles in cloud environments.
Skills & Attributes
Strong leadership and coaching skills with the ability to build high-performing platform teams.
Deep understanding of cloud platform operations and reliability engineering.
Strong communication and stakeholder engagement skills.
Delivery focused with a strong commitment to automation and operational excellence.
Collaborative mindset with the ability to work across design, engineering, and operational teams.
Agile and DevOps mindset with a passion for continuous improvement.
Preference given to candidates that have managed a team through the transition from a traditional IT (data centre delivery) to an Agile Cloud (DevOps) delivery model.
- Department
- Technology
- Locations
- BoyleSports HQ, Dundalk
- Remote status
- Hybrid
- Employment type
- Full-time
- Employment level
- Professionals