Site Reliability Engineer

Apply Now

As a Site Reliability Engineer, you will help us achieve our goals by continuously improving our SaaS offering’s features and robustness. You will participate in designing, developing, deploying, monitoring, supporting, documenting, and troubleshooting our SaaS solution.
This is an exciting opportunity to collaborate closely with the Cloud Operations team, the wider organization, and external vendors and customers.
This is a hybrid role based in our Cambridge or London office, so you will ideally be comfortable coming into the office once or twice a week. If you’re interested in the role but require more flexibility, please speak to us!
Key Responsibilities:

Deploying, maintaining, monitoring, and upgrading production deployments of our SaaS solutions
Building software and systems to manage platform infrastructure and applications
Continually evaluating and improving our technology and processes to increase quality, decrease costs, and improve time-to-market
Periodically testing the service with predictable and unpredictable failures
Providing 2nd-line operational support for our SaaS customers
Gathering data and generating reports on the service performance
Developing and documenting internal processes
Working with engineering/data science to drive and develop new capabilities
Providing out-of-hours support for critical service issues as part of our on-call engineer rota

Preferred Skills/Experience:
While not all are essential, ideally you will have experience with the following:

Administering cloud infrastructure or developing cloud applications (preferably in AWS)
Configuration management, including Infrastructure as Code
Linux, shell-scripting, and command-line tools
Programming in one or more high-level programming languages (e.g. Python)
Networking (e.g. DNS, routing, firewalls)
Source-control management (e.g. Git)
Continuous Integration / Continuous Deployment (CI/CD)
Monitoring, metrics, and alerting
Containerization (e.g. Docker)
Administering, developing applications for, or deploying applications to Kubernetes
Using or developing applications with service mesh (e.g. Istio)
Object-oriented programming and design
Operating production-grade services
Providing technical support
Building serverless or cloud-native applications
Writing technical documentation
Developing processes and procedures
Securing applications, services, and data (e.g. authentication, authorization, encryption, and TLS)
Experience with any of the following: Terraform, SaltStack, MongoDB, Elasticsearch, Kafka, Prometheus, Grafana, HashiCorp Vault

We are looking for candidates who are passionate about technology, keen to continuously learn, and excited to contribute to a dynamic team environment. If you have the required skills and are looking for a challenging and rewarding role, we encourage you to apply.

Apply Now

Type:: Permanent
Start Date:: 19/06/2024
Contract Length:: N/A
Contact Name:: Login or register to view
Telephone:: Login or register to view
Job Reference:: RF-3
Job ID:: 221867181

Remember: You should never send money to a prospective employer or disclose any financial information. Should you encounter any job listings requesting payments or financial details, please reach out to us immediately. For further guidance, visit jobsaware.co.uk.

Apply Now

Site Reliability Engineer

Create new Job Alert

Checking to see if SoCode Limited requires any additional information for this job.

Create new Job Alert

Related sectors

Related by job title

Checking to see if SoCode Limited requires any additional information for this job.