Senior Reliability Engineer
Job Reference: 31588-Senior reliability engineer
Company: RecruitArab - recruiter
Industry: Information Technology
Job Title: Senior Reliability Engineer
Location: Abu Dhabi, United Arab Emirates
Department: Information Technology
Job Type: Full-Time
Job Description:
We are seeking a highly skilled and experienced Senior Reliability Engineer to join our dynamic IT team in Abu Dhabi. In this role, you will be responsible for ensuring the reliability, availability, and performance of our IT systems and services. You will work closely with cross-functional teams to design, implement, and support robust systems that meet the needs of our business and clients.
Key Responsibilities:
- Reliability Engineering: Design, implement, and maintain systems and processes that enhance the reliability and availability of IT services. Utilize best practices in reliability engineering to proactively identify and mitigate risks.
- Incident Management: Lead the incident management process by investigating outages and incidents, performing root cause analysis, and implementing corrective actions to prevent recurrence.
- Monitoring and Metrics: Develop and maintain monitoring solutions to provide visibility into system performance. Analyze metrics and logs to identify trends and areas for improvement.
- Capacity Planning: Collaborate with engineering teams to ensure systems are appropriately scaled to meet current and future demand. Conduct capacity planning and performance testing to ensure optimal resource utilization.
- Automation: Drive automation initiatives to reduce manual interventions and increase reliability. Implement tools and scripts to automate repetitive tasks and enhance operational efficiency.
- Collaboration: Work with development, operations, and quality assurance teams to foster a culture of reliability and continuous improvement. Participate in design reviews and provide feedback on technical architecture decisions.
- Documentation: Create and maintain comprehensive documentation related to system designs, configurations, and processes. Ensure that knowledge is shared across teams.
- Compliance and Security: Ensure that IT systems comply with industry standards and regulations. Collaborate with security teams to implement best practices for system security and data protection.
- Mentorship: Provide guidance and mentorship to junior engineers, helping them to grow their technical skills and understanding of reliability engineering practices.
Qualifications:
- Bachelor’s degree in Computer Science, Information Technology, or a related field.
- A minimum of 5 years of experience in reliability engineering, site reliability engineering, or a similar role.
- Strong knowledge of system architecture, networking, and cloud infrastructure.
- Experience with monitoring tools (e.g., Prometheus, Grafana, Nagios) and incident management platforms (e.g., PagerDuty, ServiceNow).
- Proficiency in programming and scripting languages such as Python, Bash, or Go.
- Familiarity with containerization and orchestration technologies (e.g., Docker, Kubernetes).
- Excellent problem-solving skills and the ability to work under pressure.
- Strong communication and collaboration skills to work effectively across teams.
Benefits:
- Competitive salary and performance-based bonuses.
- Comprehensive health insurance.
- Opportunities for professional development and training.
- A vibrant and inclusive work environment.
Application Process:
If you are passionate about reliability engineering and meet the qualifications above, we would love to hear from you. Please submit your application, including your resume and a cover letter, to apply@emiratesrecruiter.com by 2024-09-06.
We are an equal opportunity employer and value diversity in our workforce. We encourage all qualified candidates to apply.
Please inculde this job reference [31588-Senior reliability engineer] in the email when you send your application.