Site Reliability Engineer

Brooksource

Remote Aug 7
Apply now

*Site Reliability Engineer (SRE)*

*Contract to Hire *

*Remote (EST Time Zone)*

Our Fortune 15 health care client is seeking a Site Reliability Engineer (SRE) to play a critical role in ensuring the reliability, scalability, and performance of their systems and applications. You will work closely with cross-functional teams to design, implement, and maintain robust infrastructure and automation solutions. Your expertise in release management and automation will be instrumental in streamlining their software delivery processes and enhancing their overall operational efficiency.

*Responsibilities:*

· Manage and optimize Linux-based systems and servers: Ensure high availability, performance, and security of critical services by configuring, monitoring, and maintaining Linux environments.

· Implement monitoring, alerting, and logging solutions to proactively identify and mitigate potential issues. Help the team stabilize monitoring and improve observability.

· Enhance platform monitoring by utilizing tools such as Dynatrace and Splunk, specifically for applications running on Ruby on Rails.

· Implement and manage Azure cloud monitoring to gain comprehensive visibility into infrastructure and application health, ensuring swift resolution of issues.

· Design, implement, and maintain highly available and scalable infrastructure solutions to support applications and services on the Azure cloud platform.

· Collaborate with software engineering teams to define and implement reliable deployment pipelines and release processes using GitHub and Azure Pipelines for CI/CD.

· Develop automation scripts and tools using PowerShell and other languages to automate repetitive tasks and streamline operational workflows.

· Lead disaster recovery planning and testing efforts to ensure business continuity and minimize downtime in case of system failures or disasters.

· Perform capacity planning and resource optimization to ensure optimal performance and cost-effectiveness of infrastructure.

· Participate in incident response and resolution, including root cause analysis and post-incident reviews.

*Qualifications:*

· Bachelor's degree in Computer Science, Engineering, or a related field.

· 5+ years of experience in site reliability engineering, DevOps, or a similar role.

· Extensive experience in Linux environment.

· Extensive application monitoring experience using platforms such as Azure Monitor, Dynatrace, or Splunk.

· Proficiency in scripting and programming languages such as PowerShell, Ruby on Rails, Python, Bash, or Go.

· Hands-on experience with Azure cloud services and technologies.

· Experience with GitHub and Azure Pipelines for CI/CD.

· Strong understanding of containerization technologies and orchestration frameworks like Kubernetes.

· Experience with configuration management tools such as Terraform, Ansible, or Puppet.

· Familiarity with release automation tools like release-please.

· Excellent problem-solving skills and the ability to troubleshoot complex issues in distributed systems.

· Strong communication and collaboration skills, with the ability to work effectively in a cross-functional team environment.

Job Types: Full-time, Contract

Pay: $55.00 - $60.00 per hour

Benefits:
* Dental insurance
* Health insurance
* Vision insurance
Schedule:
* Monday to Friday

Application Question(s):
* Will you now, or in the future, require sponsorship for employment visa status (e.g. H-1B visa status)?

Experience:
* SRE: 5 years (Required)
* Microsoft Azure: 5 years (Required)
* Linux: 5 years (Required)

Work Location: Remote