Centennial Technologies seeks experienced Site Reliability Engineers (SRE) to join our SRE Support team. As an SRE at Centennial, you will play a vital role in ensuring the 24×7 monitoring and production support of critical systems. Our team is responsible for meeting service level agreements (SLAs) and following SRE best practices to minimize manual remediation (“toil”) to less than 50% of your workload. Your primary focus will be on building automated remediation capabilities to enhance system reliability. You will collaborate with the customer, Cloud Architects, and DevOps Engineers to control and increase Reliability.
Location: Hybrid with the following locations (2 days onsite and 3 days remote) DC Metro Area (VA, DC, and MD), Kansas City, MO, Cincinnati, OH, Raleigh-Cary, NC
Key Responsibilities:
- Provide 24×7 monitoring and production support to ensure system availability.
- Meet defined SLAs and service levels in alignment with SRE best practices.
- Minimize manual remediation (“toil”) by developing and implementing automated remediation solutions.
- Collaborate with appropriate teams in the event of system overload, including Application and cloud automation teams.
- Administer/Configure Splunk.
- Perform application monitoring, gradual change implementation, and automation for reliability improvement.
- Contribute to Business Continuity and Disaster Recovery (DR) efforts, particularly in cloud-based business continuity.
- Assist in designing Reliability, Maintainability, and Availability (RAM/ARM) for Systems through Fault Tolerance, Redundancy, Distributed/Parallel Processing, and five 9s (i.e., 99.999%).
- Perform Business Continuity, Continuity of Operations (COOP), DR, and Readiness planning, exercises, and testing.
- Perform Switchover/Failover with Cold, Warm, or Hot Start.
- Monitor/remedy System Data Synchronization processes.
- Administer Splunk – Platform performance and stability –resource usage/infrastructure monitoring.
Requirements:
- Bachelor’s degree in computer science, Information Technology, or a related field, and 6+ years in SRE.
- Proven experience as a Site Reliability Engineer (SRE).
- Strong knowledge of cloud-based Business Continuity, COOP, DR, and Readiness planning, exercises, and testing.
- Proficiency in Splunk administration and configuration.
- Ability to work collaboratively and efficiently in a team.
- Exceptional problem-solving and troubleshooting skills.
- Excellent communication and documentation skills.
About the Company:
Centennial Technologies Inc. (Centennial) is committed to a healthy work-life balance for our employees, and we have worked hard to foster an environment that enables employees to prioritize both their professional and personal responsibilities effectively. We make every effort to accommodate employees by providing flexible paid time off, a casual work atmosphere, frequent collaborative interaction, and the opportunity to continuously develop career skills.
Centennial offers a competitive benefits package, which includes Medical, Dental, Short-Term Disability, Long-Term Disability, Life Insurance, 401k, Mass Transit Benefits, Paid Time Off, and Federal Holidays.
Our Culture is inclusive of:
- A supportive professional environment that promotes a healthy work-life balance.
- Performance Management techniques that reward our top performers.
- Employee surveys and discussions to inform Management’s decisions.
- Paid training on the latest technologies and business practices.
- An employee-focused model.
- A shared vision of client success through cultivating long-term relationships.
Equal Opportunity Employer
Centennial is an equal-opportunity employer and complies with all applicable federal, state, and local employment laws.