Site Reliability Engineer

Przegląd oferty

Lokalizacja
Remote, Remote
Rodzaj pracy
Pełny etat
Wynagrodzenie netto
22,000 zł - 32,000 zł Za MiesiÄ…c
Data opublikowania
2 lat temu

Szczegóły

ID oferty
1952
Typ pracy
Zdalnie
Rozmiar firmy
ponad 200
Wynagrodzenie
Umieszczone w ofercie
Wykorzystywane technologie
SRE, Microservices
Typ umowy
B2B Umowa o pracÄ™
Rekrutacja
Online
Rekrutacja w języku
Angielski
Korzyści
Prywatna opieka medyczna, Finansowanie kursów, No dress code, Kawa za free, Międzynarodowe projekty, Małe zespoły, Elastyczne godziny pracy,
Poziom doświadczenia
Mid Senior
Wymagana
Inny

Opis oferty

  • 5+ years of experience for Senior level and 3+ years of experience for Middle of the full software delivery lifecycle

  • Solid understanding of Microservices and APIs

  • Versed in system management, monitoring, and analysis to identify opportunities to improve service health, manageability, and reliability

  • Proven ability to dig through metrics, logs, and available sources to triage and resolve an incident at any time

  • Eager to problem-solve and troubleshoot issues that may arise day-to-day

  • Ability to document solutions, SRE architectural patterns, and best practices to ensure that teams have guidance as needed

  • Experience and interest in working in an Agile environment

  • Effective communication and interpersonal skills

Nice to have:

  • Past enterprise level experience in DevOps, Software, Infrastructure, or Site Reliability Engineering with the ability to demonstrate understanding of high level technical briefs, talks, and ideas.

  • Experience leading teams in troubleshooting, issue resolution, or escalations

O stanowisku / o projekcie 

We're currently looking for a talented Site Reliability Engineer (SRE) who will be embedded within the product development team and be responsible for the overall reliability and availability of those applications. This person must have a passion for troubleshooting and getting to the root cause of any issue that is identified, resolving that issue, and owning the lifecycle of that feedback within the application teams

Welcome bonus $4000 for Senior Level / $3000 for Middle Level

About Our Customer:
The customer is an American company based in Chicago with more than 40 years of experience in the P&C insurance industry. The customer moved to the cloud in 2003, and it began using its first ML algorithms as early as 10 years ago. The company accelerates digital transformation for the insurance and automotive industries with AI, IoT, and workflow solutions. 
The core product is a comprehensive SaaS platform that consolidates about 30,000 stakeholders, namely insurance companies, repair facilities, auto manufacturers, lenders, fleets, and everyone involved in resolving critical moments following an accident.

Client team culture and development approach:

  • People who are flexible and ready to learn new technologies
  • Most importantly, good Java understanding and attitude to learn new things in order to keep pace with the rapidly changing development environment where new technologies need to be constantly implemented

  • Cross-functional team where developers are expected to deliver both back-end and front-end code

  • Don’t worry, we've got you covered! The customer provides education courses for front-end technologies, the team adjusts the tasks at the start to allow you to gradually pick up the front end, and we provide experienced mentors from our side. You’ll have all the support you need for your professional development

About Our Project:
Workflow is an extensive platform that unifies many web and mobile applications.
Each SRE will be responsible for 1 or 2 applications within the Workflow platform and working with the corresponding development team.

Key Areas of Focus:

  • Reducing Technical Debt

  • Reducing Toil

  • Observability/System Monitoring

  • Incident Response throughout SDLC

  • Problem Management

Product/Project Tech Stack: 

  • Java, J2EE, RESTful services, JMS, Kafka, SQL, SOAP, ACTIVEMQ

  • JavaScript, vue.js, jQuery, JSP, Struts

  • Oracle, MySQL, Postgres

  • Oracle WebLogic, Amazon, Kubernetes

  • Jenkins, Spinnaker, CI/CD Pipeline

  • SVN, Git/Gitlab

  • Python

  • Ceph, S3

Zakres obowiązków

  1. First 6 months in the position: 
  2. Cleanup work, bug fixing, preparing the basis for the future SRE work 
  3. Apply automation to any tasks/parts of the system that are performed manually
  4. Configuring and maintaining the monitoring tooling as it relates to the target application 
  5. Monitor application/infrastructure and take steps to improve overall system software performance, availability, and reliability by incorporating changes through defined feedback loops within the software delivery lifecycle 
  6. Document tribal knowledge as you acquire it over time by creating runbooks/playbooks and ensuring critical system information is readily available to those who need it through dashboards 
  7. After the first 6 months in the position: 
  8. Work closely with software developers and testers to ensure the product is responding correctly to non-functional requirements such as security, performance, and availability 
  9. Resolve NOC escalations and help prevent reiteration of incidents by creating processes and automation 
  10. Be key part of our response to high-severity internal customer incidents, ensuring we meet all SLAs and SLOs 
  11. Help build an SRE culture by sharing best practices, approaches, documentation, and code with other engineering teams across the organization 
  12. Assist product development team with managing their error budget
  13. Embrace failures and treat incidents as learning opportunities through conducting blameless postmortems reports 
  14. Participate in product engineering stand-ups and related design activities
  15. Coach other team members to ensure systems are supported by following SRE best practices