Site Reliability Engineer

Київ, Ukraine
Full Time Employment
Senior Level
No. of Openings: 1
Posted 1 month+ ago
rinf.tech is a technology solutions company specializing in b2b custom software engineering and robotics. Founded in 2006, we’ve transformed from a Romania-based startup into an international brand of 450+ specialists with Delivery Centers and offices in Europe and the USA. At rinf.tech, we follow the Engineer of the Future philosophy which means we put a special focus on ensuring our tech talent is future-ready. As such, we foster open-mindedness, flexibility, tenacity, proactiveness, and take initiatives to encourage regular professional development and knowledge sharing within and among our teams.

What you will be working on

  • Deploy and manage innovative modern cloud technologies using infrastructure-as-code, self-healing, and security automation patterns;
  • Develop useful telemetry, alerts, and response to reduce Mean Time To Repair (MTTR);
  • Collaborate and provide technical excellence within and across teams;
  • Consult on best practices and develop tools to enable smooth adoptions of good service reliability practices and methods;
  • Identify areas of improvement in reliability, efficiency, and operations;
  • Build tools to help your SRE team quickly pinpoint, isolate and resolve issues related to infrastructure, platform services and applications;
  • Continuously refine monitoring processes, configurations, and thresholds;
  • Practice and promote sustainable incident response and blameless postmortems;
  • Develop runbooks and tools to streamline processes and shorten problem resolution time;
  • Write code that improves scalability, performance, maintainability, and security;
  • Add, tune and maintain alert configurations and documentation as needed;
  • Operate in the high-pressure environment and troubleshoot complex issues across distributed applications quickly, while successfully handling multiple priorities;
  • Cultivate full-team participation in high quality, thoughtful software;
  • Develop and improve CI/CD processes to improve release cadence and success;
  • Use Chaos Engineering principles and methodologies to test what you build under real-world conditions;
  • Mentor SREs in technical and non-technical SRE responsibilities;
  • Take primary responsibility for large (multi-person) efforts, including planning, execution, and training.

What you offer us

  • Creative and innovative outside the box thinking;
  • 5-7 years of experience in SRE, devops, technical operations, systems engineering, software engineering or related discipline;
  • Proficient, collaborative, & experienced in building reliable, scalable, enterprise systems;
  • Excellent communication skills, both verbal and written;
  • Passionate and curious about ways to leverage technology while continually learning;
  • Ability to identify root-cause sources of instability in a high-traffic, large-scale distributed systems;
  • Experience in designing, building, and operating large-scale production systems;
  • Efficiently skilled with the use of containers in enterprise production environments (e.g. Docker, Kubernetes, LXC, AWS ECS and EKS);
  • Configuration management and orchestration (e.g. Terraform, Cloud Formation, Ansible);
  • Comfortable in one or more of the following languages (Python, Java, Scala, Go, Rust, Ruby, or similar);
  • Scripting languages like Ruby, Bash, PowerShell or Python;
  • Skilled in Cloud/PaaS/SaaS Environments (e.g. AWS, Azure, Google Cloud Compute);
  • Hands-on experience using source control (Git, GitHub) and feature branching strategies;
  • Experience with continuous integration tools (e.g. Jenkins, Gitlab CI/CD, AWS CodeBuild, CodeDeploy, CodePipeline, Azure DevOps, Spinnaker);
  • Knowledge of best practices and IT operations in an always-up, always-available service;
  • Possess expertise in scalable testing, automation, continuous integration frameworks and best practices;
  • Experience in SDLC, distributed systems, networking, hardware, logistics and operations or capacity planning;
  • UNIX/Linux administration, troubleshooting, performance tuning, and security.

Preferred Qualifications:

  • Experience with DevOps methodologies and/or SRE;
  • Experience with container orchestration systems, such as AWS ECS or Kubernetes;
  • Experience with monitoring and observability tooling such as Datadog, Prometheus, Grafana;
  • Experience with automating infrastructure, deployment and testing using tools like Cloudformation, Ansible or Terraform, and can explain the Infrastructure as Code paradigm;
  • Experience with Service Level Objectives and Error Budgets;
  • Experience with configuration management, such as Puppet and Ansible;
  • Understanding of the principles and methodologies behind Chaos Engineering;
  • Experience with software development in Java, Scala, etc;
  • BS Degree in Computer Science, Electrical & Computer Engineering or Mathematics; or equivalent experience.

What we offer you

  • An energetic and spirited team environment;
  • Training and on-going development opportunities;
  • Private medical services;
  • Compensation: Competitive salary package, extra vacation days;
  • Ability to grow professionally;
  • Friendly atmosphere;
  • Comfortable office in Gulliver Business Center;
  • Paid vacation/sick leave;
  • Medical insurance;
  • Gym coverage.

Our recruitment process

HR Discussion
Introduction call
Final technical Interview with Tech Lead
Offer

Meet us!