Site Reliability Engineer

Full Time Employment
Middle / Senior Level
No. of Openings: 2
Posted 1 month+ ago
Rinf.tech has grown from a Romanian start-up into a company with over 400 employees and successfully open 5 branches in Europe (Kiev, UK, France, Germany, Bulgaria).
We offer IT consulting in the area of software services to our partners that do not have the technical skills set in-house and need additions to their current teams. We are operating on a wide range of technologies and industries.
At rinf.tech you will come across friendly people and a relaxed atmosphere every day. RINFers are eager to learn from each other, explore and reinvent the world of technology. We have an inspiring place to share ideas and build amazing things together.

What you will be working on

Do you want to be part of a team that makes streaming magic through one of the most reliable streaming services in the World? Our SREs provide expert engineering services in cloud automation, and reliability engineering to all our services that power streaming for Disney+, ESPN+, Hulu, Star+ and more, home to 100 million+ subscribers and ESPN fight nights. We are passionate about our services running with maximum uptime and minimum latency so that our subscribers have the best streaming experience of all our content.

The Mission:

As an SRE, you are looked at by your fellow team members as a trusted advisor for all things reliability; you are someone who has a clear understanding of and can thoroughly elaborate on SRE principles and best practices to a given audience. To be successful in this role you will continuously uphold and improve all the relevant reliability aspects for our services, with an increased focus on SLIs and SLOs, while raising the reliability of a variety of large-scale user facing and internal services.
Teams are in New York, San Francisco, Manchester UK, Poland, Amsterdam and more.

What you offer us

Responsibilities:

  • Deploy and manage innovative modern cloud technologies using infrastructure-as-code, self-healing, and security automation patterns
  • Develop useful telemetry, alerts, and response to reduce Mean Time To Repair (MTTR)
  • Collaborate and provide technical excellence within and across teams
  • Consult on best practices and develop tools to enable smooth adoptions of good service reliability practices and methods
  • Identify areas of improvement in reliability, efficiency, and operations
  • Build tools to help your SRE team quickly pinpoint, isolate, and resolve issues related to infrastructure, platform services and applications
  • Continuously refine monitoring processes, configurations, and thresholds
  • Practice and promote sustainable incident response and blameless postmortems
  • Develop runbooks and tools to streamline processes and shorten problem resolution time
  • Write code that improves scalability, performance, maintainability, and security
  • Add, tune, and maintain alert configurations and documentation as needed
  • Operate in the high-pressure environment and troubleshoot complex issues across distributed applications quickly, while successfully handling multiple priorities
  • Cultivate full-team participation in high quality, thoughtful software
  • Develop and improve CI/CD processes to improve release cadence and success
  • Use Chaos Engineering principles and methodologies to test what you build under real-world conditions

Qualifications:

  • Creative and innovative outside the box thinking
  • 5-7 years of experience in SRE, DevOps technical operations, systems engineering, software engineering or related discipline
  • Proficient, collaborative, & experienced in building reliable, scalable, enterprise systems
  • Excellent communication skills, both verbal and written
  • Passionate and curious about ways to leverage technology while continually learning
  • Ability to identify root-cause sources of instability in a high-traffic, large-scale distributed systems
  • Experience in building, and operating large-scale production systems
  • Efficiently skilled with the use of containers in enterprise production environments (e.g., Docker, Kubernetes, LXC, AWS ECS and EKS)
  • Configuration management and orchestration (e.g., Terraform, Cloud Formation, Ansible)
  • Comfortable in one or more of the following languages (Python, Java, Scala, Go, Rust, Ruby, or similar)
  • Scripting languages like Ruby, Bash, PowerShell, or Python
  • Skilled in Cloud/PaaS/SaaS Environments (e.g., AWS, Azure, Google Cloud Compute)
  • Hands-on experience using source control (Git, GitHub) and feature branching strategies
  • Experience with continuous integration tools (e.g., Jenkins, Gitlab CI/CD, AWS CodeBuild, CodeDeploy, CodePipeline, Azure DevOps, Spinnaker)
  • Knowledge of best practices and IT operations in an always-up, always-available service
  • Possess expertise in scalable testing, automation, continuous integration frameworks and best practices
  • Experience in SDLC, distributed systems, networking, hardware, logistics and operations or capacity planning
  • UNIX/Linux administration, troubleshooting, performance tuning, and security

What we offer you

  • Flexible working environment
  • Learning budget and platforms
  • Wide variety of projects you could be part of
  • Bonding and drinking events
  • Medical subscription
  • HR representative to guide you in your professional career development.
  • Flexible benefits platform
  • Bookster

Our recruitment processeses

  • HR Discussion
  • Technical interview
  • Offer

Meet us!

If you are still unsure, we are inviting you to come by anytime for a tour of our office without any commitment.
*All applications are strictly confidential. We will not disclose any private information without having your approval.