Menu

Opportunities

Please note: We are aware of fraudulent job offers circulating under our own brand name. Please be advised that any Northzone recruitment will always involve in-person interviews and that during our recruitment/joining process, we will never ask for any fees/payments or for individuals to pay for their own equipment or software.

companies

Jobs

My job alerts

Senior Site Reliability Engineer

Nivoda

This job is no longer accepting applications

See open jobs at Nivoda.See open jobs similar to "Senior Site Reliability Engineer" Northzone.

Software Engineering

Pakistan · India · Remote

Posted on Jan 9, 2025

About the Role

Our mission is to build a world-class online marketplace, delivering unparalleled reliability, performance, and value. In doing this, we directly help thousands of other businesses to live brilliantly. We achieve this through a commitment to engineering excellence, relentless innovation, strong collaboration and complete ownership of our responsibilities. Together, we strive to not only meet but to exceed the expectations of our customers and stakeholders. Our focus on observability reflects our commitment to continuous improvement and as part of the SRE team, you will play a key role in ensuring we offer the best possible service to our customers.

What You'll Do:

We're looking for an enthusiastic and detail-oriented Site Reliability Engineer to join our team! As a Senior SRE, you'll work closely with the site reliability engineering team to design, implement, and maintain robust system architectures, monitoring, and incident management processes. This role requires a balance between software development and operations expertise, with a focus on ensuring systems are available, scalable and efficient.

Key Responsibilities:

Assist in System Design and Architecture: Collaborate with senior engineers to design, develop, and deploy system architectures that meet performance, security, and scalability requirements.
Monitoring: Assist in developing and implementing monitoring strategies, incident management processes, and automation tools to ensure timely detection and resolution of issues.
Incident Management: Participate in the on-call rotation to ensure timely incident response. Work on root cause analysis and contribute to post-mortem documentation for incidents.
Optimise performance: Analyse and improve system performance across the infrastructure, including cloud services, databases and networking.
Capacity planning: Assist in capacity planning and scalability efforts, ensuring that infrastructure can handle increased demand over time.
Infrastructure as code: Use tools like Terraform, Ansible or similar to manage infrastructure as code, enabling consistent, repeatable infrastructure deployment and management.
Service-level-Objectives (SLOs): Help define and measure SLOs and SLIs to track the health and performance of services.
Automate Tasks: Implement automated tasks using scripting languages (e.g., Python, Bash) to improve system efficiency and reduce manual errors.
Security and compliance: Collaborate with security teams to ensure the infrastructure follows security best practices and complies with necessary regulations.
Collaboration and Communication: Work closely with internal stakeholders, including development teams, product managers, and other engineering leaders to ensure effective communication and alignment on site reliability goals.
Documentation: Create and maintain clear documentation of systems, processes and incident management procedures. Assist with the adoption of these processes across the business.

Your qualifications and experience:

2+ years of experience in a Site Reliability, DevOps or related role
Previous experience with AWS

Strong experience with Linux system administration, cloud infrastructure (AWS) and container orchestration tools (Kubernetes, Docker)

Strong technical background in cloud computing, distributed systems, and software engineering
Familiarity with automation and IaC tools (e.g., Ansible, Terraform) and continuous integration/continuous deployment (CI/CD) pipelines
Excellent problem-solving skills and ability to learn quickly
Familiarity with monitoring solutions (Datadog, New Relic, Grafana etc.)
Familiarity with a programming language (e.g. Java, Python, Bash)

Nice to have:

AWS certifications
Database management experience (SQL/NoSQL)
Knowledge of security best practices
Knowledge of system performance monitoring and tuning

About Nivoda

Nivoda is the industry leading B2B diamond and gemstones marketplace, connecting jewellery retailers to gemstone supplies, in order to save time and money whilst gaining access to a global diamond supply at the best prices, all with zero inventory risk.

With a team of over 400 dedicated employees around the world and a wealth of experience in the industry, Nivoda has developed an award-winning solution that enables jewellery businesses of any size, in any location, to buy and sell diamonds in the most profitable, efficient and hassle-free manner.

Over the course of the last six years, Nivoda has evolved into a global platform recognised for its innovation, customer service and ability to deliver a seamless, reliable and efficient experience.

We offer

Dynamic working environment in an extremely fast-growing company
Intellectually challenging work, play a massive role in Nivoda’s success and scalability
Connect with peers globally in a true no central location team
Collaborative and supportive work environment with very little hierarchy
Flexible working hours

This job is no longer accepting applications

See open jobs at Nivoda.See open jobs similar to "Senior Site Reliability Engineer" Northzone.

See more open positions at Nivoda

Privacy policy Cookie policy