Infrastructure Engineer
SolveAI
Other Engineering
London, UK
Location
London
Employment Type
Full time
Location Type
On-site
Department
Engineering
About SolveAI
SolveAI empowers enterprise business users to solve their most critical problems by enabling them to build custom full-stack applications through natural language conversations. Our product is designed for non-technical business users, those closest to the problems, allowing them to turn their insights into production-ready, enterprise-grade solutions.
We integrate seamlessly with existing enterprise tech stacks while adhering to strict IT security and compliance standards, giving organizations the freedom to innovate without compromising system integrity, maintainability, or data protection.
Our approach is rooted in an enterprise-first philosophy and a problem-solving mindset, ensuring that we create scalable, compliant, and high-impact solutions tailored to each customer's environment.
We create a future where building enterprise software is as natural as having a conversation!
What We Offer
High-impact environment: Join a company shaping the future of enterprise software, working with a team that's redefining how AI meets real-world business needs
Ownership and visibility: In a small, high-performing team, the impact of your work isn't theoretical - it actively shapes the company's direction
Empowerment through innovation: Use cutting-edge AI tools internally. You'll be working with the same technology we deliver to enterprises
About the role
This is a high-leverage, high-ownership role and you'll set technical direction across reliability, observability, and customer deployability.
What makes it unusual is that you'll be the person enterprise security and infrastructure teams talk to when they have hard questions. As much as this is an internal platform role, it's also a customer-facing one. You'll be on calls with customer architects, leading on-prem rollouts, and navigating the compliance requirements of some demanding environments.
What you'll do
Own our AWS footprint, EKS clusters, and Terraform codebase: design, evolve, harden
Build and operate the observability stack (Datadog) so we catch problems before customers do: metrics, traces, logs, alerting, SLOs
Design for multi-tenancy: isolation, performance, cost attribution, noisy-neighbour mitigation
Lead our customer deployment story: make it straightforward for enterprises to run SolveAI in their own AWS / Azure / GCP accounts, or fully on-prem
Be the technical lead on customer security reviews, architecture deep-dives, and on-prem rollouts
Own CI/CD, secrets management, and the developer experience that lets engineers ship safely and fast
Own production reliability: incident response, post-mortems, capacity planning
Help shape our compliance posture (SOC 2, ISO 27001, financial-services requirements) as we grow
What we're looking for
Beyond technical depth, we're looking for:
Ownership. You treat the platform as yours. You don't wait to be told something is broken and you don't hand problems off.
Customer instinct. You're comfortable talking to enterprise security and infra teams. You can take a hard technical question and give a clear, honest answer.
Pragmatism. You know the difference between good enough to ship and good enough to last. You make that call correctly.
On the technical side:
Deep AWS expertise: VPC, IAM, EKS, networking
Strong Kubernetes knowledge — operator level, not just kubectl.
Terraform fluency: module design, state management, multi-account patterns
Strong observability instincts: alerting that doesn't burn people out, symptoms vs causes
Experience supporting enterprise customers running your software in their own environments (BYOC or on-prem), or a strong appetite to figure it out
Comfortable enough in Rust and Python to trace a problem from infrastructure all the way into application code
Strong written communication: you'll be talking to customer architects as much as our own engineers
Nice to have
Experience with air-gapped or on-prem distributions of cloud-native software
Helm, ArgoCD, or similar GitOps tooling
Background with high-throughput / low-latency systems
Cost optimisation experience at AWS scale
Why SolveAI
At SolveAI, you'll work alongside a team that's spent decades solving the hardest enterprise challenges, from operational inefficiencies to data fragmentation.
If you want to see real change in the world, care deeply about value add, and thrive when given ownership in unfamiliar environments, we'd love to hear from you.