Exa is revolutionizing AI applications by building a comprehensive search engine from the ground up. The company specializes in developing massive-scale infrastructure for web crawling, training cutting-edge embedding models, and creating high-performance vector databases in Rust. With a significant investment in hardware, including a $5M H200 GPU cluster, Exa manages operations involving thousands of machines.
The Infrastructure Team plays a crucial role in developing the foundational tooling and infrastructure that powers all of Exa's systems. They're seeking infrastructure engineers to enhance their engineering capabilities by building sophisticated systems like GPU cluster orchestration in Kubernetes, implementing map-reduce batch jobs on Ray, and creating world-class observability tooling.
This role offers an exciting opportunity to work with cutting-edge technology and scale. You'll be handling projects such as building Kubernetes orchestration for multi-million dollar GPU clusters, scaling AWS batch job systems, and optimizing GPU scheduling for maximum efficiency. The position requires someone with extensive experience in large-scale infrastructure and a meticulous approach to system reliability and optimization.
The position is based in San Francisco, offering a competitive salary range of $150K-$300K plus equity. The company provides visa sponsorship for international candidates (STEM OPT, OPT, H1B, O1, E3), demonstrating their commitment to attracting top talent globally. This is an excellent opportunity for experienced infrastructure engineers who want to work on challenging problems at the intersection of AI and large-scale systems.