CentML is on a mission to revolutionize AI accessibility by drastically reducing the costs associated with developing and deploying ML models. As a Software Engineer specializing in LLM Inference, you'll be at the forefront of making large language models more efficient, scalable, and accessible.
In this role, you'll work on architecting and implementing cutting-edge inference stacks for LLMs. You'll collaborate with diverse teams focusing on resource orchestration, distributed systems, inference engine optimization, and high-performance GPU kernel development. Your responsibilities will include writing high-quality code, conducting benchmarks and profiling, and ensuring the scalability of our core backend software.
The ideal candidate has a strong background in computer science or a related field, with at least 2 years of industry experience. You should be proficient in Python and C/C++, and have a passion for machine learning and performance engineering. Experience with LLMs, GPU programming, or distributed systems is a plus.
Join CentML to contribute to the democratization of Machine Learning and be part of a team that values diversity, offers competitive benefits, and provides opportunities for professional growth. If you're ready to make a significant impact in the world of AI, this is your chance to shine!