21

Suggestions for good open-source AI projects I can contribute to?

Profile picture
Entry-Level Software Engineer at Other3 months ago

I'm trying to break into AI as a Machine Learning Engineer. I want to demonstrate expertise in AI/ML topics and capture a "Wow" factor by contributing to a well-known open source AI project.

I'm looking for suggestions spanning CV, NLP, tabular data and collaborative filtering -- not excluding but not limited to the latest GenAI stuff.

Some context: I have an MS in CS, did CV research, and completed Jeremy Howard's "Practical Deep Learning with fastai and PyTorch".

The fastai library itself is a popular target for a first contribution. See:

  1. How to Make Open Source Contributions to fastai (Hamel Husain)

  2. fast.ai Discord server (if the link doesn't work, it's also discoverable: search "fast.ai"). #fastai-dev is the contributors' channel.

One concern is identifying something that's non-trivial but tractable for a relative newbie. I realize there's tension between this and achieving that "Wow" factor at least in the beginning. I'm not sure a contribution to, say, PyTorch or scikit-learn is achievable from where I stand, but I could be wrong. Perspective on how to spot/scope opportunities would be appreciated too.

Thanks very much!

Potentially relevant questions on Taro:

2.8K
9

Discussion

(9 comments)
  • 23
    Profile picture
    Entry-Level Software Engineer at Seed Startup
    3 months ago

    I was in your same position exactly a year ago, so this hits really close to home. It was so hard to get interviews that even with spam applying, I only got 2 callbacks in a month. Then I started contributing to open-source, and things started looking up. Here's some pointers:

    **For those who are later in their career, I think these principles still apply. The impact of doing this earlier in your career has more of a 0-1 effect than the 1-5 effect someone later in their career would see IMO

    Pick A Niche

    So there's a deluge of MSCS graduates who want to work in AI. You need to focus on a single direction. There's several routes you can take:

    • Model Serving
    • Classic ML model development (collaborative filtering, etc)
    • Large ML models (LLMs, stable diffusion, etc)
    • Fine-tuning
    • Generative AI tools
    • ML infra tools like Flyte/Airflow
    • Classic ML model dev and Large Model work typically require top conference papers to be in the top 1%

    Before you dive into any route, make sure you do some due diligence so that you know roughly what it takes to be the top 1% in anything. Then decide if you're willing to put in the work.

    Pick The Project Size

    Then there's size of the open source project. Small libraries/frameworks are too unknown to get you any real attention with recruiters. Large libraries/frameworks will take 1-2 years to get on the PMC.

    Small projects are things like util libraries. Large projects are established frameworks like PyTorch or Kubernetes.

    The "goldilocks zone" IMO is medium-size projects that are rapidly gaining traction. They have enough things on their roadmap that'll add immediate value and they should be the most welcoming of new contributors.

    Making Core Contributions

    Start with something small. No sane person would ever let a stranger add changes to their core codebase. Start with something small like a cherry-pick commit or an integration.

    Then it's time to align your contributions to the actual product roadmap. Attend contributor meetings. Talk to other contributors on Slack/Discord. Get involved.

    As you contribute, you'll see yourself go up the "top contributors" list. You'll see yourself writing better PRs, writing cleaner code, and working on more impactful features.

    Spinning Open-source Into Opportunities

    Getting to the top 10 or top 5 contributors for a medium-size library doesn't take that long. Even in the span of 1-2 months, you can get to Top 5 if you put in full-time effort.

    Now you have some awesome résumé bullet points:

    • Built core features: X, Y, Z, etc
    • Top 5 contributor for Github project with 20k stars and 500K downloads/month

    Now go! Find a project and contribute. You'll learn a lot + be helping out the community. I think that you can only re-write your resume so many times before you need to make a fundamental change. Making open-source contributions can help you make that fundamental change to your profile.

    • 1
      Profile picture
      Tech Lead @ Robinhood, Meta, Course Hero
      3 months ago

      Wow, this is literally one of the best things I have ever read - Thank you so much Elliot for sharing your wisdom!

    • 2
      Profile picture
      Entry-Level Software Engineer [OP]
      Other
      3 months ago

      Thanks for sharing, Elliot. It's encouraging to hear from someone who's been in your shoes and made it to the other side. I'll put your suggestions to work. I appreciate you.

    • 0
      Profile picture
      Thoughtful Tarodactyl
      Taro Community
      3 months ago

      This is so helpful! Elliot, would you be open to sharing more details on how this helped get more interviews? like how many more interviews did you get, did recruiters notice it/were impressed by it? or did you get reachouts?

      and generally any other thoughts related to how open source helped with getting/passing interviews

    • 1
      Profile picture
      Thoughtful Tarodactyl
      Taro Community
      3 months ago

      Related reading for anyone interested: https://huyenchip.com/2024/03/14/ai-oss.html

      Chip talks about the current state of open source AI repos, which ones are good, and how to think about it

    • 1
      Profile picture
      Thoughtful Tarodactyl
      Taro Community
      3 months ago

      Here's a list of repos she analyzed and categorized: https://huyenchip.com/llama-police

    • 0
      Profile picture
      Thoughtful Tarodactyl
      Taro Community
      3 months ago

      ok update, after digging through chip's list I found an awsome repo: https://github.com/unslothai/unsloth

      It's a YC backed startup: https://www.ycombinator.com/companies/unsloth-ai that's building an open source platform to make fine-tuning LLMs faster.

      It's really cool because you can actually run the training yourself on colab because these models are small. You get the experience of training/fine-tuning LLMs which is not something you find in a lot of open source repos

      It's a super active repo with a very active maintainer. Lots of opportunity to learn how these models work under the hood

      Not too crowded, still time to make good contributions.

      Pretty medium sized

    • 0
      Profile picture
      Entry-Level Software Engineer at Seed Startup
      3 months ago

      Hm, not a bad choice

  • 12
    Profile picture
    Tech Lead @ Robinhood, Meta, Course Hero
    3 months ago

    TensorFlow is open-source and widely used: https://github.com/tensorflow/tensorflow

    The problem is that the repo is huge (2.8k pull requests with 185k GitHub stars), so getting something merged in is probably super hard 😥

    The "How to start contributing to open source?" discussion is particular helpful for high-level advice. In particular, I recommend working on a repo you use yourself.

    We also recently shipped an open-source contribution course with the former Director of Open Source Engineering at Facebook. You can watch it here: [Course] Become An Open Source Master