Taro Logo
5

Looking for advice on fine-tuning LLMs as a side project

Profile picture
Entry-Level Data Scientist at Flatiron Health3 months ago

I'm a Data Scientist looking to switch company and move to a role closer to ML/LLMs. My plan is to build a side project fine-tuning LLMs to familiarize myself with this field and leverage that experience on my resume. I was wondering if anyone here has experience building similar projects or went through a similar learning process - it would be very helpful to get some insights on skill acquisition and finding a job in this area. Here're some examples of what advice I'm looking for, but please feel free to share other aspects as well - anything will be greatly appreciated:

  1. What are some good resources to learn about building LLMs? (currently mostly learning from HF, reddit, and googling)
  2. What's the best tech stack to build personal fine-tuned LLM projects? (I'm planning to use Runpod or similar services like Vast for training and inference, but was wondering if there's other better options)
  3. I'm looking to get into an early stage company in this field. What kind of project should I build to maximize my chance at getting into such companies? My plan rn is to fine tune a model using literature works (novels, poems, proses, etc.) since training data is relatively abundant and it's aligned with my interests. Are there more impactful use cases (for job hunting) out there?
  4. What are some things I should keep in mind when producing deliverables to better showcase my technical and learning abilities? I'm planning to make a series of blog/social media posts documenting my experience building this project. Is there anything in specific that would draw companies' attention?

Thanks in advance and please feel free to share your thoughts!

86
3

Discussion

(3 comments)
  • 5
    Profile picture
    ML Engineer
    3 months ago

    I would treat this like any other ML project. I would build an end to end app where you

    1. run some prefect or orchestration job to fetch data from some source (reddit/twitter/...)

    2. put it in s3

    3. read from s3 and write a processing job to fine tune an LLM in sagemaker

    4. deploy on sagemaker inference endpoint

    5. wrap with a flask app

    Here are my general thoughts on ML roles for LLM since I see a lot of people interested in this.

    In my opinion from an ML/Data Science perspective an LLM is just like any other model. It's similar to how you would use BERT for something except that its like BERT on steroids.

    I would also suggest to reevaluate your motivation for pursuing an LLM driven field. From this masterclass on choosing a good company/team it is advised that things like product, technical, level, and comp are generally overrated. At the end of the day an LLM is just another model from a company perspective.

    Again in my opinion, the reason there is a lot of hype rn is because the barrier to entry for LLM is pretty low. There's tons of hype and most (>75%) of applications which tend to be low hanging fruit are being grabbed. These can be done via chatGPT API, or out of the box tools to fine tune and deploy LLMs or another set of tools for RAG. All of this is really software heavy and less of ML Expertise heavy. AWS also has a lot of built in tools to simplify the process for deploying LLMs as they want to streamline and productize it as easily as possible)

    The only place where you truly need someone with data science skills is in companies that are building novel LLMs/SOTA LLMs (e.g. phind.ai, magic.ai) and those need PhDs usually

    And also 90% of the hard part of fine-tuning an LLM is in the "Getting the data to the the LLM part" i.e. the data engineering part

    There is a ton of competition as well in this field and it is incredibly saturated. There really arent a ton of jobs where they want someone to fine tune LLMs. (source: https://youtu.be/5p248yoa3oE?si=FmANIzPN-\_zMtXvB&t=758). Think about how many companies need AI/ML in the first place -- its not as much as things like SWE. Then the number of companies that need LLMs is a fraction of that. It is important to keep in mind that most data in companies is tabular and supervised learning still accounts for the vast majority of the ML expertise needed.

    Finally the AI/LLM field is moving really fast. A lot of the LLM specific expertise needed today is going to be obsolete in 6-12 months.

    All this is to say is focus on the fundamentals of understanding how to frame, design, and solve problems and less about trying to find the right space/tools/expertise.

    I know this might have sounded a bit rough/not exactly answer your question! If LLM is something you're deeply passionate about then by all means go for it! But I just wanted to paint a more complete/realistic picture on the reality of LLM landscape since I've seen a lot of people trying to chase the hype/sell the hype

  • 3
    Profile picture
    Tech Lead/Manager at Meta, Pinterest, Kosei
    3 months ago

    I don't have much to add to Sai's great answer, but one thing I recommend is to find a partner to work with. The likelihood of completing the project is way higher if you have someone to learn from and keep you accountable.

  • 0
    Profile picture
    Entry-Level Data Scientist [OP]
    Flatiron Health
    2 months ago

    Thanks Sai, that was extremely helpful! I definitely agree how to frame, design, and solve problems is more important, and that's what I'm aiming for. What would be a good way to showcase to companies that I'm good at solving problems in general vs. just being good at one tool? Would writing and publishing something documenting my thought process help? Or are there better ways to achieve this? Thanks again!

Flatiron Health’s mission is to improve and extend lives by learning from the experience of every person with cancer.
Flatiron Health2 questions