Taro Logo

Ideas for a side project in python / ML?

Profile picture
Machine Learning Engineer at Ratepaya month ago

I'm looking for ideas for a side project that I would want to do to solidify / showcase my skills in python and specifically ML.

Tech stack I'd like to use / show is FastAPI, Pydantic, Pytest, and some ML stuff (depends on the project but could be numpy, pandas, or TF/Keras/PyTorch).

I have also some great experience with Elasticsearch, so if search is relevant that might be nice to include, or a newer vector DB with knn / ann search could be interesting.

I have no idea what to do though. Areas of interest are finance, investing, health, anything related to women... Also open to other ideas.

It doesn't need to make money, but it would be nice if it would be beneficial for some people.



  • 2
    Profile picture
    Data Engineer @ CI Financial
    a month ago

    Don't have any ideas for a good project, but I can recommend, as a baby step, to find someone who has done a project with the stack you're interested in and walk-through/copy their project to get the project-building juices flowing. Should be easy enough to find on YouTube/Google/LinkedIn.

    It's better to deviate from the exact project you're following and add your own twist to things.

    But if you're having trouble getting started or finding an idea, seeing other people's ideas in action could point you in the right direction.

    For ideas, I'd recommend asking friends and family for things they find interesting or problems they want solved, so that way you have a user/user-base off the bat.

    I'd also recommend doing some brainstorming sessions with ChatGPT/Bard.

    Hope that helps!

  • 4
    Profile picture
    Founding ML Engineer @ Lancey (YC S22)
    a month ago

    Hey OP, here is a blueprint for a project.

    It's great you've listed some interests of yours. I suggest going to Kaggle and finding a dataset you're interested in. The bigger the better (think 50/100+ GB) and if it updates everyday even better.

    1. run some prefect/airflow or orchestration job to fetch data from some source (reddit/twitter/kaggle...) put it in s3
    2. read from s3 and write a sagemaker processing job to train a model
    3. deploy on sagemaker inference endpoint
    4. wrap with a flask app
    5. Run continuous training and monitoring. As new data comes in how does your model performance change? Are you able to easily retrain a model after time passes?

    Always happy to chat over Slack DMs (@Sai B) or LinkedIn if you have more questions

  • 2
    Profile picture
    Tech Lead/Manager at Meta, Pinterest, Kosei
    a month ago

    Instead of building something on your own (although nothing against that!), I wonder if you could join an existing effort where you have clear collaborators?

    The first thing that comes to mind is the initiative from Nat Friedman (former CEO of Github) to read these ancient scrolls using ML/AI, the Vesuvius Challenge: https://scrollprize.org/

    You lose some control over the tech stack, but you meet really cool people who will motivate you.