How to prepare and annotate data for use in ML model fine tuning with Label Studio

Practical AI through Prototypes

312 Subscribers

214 views since Nov 26, 2023

This video discusses why you might want to fine tune a model (vs using RAG), the steps in model training, preparing data for annotation, and how to use Label Studio (open source) for that annotation.

It covers using an LLM to convert your data into an annotation-friendly format, and build an easy UI for one click data conversion.

- - -

Links:

Label Studio: https://labelstud.io/
Label Studio Github Repo: https://github.com/HumanSignal/label-...
SRT to JSON transcript converter: https://huggingface.co/spaces/mikemoz...
Retrieval Augmented Generation (an alternative to fine tuning):    • Better AI Responses with RAG Fusion  
Autonomous AI agents for data scraping:    • Autonomous AI agents and scraping for...  
OpenAI Whisper for Speech to Text: https://platform.openai.com/docs/guid...
ChatGPT Transcript to generate data conversion code: https://chat.openai.com/share/7eabb71...
Gradio Docs: https://www.gradio.app/docs/interface

- - -

Timestamps:

0:00 - Overview
0:29 - Advantages of RAG vs Fine Tuning
1:31 - Steps in Fine Tuning
2:13 - Why I'm Fine Tuning: better AI Understanding of Podcast Transcripts
3:50 - Writing code for data conversion using chatGPT
6:38 - Taking chatGPT step by step through a complex coding task
10:53 - Building a web based tool for data conversion using Gradio
17:46 - Installing Label Studio for data labeling and annotation
20:00 - Label Studio feature overview
21:53 - Setting up a Label Studio interface with code (from chatGPT)
26:09 - Importing data into label studio
27:33 - Next step: training the model!