Advanced Inference Repo: https://trelis.com/enterprise-server-...
Affiliate Links (support the channel):
- Vast AI - https://cloud.vast.ai/?ref_id=98762
- Runpod - https://tinyurl.com/4b6ecbbn
Newsletter: Trelis.Substack.com
Chapters:
0:00 Serving a model for 100 customers
0:25 Video Overview
1:08 Choosing a server
7:45 Choosing software to serve an API
11:26 One-click templates
12:13 Tips on GPU selection.
17:34 Using quantisation to fit in a cheaper GPU
21:31 Vast.ai setup
22:25 Serve Mistral with vLLM and AWQ, incl. concurrent requests
35:22 Serving a function calling model
45:00 API speed tests, including concurrent
49:56 Video Recap