Serve a Custom LLM for Over 100 Customers

Trelis Research

3,810 Subscribers

9,748 views since Nov 26, 2023

Advanced Inference Repo: https://trelis.com/enterprise-server-...

Affiliate Links (support the channel):
- Vast AI - https://cloud.vast.ai/?ref_id=98762
- Runpod - https://tinyurl.com/4b6ecbbn

Newsletter: Trelis.Substack.com

Chapters:
0:00 Serving a model for 100 customers
0:25 Video Overview
1:08 Choosing a server
7:45 Choosing software to serve an API
11:26 One-click templates
12:13 Tips on GPU selection.
17:34 Using quantisation to fit in a cheaper GPU
21:31 Vast.ai setup
22:25 Serve Mistral with vLLM and AWQ, incl. concurrent requests
35:22 Serving a function calling model
45:00 API speed tests, including concurrent
49:56 Video Recap

© Furr.pk

TermsPrivacy

[email protected]