Cost-Effective LLM Routing

The price of flagship LLMs models can easily be 15 to 50 times the price of their smaller and weaker siblings. As a reference, gpt-4o costs $5 / 1 M input tokens meanwhile gpt-4o-mini costs $0.150 / 1 M input tokens¹. We can ask the following question: Can we route between the two models on demand to minimize costs while maintaining performance? In other words, when our application is given easier tasks, we want to use the cheaper model and when given complex reasoning tasks we want to use the bigger model. We can understand this by looking at the next figure, which represents the performance on MT Bench² of a router that randomly routes between two models.

This is where RouteLLM³: An Open-Source Framework for Cost-Effective LLM Routing⁴ developed by LLMSys Org together with AnⁱyScale comes into play where researchers trained different types of LLM Classifiers for routing and showed that you can get 95% of GPT-4o performance while reducing the costs by 85%.

To use it in our project:

1. Install it from PyPI: pip install “routellm[serve,eval]”

2. Replace your existing OpenAI client with RouteLLM Controller as shown in the example, here we are routing using matrix factorization method which recommended by the authors.

As simple as that get 85% costs reduction.

Cost-Effective LLM Routing

Don’t just take our word for it.

Explore

Resource

Follow Us in...

Cost-Effective LLM Routing

Don’t just take our word for it.

Explore

Resource

Follow Us in...

Get a Complimentary Planning Session