In this post, I’ll show you how to use LiteLLM to query 100+ different LLMs using the same unified interface, then track all of these calls automatically in a locally hosted Langfuse dashboard.

Langfuse logs the following info for each query (aka “trace”):

Total tokens (prompt and completion)
Total cost ($)
Latency (s)
Input and output
Model
and more!

Screenshot 2024-02-12 at 12.28.00 AM

End result of this tutorial

Setup

First, we need to install LiteLLM and Langfuse. We’ll use the self-hosted Docker version of Langfuse.

LiteLLM

pip install litellm

Langfuse

First, you’ll need to install Docker:

brew install docker
brew install docker-compose

Second, the Python library:

pip install langfuse>=2.0.0

Third, the self-host Docker option:

git clone https://github.com/langfuse/langfuse.git
cd langfuse

How to Use

I’ll walk through how to setup your local Langfuse server, connect it to LiteLLM, and run a query on it using Python.

1. Start Langfuse server (local)

First, run the Langfuse server by cding into the langfuse/ directory we cloned previously, and running:

docker compose up

This will start a server at http://localhost:3000

2. Get Langfuse API keys (local)

Go to http://localhost:3000 in your browser.

Click “Sign up” to create a local account:

Screenshot 2024-02-11 at 10.03.26 PM

Create your account. (Note: This isn’t an official Langfuse account, just a locally hosted one. I’m not sure why this step is necessary for self hosting.)

Screenshot 2024-02-11 at 10.04.17 PM

You should now see a page that looks like this. This is where your LLM calls will be logged:

Screenshot 2024-02-11 at 10.05.00 PM

Retrieve your API keys. Click on “Settings”, then scroll down to “API keys”. Click “Create new API keys” to create your new public/secret key pair.

Screenshot 2024-02-11 at 10.06.32 PM

Copy down your public/secret keys. We’ll use them for LiteLLM in the step below.

Screenshot 2024-02-11 at 10.11.57 PM

3. LiteLLM

Here is a minimal viable example of using LiteLLM to query GPT-3.5, and log the results in Langfuse:

from typing import Dict, Any
import litellm
import os

# Set environment variables + connect Langfuse <> LiteLLM
os.environ["LANGFUSE_HOST"] = "http://localhost:3000"
os.environ["LANGFUSE_PUBLIC_KEY"] = 'pk-...' # from Step 2
os.environ["LANGFUSE_SECRET_KEY"] = 'sk-...' # from Step 2
litellm.success_callback = ["langfuse"]

messages: List[Dict[str, str]] = [{
    'role' : 'user',
    'content' : 'What is 1+1?'
}]
response: Dict[str, Any] = litellm.completion(model="gpt-3.5-turbo", messages=messages)
print(response['choices'][0]['message']['content'])

If you go to http://localhost:3000, you should now see this query logged as a “trace” in the Langfuse Dashboard:

Screenshot 2024-02-11 at 10.13.36 PM

And clicking on this trace should reveal a lot of details about it:

Screenshot 2024-02-11 at 10.13.42 PM

Takeaways

I only scratched the surface of what these two libraries can do, but even just logging/querying GPT-3.5 is extremely useful.

Strengths

Easy setup: Getting Langfuse running locally was super simple.
Pretty UI: The Langfuse dashboard is super clean. It makes it very easy to track everything you’d want to know about your LLM queries.
In-depth tracking: Langfuse logs pretty much everything about your LLM queries, so I haven’t found any need yet for additional logging

Weaknesses

None yet

AI Tool Review - Langfuse + LiteLLM