How to Trace LLM Calls with Grafana

Tobi A

Project Manager

Jun 25, 2024

Introduction

Understanding how to effectively trace and observe LLM calls is crucial for optimizing performance, managing costs, and ensuring the reliability of your AI-powered applications. This blog post will guide you through the process of tracing LLM calls using Grafana Cloud and langtrace.ai.

Before we dive in, make sure you have these prerequisites

1. A Grafana Cloud account

2. A langtrace.ai account (optional for sending traces to Grafana)

Step 1: Create your environment

Create a python environment:

# Create a new python environment
python3 -m venv .venv
source ./venv/bin/activate

Step 2: Set Up Grafana Cloud

1. Log in to your Grafana Cloud account.

2. Generate a Grafana API token by following this guide.

3. Note down the values for:

- OTEL_EXPORTER_OTLP_ENDPOINT

- OTEL_EXPORTER_OTLP_HEADERS

Step 3: Install Required Libraries

Install the latest release of the OpenTelemetry Python package:

pip install opentelemetry-distro[otlp]
opentelemetry-bootstrap -a install

Additionally, install the latest release of Langtrace:

pip install langtrace-python-sdk

Step 4: Configure Your Environment

Set the following environment variables, replacing the placeholders with your actual values:

export OTEL_EXPORTER_OTLP_PROTOCOL="http/protobuf"
export OTEL_EXPORTER_OTLP_ENDPOINT="<https://your_oltp_endpoint>"
export OTEL_EXPORTER_OTLP_HEADERS="Authorization=Basic%20<your_encoded_api_token>"
export OTEL_RESOURCE_ATTRIBUTES="service.name=your-app-name,service.instance.id=your-app-instance"

Note: For Python, replace the space after "Basic" with "%20" in the OTEL_EXPORTER_OTLP_HEADERS variable.

Step 5: Install and Initialize langtrace.ai SDK

1. Install the langtrace.ai Python SDK:

pip install langtrace-python-sdk

2. Import and initialize the SDK in your code:

from langtrace_python_sdk import langtrace
langtrace.init(api_key='LANGTRACE_API_KEY') # API key not needed for Grafana

Note: In-order to send traces to Grafana you must not input your Langtrace API key

Step 6: Run Your Application

Use the OpenTelemetry Python automatic instrumentation tool to run your application:

opentelemetry-instrument python YOUR_APPLICATION.py

Step 7: Verify Traces in Grafana

After running your application, you should see traces generated in Grafana Tempo. You can now visualize and analyze the incoming requests and outgoing LLM calls made by your application.

Troubleshooting

If you encounter issues, try the following:

1. Traces not visible: Double-check all environment variables.

2. Missing libraries: Reinstall required packages:

pip install opentelemetry-distro[otlp]
opentelemetry-bootstrap -a install

3. OTEL_EXPORTER_OTLP_HEADERS formatting:

Ensure the space is replaced with "%20".

export OTEL_EXPORTER_OTLP_HEADERS="Authorization=Basic%20<your_encoded_api_key>"

Use Case Examples: How-tos with Grafana

1. How to Debug LLM Response Latency Using Grafana:

  • Visualize latency trends using time series graphs.

  • Set up alerts for response times exceeding thresholds.

2. How to Optimize LLM Token Usage Using Grafana:

  • Track token consumption per request.

  • Identify high-token-consuming requests through query analysis.

3. How to Monitor LLM Error Rates with Grafana:

  • Create a panel to track different types of LLM errors (e.g., timeouts, API errors).

  • Use heatmaps to visualize error frequency across time periods.

  • Implement annotations to mark deployments or config changes for context.

4. How to Analyze LLM Prompt Effectiveness Using Grafana:

  • Correlate prompt variations with response quality metrics.

  • Compare different prompt strategies over time.

  • Use variables to easily switch between prompt categories.

5. How to Track LLM Cost Efficiency with Grafana:

  • Combine token usage with pricing data.

  • Calculate cost per successful interaction using math functions.

  • Create projections and alerts for budget thresholds.

6. How to Optimize LLM Caching Strategies Using Grafana:

  • Compare response times for cached vs. non-cached LLM calls.

  • Show cache hit rates using gauge panels.

  • Analyze which types of queries benefit most from caching.

These examples demonstrate how Grafana can be used to gain insights, improve performance, and optimize costs in LLM applications.

Additional Resources

For more information on integrating langtrace.ai with Grafana, check out our official langtrace.ai documentation:

To get started quickly, we've created a simple demo program that showcases the integration:

Visit our GitHub repository for more open-source LLM observability stack tools.

Stay up to date with Langtrace’s latest tooling updates via out Linkedin and Twitter:

We’d love to hear from you!

We’d love to hear your feedback, Langtrace! We invite you to join our community on Discord or reach out at support@langtrace.ai and share your experiences, insights, and suggestions. Together, we can continue to set new standards of observability in LLM development.

Happy tracing!

Ready to try Langtrace?

Try out the Langtrace SDK with just 2 lines of code.

Ready to deploy?

Try out the Langtrace SDK with just 2 lines of code.

Want to learn more?

Check out our documentation to learn more about how langtrace works

Join the Community

Check out our Discord community to ask questions and meet customers