> ## Documentation Index
> Fetch the complete documentation index at: https://docs.embedchain.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# 🤖 Large language models (LLMs)

## Overview

Embedchain comes with built-in support for various popular large language models. We handle the complexity of integrating these models for you, allowing you to easily customize your language model interactions through a user-friendly interface.

<CardGroup cols={4}>
  <Card title="OpenAI" href="#openai" />

  <Card title="Google AI" href="#google-ai" />

  <Card title="Azure OpenAI" href="#azure-openai" />

  <Card title="Anthropic" href="#anthropic" />

  <Card title="Cohere" href="#cohere" />

  <Card title="Together" href="#together" />

  <Card title="Ollama" href="#ollama" />

  <Card title="vLLM" href="#vllm" />

  <Card title="Clarifai" href="#clarifai" />

  <Card title="GPT4All" href="#gpt4all" />

  <Card title="JinaChat" href="#jinachat" />

  <Card title="Hugging Face" href="#hugging-face" />

  <Card title="Llama2" href="#llama2" />

  <Card title="Vertex AI" href="#vertex-ai" />

  <Card title="Mistral AI" href="#mistral-ai" />

  <Card title="AWS Bedrock" href="#aws-bedrock" />

  <Card title="Groq" href="#groq" />

  <Card title="NVIDIA AI" href="#nvidia-ai" />
</CardGroup>

## OpenAI

To use OpenAI LLM models, you have to set the `OPENAI_API_KEY` environment variable. You can obtain the OpenAI API key from the [OpenAI Platform](https://platform.openai.com/account/api-keys).

Once you have obtained the key, you can use it like this:

```python theme={null}
import os
from embedchain import App

os.environ['OPENAI_API_KEY'] = 'xxx'

app = App()
app.add("https://en.wikipedia.org/wiki/OpenAI")
app.query("What is OpenAI?")
```

If you are looking to configure the different parameters of the LLM, you can do so by loading the app using a [yaml config](https://github.com/embedchain/embedchain/blob/main/configs/chroma.yaml) file.

<CodeGroup>
  ```python main.py theme={null}
  import os
  from embedchain import App

  os.environ['OPENAI_API_KEY'] = 'xxx'

  # load llm configuration from config.yaml file
  app = App.from_config(config_path="config.yaml")
  ```

  ```yaml config.yaml theme={null}
  llm:
    provider: openai
    config:
      model: 'gpt-4o-mini'
      temperature: 0.5
      max_tokens: 1000
      top_p: 1
      stream: false
  ```
</CodeGroup>

### Function Calling

Embedchain supports OpenAI [Function calling](https://platform.openai.com/docs/guides/function-calling) with a single function. It accepts inputs in accordance with the [Langchain interface](https://python.langchain.com/docs/modules/model_io/chat/function_calling#legacy-args-functions-and-function_call).

<Accordion title="Pydantic Model">
  ```python theme={null}
  from pydantic import BaseModel

  class multiply(BaseModel):
      """Multiply two integers together."""

      a: int = Field(..., description="First integer")
      b: int = Field(..., description="Second integer")
  ```
</Accordion>

<Accordion title="Python function">
  ```python theme={null}
  def multiply(a: int, b: int) -> int:
      """Multiply two integers together.

      Args:
          a: First integer
          b: Second integer
      """
      return a * b
  ```
</Accordion>

<Accordion title="OpenAI tool dictionary">
  ```python theme={null}
  multiply = {
    "type": "function",
    "function": {
      "name": "multiply",
      "description": "Multiply two integers together.",
      "parameters": {
        "type": "object",
        "properties": {
          "a": {
            "description": "First integer",
            "type": "integer"
          },
          "b": {
            "description": "Second integer",
            "type": "integer"
          }
        },
        "required": [
          "a",
          "b"
        ]
      }
    }
  }
  ```
</Accordion>

With any of the previous inputs, the OpenAI LLM can be queried to provide the appropriate arguments for the function.

```python theme={null}
import os
from embedchain import App
from embedchain.llm.openai import OpenAILlm

os.environ["OPENAI_API_KEY"] = "sk-xxx"

llm = OpenAILlm(tools=multiply)
app = App(llm=llm)

result = app.query("What is the result of 125 multiplied by fifteen?")
```

## Google AI

To use Google AI model, you have to set the `GOOGLE_API_KEY` environment variable. You can obtain the Google API key from the [Google Maker Suite](https://makersuite.google.com/app/apikey)

<CodeGroup>
  ```python main.py theme={null}
  import os
  from embedchain import App

  os.environ["GOOGLE_API_KEY"] = "xxx"

  app = App.from_config(config_path="config.yaml")

  app.add("https://www.forbes.com/profile/elon-musk")

  response = app.query("What is the net worth of Elon Musk?")
  if app.llm.config.stream: # if stream is enabled, response is a generator
      for chunk in response:
          print(chunk)
  else:
      print(response)
  ```

  ```yaml config.yaml theme={null}
  llm:
    provider: google
    config:
      model: gemini-pro
      max_tokens: 1000
      temperature: 0.5
      top_p: 1
      stream: false

  embedder:
    provider: google
    config:
      model: 'models/embedding-001'
      task_type: "retrieval_document"
      title: "Embeddings for Embedchain"
  ```
</CodeGroup>

## Azure OpenAI

To use Azure OpenAI model, you have to set some of the azure openai related environment variables as given in the code block below:

<CodeGroup>
  ```python main.py theme={null}
  import os
  from embedchain import App

  os.environ["OPENAI_API_TYPE"] = "azure"
  os.environ["AZURE_OPENAI_ENDPOINT"] = "https://xxx.openai.azure.com/"
  os.environ["AZURE_OPENAI_KEY"] = "xxx"
  os.environ["OPENAI_API_VERSION"] = "xxx"

  app = App.from_config(config_path="config.yaml")
  ```

  ```yaml config.yaml theme={null}
  llm:
    provider: azure_openai
    config:
      model: gpt-4o-mini
      deployment_name: your_llm_deployment_name
      temperature: 0.5
      max_tokens: 1000
      top_p: 1
      stream: false

  embedder:
    provider: azure_openai
    config:
      model: text-embedding-ada-002
      deployment_name: you_embedding_model_deployment_name
  ```
</CodeGroup>

You can find the list of models and deployment name on the [Azure OpenAI Platform](https://oai.azure.com/portal).

## Anthropic

To use anthropic's model, please set the `ANTHROPIC_API_KEY` which you find on their [Account Settings Page](https://console.anthropic.com/account/keys).

<CodeGroup>
  ```python main.py theme={null}
  import os
  from embedchain import App

  os.environ["ANTHROPIC_API_KEY"] = "xxx"

  # load llm configuration from config.yaml file
  app = App.from_config(config_path="config.yaml")
  ```

  ```yaml config.yaml theme={null}
  llm:
    provider: anthropic
    config:
      model: 'claude-instant-1'
      temperature: 0.5
      max_tokens: 1000
      top_p: 1
      stream: false
  ```
</CodeGroup>

## Cohere

Install related dependencies using the following command:

```bash theme={null}
pip install --upgrade 'embedchain[cohere]'
```

Set the `COHERE_API_KEY` as environment variable which you can find on their [Account settings page](https://dashboard.cohere.com/api-keys).

Once you have the API key, you are all set to use it with Embedchain.

<CodeGroup>
  ```python main.py theme={null}
  import os
  from embedchain import App

  os.environ["COHERE_API_KEY"] = "xxx"

  # load llm configuration from config.yaml file
  app = App.from_config(config_path="config.yaml")
  ```

  ```yaml config.yaml theme={null}
  llm:
    provider: cohere
    config:
      model: large
      temperature: 0.5
      max_tokens: 1000
      top_p: 1
  ```
</CodeGroup>

## Together

Install related dependencies using the following command:

```bash theme={null}
pip install --upgrade 'embedchain[together]'
```

Set the `TOGETHER_API_KEY` as environment variable which you can find on their [Account settings page](https://api.together.xyz/settings/api-keys).

Once you have the API key, you are all set to use it with Embedchain.

<CodeGroup>
  ```python main.py theme={null}
  import os
  from embedchain import App

  os.environ["TOGETHER_API_KEY"] = "xxx"

  # load llm configuration from config.yaml file
  app = App.from_config(config_path="config.yaml")
  ```

  ```yaml config.yaml theme={null}
  llm:
    provider: together
    config:
      model: togethercomputer/RedPajama-INCITE-7B-Base
      temperature: 0.5
      max_tokens: 1000
      top_p: 1
  ```
</CodeGroup>

## Ollama

Setup Ollama using [https://github.com/jmorganca/ollama](https://github.com/jmorganca/ollama)

<CodeGroup>
  ```python main.py theme={null}
  import os
  os.environ["OLLAMA_HOST"] = "http://127.0.0.1:11434"
  from embedchain import App

  # load llm configuration from config.yaml file
  app = App.from_config(config_path="config.yaml")
  ```

  ```yaml config.yaml theme={null}
  llm:
    provider: ollama
    config:
      model: 'llama2'
      temperature: 0.5
      top_p: 1
      stream: true
      base_url: 'http://localhost:11434'
  embedder:
    provider: ollama
    config:
      model: znbang/bge:small-en-v1.5-q8_0
      base_url: http://localhost:11434

  ```
</CodeGroup>

## vLLM

Setup vLLM by following instructions given in [their docs](https://docs.vllm.ai/en/latest/getting_started/installation.html).

<CodeGroup>
  ```python main.py theme={null}
  import os
  from embedchain import App

  # load llm configuration from config.yaml file
  app = App.from_config(config_path="config.yaml")
  ```

  ```yaml config.yaml theme={null}
  llm:
    provider: vllm
    config:
      model: 'meta-llama/Llama-2-70b-hf'
      temperature: 0.5
      top_p: 1
      top_k: 10
      stream: true
      trust_remote_code: true
  ```
</CodeGroup>

## Clarifai

Install related dependencies using the following command:

```bash theme={null}
pip install --upgrade 'embedchain[clarifai]'
```

set the `CLARIFAI_PAT` as environment variable which you can find in the [security page](https://clarifai.com/settings/security). Optionally you can also pass the PAT key as parameters in LLM/Embedder class.

Now you are all set with exploring Embedchain.

<CodeGroup>
  ```python main.py theme={null}
  import os
  from embedchain import App

  os.environ["CLARIFAI_PAT"] = "XXX"

  # load llm configuration from config.yaml file
  app = App.from_config(config_path="config.yaml")

  #Now let's add some data.
  app.add("https://www.forbes.com/profile/elon-musk")

  #Query the app
  response = app.query("what college degrees does elon musk have?")
  ```

  Head to [Clarifai Platform](https://clarifai.com/explore/models?page=1\&perPage=24\&filterData=%5B%7B%22field%22%3A%22use_cases%22%2C%22value%22%3A%5B%22llm%22%5D%7D%5D) to browse various State-of-the-Art LLM models for your use case.
  For passing model inference parameters use `model_kwargs` argument in the config file. Also you can use `api_key` argument to pass `CLARIFAI_PAT` in the config.

  ```yaml config.yaml theme={null}
  llm:
   provider: clarifai
   config:
     model: "https://clarifai.com/mistralai/completion/models/mistral-7B-Instruct"
     model_kwargs:
       temperature: 0.5
       max_tokens: 1000  
  embedder:
   provider: clarifai
   config:
     model: "https://clarifai.com/clarifai/main/models/BAAI-bge-base-en-v15"
  ```
</CodeGroup>

## GPT4ALL

Install related dependencies using the following command:

```bash theme={null}
pip install --upgrade 'embedchain[opensource]'
```

GPT4all is a free-to-use, locally running, privacy-aware chatbot. No GPU or internet required. You can use this with Embedchain using the following code:

<CodeGroup>
  ```python main.py theme={null}
  from embedchain import App

  # load llm configuration from config.yaml file
  app = App.from_config(config_path="config.yaml")
  ```

  ```yaml config.yaml theme={null}
  llm:
    provider: gpt4all
    config:
      model: 'orca-mini-3b-gguf2-q4_0.gguf'
      temperature: 0.5
      max_tokens: 1000
      top_p: 1
      stream: false

  embedder:
    provider: gpt4all
  ```
</CodeGroup>

## JinaChat

First, set `JINACHAT_API_KEY` in environment variable which you can obtain from [their platform](https://chat.jina.ai/api).

Once you have the key, load the app using the config yaml file:

<CodeGroup>
  ```python main.py theme={null}
  import os
  from embedchain import App

  os.environ["JINACHAT_API_KEY"] = "xxx"
  # load llm configuration from config.yaml file
  app = App.from_config(config_path="config.yaml")
  ```

  ```yaml config.yaml theme={null}
  llm:
    provider: jina
    config:
      temperature: 0.5
      max_tokens: 1000
      top_p: 1
      stream: false
  ```
</CodeGroup>

## Hugging Face

Install related dependencies using the following command:

```bash theme={null}
pip install --upgrade 'embedchain[huggingface-hub]'
```

First, set `HUGGINGFACE_ACCESS_TOKEN` in environment variable which you can obtain from [their platform](https://huggingface.co/settings/tokens).

You can load the LLMs from Hugging Face using three ways:

* [Hugging Face Hub](#hugging-face-hub)
* [Hugging Face Local Pipelines](#hugging-face-local-pipelines)
* [Hugging Face Inference Endpoint](#hugging-face-inference-endpoint)

### Hugging Face Hub

To load the model from Hugging Face Hub, use the following code:

<CodeGroup>
  ```python main.py theme={null}
  import os
  from embedchain import App

  os.environ["HUGGINGFACE_ACCESS_TOKEN"] = "xxx"

  config = {
    "app": {"config": {"id": "my-app"}},
    "llm": {
        "provider": "huggingface",
        "config": {
            "model": "bigscience/bloom-1b7",
            "top_p": 0.5,
            "max_length": 200,
            "temperature": 0.1,
        },
    },
  }

  app = App.from_config(config=config)
  ```
</CodeGroup>

### Hugging Face Local Pipelines

If you want to load the locally downloaded model from Hugging Face, you can do so by following the code provided below:

<CodeGroup>
  ```python main.py theme={null}
  from embedchain import App

  config = {
    "app": {"config": {"id": "my-app"}},
    "llm": {
        "provider": "huggingface",
        "config": {
            "model": "Trendyol/Trendyol-LLM-7b-chat-v0.1",
            "local": True,  # Necessary if you want to run model locally
            "top_p": 0.5,
            "max_tokens": 1000,
            "temperature": 0.1,
        },
    }
  }
  app = App.from_config(config=config)
  ```
</CodeGroup>

### Hugging Face Inference Endpoint

You can also use [Hugging Face Inference Endpoints](https://huggingface.co/docs/inference-endpoints/index#-inference-endpoints) to access custom endpoints. First, set the `HUGGINGFACE_ACCESS_TOKEN` as above.

Then, load the app using the config yaml file:

<CodeGroup>
  ```python main.py theme={null}
  from embedchain import App

  config = {
    "app": {"config": {"id": "my-app"}},
    "llm": {
        "provider": "huggingface",
        "config": {
          "endpoint": "https://api-inference.huggingface.co/models/gpt2",
          "model_params": {"temprature": 0.1, "max_new_tokens": 100}
        },
    },
  }
  app = App.from_config(config=config)

  ```
</CodeGroup>

Currently only supports `text-generation` and `text2text-generation` for now \[[ref](https://api.python.langchain.com/en/latest/llms/langchain_community.llms.huggingface_endpoint.HuggingFaceEndpoint.html?highlight=huggingfaceendpoint#)].

See langchain's [hugging face endpoint](https://python.langchain.com/docs/integrations/chat/huggingface#huggingfaceendpoint) for more information.

## Llama2

Llama2 is integrated through [Replicate](https://replicate.com/).  Set `REPLICATE_API_TOKEN` in environment variable which you can obtain from [their platform](https://replicate.com/account/api-tokens).

Once you have the token, load the app using the config yaml file:

<CodeGroup>
  ```python main.py theme={null}
  import os
  from embedchain import App

  os.environ["REPLICATE_API_TOKEN"] = "xxx"

  # load llm configuration from config.yaml file
  app = App.from_config(config_path="config.yaml")
  ```

  ```yaml config.yaml theme={null}
  llm:
    provider: llama2
    config:
      model: 'a16z-infra/llama13b-v2-chat:df7690f1994d94e96ad9d568eac121aecf50684a0b0963b25a41cc40061269e5'
      temperature: 0.5
      max_tokens: 1000
      top_p: 0.5
      stream: false
  ```
</CodeGroup>

## Vertex AI

Setup Google Cloud Platform application credentials by following the instruction on [GCP](https://cloud.google.com/docs/authentication/external/set-up-adc). Once setup is done, use the following code to create an app using VertexAI as provider:

<CodeGroup>
  ```python main.py theme={null}
  from embedchain import App

  # load llm configuration from config.yaml file
  app = App.from_config(config_path="config.yaml")
  ```

  ```yaml config.yaml theme={null}
  llm:
    provider: vertexai
    config:
      model: 'chat-bison'
      temperature: 0.5
      top_p: 0.5
  ```
</CodeGroup>

## Mistral AI

Obtain the Mistral AI api key from their [console](https://console.mistral.ai/).

<CodeGroup>
  ```python main.py theme={null}
  os.environ["MISTRAL_API_KEY"] = "xxx"

  app = App.from_config(config_path="config.yaml")

  app.add("https://www.forbes.com/profile/elon-musk")

  response = app.query("what is the net worth of Elon Musk?")
  # As of January 16, 2024, Elon Musk's net worth is $225.4 billion.

  response = app.chat("which companies does elon own?")
  # Elon Musk owns Tesla, SpaceX, Boring Company, Twitter, and X.

  response = app.chat("what question did I ask you already?")
  # You have asked me several times already which companies Elon Musk owns, specifically Tesla, SpaceX, Boring Company, Twitter, and X.
  ```

  ```yaml config.yaml theme={null}
  llm:
    provider: mistralai
    config:
      model: mistral-tiny
      temperature: 0.5
      max_tokens: 1000
      top_p: 1
  embedder:
    provider: mistralai
    config:
      model: mistral-embed
  ```
</CodeGroup>

## AWS Bedrock

### Setup

* Before using the AWS Bedrock LLM, make sure you have the appropriate model access from [Bedrock Console](https://us-east-1.console.aws.amazon.com/bedrock/home?region=us-east-1#/modelaccess).
* You will also need to authenticate the `boto3` client by using a method in the [AWS documentation](https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html#configuring-credentials)
* You can optionally export an `AWS_REGION`

### Usage

<CodeGroup>
  ```python main.py theme={null}
  import os
  from embedchain import App

  os.environ["AWS_REGION"] = "us-west-2"

  app = App.from_config(config_path="config.yaml")
  ```

  ```yaml config.yaml theme={null}
  llm:
    provider: aws_bedrock
    config:
      model: amazon.titan-text-express-v1
      # check notes below for model_kwargs
      model_kwargs:
        temperature: 0.5
        topP: 1
        maxTokenCount: 1000
  ```
</CodeGroup>

<br />

<Note>
  The model arguments are different for each providers. Please refer to the [AWS Bedrock Documentation](https://us-east-1.console.aws.amazon.com/bedrock/home?region=us-east-1#/providers) to find the appropriate arguments for your model.
</Note>

<br />

## Groq

[Groq](https://groq.com/) is the creator of the world's first Language Processing Unit (LPU), providing exceptional speed performance for AI workloads running on their LPU Inference Engine.

### Usage

In order to use LLMs from Groq, go to their [platform](https://console.groq.com/keys) and get the API key.

Set the API key as `GROQ_API_KEY` environment variable or pass in your app configuration to use the model as given below in the example.

<CodeGroup>
  ```python main.py theme={null}
  import os
  from embedchain import App

  # Set your API key here or pass as the environment variable
  groq_api_key = "gsk_xxxx"

  config = {
      "llm": {
          "provider": "groq",
          "config": {
              "model": "mixtral-8x7b-32768",
              "api_key": groq_api_key,
              "stream": True
          }
      }
  }

  app = App.from_config(config=config)
  # Add your data source here
  app.add("https://docs.embedchain.ai/sitemap.xml", data_type="sitemap")
  app.query("Write a poem about Embedchain")

  # In the realm of data, vast and wide,
  # Embedchain stands with knowledge as its guide.
  # A platform open, for all to try,
  # Building bots that can truly fly.

  # With REST API, data in reach,
  # Deployment a breeze, as easy as a speech.
  # Updating data sources, anytime, anyday,
  # Embedchain's power, never sway.

  # A knowledge base, an assistant so grand,
  # Connecting to platforms, near and far.
  # Discord, WhatsApp, Slack, and more,
  # Embedchain's potential, never a bore.
  ```
</CodeGroup>

## NVIDIA AI

[NVIDIA AI Foundation Endpoints](https://www.nvidia.com/en-us/ai-data-science/foundation-models/) let you quickly use NVIDIA's AI models, such as Mixtral 8x7B, Llama 2 etc, through our API. These models are available in the [NVIDIA NGC catalog](https://catalog.ngc.nvidia.com/ai-foundation-models), fully optimized and ready to use on NVIDIA's AI platform. They are designed for high speed and easy customization, ensuring smooth performance on any accelerated setup.

### Usage

In order to use LLMs from NVIDIA AI, create an account on [NVIDIA NGC Service](https://catalog.ngc.nvidia.com/).

Generate an API key from their dashboard. Set the API key as `NVIDIA_API_KEY` environment variable. Note that the `NVIDIA_API_KEY` will start with `nvapi-`.

Below is an example of how to use LLM model and embedding model from NVIDIA AI:

<CodeGroup>
  ```python main.py theme={null}
  import os
  from embedchain import App

  os.environ['NVIDIA_API_KEY'] = 'nvapi-xxxx'

  config = {
      "app": {
          "config": {
              "id": "my-app",
          },
      },
      "llm": {
          "provider": "nvidia",
          "config": {
              "model": "nemotron_steerlm_8b",
          },
      },
      "embedder": {
          "provider": "nvidia",
          "config": {
              "model": "nvolveqa_40k",
              "vector_dimension": 1024,
          },
      },
  }

  app = App.from_config(config=config)

  app.add("https://www.forbes.com/profile/elon-musk")
  answer = app.query("What is the net worth of Elon Musk today?")
  # Answer: The net worth of Elon Musk is subject to fluctuations based on the market value of his holdings in various companies.
  # As of March 1, 2024, his net worth is estimated to be approximately $210 billion. However, this figure can change rapidly due to stock market fluctuations and other factors.
  # Additionally, his net worth may include other assets such as real estate and art, which are not reflected in his stock portfolio.
  ```
</CodeGroup>

## Token Usage

You can get the cost of the query by setting `token_usage` to `True` in the config file. This will return the token details: `prompt_tokens`, `completion_tokens`, `total_tokens`, `total_cost`, `cost_currency`.
The list of paid LLMs that support token usage are:

* OpenAI
* Vertex AI
* Anthropic
* Cohere
* Together
* Groq
* Mistral AI
* NVIDIA AI

Here is an example of how to use token usage:

<CodeGroup>
  ```python main.py theme={null}
  os.environ["OPENAI_API_KEY"] = "xxx"

  app = App.from_config(config_path="config.yaml")

  app.add("https://www.forbes.com/profile/elon-musk")

  response = app.query("what is the net worth of Elon Musk?")
  # {'answer': 'Elon Musk's net worth is $209.9 billion as of 6/9/24.',
  #   'usage': {'prompt_tokens': 1228,
  #   'completion_tokens': 21, 
  #   'total_tokens': 1249, 
  #   'total_cost': 0.001884, 
  #   'cost_currency': 'USD'}
  # }


  response = app.chat("Which companies did Elon Musk found?")
  # {'answer': 'Elon Musk founded six companies, including Tesla, which is an electric car maker, SpaceX, a rocket producer, and the Boring Company, a tunneling startup.',
  #   'usage': {'prompt_tokens': 1616,
  #   'completion_tokens': 34,
  #   'total_tokens': 1650,
  #   'total_cost': 0.002492,
  #   'cost_currency': 'USD'}
  # }
  ```

  ```yaml config.yaml theme={null}
  llm:
    provider: openai
    config:
      model: gpt-4o-mini
      temperature: 0.5
      max_tokens: 1000
      token_usage: true
  ```
</CodeGroup>

If a model is missing and you'd like to add it to `model_prices_and_context_window.json`, please feel free to open a PR.

<br />

<Snippet file="missing-llm-tip.mdx" />