📊 add - Embedchain

add() method is used to load the data sources from different data sources to a RAG pipeline. You can find the signature below:

Parameters

source

str

The data to embed, can be a URL, local file or raw content, depending on the data type.. You can find the full list of supported data sources here.

data_type

str

Type of data source. It can be automatically detected but user can force what data type to load as.

metadata

dict

Any metadata that you want to store with the data source. Metadata is generally really useful for doing metadata filtering on top of semantic search to yield faster search and better results.

all_references

bool

This parameter instructs Embedchain to retrieve all the context and information from the specified link, as well as from any reference links on the page.

Usage

Load data from webpage

Code example

from embedchain import App

app = App()
app.add("https://www.forbes.com/profile/elon-musk")
# Inserting batches in chromadb: 100%|███████████████| 1/1 [00:00<00:00,  1.19it/s]
# Successfully saved https://www.forbes.com/profile/elon-musk (DataType.WEB_PAGE). New chunks count: 4

Load data from sitemap

Code example

from embedchain import App

app = App()
app.add("https://python.langchain.com/sitemap.xml", data_type="sitemap")
# Loading pages: 100%|█████████████| 1108/1108 [00:47<00:00, 23.17it/s]
# Inserting batches in chromadb: 100%|█████████| 111/111 [04:41<00:00,  2.54s/it]
# Successfully saved https://python.langchain.com/sitemap.xml (DataType.SITEMAP). New chunks count: 11024

You can find complete list of supported data sources here.

​Parameters

​Usage

​Load data from webpage

​Load data from sitemap

Parameters

Usage

Load data from webpage

Load data from sitemap