App methods
๐ add
add()
method is used to load the data sources from different data sources to a RAG pipeline. You can find the signature below:
Parameters
source
str
The data to embed, can be a URL, local file or raw content, depending on the data type.. You can find the full list of supported data sources here.
data_type
str
Type of data source. It can be automatically detected but user can force what data type to load as.
metadata
dict
Any metadata that you want to store with the data source. Metadata is generally really useful for doing metadata filtering on top of semantic search to yield faster search and better results.
Usage
Load data from webpage
Code example
from embedchain import App
app = App()
app.add("https://www.forbes.com/profile/elon-musk")
# Inserting batches in chromadb: 100%|โโโโโโโโโโโโโโโ| 1/1 [00:00<00:00, 1.19it/s]
# Successfully saved https://www.forbes.com/profile/elon-musk (DataType.WEB_PAGE). New chunks count: 4
Load data from sitemap
Code example
from embedchain import App
app = App()
app.add("https://python.langchain.com/sitemap.xml", data_type="sitemap")
# Loading pages: 100%|โโโโโโโโโโโโโ| 1108/1108 [00:47<00:00, 23.17it/s]
# Inserting batches in chromadb: 100%|โโโโโโโโโ| 111/111 [04:41<00:00, 2.54s/it]
# Successfully saved https://python.langchain.com/sitemap.xml (DataType.SITEMAP). New chunks count: 11024
You can find complete list of supported data sources here.