context relevancy
, groundedness
, and answer relevancy
and return resultgpt-4
model as default LLM model for automatic evaluation. Hence, we require you to set OPENAI_API_KEY
as an environment variable.
questions
, contexts
, answer
. Here is an example of how to create a dataset for evaluation:
gpt-4
model to determine the relevancy of the context. We achieve this by prompting the model with the question and the context and asking it to return relevant sentences from the context. We then use the following formula to determine the score:
ContextRelevanceConfig
class.
Here is a more advanced example of how to pass a custom evaluation config for evaluating on context relevance metric:
ContextRelevanceConfig
gpt-4
. We only support openaiโs models for now.None
. If not provided, we will use the OPENAI_API_KEY
environment variable.en
.CONTEXT_RELEVANCY_PROMPT
, which can be found at embedchain.config.evaluation.base
path.AnswerRelevanceConfig
class. Here is a more advanced example where you can provide your own evaluation config:
AnswerRelevanceConfig
gpt-4
. We only support openaiโs models for now.text-embedding-ada-002
. We only support openaiโs embedders for now.None
. If not provided, we will use the OPENAI_API_KEY
environment variable.1
.num_gen_questions
number of questions from the provided answer. Defaults to ANSWER_RELEVANCY_PROMPT
, which can be found at embedchain.config.evaluation.base
path.gpt-4
model to determine the groundedness of the answer. We achieve this by prompting the model with the answer and asking it to generate claims from the answer. We then again prompt the model with the context and the generated claims to determine the verdict on the claims. We then use the following formula to determine the score:
GroundednessConfig
class. Here is a more advanced example where you can configure the evaluation config:
GroundednessConfig
gpt-4
. We only support openaiโs models for now.None
. If not provided, we will use the OPENAI_API_KEY
environment variable.GROUNDEDNESS_ANSWER_CLAIMS_PROMPT
, which can be found at embedchain.config.evaluation.base
path.GROUNDEDNESS_CLAIMS_INFERENCE_PROMPT
, which can be found at embedchain.config.evaluation.base
path.BaseMetric
class. You can find the source code for the existing metrics at embedchain.evaluation.metrics
path.
name
of your custom metric in the __init__
method of your class. This name will be used to identify your metric in the evaluation report.