Rubric Development

Create a rubric to evaluate your best prompts

Farsight utilizes the Prometheus prompting methodology to auto-evaluate system prompts.

To begin creating your rubric, we suggest

1) Synthesizing a description of your use case. For example:

To develop a secure and efficient internal HR chatbot for a financial
institution that assists employees with HR-related queries while ensuring the 
protection of private personal information.

2) Prompting by using chatGPT to generate your evaluation rubric with your use case in the Prometheus prompt. To do so, prompt the chat as follows:

Given this use case: To develop a secure and efficient internal HR chatbot for a financial
institution that assists employees with HR-related queries while ensuring the 
protection of private personal information.

I would like to create an evaluation rubric to effectively evaluate chat bot 
responses. Can you provide an example {instruction}, example {reference_answer}, 
can fill in the {criteria_description}, and the five {score_descriptions} for my 
use case? Please keep the rest of the format exactly the same. Please create one 
evaluation rubric from 1 to 5 with no subcategories.

### Reference Answer (Score 5):
{reference_answer}

### Score Rubric:
[{criteria_description}]
Score 1: {score1_description}
Score 2: {score2_description}
Score 3: {score3_description}
Score 4: {score4_description}
Score 5: {score5_description}

Please provide a single, consolidated rubric for evaluating these criteria.

Example Response:

Here is an example response from chatGPT, simply input the reference answer and score rubric into our get_best_prompt function with a few different prompts to evaluate.

Certainly! Here's an example evaluation rubric for the given use case:

### Reference Answer (Score 5):
The chatbot response addresses the HR-related query accurately, provides relevant 
information, and ensures the protection of private personal information by following 
security protocols and compliance standards.

### Score Rubric:
**[{criteria_description}]**
Score 1: The response does not address the HR-related query accurately and fails to 
provide relevant information. It does not ensure the protection of private personal 
information.

Score 2: The response partially addresses the HR-related query but lacks accuracy and 
relevant information. There are concerns about the protection of private personal 
information.

Score 3: The response adequately addresses the HR-related query with some accuracy 
and relevant information. It demonstrates basic efforts to protect private personal 
information.

Score 4: The response accurately addresses the HR-related query, provides relevant 
information, and demonstrates a good level of protection for private personal 
information.

Score 5: The response precisely addresses the HR-related query, offers highly 
relevant information, and ensures an exemplary level of protection for private 
personal information. It adheres to the highest security standards and compliance 
requirements.

This rubric aims to assess the accuracy and relevance of the chatbot's responses to 
HR-related queries while emphasizing the importance of ensuring the security of 
private personal information. The scores progressively reflect the chatbot's 
performance, with a score of 5 representing an exemplary response that meets the 
highest standards of accuracy and security.

Last updated