Tutorial

Improve and evaluate your LLM applications with a few simple steps

Stop spending time on prompt iteration! Follow a few steps to efficiently uncover your optimized system prompt.

Want to integrate even quicker? Try out Farsight AI on a Colab notebook here.

Note: While you have the flexibility to assess the results of any Language Model (LLM), we specifically leverage OpenAI for the evaluation functions in Farsight AI. To utilize our package, you must have access to an OpenAI API Key.

Setup A Python Environment

Go to the root directory of your project and create a virtual environment (if you don't already have one). In the CLI, run:

python3 -m venv env
source venv/bin/activate

Installation

Install our library by running:

pip install farsightai

Suggested Starter Workflow

We suggest you start by generating a few system prompts via our generate prompts function, then start evaluating outputs using standard Farsight metrics. Follow the steps below:

  1. Generate several system prompts using our prompt generation functionality (we recommend starting with 5).

  2. Generate outputs using your preferred language model (e.g., Mistral, ChatGPT, Llama).

  3. Evaluate the results using our prompt evaluation function or any of our additional metrics.

We've provided an example of this suggested generation and evaluation process below.

from farsightai import FarsightAI
from openai import OpenAI

# Replace this with your openAI credentials
OPEN_AI_KEY = "<openai_key>"

# specify your use case parameters
num_prompts = 5
task = 'You are a chatbot answering scientific questions'
context = 'You have knowledge on all scientific concepts'
guidelines = ["Use limited jargon."]

# generate a few system prompts to evaluate
generated_prompts = farsight.generate_prompts(num_prompts, task, context, guidelines)
print("generated_prompts: ", generated_prompts)

# generate the outputs to evaluate
client = OpenAI(api_key=OPEN_AI_KEY)
input = "Can you describe the carbon cycle"
outputs = []
for system_prompt in generated_prompts:
    chatCompletion = client.chat.completions.create(
        model="gpt-3.5-turbo",
         messages = [
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": input},
        ],
    )
    output = chatCompletion.choices[0].message.content
    knowledge = None

# find your best prompt
criteria_description = """Can the model's response be understood by a non-expert
in the subject"""

rubric = """
Score 1: The response is filled with jargon and complex language, making it incomprehensible for a non-expert.
Score 2: the response includes some explanations, but still relies heavily on jargon and complex language.
Score 3: The response is somewhat clear, but could still be challenging for a non-expert to fully understand.
Score 4: the response is mostly comprehensible to a non-expert, with only a few complex terms or concepts
Score 5: the response is completely clear and understandable for a non-expert, with no reliance on jargon or complex language.
"""

reference_answer = """Photosynthesis is the process by which plants make their own
food using sunlight. In simple terms, they take in carbon dioxide from the air and
water from the soil, and with the help of sunlight, they transform these into sugars,
which the plant uses as energy. In the process, oxygen is released into the air,
benefiting the environment. So, photosynthesis is like the plant's way of cooking up
its own food using sunlight and a few basic ingredients."""


# Call the best_prompt function
best_prompt = farsight.best_prompt(
    criteria_description, rubric, reference_answer, system_propmts, input, outputs
)
print(best_prompt)

Last updated