# Getting Started

Get evaluating now! Follow a few simple steps to improve your LLMs.

{% hint style="info" %}
Want to integrate even quicker? Try out Farsight AI on a Colab notebook [here](https://colab.research.google.com/drive/1_lXoxfu9fSfUhsJMhlxmMMNRuZpmKDqd?usp=sharing).
{% endhint %}

*<mark style="color:red;">Note:</mark> While you have the flexibility to assess the results of any Language Model (LLM), we specifically leverage OpenAI for the evaluation functions in Farsight AI. To utilize our package, you must have access to an* [*OpenAI API Key*](https://platform.openai.com/account/api-keys)*.*&#x20;

### Setup A Python Environment[​](https://docs.confident-ai.com/docs/getting-started#setup-a-python-environement) <a href="#setup-a-python-environement" id="setup-a-python-environement"></a>

Go to the root directory of your project and create a virtual environment (if you don't already have one). In the CLI, run:

```python
python3 -m venv env
source venv/bin/activate
```

### Installation[​](https://docs.confident-ai.com/docs/getting-started#installation) <a href="#installation" id="installation"></a>

Install our library by running:&#x20;

```
pip install farsightai
```

### Evaluate Your First Metric <a href="#create-your-first-test-case" id="create-your-first-test-case"></a>

Utilize our evaluation suite to score your LLM outputs. Below is an example of evaluation a query against an output using the Farsight quality metric.

```python
from farsightai import FarsightAI

# Replace with your openAI credentials
OPEN_AI_KEY = "<openai_key>"

query = "Who is the president of the United States"
farsight = FarsightAI(openai_key=OPEN_AI_KEY)

# Replace this with the actual output of your LLM application
output = "As of my last knowledge update in January 2022, Joe Biden is the President of the United States. However, keep in mind that my information might be outdated as my training data goes up to that time, and I do not have browsing capabilities to check for the most current information. Please verify with up-to-date sources."

quality_score = farsight.quality_score(query, output)
print("score: ", quality_score)
# score: 4
```

### Create Your First Custom Metric <a href="#create-your-first-test-case" id="create-your-first-test-case"></a>

Generate a custom metric to evaluate your LLM outputs. A custom metric will return True if the provided constraint is violated, and False if the ouptut complies with the provided constraint.

Below is an example of defining and measuring a custom metric that measures two independent constraints.

```python
from farsightai import FarsightAI

# Replace with your openAI credentials
OPEN_AI_KEY = "<openai_key>"

query = "Who is the president of the United States"
farsight = FarsightAI(openai_key=OPEN_AI_KEY)

# Replace this with the actual output of your LLM application
output = "As of my last knowledge update in January 2022, Joe Biden is the President of the United States. However, keep in mind that my information might be outdated as my training data goes up to that time, and I do not have browsing capabilities to check for the most current information. Please verify with up-to-date sources."
# Replace this with the actual constraints you want to check your LLM output for
constraints = ["do not mention Joe Biden", "do not talk about alcohol"]

custom_metric = farsight.custom_metrics(constraints, output)

print("score: ", custom_metric)
# score:  [True, False]
```

### Auto-Generate High-Quality Potential System Prompts <a href="#create-your-first-test-case" id="create-your-first-test-case"></a>

Don't want to waste time trying out different system prompts? Farsight automatically generates great candidates and allows you to quantitatively compare them using standard and custom metrics with minimal effort. Simply describe the use case of your application and seamlessly generate multiple system prompts to evaluate.

```python
from farsightai import FarsightAI

# Replace this with your openAI credentials
OPEN_AI_KEY = "<openai_key>"

# Replace this with your use case details
num_prompts = 2
task = 'You are a conversational chatbot'
context = 'The year is 2012'
guidelines = ["Don't answer questions about Britney Spears"]
farsight = FarsightAI(openai_key=OPEN_AI_KEY)

generated_prompts = farsight.generate_prompts(num_prompts, task, context, guidelines)
print("prompts: ", generated_prompts)

# prompts: [
#    "You are a conversational chatbot from the year 2012. Your goal is to answer questions 
#    based on your knowledge but without answering questions about Britney Spears. I'm here to help! Please 
#    provide me with a question and the specific knowledge you want me to use for 
#    answering it.",
#    "As a conversational chatbot in the year 2012, your goal is to answer questions 
#    accurately and concisely. Your guidelines are to not answer any questions about Britney Spears. Please provide the necessary information for me to 
#    generate a response."
# ]
```

### Prompt Optimization

For prompt optimization, we offer two distinct approaches - one with manual oversight and one with full automation. Choose the one that aligns best with your use case, workflows and anticipated functionality:

1. [**Step by Step Approach**](/sdk/step-by-step-prompt-optimization/introduction.md)**:** Generate multiple system prompts for evaluation and testing purposes. Tailor them based on context and optional system guidelines.
2. [**Fully Automated Approach**](/sdk/fully-automated-prompt-optimization/introduction.md)**:** Leverage our comprehensive automated prompt optimization function. This feature not only generates prompts but also evaluates and iteratively improves them. It operates based on your shadow traffic, evaluation rubric, and optional ground truth outputs.

{% hint style="info" %}
Want to integrate quickly? Try out Farsight AI on a colab notebook for automated prompt optimization [here](https://colab.research.google.com/drive/16iSiMl6ngqHKL7SuOnHhTkTDbe1EKRUA#scrollTo=c1X_fG-fn_XH) and step by step [here](https://colab.research.google.com/drive/1GvsRYFmKZZDc9U8ZyqbCD1dKPazt6uQw#scrollTo=VZCvUf0BC1s6).&#x20;
{% endhint %}

We included the fully automated approach below:&#x20;

```python
from farsightai import FarsightAI
from openai import OpenAI

# Replace this with your openAI credentials
OPEN_AI_KEY = "<openai_key>"

shadow_traffic = [
    "What are the current job openings in the company?",
    "How can I apply for a specific position?",
    "What is the status of my job application?",
    "Can you provide information about the company's benefits and perks?",
    "What is the company's policy on remote work or flexible schedules?",
    "How do I update my personal information in the HR system?",
    "Can you explain the process for employee onboarding?",
    "What training and development opportunities are available for employees?",
    "How is performance evaluation conducted in the company?",
    "Can you assist with information about employee assistance programs or wellness initiatives?",
]

farsight.get_best_system_prompts(shadow_traffic, gpt_optimized=True)
# Result:

# [
#     PromptEvaluation(
#         score=4.666666666666667,
#         system_prompt="<SYS> Thank you for reaching out to our HR chatbot. How can I assist you with your HR-related queries while ensuring the protection ...""
#         test_results=[
#               TestResult(
#                     score=5,
#                     input="What is the company's policy on remote work or flexible schedules?   "
#                     output="Our company recognizes the importance of work-life balance and understands that remote work or flexible schedules can contribute ..."
#               ),
#               TestResult( ... ),
#               TestResult( ... ),
#           ]),
#     PromptEvaluation( ... ),
#     PromptEvaluation( ... )
# ]
```

### Suggested Starter Workflow

We suggest you start by generating a few system prompts via our [generate prompts functionality](/sdk/step-by-step-prompt-optimization/prompt-generation.md), then start evaluating outputs using standard Farsight metrics. Follow the steps below:

(a) Generate a reasonable amount of system prompts (we suggest 5 to start)&#x20;

(b) Start to generate outputs using an LLM of your choice (Mistral, OpenAI, Anthropic)

(c) Finally, evaluate using our [metrics suite. ](/sdk/step-by-step-prompt-optimization/prompt-evaluation.md)We suggest starting with the standard Farsight metrics, and implementing custom metrics as needed for your evaluation.

**We've provided an example of this suggested generation and evaluation process below.**

```python
from farsightai import FarsightAI
from openai import OpenAI

# Replace this with your openAI credentials
OPEN_AI_KEY = "<openai_key>"

# specify your use case parameters
num_prompts = 2
task = 'You are a financial chatbot'
context = 'The year is 2008'
guidelines = ["Don't answer questions about the housing market."]

# generate a few system prompts to evaluate
generated_prompts = farsight.generate_prompts(num_prompts, task, context, guidelines)
print("generated_prompts: ", generated_prompts)

client = OpenAI(api_key=OPEN_AI_KEY)
# test a specific input
input = "What happened to the market in 2012"
for system_prompt in generated_prompts:
    # for each system prompt generate an output 
    chatCompletion = client.chat.completions.create(
        model="gpt-3.5-turbo",
         messages = [
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": input},
        ],
    )
    output = chatCompletion.choices[0].message.content
    knowledge = None
    print("input: ", input)
    print("output: ", output)
    print("---------------metrics---------------")

    # evaluate the output 
    factuality_score = farsight.factuality_score(input, output, knowledge)
    print("factuality_score: ", factuality_score)
    # factuality_score: true
    consistency_score = farsight.consistency_score(input, output)
    print("consistency_score: ", consistency_score)
    # factuality_score: 1.0
    quality_score = farsight.quality_score(input, output)
    print("quality_score: ", quality_score)
    # quality_score: 3
    conciseness_score = farsight.conciseness_score(input, output)
    print("conciseness_score: ", conciseness_score)
    # conciseness_score: 4
```


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://api.farsight-ai.com/sdk/get-started/welcome-to-the-farsight-ai-starter-library/getting-started.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
