Generate Optimized Prompts
opro.generate_optimized_prompts(dataset, test_dataset=None, num_iterations=None, num_prompts_per_iteration=None, eval_function=None, sample_evaluations=None)
This function generates, assesses, and refines system prompts. It continuously enhances the quality of these prompts based on your specific datasets, ensuring the final output is an optimized system prompt tailored to your unique requirements.
Evaluation
To evaluate outputs and targets, by default, generate_optimized prompts determines the score by prompting GPT-3.5-Turbo
as follows:
To replace this evaluation method, please use the custom_eval_function
parameter and see our example below.
Make sure you have your OpenAI API Key before you begin.
dataset
list[dict]
Required: This should include pairs of "input" (the query or request) and "target" (the desired output). We recommend at least 50 samples for robust results, with a minimum of 3 samples needed.
See Dataset Configuration for more detail
test_dataset
list[dict]
Similar format as dataset
. These samples are used only for testing the efficacy of the generated prompts (validation step).
num_iterations
int
Total number of learning iterations for prompt optimization. Default is 40.
num_prompts_per_iteration
int
The number of different prompts generated per learning iteration. Default is 8.
sample_evaluations
boolean
If True
, includes the results of each sample evaluated in iterations. Default is False
custom_score_function
function
custom_score_function(input, output, expected_output, a custom scoring function that evaluates the output accuracy and returns a score between 0 and 1, with 1 being entirely accurate and 0 being entirely inaccurate.
list[dict]
This includes at most the top 20 system prompts, depending on the iteration settings. Each entry contains:
"prompt": always included, The generated system prompt.
"score": always included, The average performance score of the prompt across the dataset.
"test_score": included only if a
test_dataset
is provided. Reflects the prompt's efficacy on the test set."sample_evals": included only if
sample_evaluations
is set toTrue
. Contains evaluation results in the format list[dict], with "sample", "target", "output", and the corresponding "score".
Custom Score Function Example:
The following example illustrates a straightforward method for assessing whether a particular string appears in the output. For instance, in a dataset containing multiple-choice questions and answers, the desired string might be "(a)," while the output could be "(a) Joe Biden." To determine the accuracy of the output, we merely verify whether the target string is present within the output.
We offer the option to incorporate a custom scoring function because accuracy validation is frequently more intricate and tailored to specific requirements.
Last updated