add note to all promptfoo lectures

This commit is contained in:
Colt Steele MacBook
2024-09-04 18:46:32 -06:00
parent 93c65084f7
commit a24229134a
4 changed files with 13 additions and 0 deletions

View File

@@ -6,6 +6,9 @@
"source": [ "source": [
"# Promptfoo: classification evaluations\n", "# Promptfoo: classification evaluations\n",
"\n", "\n",
"**Note: This lesson lives in a folder that contains relevant code files. Download the entire folder if you want to follow along and run the evaluation yourself**\n",
"\n",
"\n",
"In an earlier lesson, we evaluated prompts to classify customer complains like: \n", "In an earlier lesson, we evaluated prompts to classify customer complains like: \n",
"\n", "\n",
"> Whenever I open your app, my phone gets really slow\n", "> Whenever I open your app, my phone gets really slow\n",

View File

@@ -6,6 +6,9 @@
"source": [ "source": [
"# Promptfoo: custom code graders\n", "# Promptfoo: custom code graders\n",
"\n", "\n",
"**Note: This lesson lives in a folder that contains relevant code files. Download the entire folder if you want to follow along and run the evaluation yourself**\n",
"\n",
"\n",
"So far we've seen how to use some of the built-in promptfoo graders like `exact-match` and `contains-all`. Those are often useful features, but promptfoo also gives us the ability to write custom grading logic for more specific grading tasks. \n", "So far we've seen how to use some of the built-in promptfoo graders like `exact-match` and `contains-all`. Those are often useful features, but promptfoo also gives us the ability to write custom grading logic for more specific grading tasks. \n",
"\n", "\n",
"To demonstrate this, we'll use a very simple prompt template:\n", "To demonstrate this, we'll use a very simple prompt template:\n",

View File

@@ -6,6 +6,9 @@
"source": [ "source": [
"# Model-graded evaluations with promptfoo\n", "# Model-graded evaluations with promptfoo\n",
"\n", "\n",
"**Note: This lesson lives in a folder that contains relevant code files. Download the entire folder if you want to follow along and run the evaluation yourself**\n",
"\n",
"\n",
"So far, we've only written code-graded evaluations. Whenever possible, code-graded evaluations are the simplest and least-expensive evaluations to run. They offer clear-cut, objective assessments based on predefined criteria, making them ideal for tasks with straightforward, quantifiable outcomes. The trouble is that code-graded evaluations can only grade certain types of outputs, primarily those that can be reduced to exact matches, numerical comparisons, or other programmable logic.\n", "So far, we've only written code-graded evaluations. Whenever possible, code-graded evaluations are the simplest and least-expensive evaluations to run. They offer clear-cut, objective assessments based on predefined criteria, making them ideal for tasks with straightforward, quantifiable outcomes. The trouble is that code-graded evaluations can only grade certain types of outputs, primarily those that can be reduced to exact matches, numerical comparisons, or other programmable logic.\n",
"\n", "\n",
"However, many real-world applications of language models require more nuanced evaluation. Suppose we wanted to build a chatbot to be used in middle-school classrooms. We might want to evaluate the outputs to make sure they use age-appropriate language, maintain an educational tone, avoid answering non-academic questions, or provide explanations at a suitable complexity level for middle schoolers. These criteria are subjective and context-dependent, making them challenging to assess with traditional code-based methods. This is where model-graded evaluations can help!\n", "However, many real-world applications of language models require more nuanced evaluation. Suppose we wanted to build a chatbot to be used in middle-school classrooms. We might want to evaluate the outputs to make sure they use age-appropriate language, maintain an educational tone, avoid answering non-academic questions, or provide explanations at a suitable complexity level for middle schoolers. These criteria are subjective and context-dependent, making them challenging to assess with traditional code-based methods. This is where model-graded evaluations can help!\n",

View File

@@ -5,6 +5,10 @@
"metadata": {}, "metadata": {},
"source": [ "source": [
"# Custom model-graded evals \n", "# Custom model-graded evals \n",
"\n",
"**Note: This lesson lives in a folder that contains relevant code files. Download the entire folder if you want to follow along and run the evaluation yourself**\n",
"\n",
"\n",
"In this lesson, we'll see how we can write custom model-graded evaluations using promptfoo. We'll start with a simple prompting goal: we want to write a prompt that can turn long, technically complex Wikipedia articles into short summaries appropriate for a grade school audience.\n", "In this lesson, we'll see how we can write custom model-graded evaluations using promptfoo. We'll start with a simple prompting goal: we want to write a prompt that can turn long, technically complex Wikipedia articles into short summaries appropriate for a grade school audience.\n",
"\n", "\n",
"For example, given the entire [Wikipedia entry on convolutional neural networks](https://en.wikipedia.org/wiki/Convolutional_neural_network), we want simple output summary like this one:\n", "For example, given the entire [Wikipedia entry on convolutional neural networks](https://en.wikipedia.org/wiki/Convolutional_neural_network), we want simple output summary like this one:\n",