|
112 | 112 | "source": [
|
113 | 113 | "## Project Lifecycle\n",
|
114 | 114 | "\n",
|
| 115 | + "Not every project will proceed in the same way, but projects generally have some \n", |
| 116 | + "important components in common.\n", |
| 117 | + "\n", |
| 118 | + "\n", |
| 119 | + "\n", |
| 120 | + "The solid arrows show the primary progressions or steps, while the dotted line \n", |
| 121 | + "represents the ongoing nature of problem understanding - uncovering more about\n", |
| 122 | + "the customer domain will influence every step of the process. We wil examine \n", |
| 123 | + "several of these iterative cycles of refinement in detail below. \n", |
115 | 124 | "Not every project will proceed in the same way, but projects generally have some common\n",
|
116 | 125 | "important components.\n",
|
117 | 126 | "\n",
|
|
133 | 142 | "It's very rare that a real-world project will start with all the data necessary to get\n",
|
134 | 143 | "to a satisfactory solution, much less to establish confidence.\n",
|
135 | 144 | "\n",
|
| 145 | + "In our case, we're going to assume that we have a decent sample of system *inputs*, \n", |
| 146 | + "in the form of but receipt images, but start without any fully annotated data. We find \n", |
| 147 | + "this is a not-unusual situation when automating an existing process. Instead, \n", |
| 148 | + "we'll walk through the process of building that out as we go along by collaborating with\n", |
| 149 | + "domain experts, and make our evals progressively more comprehensive.\n", |
136 | 150 | "In our case, we're going to assume that we have a decent sample of system *inputs*\n",
|
137 | 151 | "(here, photographs of receipts), but start without any fully annotated data. We'll walk\n",
|
138 | 152 | "through the process of incrementally expanding our test and training sets as we go along\n",
|
|
498 | 512 | "### Action Decision\n",
|
499 | 513 | "\n",
|
500 | 514 | "Next, we need to close the loop and get to an actual decision based on receipts. This\n",
|
| 515 | + "looks pretty similar, so we'll present the code without comment.\n", |
| 516 | + "\n", |
| 517 | + "Ordinarily one would start with the most capable model - `o3`, at this time - for a \n", |
| 518 | + "first pass, and then once correctness is established experiment with different models\n", |
| 519 | + "to analyze any tradeoffs for their business impact, and potentially consider whether \n", |
| 520 | + "they are remediable with iteration. A client may be willing to take a certain accuracy \n", |
| 521 | + "hit for lower latency or cost, or it may be more effective to change the architecture\n", |
| 522 | + "to hit cost, latency, and accuracy goals. We'll get into how to make these tradeoffs\n", |
| 523 | + "explicitly and objectively later on. \n", |
| 524 | + "\n", |
| 525 | + "For this cookbook, `o3` might be too good. We'll use `o4-mini` for our first pass, so \n", |
| 526 | + "that we get a few reasoning errors we can use to illustrate the means of addressing\n", |
| 527 | + "them when they occur.\n", |
| 528 | + "\n", |
| 529 | + "Next, we need to close the loop and get to an actual decision based on receipts. This\n", |
501 | 530 | "looks pretty similar, so we'll present the code without comment."
|
502 | 531 | ]
|
503 | 532 | },
|
|
887 | 916 | "metadata": {},
|
888 | 917 | "source": [
|
889 | 918 | "After you run that eval you'll be able to view it in the UI, and should see something\n",
|
| 919 | + "like the below. \n", |
| 920 | + "\n", |
| 921 | + "(Note, if you have a Zero-Data-Retention agreement, this data is not stored\n", |
| 922 | + "by OpenAI, so will not be available in this interface.)\n", |
890 | 923 | "like:\n",
|
891 | 924 | "\n",
|
892 | 925 | "\n",
|
|
1617 | 1650 | "ARE NOT TRAVEL-RELATED, THEN IT MUST BE AUDITED.\n",
|
1618 | 1651 | "```\n",
|
1619 | 1652 | "\n",
|
| 1653 | + "4. We added three examples, JSON input/output pairs wrapped in XML tags.\n", |
1620 | 1654 | "3. We added three examples, JSON input/output pairs wrapped in XML tags.\n",
|
1621 | 1655 | "\n",
|
1622 | 1656 | "With our prompt revisions, we'll regenerate the data to evaluate and re-run the same\n",
|
|
0 commit comments