Skip to content

Commit ded18e4

Browse files
Minor updates and images addition to cookbook (#1882)
Co-authored-by: Shyamal H Anadkat <[email protected]>
1 parent f92933b commit ded18e4

File tree

6 files changed

+39
-0
lines changed

6 files changed

+39
-0
lines changed

examples/partners/eval_driven_system_design/receipt_inspection.ipynb

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -112,6 +112,15 @@
112112
"source": [
113113
"## Project Lifecycle\n",
114114
"\n",
115+
"Not every project will proceed in the same way, but projects generally have some \n",
116+
"important components in common.\n",
117+
"\n",
118+
"![Project Lifecycle](../../../images/partner_project_lifecycle.png)\n",
119+
"\n",
120+
"The solid arrows show the primary progressions or steps, while the dotted line \n",
121+
"represents the ongoing nature of problem understanding - uncovering more about\n",
122+
"the customer domain will influence every step of the process. We wil examine \n",
123+
"several of these iterative cycles of refinement in detail below. \n",
115124
"Not every project will proceed in the same way, but projects generally have some common\n",
116125
"important components.\n",
117126
"\n",
@@ -133,6 +142,11 @@
133142
"It's very rare that a real-world project will start with all the data necessary to get\n",
134143
"to a satisfactory solution, much less to establish confidence.\n",
135144
"\n",
145+
"In our case, we're going to assume that we have a decent sample of system *inputs*, \n",
146+
"in the form of but receipt images, but start without any fully annotated data. We find \n",
147+
"this is a not-unusual situation when automating an existing process. Instead, \n",
148+
"we'll walk through the process of building that out as we go along by collaborating with\n",
149+
"domain experts, and make our evals progressively more comprehensive.\n",
136150
"In our case, we're going to assume that we have a decent sample of system *inputs*\n",
137151
"(here, photographs of receipts), but start without any fully annotated data. We'll walk\n",
138152
"through the process of incrementally expanding our test and training sets as we go along\n",
@@ -498,6 +512,21 @@
498512
"### Action Decision\n",
499513
"\n",
500514
"Next, we need to close the loop and get to an actual decision based on receipts. This\n",
515+
"looks pretty similar, so we'll present the code without comment.\n",
516+
"\n",
517+
"Ordinarily one would start with the most capable model - `o3`, at this time - for a \n",
518+
"first pass, and then once correctness is established experiment with different models\n",
519+
"to analyze any tradeoffs for their business impact, and potentially consider whether \n",
520+
"they are remediable with iteration. A client may be willing to take a certain accuracy \n",
521+
"hit for lower latency or cost, or it may be more effective to change the architecture\n",
522+
"to hit cost, latency, and accuracy goals. We'll get into how to make these tradeoffs\n",
523+
"explicitly and objectively later on. \n",
524+
"\n",
525+
"For this cookbook, `o3` might be too good. We'll use `o4-mini` for our first pass, so \n",
526+
"that we get a few reasoning errors we can use to illustrate the means of addressing\n",
527+
"them when they occur.\n",
528+
"\n",
529+
"Next, we need to close the loop and get to an actual decision based on receipts. This\n",
501530
"looks pretty similar, so we'll present the code without comment."
502531
]
503532
},
@@ -887,6 +916,10 @@
887916
"metadata": {},
888917
"source": [
889918
"After you run that eval you'll be able to view it in the UI, and should see something\n",
919+
"like the below. \n",
920+
"\n",
921+
"(Note, if you have a Zero-Data-Retention agreement, this data is not stored\n",
922+
"by OpenAI, so will not be available in this interface.)\n",
890923
"like:\n",
891924
"\n",
892925
"![Summary UI](../../../images/partner_summary_ui.png)\n",
@@ -1617,6 +1650,7 @@
16171650
"ARE NOT TRAVEL-RELATED, THEN IT MUST BE AUDITED.\n",
16181651
"```\n",
16191652
"\n",
1653+
"4. We added three examples, JSON input/output pairs wrapped in XML tags.\n",
16201654
"3. We added three examples, JSON input/output pairs wrapped in XML tags.\n",
16211655
"\n",
16221656
"With our prompt revisions, we'll regenerate the data to evaluate and re-run the same\n",
-126 KB
Loading
-317 KB
Loading

images/partner_process_flowchart.png

7.02 KB
Loading

images/partner_project_lifecycle.png

161 KB
Loading

registry.yaml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,8 +9,13 @@
99
date: 2025-06-01
1010
authors:
1111
- shikhar-cyber
12+
- moredatarequired
13+
- tooluser
14+
- eddiesiegel
1215
tags:
1316
- evals
17+
- API Flywheel
18+
- completions
1419
- responses
1520
- functions
1621
- tracing

0 commit comments

Comments
 (0)