Skip to content

Commit 7917a93

Browse files
Deployed aaf90f7 with MkDocs version: 1.6.0
1 parent 1716109 commit 7917a93

File tree

2 files changed

+2
-1
lines changed

2 files changed

+2
-1
lines changed

index.html

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -804,6 +804,7 @@ <h2 id="benchmark-statistics">Benchmark Statistics</h2>
804804
<p><a class="glightbox" href="figures/SciCode_chart.png" data-type="image" data-width="auto" data-height="auto" data-desc-position="bottom"><img alt="Image Title" src="figures/SciCode_chart.png" /></a></p>
805805
<p style="text-align: center;">Left: Distribution of Main Problems Right: Distribution of Subproblems</p>
806806

807+
<p>We include several research problems that are built upon or reproduce methods used in Nobel Prize-winning studies to highlight current trends in scientific research: the self-consistent field (SCF) method for density functional theory (DFT) calculations (<strong>The Nobel Prize in Chemistry 1998</strong>), the PMNS matrix for neutrino oscillation in matter (<strong>The Nobel Prize in Physics 2015</strong>), the Haldane model for the anomalous quantum Hall effect (<strong>The Nobel Prize in Physics 2016</strong>), optical tweezer simulations for microscopic thermodynamics (<strong>The Nobel Prize in Physics 2018</strong>), and the replica method for spin glasses (<strong>The Nobel Prize in Physics 2021</strong>).</p>
807808
<h2 id="experiment-results">Experiment Results</h2>
808809
<p>We evaluate our model using zero-shot prompts. We keep the prompts general and design different ones for different evaluation setups only to inform the model about the tasks. We keep prompts the same across models and fields, and they contain the model’s main and sub-problem instructions and code for previous subproblems. The standard setup means the model is tested without background knowledge and carrying over generated solutions to previous subproblems. The scientists' annotated background provides the necessary knowledge and reasoning steps to solve the problems, shifting the evaluation’s focus more towards the models’ coding and instruction-following capabilities.
809810
<a class="glightbox" href="figures/Standard_Setup.png" data-type="image" data-width="auto" data-height="auto" data-desc-position="bottom"><img alt="Image Title" src="figures/Standard_Setup.png" /></a>

0 commit comments

Comments
 (0)