Skip to content

Commit b7bd3d6

Browse files
authored
Update index.md
1 parent 575be4d commit b7bd3d6

File tree

1 file changed

+8
-1
lines changed

1 file changed

+8
-1
lines changed

docs/index.md

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,11 +24,18 @@
2424
</p>
2525

2626
## Introduction
27-
SciCode is a newly developed benchmark designed to evaluate the capabilities of language models (LMs) in generating code for solving realistic scientific research problems. It has a diverse coverage of **6** domains: Physics, Math, Material Science, Biology, and Chemistry. They span 16 diverse natural science sub-fields. Unlike previous benchmarks that consist of question-answer pairs, SciCode problems naturally factorize into multiple subproblems, each involving knowledge recall, reasoning, and code synthesis. In total, SciCode contains **338** subproblems decomposed from **80** challenging main problems, and it offers optional descriptions specifying useful scientific background information and scientist-annotated gold-standard solutions and test cases for evaluation.
27+
SciCode is a newly developed benchmark designed to evaluate the capabilities of language models (LMs) in generating code for solving realistic scientific research problems. It has a diverse coverage of **6** domains: Physics, Math, Material Science, Biology, and Chemistry. They span 16 diverse natural science sub-fields. Unlike previous benchmarks that consist of question-answer pairs, SciCode problems naturally factorize into multiple subproblems, each involving knowledge recall, reasoning, and code synthesis. In total, SciCode contains **338** subproblems decomposed from **80** challenging main problems, and it offers optional descriptions specifying useful scientific background information and scientist-annotated gold-standard solutions and test cases for evaluation. Claude3.5-Sonnet, the best-performing model among those tested, can solve only **4.6%** of the problems in the most realistic setting.
2828

2929

3030
## Overview
3131

32+
| **Fields** | **Subfields** |
33+
|----------------------|---------------------------------------------------------------------------------------------------------------|
34+
| **Mathematics** | Numerical Linear Algebra (7), Computational Mechanics (6), Computational Finance (1) |
35+
| **Physics** | Condensed Matter Physics (13), Optics (10), Quantum Information/Computing (6), Computational Physics (5), Astrophysics (2), Particle Physics (1) |
36+
| **Chemistry** | Quantum Chemistry (5), Computational Chemistry (3) |
37+
| **Biology** | Ecology (6), Biochemistry (1), Genetics (1) |
38+
| **Material Science** | Semiconductor Materials (7), Molecular Modeling (6) |
3239

3340

3441
<div class="grid cards" markdown>

0 commit comments

Comments
 (0)