调查开源大模型的数学能力
Model | Params | GSM8K | MATH | links |
---|---|---|---|---|
GPT-4 | close | 92.0 | 42.5 | https://arxiv.org/pdf/2309.16609.pdf |
GPT3.5 | close | 80.8 | 34.1 | https://arxiv.org/pdf/2309.16609.pdf |
QWEN-CHAT | 7B | 50.3 | 6.8 | https://arxiv.org/pdf/2309.16609.pdf |
QWEN-CHAT | 14B | 60.1 | 18.4 | https://arxiv.org/pdf/2309.16609.pdf |
Qwen1.5 |
72B |
79.5 | 34.1 | https://qwenlm.github.io/blog/qwen1.5/ |
ChatGLM3-Base |
6B | 72.3(0-shot CoT) | 25.7 | https://github.com/THUDM/ChatGLM3 |
BiLLa(ChatGLM升级) | 7B | - | - | https://github.com/Neutralzz/BiLLa |
Mixtral-8x7B | 45B | 74.4 | 28.4 | https://mistral.ai/news/mixtral-of-experts/ |
WizardMath | 7B | 54.9 | 10.7 | https://arxiv.org/pdf/2308.09583.pdf |
13B | 63.9 | 14.0 | https://arxiv.org/pdf/2308.09583.pdf | |
70B | 81.6 | 22.7 | https://arxiv.org/pdf/2308.09583.pdf |
Model | Params | 样题-1 | 发行时间 | ||
---|---|---|---|---|---|
ChatGLM3 |
6B | no | |||
Qwen | 72B | no | |||
14B | no(0.36) | ||||
Qwen1.5 | 72B | no(0.78294) 回答了很多 | 2024.02.05 | ||
BiLLa | 7B | - | - | - | |
This line appears after every note.