Question 1: Which model has the highest total score in your test results? What do you think are its advantages and disadvantages compared to the other?
Model B had the higher test score in my test results. I think the reason LLM B had a better test score was because of it's overally strength in math. While A and B are virtually the same on every other aspect giving the same answer, or very similar answers.
Question 2: Which model do you think is more suitable for you? Why?
I think Model B is more suitable for me as overall on every test subject Model B matched with Model A, but on math it was even better than Model A. This is why I think Model B was more suitable for me.
Question 3: What new understanding did this test experiment give you about the artificial intelligence large language model ?
This test showed me that AI language models/LLMs are good at language tasks and simple logic, as well as many others. LLM's can also have trouble with hard math and staying consistent when questions are worded differently as proved in example/test 4. It also taught me that testing AI with different kinds of questions, or even different models helps to understand AI's strengths and weaknesses better.
附件: 大语言模型测试实验表--评分简化版 (1).xlsx