Benchmark Model - Search News

Researchers find you don’t need a ton of data to train LLMs for reasoning tasks

With a few hundred well-curated examples, an LLM can be trained for complex reasoning tasks that previously required thousands of instances.

8don MSN

These researchers used NPR Sunday Puzzle questions to benchmark AI ‘reasoning’ models

Researchers used questions from the NPR Sunday Puzzle challenge to build a benchmark to test AI 'reasoning' models.

Decrypt1d

New Open Source AI Model Rivals DeepSeek's Performance—With Far Less Training Data

OpenThinker-32B achieved benchmark-beating results using just 14% of the data its Chinese competitor needed, marking a win ...

Techopedia4d

Diffbot’s AI Model Suggests “Smaller Is Better” for LLMs

Learn whether a smaller Diffbot’s AI model with an innovative GraphRAG AI training technology can solve AI hallucinations for ...

HackerRank Introduces New Benchmark to Assess Advanced AI Models

Industry Leader Known for Software Development Skills Expertise Introduces Real-World Benchmark of AI Software Development CapabilitiesCUPERTINO, Calif., Feb. 11, 2025 (GLOBE NEWSWIRE) -- HackerRank, ...

Techopedia1d

Kimi AI 1.5: New Chinese AI Model Beats ChatGPT & DeepSeek

Just days after DeepSeek R1 made headlines, Moonshot AI introduced Kimi AI 1.5, a model already touted superior to OpenAI’s ...

Is Perplexity's Sonar really more 'factual' than its AI rivals? See for yourself

The company claims its newly upgraded model is number one in user satisfaction and speed - but its methodology is unclear.

Fintel on MSN1d

Benchmark Initiates Coverage of Tesla (TSLA) with Buy Recommendation

Fintel reports that on February 12, 2025, Benchmark initiated coverage of Tesla (NasdaqGS:TSLA) with a Buy recommendation.

15h

Which AI agent is the best? This new leaderboard can tell you

On Wednesday, Galileo launched an Agent Leaderboard on Hugging Face, an open-source AI platform where users can build, train, access, and deploy AI models. The leaderboard is meant to help people ...

10d

I just tested ChatGPT's new o3-mini model with 7 prompts to rate its problem-solving and reasoning capabilities — and it blew me away

I went hands-on with 7 prompts to test the reasoning capabilities of the o3-mini, the newest ChatGPT model available in the ...

2don MSN

OpenAI’s DeepResearch can complete 26% of ‘Humanity’s Last Exam’ — a benchmark for the frontier of human knowledge

OpenAI’s o1 and DeepSeek’s R1 models, which previously sat atop the leaderboard, could only get through roughly 9% of the ...

Jordan News Agency (Petra) on MSN1d

Jordan-Qatar Labor Ties Set Benchmark For Arab Cooperation - Minister

Minister of Labor Khaled Bakkar highlighted Qatar-Jordan relations as an exemplary model for Arab economic cooperation, reflecting deep-rooted bilateral ties and the historic relationship between both ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results