Benchmark Model - Search News

End of an Era? CPU Performance Drops in 2025—What’s Really Happening?

According to new data from PassMark Software, which has offered PC benchmark testing tools since 1998, average CPU ...

Researchers find you don’t need a ton of data to train LLMs for reasoning tasks

With a few hundred well-curated examples, an LLM can be trained for complex reasoning tasks that previously required thousands of instances.

decrypt1d

New Open Source AI Model Rivals DeepSeek's Performance—With Far Less Training Data

OpenThinker-32B achieved benchmark-beating results using just 14% of the data its Chinese competitor needed, marking a win ...

Institutional Investor1d

Is There a Better Alternative to the Endowment Model? Top CIOs Weigh In.

In the new paper, the institute examines various alternatives to the current endowment model, including the Canadian model, ...

Techopedia1d

Kimi AI 1.5: New Chinese AI Model Beats ChatGPT & DeepSeek

Just days after DeepSeek R1 made headlines, Moonshot AI introduced Kimi AI 1.5, a model already touted superior to OpenAI’s ...

Less supervision, better results: Study shows AI models generalize more effectively on their own

Training LLMs and VLMs through reinforcement learning delivers better results than using hand-crafted examples.

Legion Go S Review: A Blueprint For Handheld Gaming Perfection

With so many meaningful improvements, the Legion Go S is a gorgeous and comfortable handheld. But does its high price and ...

Is Perplexity's Sonar really more 'factual' than its AI rivals? See for yourself

The company claims its newly upgraded model is number one in user satisfaction and speed - but its methodology is unclear.

Too Old to Operate2d

Advanced ICU Length of Stay Prediction Model for Improved Benchmarking

The following is a summary of “Prediction of Intensive Care Length of Stay for Surviving and Nonsurviving Patients Using Deep ...

OpenAI’s DeepResearch can complete 26% of ‘Humanity’s Last Exam’ — a benchmark for the frontier of human knowledge

OpenAI’s o1 and DeepSeek’s R1 models, which previously sat atop the leaderboard, could only get through roughly 9% of the ...

Diginomica2d

AI and energy use - why a new way to measure energy consumption of AI models and award a star rating could prove invaluable

Salesforce argues that the tool establishes a clear and trusted benchmark for AI model sustainability, comparing it to the ...

HackerRank Introduces New Benchmark to Assess Advanced AI Models

The ASTRA Benchmark consists of multi-file, project-based problems designed to mimic real-world coding tasks. The intent of the HackerRank ASTRA Benchmark is to determine the correctness and ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results