Model Inference API - Search News

OpenAI Halves Inference Costs With Software Alone: GPUs Drop to Hundreds

OpenAI inference cost reduction cut ChatGPT guest traffic from tens of thousands of Nvidia GPUs to just a couple hundred, ...

iAfrica

Kenya’s Fikra API brings AI inference to African developers, with M-Pesa built in

Kenya's Fikra API has launched an AI inference API built specifically for African developers, startups and businesses.

Tech Times

Compile Once, Run Offline: New AI Method Matches 32B Models With a 23MB File

Local AI inference at 32B-parameter quality, no cloud API required: University of Waterloo researchers released PAW on July 2 ...

Waterloo's PAW compiles task specs into 23MB LoRA adapters a 600M-parameter model runs entirely offline.

Local AI inference at 32B-parameter quality, no cloud API required: University of Waterloo researchers released PAW on July 2, 2026, a system that compiles any natural-language task spec into a 23MB ...

Runware launches developer API access for Google DeepMind’s Gemini Omni Flash

Generate and edit video from any input, text, image, video, or audio, through Runware, the lowest-cost API on the ...

DeepSeek open sources DSpark, a new framework to speed up LLM inference by up to 85%

DSpark can make decoding faster, but acceptance quality still determines how much speed the system actually realizes.

Center for Strategic and International Studies

What to Know About Chinese AI Models

Chinese AI models are rapidly closing the gap with U.S. frontier systems. This analysis examines what their growing ...

Tech Bytes: OpenAI and Broadcom unveil Jalapeño inference chip to power next wave of LLMs

The chip has been designed specifically for large language model inference — the stage where trained AI models generate ...

OpenAI reportedly reduced inference costs by more than half

According to a media report, OpenAI engineers have found optimizations that reduce the cost of operating existing AI models ...

Best Seedance 2.0 API Platforms In 2026: Full Guide

Seedance 2.0 is ByteDance's flagship video generation model, released in 2026. It produces cinematic video up to 1080p natively, with synchronized audio, accurate lip-sync, and 4K available through ...

Meituan open sources LongCat-2.0, the 1.6T, near-frontier agentic coding model that's been leading OpenRouter — trained entirely on Chinese chips

By registering the LongCat-2.0 repository under the open-source MIT License, Meituan positions the architecture with maximum ...

DIGITIMES

DeepSeek V4 introduces utility-style AI pricing in shift beyond China's LLM price war

DeepSeek will launch the official version of its V4 large language model (LLM) in mid-July alongside peak and off-peak API ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results