SI
SI
discoversearch

We've detected that you're using an ad content blocking browser plug-in or feature. Ads provide a critical source of revenue to the continued operation of Silicon Investor.  We ask that you disable ad blocking while on Silicon Investor in the best interests of our community.  If you are not using an ad blocker but are still receiving this message, make sure your browser's tracking protection is set to the 'standard' level.
Technology Stocks : Semi Equipment Analysis
SOXX 324.89-1.1%Jan 7 4:00 PM EST

 Public ReplyPrvt ReplyMark as Last ReadFilePrevious 10Next 10PreviousNext  
From: Julius Wong8/27/2024 2:14:18 PM
2 Recommendations

Recommended By
Kirk ©
Return to Sender

   of 95689
 
Cerebras launches Inference, which runs 20 times faster than Nvidia GPUs

Aug. 27, 2024 1:16 PM ET
By: Brandon Evans, SA News Editor

J Studios

Cerebras, an artificial intelligence startup based in Sunnyvale, Calif., launched Cerebras Inference today, which it said is the fastest AI inference solution in the world.

"Cerebras Inference delivers 1,800 tokens per second for Llama3.1 8B and 450 tokens per second for Llama3.1 70B, which is 20x faster than NVIDIA (NASDAQ: NVDA) GPU-based hyperscale clouds," the company said in a blog post.

Cerebras Inference is powered by what it calls the third generation Wafer Scale Engine. It claims its GPU solution runs at one-fifth the price of competitors while attaining higher speeds due to its removal of the memory bandwidth barrier.

"Cerebras solves the memory bandwidth bottleneck by building the largest chip in the world and storing the entire model on-chip," the company said. "With our unique wafer-scale design, we are able to integrate 44GB of SRAM on a single chip – eliminating the need for external memory and for the slow lanes linking external memory to compute."

"Cerebras has taken the lead in Artificial Analysis' AI inference benchmarks," said Micah Hill-Smith, co-founder and CEO of Artificial Analysis. "Cerebras is delivering speeds an order of magnitude faster than GPU-based solutions for Meta’s Llama 3.1 8B and 70B AI models. We are measuring speeds above 1,800 output tokens per second on Llama 3.1 8B, and above 446 output tokens per second on Llama 3.1 70B – a new record in these benchmarks."

Cerebras filed for an initial public offering earlier this month, and it is expected to go public during the second half of this year. It also recently appointed two new board members, Glenda Dorchak, a former executive at IBM ( IBM), Intel ( INTC) and Spansion; and Paul Auvil, former chief financial officer at VMware and Proofpoint.

The startup also made another important step towards going public, with the hiring of Bob Komin earlier this month as its chief financial officer. Komin formerly worked as a chief financial officer at Sunrun ( RUN), where he led their IPO process. He also served as CFO at Flurry, which was acquired by Yahoo, and Tellme Networks, which was acquired by Microsoft ( MSFT).

"Bob has been a key operating leader throughout his career, serving as an entrepreneurial executive in several companies that invented significant technical and business model innovations and rapidly grew to become leaders in their industry," said Andrew Feldman, Cerebras CEO and co-founder. "His broad experience in financial leadership at growth stage and public companies will be invaluable to Cerebras."
Report TOU ViolationShare This Post
 Public ReplyPrvt ReplyMark as Last ReadFilePrevious 10Next 10PreviousNext