the Chinese quant fund-turned-AI pioneer
One of the pack of Chinese AI hopefuls trying to take on the likes of OpenAI comes from an unusual source: a quant fund dominating the country’s financial sector.
High-Flyer Capital Management, a Chinese quantitative hedge fund, has grown into a roughly Rmb60bn ($8bn) asset manager since its launch in 2015, partly using AI and algorithms to identify patterns or variables that could affect stock prices.
Now it has parlayed that knowledge and infrastructure into a powerful AI model that has been released and that experts say is on a par with leading western efforts. DeepSeek-V2 can answer questions, write code and reason.
DeepSeek costs significantly less than rivals, about Rmb2 for every million output tokens — or words returned per query — sparking a price war among Chinese artificial intelligence providers.
A week after its launch in May, technology giant ByteDance cut prices to as low as Rmb0.60 per million output tokens. Rival Alibaba then cut usage prices for some of its models by as much as 97 per cent and Baidu made two of its Ernie models free.
The rollout of the new model, which has quickly attracted thousands of Chinese developers, highlights how even with early leads in generative AI, tech giants such as Baidu and Alibaba face fierce competition from more nimble upstarts. It has also put the spotlight on China’s highly competitive generative AI race.
“The gap between the US and China isn’t as big as everyone thinks,” Liu Qingfeng, founder of Chinese AI group iFlytek, told a recent tech gathering in Macau. “In a lot of verticals our [models] are better than theirs.”
DeepSeek’s development is fuelled with funding from its sister hedge fund High-Flyer. Its funds have returned 151 per cent, or 13 per cent annualised, since 2017, and were achieved in China’s battered domestic stock market. The country’s benchmark CSI 300 index, which tracks China’s top 300 stocks, has risen 8 per cent over the same time period, according to research provider Simu Paipai.
In February, Beijing cracked down on quant funds, blaming a stock market sell-off at the start of the year on their high-speed algorithmic trading. Since then, High-Flyer’s funds have trailed the CSI 300 by four percentage points.
High-Flyer and DeepSeek did not respond to requests for comment.
The quant fund got its start in a Chengdu apartment, where founder Liang Wenfeng, a computer science graduate of Zhejiang University, experimented with automated stock trading, according to local media reports. His profile in China’s asset management association registry says he was a freelancer until 2013, when he incorporated his first investment firm.
By 2021, all of High-Flyer’s strategies were using AI, according to manager Cai Liyu, employing strategies similar to those pioneered by hugely profitable hedge fund Renaissance Technologies. “AI helps to extract valuable data from massive data sets which can be useful for predicting stock prices and making investment decisions,” he said during a roadshow that streamed online that year.
Cai said the company’s first computing cluster had cost nearly Rmb200mn and that High Flyer was investing about Rmb1bn to build a second supercomputing cluster, which would stretch across a roughly football pitch-sized area. Most of their profits went back into their AI infrastructure, he added.
The second cluster, now complete, connects more than 10,000 of Nvidia’s cutting-edge processors with servers and storage, giving DeepSeek the computing power to train a large model, according to archived versions of the company’s website. The group acquired the Nvidia A100 chips before Washington restricted their delivery to China in mid-2022.
“We always wanted to carry out larger-scale experiments, so we’ve always aimed to deploy as much computational power as possible,” founder Liang told Chinese tech site 36Kr last year. “We wanted to find a paradigm that can fully describe the entire financial market.”
The company is one of six Chinese groups with more than 10,000 A100 processors, commonly believed to be a computational threshold for self-training large models, according to Guosheng Securities. The other five are all Chinese tech giants, though their collective computing power pales in comparison to US companies. Meta has said it will have computing power equal to nearly 600,000 of Nvidia’s more advanced H100 chips by the end of the year.
Tests run by research groups rank DeepSeek-V2 among the top LLMs in the world. Researchers at the University of Waterloo in Canada scored it among the top 10 models behind OpenAI’s GPT-4, Anthropic’s Claude and Chinese rival 01.AI.
DeepSeek’s model is also open source, allowing AI researchers to inspect its structure and copy it.
“The model’s architecture is very unique,” said Andrew Carr, chief scientist at Cartwheel, an AI animation start-up based in the US. “DeepSeek has taken this idea called mixture of experts, where you split up a model into smaller chunks, to the extreme, with hundreds of small experts.”
Carr said the model came close to matching Meta’s latest Llama 3 model but with lower pricing. Its price is about 100th the cost of OpenAI’s GPT-4 and a fifth of Anthropic’s Claude 3 Haiku.
Tiezhen Wang, an engineer at New York-based AI research hub Hugging Face, said DeepSeek’s team had reduced what the model needed to remember while allowing it to “handle more tasks at the same time without slowing down”.
Inside China, the pricing strategy has helped sign up developers. Wang Zixu, a programmer based in northern China, said he had switched from using OpenAI’s GPT-4 for coding help to DeepSeek because of the lower prices.
Even with the cost advantage, some industry experts said DeepSeek could be losing money at its low price point. Its computing power may also fall further behind rivals as Nvidia releases new chips banned from export to China.
Still, High-Flyer’s AI offshoot is aiming to be the first to achieve artificial general intelligence, the point at which machines have greater cognitive capabilities than humans.
“We believe AGI is the violent beauty of model x data x computing power,” said one job recruitment advertisement for DeepSeek. “Embark on a ‘deep quest’ with us on the journey towards AGI!”
Additional reporting by Nian Liu in Beijing