Baidu's ERNIE Bot: True Competition?
Created in Partnership with Baidu
For those of us that live in the technology space, you can’t swing a cat without someone mentioning machine learning, artificial intelligence, natural language processing, and tools like ChatGPT. If anything, I’m a big culprit of this, as I almost exclusively talk about the semiconductor hardware underlying all of these new innovations.
I’m also a user - I’ve put ChatGPT to use in my research, as it often provides a friendlier interface into a rough topic than wikipedia ever could. It’s provided entry paragraphs for upcoming presentations I’m doing. But I also use it to spitball ideas for video titles on my YouTube channel, or iterate feedback. For example, last week I actually spent most of a train journey from Stuttgart to Zurich trying to get it to suggest a title regarding an upcoming video on quantum-safe cryptography I’m working on. I wanted a title simple enough, but expressive enough, to be exciting to new subscribers. I’ve described using ChatGPT in this way as cutting down what could be 6+ hours of thinking of a title to a good 30 minutes prompt engineering to generate a thousand ideas, one of which might be good, rather than spend hours to come up with ten ideas myself. Recently I’ve also looked into audio emulation tools, ones that can take a recording of my voice and then present it back to me with different words. Like ChatGPT, that model is 80% of the way there, and I’m already using some of it in my content as basic voice-over.
The point is that AI tools aren’t going anywhere, and based on where I am and what language I speak, the English tools are at my fingertips. OpenAI recently launched GPT-4, the next evolution in these models, and despite an international focus, it’s quite clear that a good chunk of the big data used to train it was going to be in English. One of the big use cases and demos for these models is often translation, or similar questions in different languages, but there is always a question if they’re culturally neutral, or sub-consciously culturally skewed towards who made it and the data used to train.
So insert ERNIE, or ERNIE Bot, from Baidu in China. Baidu makes it clear, while ERNIE Bot is proficient in multiple languages, the goal of the model is to provide for the Chinese market, and as a result is culturally optimised (my words) to enable a Chinese context to its output. Baidu is China’s big search engine, akin to Google in the west, and the goal with ERNIE Bot is to be the major natural language processing model for that market. Baidu specifically points to GPT-4, LaMDA (Google), LLaMA (Meta), and others as good, but not China focused.
ERNIE Bot is over a decade in the making, but today is the first major public release for the interface built on both Baidu’s ERNIE and PLATO models. Baidu’s goal here isn’t simply to create a conversational interface (which looks a lot like ChatGPTs), but to enable a full overarching foundational model set useful for other industries.
At the launch event, we saw the demo (not sure if it was a live demo, but taking it at face value) respond specifically to Chinese idioms, provide context for why they are what they are in relation to society today, but also create poetry in relation to a given prompt. What was a step beyond what I’ve seen in ChatGPT however, is that ERNIE Bot will also do image creation (similar to Midjourney it seems) live in the chat window, provide context on the image it just created, and then also export an audio voice in different Chinese dialects. With all this, the interface can also create an AI-generated video.
Baidu is calling its interface a multi-model design, which as a content creator I can get behind fully, however as with any image/video generation tools, it will have to be put through some stringent paces to see how it was trained, especially if it starts to iterate potentially copyrighted content. The news that stable diffusion was trained on copyrighted images, or that code generation tools like Github’s Copilot sometimes returns code under license has made every content creator I speak to a little uneasy about using these tools. I suspect that a lot of these NLP environments, just like ChatGPT and Bing’s tool, will get put through a metaphorical mangle over time as part of its validity. This also means being accurate with facts, and being accurate with mathematics, things that are inherently rigid that might get distorted due to the fuzzy nature of machine learning.
Baidu’s goal here is to have a number of big models for each industry, and not simply natural language processing. ERNIE is the big language model, and that is going to be filtered into health, finance, code, search, ERNIE Bot, and different variants for each market that Baidu plays in as well as Baidu’s partners and customers. There is also a computer vision model, and science models, all under the same machine learning/deep learning banner with a unified API.
CEO Robin Li on stage had this slide, which is actually a really important hook for Baidu. Initially he stated that modern IT infrastructure is built on three layers: hardware, operating system, and software. He sees the future extending to four layers, with a specific AI/model layer in there as well. You could argue where it fits, but Baidu’s explanation here is that they are the only player with deep fingers in all of the segments:
For hardware, Baidu has its Kunlun 2 chips for their cloud infrastructure, although focused on inference and we don’t have too many details (I wish I could go into architectural detail, if anyone from that team is reading)
For operating system, Baidu in 2017 launched a conversational AI-optimized OS, although new information on that has been scarce. Baidu has lots of experience in creating operating systems, with a quick google search showcasing a half-dozen, although what I think perhaps Baidu is referring to here is more an optimized cloud infrastructure for its hardware and AI workloads.
For models, there’s ERNIE and others. Baidu has its own AI-based framework called PaddlePaddle, as a competitor to pyTorch, TensorFlow, and others.
For software, there’s the API stack.
In order for conversational AI to work at the scale of search (Baidu has 70-80% of the Chinese search market), the argument is that you need a full stack solution, end-to-end, in order for it to work economically and financially. A fully AI-based search engine ecosystem, because of the energy cost of a conversational AI interaction, doesn’t scale to hundreds of millions of searches, and as a result, a highly optimized stack is required. This is why we’re seeing Microsoft offer its Bing chatbot to small groups to start, and Google would have to leverage a mountain of its TPUs to enable it for Google Search.
Conversational bots consume huge compute power - they are big models, use massive data, and are quite expensive. Between the four layers, if coordinated harmoniously, synergy will be optimized to make it more efficient than other systems, especially between model layer and frame layer, and past months experienced benefits between that synergy.
Globally speaking, companies who are leading in all four layers are so rare - Baidu may be unique. When we officially announced the launch of ERNIE Bot on Feb 7th, more than 650 companies that they will join the ERNIE Bot ecosystem. Many companies understand this is a big opportunity. Conversational AI bots represent a new technology paradigm, with an explosive growth of new opportunities. In terms of positioning of ERNIE Bot, there is a general ground model, there will be energy big model, transport big model, media big model, and industry specific big models for each industry. Not only influencing search, IT, but each and every company in the world.
ERNIE Bot was built with super powerful NLP expression and reasoning capabilities, which enables companies to be closer to their customers. That means any company can use its powerful capabilities to create better user experience, and retain users, and be more competitive. For these companies, or for any company, ERNIE Bot would be a very good opportunity. By 2030, workers could have output improved 4x due to AI.
CTO Wang Heifeng (taken from live translator)
Obviously this all matters within context - Baidu is going to be focused for the Chinese market, and while they’re improving the AI capabilities in English, it’s going to be very home-market focused. Use of China-based AI engines in the west is going to be limited for a number of reasons - the same reasons why the tools I use today likely aren’t used in China. Baidu is going to be focusing big not only on its own services, but the services it can help others create - that means getting local developers on board to use these tools, creating machine learning ‘as a service’ in as many corners as it can contribute.
I go back to the question in the headline - is Baidu’s ERNIE Bot true competition to the tools I use today? The answer is both yes, and no, but also maybe. I like the fact that if it works as demonstrated, the ability to combine not only conversational AI but also image creation and voice response all in the same tool is moving towards a ‘One True Interface’. The fact that it’s China focused, as a monoglot (I only speak one language), means I’m unlikely to interact with it. But therein lies the competition - if Baidu is a walled garden inside China, and ChatGPT is a walled garden outside of China, are they really competitors?
As a final thought, I’d be really interested to see what Baidu’s calculations are to replace its search with conversational AI, like ERNIE Bot, and how close could they make it to the costs of search today. I suspect to get there, we need super optimized hardware, and if there’s anything you should take away from this newsletter, it’s that I like talking about hardware. If anyone at Baidu is reading this, then let me know if you want to discuss Kunlun, or anything else you’re working on!
Interesting write up. Tks Dr. I 👍🏾
How can Baidu or anyone else in China continue to advance their large language model work without access to <14nm semiconductor technology? Or anything more powerful than an Nvidia A100?