How to enhance GenAI’s problem-solving
In 2024, enterprise software companies are betting on generative AI, in a quest to enhance productivity. OpenAI has recently released GPT-4o, which includes interpretation and generation of voice and vision. Ravi Sawhney discusses how to incorporate this technology into end-user workplace technology and introduces the concept of multi-agent workflows, an idea that allows organisations to imitate entire knowledge teams.
Back in 2021 I wrote a piece for LSE Business Review in which I demonstrated the power of OpenAI’s GPT3 in being able to interpret human language by converting it into code. At the time, the technology was in its infancy and didn’t generate the spark ChatGPT did when it was released to the public in November 2022, that moment truly ignited the generative AI (GenAI) boom. Here I provide some personal thoughts on why GenAI matters and the challenges in using it for work. I also introduce the concept of multi-agent workflows as a method to boost the potential of where we can go.
It’s all about productivity
In 2024, nearly all enterprise software companies are making bets on GenAI, which perhaps has taken away some of the limelight from existing machine learning approaches such as supervised and unsupervised learning that are still crucial parts of any complete AI framework. The premise of why organisations are doing this all ties back to what got me interested in this technology in the first place: productivity.
In its more basic form, GenAI can be thought of as the most powerful autocomplete technology we have ever seen. The ability of large language models (LLMs) to predict the next word is so good that it can step in and perform knowledge worker tasks such as classifying, editing, summarising, questions and answers as well as content creation.
Additionally, variations of this technology can operate across modalities, much like human senses, to include interpretation and generation across voice and vision. In fact, in 2024 the nomenclature is shifting from LLMs to large multimodal models (LMMs) and the recent release of GPT-4o from OpenAI is evidence of this. Whether the step-in process is advisory, with a human in-the-loop or full-blown automated decision-making, it is not hard to see how GenAI has the potential to deliver transformational boost to labour productivity across the knowledge working sector. A recent paper on this very topic estimated that, when used to drive task automation, GenAI could boost labour productivity by 3.3 percentage points annually, creating $4.4 trillion of value to global GDP.
The productivity benefits perhaps take us closer to the aspiration Keynes had when he wrote Economic Possibilities for our Grandchildren in 1930, in which he forecast that in a hundred years, thanks to technological advancements improving the standard of living, we could all be doing 15-hour work weeks. This sentiment was echoed by Nobel Prize winner in economics Sir Chirstopher Pissarides, who said ChatGPT could herald 4-day work week.
So, if the potential to meaningfully transform how we work is right in front of us and is being developed at breakneck speed, then how do we bridge the gap to make this possibility a reality?
Trust and tooling
Two typical challenges need to be considered when incorporating this technology into end-user workplace technology. The largest, arguably, is managing the trust issue. By default, LLMs do not have access to your own private information, so asking it about a very specific support issue on your own product will typically produce a confident but inaccurate response, typically referred to as “hallucinations”. Fine-tuning LLMs on your own data is one option, albeit an expensive one given the hardware requirements. A much more approachable method that has become commonplace in the community is referred to as retrieval-augmented-generation (RAG). This is where your private data is brought into the query prompt using the power of embeddings to perform lookups from a given query. This resultant response is synthesised from this data along with the LLM’s existing knowledge, resulting in something that could be considered useful, albeit with some appropriate user guidance.
The second challenge is maths. While LLMs, with some careful prompting, could create a unique, compelling and (importantly) convincing story from scratch, it would struggle with basic to intermediate maths, depending on the foundation model you are using. Here the community has introduced the concept of tooling or sometimes referred to as agents. In this paradigm, the LLM can categorise the query being asked and, rather than trying to answer it, call the appropriate ‘tool’ for the job. For example, if being asked about the weather outside, it might call a weather API service. If being asked to perform math, it would route the query to a calculator API. And if it needs to retrieve information from a database, it might convert the request to SQL or Pandas, execute the resulting code in a sandbox environment and return the result to the user, who might be none the wiser about what is going on under the hood.
The potential of multi-agent workflows
Agent frameworks with tooling are expanding the possibilities of how LLMs can be used to solve real-world problems today. However, they still largely fall short of being able to perform complex knowledge-work tasks due to limitations such as lack of memory, planning and reasoning capabilities. Multi-agent frameworks present an opportunity to tackle some of these challenges. A good way to understand how they could work is to draw a contrast with System 1 and System 2 thinking, popularised by Daniel Kahneman.
Think of System 1 as your gut instinct: fast, automatic and intuitive. In the world of LLMs, that’s like the model’s ability to generate human-like responses based on its vast training data. In contrast, System 2 thinking is slower, more deliberate, and logical, representing the model’s capacity for structured, step-by-step problem-solving and reasoning.
To fully unleash the potential of LLMs, we need to develop techniques that leverage both System 1 and System 2 capabilities. By breaking down complex tasks into smaller, manageable steps, we can guide LLMs to perform more structured and reliable problem-solving, akin to how humans would solve challenges.
Consider a team of agents, each assigned a specific role through prompt engineering, working together to tackle a single goal. That’s essentially what agent workflows, or sometimes referred to as agentic workflows, do for LLMs. Each agent is responsible for a specific subtask, and they communicate with each other, passing information and results back and forth until the overall task is complete. By designing prompts that encourage logical reasoning, step-by-step problem-solving, and collaboration with other agents, we can create a system that mimics the deliberate and rational thinking associated with System 2.
Here is where this gets exciting: agent workflows could allow us to imitate entire knowledge teams. Imagine a virtual team of AI agents, each with its workflow’s own specialism, collaborating to solve problems and make decisions just like a human team would. This could revolutionise the way we work, allowing us to tackle more complex challenges with zero or minimal human-in-the-loop supervision. It also opens the idea of allowing us to simulate how teams will react to events in a sandbox environment, where every team member is modelled as an agent in the workflow. The conversational outputs could even be saved for retrieval latter, serving as long-term memory.
By combining the raw power of System 1 thinking with the structured reasoning of System 2, we can create AI systems that not only generate human-like responses but can also tackle more complex tasks and move towards solving problems. The future of work is here, and it’s powered by the symbiosis of human ingenuity and artificial intelligence.
- Authors’ disclaimer: All views expressed are my own.
- This blog post represents the views of the author(s), not the position of LSE Business Review or the London School of Economics and Political Science.
- Featured image provided by Shutterstock
- When you leave a comment, you’re agreeing to our Comment Policy.