18 Oct 2023

In-between memory and thought: How to wield Large Language models. Part III.

With so much attention on Large Language Models (LLMs), many organisations are wondering how to take advantage of LLMs.

This is the first in a series of three articles geared towards non-technical business leaders.

We aim to shed light on some of the inner workings of LLMs and point out a few interesting quirks along the way.

Words by
Alex Carruthers

This article was originally published as one of our LinkedIn articles: In-between memory and thought: The Impact. Part III | LinkedIn

This is the third in a series of articles geared towards non-technical business leaders. We aim to shed light on some of the inner workings of LLMs and point out a few interesting quirks along the way.

In Between Langauge And Thought Part III

Applications and Societal Impact

The opportunity presented by a mastery of language means anything language based can be automated by machines at scale.

Let’s just think about that for a second.


Research, writing, influence and persuasion, analytical thinking, logic, scientific deduction or philosophical induction, the forming of hypotheses, self-guided learning and the self-setting of research tasks, an appreciation of cultures, navigating ethics and philosophies, policy development, conflict resolution, creativity... …the list goes on, and on.


Considering the full scope of advantages that trustworthy large language models will bring is breath-taking.

Truly: machines mastering language will be the key that unlocks endless potential and progress for humanity – the abundance and equity of ‘raw’ intelligence.


It’s no surprise then, despite the vulnerabilities and challenges of this unripe technology, that public sentiment towards AI is largely positive. People are thrilled to offload boring, repetitive tasks.

People seem to grasp the weight of AI’s potential without any need to know how it works.

Do you know how a computer works? No, but it certainly beats pen and paper, faxes and photocopying.


A recent survey revealed that most respondents, especially younger people, foresee a moderate to high societal impact from AI. They believe that, while AI shouldn't be limited, regulation is necessary and the emergence of sentient AI is a possibility in the future.


There is naturally some concern, too. Such as students using LLMs to cheat at school and the LLM-Detectors designed to detect it failing. The structures, expectations and systems of the past are unprepared for any technology that brings about the future. No doubt, there will be some whiplash as we adjust. Once, calculators were banned from maths exams. Now students sit exams on laptops. The toothpaste doesn’t go back in the tube, the world keeps spinning.


The future is never what we expect. Yet so far, historically, the future has shown to be a generally positive thing. By practically any measure (except perhaps a measure of the climate, although the stories of carbon capture and clean energy technology are still being written).


The future is generally better than the past. So long as we adapt to it.


In our next section, we conclude that ensuring the robustness of these AI systems is key to their successful and trustworthy adoption.

The Need for Robustness and the Role of Advai

As we have penned previously, we believe AI could be the first intrinsically positive technological transition. If, and only if, robustness is a priority in its development.

(--> Red the full article here: Could AI be the First Intrinsically Positive Technological Transition? | LinkedIn)


As humanity navigates through the labyrinth of AI capabilities and the soon to be burdensome regulatory requirements, one principle becomes clear. We must prioritise robustness and trustworthiness.


  • To meet minimum ethical requirements like mitigating bias, (as we wrote in this article) = robustness is key.
  • To meet minimum regulatory requirements (as we wrote in this article) = robustness is key.
  • To ensure military AI operates exactly as its commanded (as we wrote in this article) = robustness is key.
  • To generally align the behaviours of AI systems with your objectives and ethics (as we wrote in this article on Superalignment) = robustness is key.



Before semantic satiation kicks in for the words ‘robustness is key’, let us move on to say with no shred of doubt that the successful adoption of large language models will also depend on robustness methods being prioritised alongside their development.


That said, large language models are tricky. They’re tricky because ‘reinforcement learning’ (RL) is a core machine learning method involved in their development. Organisations can only rely upon LLMs if their behaviour is predictable. This predictability will rest on unprecedented and mostly undeveloped guardrail deployment specific to reinforcement learning.


As one of our RL researchers at Advai says, “reinforcement learning is the Wild West of machine learning”. It’s all new territory.


When it comes to robustness testing of RL-related components, there's a steep learning curve. One must grapple with the core mathematics at the heart of these complex algorithms because conventional tools that function well in other ML sectors simply don't cut it here.


The challenge is mammoth. Although these systems already make fewer errors than humans, the goal isn't merely error reduction.

Instead, it's essential that when these systems do falter, they do so in ways humans can understand (relate to?) and predict.

You may remember Will Smith’s Detective Spooner screaming “Save the girl!”. The robot saved him because his chances of survival were higher and yet any human would prioritise the child. Just a movie? These decisions fall to the agent exerting control; as we delegate this control to AI – as we allow them more agency, people won’t care for the mathematical reasons a decision was made. We want remorse and we expect empathy.


If you haven’t heard of the trolley problem (a thought experiment that asks you to if you would pull a lever to change the course of a train to save X and condemn Y), you should check out the Moral Machine (although, it's a little out of date and some functionality seems to be broken at time of writing. The Good Place also do a hilarious episode on the topic).

The point is that human decision making is peculiar and nuanced.


This all reduces to what we call the ‘alignment issue’. Aligning LLMs with this qualitative ‘human requirement’ leads to the question: how can you harmonise a system with the need to be understandable and predictable by a human?


So, where does that leave us?

Fine tuning the reward model of LLMs

The difference between

  1. a large language model (something trained on an immense dataset of language to be able to use language) and
  2. ChatGPT (something that is designed to give useful responses to human questions)

is the reward model.


In essence, reward models are machine learning models that algorithmically capture the patterns of human preferences.


The fine-tuning process of LLM depends on a ‘reward model’, which is a quantified incentive for RL algorithms to do what their human creators want them to do. To behave how they’re intended to behave. It’s the reward a system receives not only for more ‘accuracy’ but for subjectively ‘better’ responses.


Instead of penning down mathematical equations for human preferences, which is impossible (unless you are a fan of the HBO show Westworld), a RL model is trained to be a proxy for it.


You may have heard of Reinforcement Learning from Human Feedback (RLHF). This is when you take many instances of human feedback (feedback on a language model’s outputs), and you use this feedback to train a reward model. In simple terms, the better the response then the higher the score it receives. Patterns emerge over many of these instances to encode a continuum from ‘perfect responses’ to ‘awful, never-to-be-repeated’ responses.


Customising the reward model is the way that businesses can encode ethics, tone, off-limit responses, and so on.


It is unlikely that more than a few organisations will have the data access, and the economic and technical capability, to train their own base LLMs. This leaves corporations like banks, assuming they want a powerful chatbot, to rely on externally developed LLMs. They will then exert control using the fine tuning of a reward model.


Despite the complexity in training an LLM, which is the cost-intensive part, it is the training of reward models that will become the modern robustness field for LLM. The guardrail methods of corporate RL models will be made up of the nuanced and myriad approaches to developing reward models.


Breakthroughs are yet to be made.


For example, Microsoft’s AI Red Team Has Already Made the Case for Itself in infrastructure security and outline that filtering offensive and ungrounded content is “the holy grail of AI red teaming. Not just looking at failures of security but also responsible AI failures.” Yet to date offer no practical ways for organisations to harness LLMs without exposing themselves to the negative consequences of insufficient guardrails.


OpenAI only recently launched their new ‘Superalignment’ initiative, which is an approach to automating guardrail development. “We need scientific and technical breakthroughs to steer and control AI systems much smarter than us.” OpenAI acknowledge that the challenge is great.


Advai’s Chief Researcher Damian Ruck puts it well: “We’re talking about AI systems here, so you need automated alignment systems to keep up – that’s the whole point: automated and fast.”

Let’s underline this take-home message: Today, we stand at the very beginning of guardrail development for LLMs.


Advai’s role in guardrail development for LLMs

The role of Advai in this landscape is to take:

  • the rigour of our AI Alignment Framework and its focus on discovering fault tolerances and setting up operational boundaries; and,
  • the Adversarial AI breakthroughs we’ve made that ‘teach’ AI models to recognise when they don’t have the information they need to operate reliably,0

and to bring these approaches into training customised reward models that guardrail our clients’ internal language models.


This way, we can ensure that large language models are not only efficient and powerful but also dependable and trustworthy.


Advai's approach connects the command-line development environment of data scientists, to the c-suite and risk-oriented mindset of organisational command. This is achieved through a rigorous robustness, stress-testing and insight ecosystem.


Until now, open LLMs haven’t really competed with the strengths of the heavily protected models developed by Google and OpenAI. However, emerging open LLMs (like Llama 2.0 and it’s integration into the Open Source integration environment of Hugging Face) are opening up the possibility for more widespread organizational adoption. Why? Because they enable customised reward models to fine tune the base LLM model, without significant compromise to the quality of the language system.


It's an exciting prospect, but as these open models become more complex and their business applications more diverse, the potential for unexpected results only increases. A culture of robustness, a customised approach to reward model development, and a suite of automated tools, will be crucial to ensure these models produce the results without the repercussions.


Adversarial Reinforcement Learning is one of the latest approaches in AI research where many discoveries are yet to be made. Like a sparring partner for an AI model, Adversarial approaches continually test and push an AI model’s limits to improve its performance (as a source, we'll cite our own research on this one!).

To use the example of a bank again, adversarial RL will be crucial to test as many different possible conversational pathways – to explore as many ‘strange’ pathways – as possible, to ensure banking customers are given accurate and relevant information.


The open-source community, represented by platforms like Hugging Face, is also playing a pivotal role in closing the gap between open-source and proprietary AI technologies. With these platforms, anyone with the computational resources and the right expertise can develop their own language models. As these colossal LLMs (Llama 2.0) become more accessible, the rate of breakthroughs from the Open-source community will increase.


The role of Advai will be to understand the business environment, understand the deeply technical techniques and tools, and to enact this expertise in crafting reward models that prevent unwanted responses.

Boundary Of Ignorance


In conclusion, the world of AI is exciting but incredibly complex - especially reinforcement learning and language models.

As we embrace the advancements in AI technology, we need to remember the importance of robustness in our AI systems.

Companies like us are making novel discoveries every day. In support of the idea that the more you know the surface area of what you don't know also grows. The more we learn about AI and it's ability remain reliable in novel situations, the more we realise how much there is yet to be known.

What we do know? As we step into the future, we must do so with AI systems that are not only powerful and intelligent but trustworthy and dependable.

And the work to achieve this continues...



Thanks for reading! We hope you've enjoyed this mini series. If you have any questions or would like to talk about any of the topics covered and how they apply to your business, then get in touch!

Who are Advai?

Advai is a deep tech AI start-up based in the UK that has spent several years working with UK government and defence to understand and develop tooling for testing and validating AI in a manner that allows for KPIs to be derived throughout its lifecycle that allows data scientists, engineers, and decision makers to be able to quantify risks and deploy AI in a safe, responsible, and trustworthy manner.

If you would like to discuss this in more detail, please reach out to contact@advai.co.uk