31 Jul 2023

Aligning MLOps with Regulatory Principles

A recap of Chris Jefferson's talk centred on MLOps and impending AI regulations. He highlighted challenges in deploying ML systems, focusing on compliance, risk, and harm prevention. Key discussions included security risks in ML models, the importance of understanding the impact of use cases, and managing failure modes. Chris also emphasised the societal impact of AI and the need to bridge the gap between public perception and technological realities. He advised on streamlining MLOps processes for efficiency and efficacy, and aligning development with the principles underlying regulation.

Words by

Alex Carruthers

Applications and Societal Impact

The first talk at the meet-up, delivered by Chris Jefferson, presented a future-proofing approach for AI regulation. This method involves aligning MLOps with business and regulatory principles while achieving innovation targets and tracking relevant metrics.

In addition to model accuracy, it is crucial for data scientists and ML engineers to consider regulatory risks, robustness risks, and public backlash risks. Each stakeholder within the organization has specific concerns, with data and tech teams overseeing model capabilities, authorisation, and compliance, risk and compliance managers assessing risk management, and C-suite employees focusing on business needs and KPIs.

AI systems that fail to align with regulatory principles are halted in their steps – a recent case study being the banning of ChatGPT in Italy due to GDPR issues. The underlying principles of ‘Responsible AI’ development include risk management (judged through impact-based assessment), reliability (trustworthiness, understandability, interpretability), accountability (traceability, documentation, compliance), security (mitigation of security threats) and human centricity (similar data quality and ethics across all).

In accordance with these principles, nations have started to build a regulatory landscape based on their standards – Horizontal legislation (EU), Agile (UK – domain based), Vertical (US – State based).

Chris and his team at Advai, research where models go wrong instead of simply aiming to maximise accuracy. Each regulatory measure is viewed through the lenses of compliance, risk and harm for appropriately sectioning each aspect of the AI development process. This framework captures risk and delegates stakeholders throughout the AI lifecycle, identifies KPIs and metrics for audit and approval and enables successful deployments. The use of the MLOps lifecycle can further help overcome any friction points, thus creating a breeding ground for the emergence of AI standards and hubs.

Some essential tips to ensure a smooth functioning of this approach:

Understand the use-case and context to determine human impact
Assign a level of risk to gauge the potential impact of the completed AI model and its use-case
Track metrics aligned to use-case. The process of taking a trained machine learning model and making it available for use in a production environment. It includes creating the necessary infrastructure, integrating the model into the target system or application, and ensuring its reliability, scalability, and performance in real-world scenarios.
Incorporate robustness and resilience for out-of-distribution from model inception
Design for security, incorporate cyber practices

Here's the talk, reformatted for the blog format

Tonight's talk is about regulation. We'll look at upcoming regulations and ways to prepare for them in MLOps.

A lot of today's discussion is about supporting decision-makers in building, designing, and deploying ML systems. There's a significant amount of technical work involved. Common questions include whether we're meeting innovation targets by integrating AI into our systems and if the projects align with our business objectives. Is the AI we're using better than existing solutions? Are we using the latest technology, like advanced algorithms or deep learning?

An important aspect we're seeing is adherence to basic principles. There are existing principles to consider, but a gap lies in linking to risks associated with deployment, such as failure risks. This has become a significant blocker for AI system adoption, especially at higher risk levels.

As for my background, I'm Chris, one of the co-founders of Advie. We've been around for three years, focusing on understanding what causes AI to fail, how to measure and mitigate it. We started with adversarial AI, which involves using it to break systems. We've learned this is just a part of understanding AI system failure.

In our work, we've solved many technical questions using accuracy measurements and other criteria. As we move through deployment stages, other risks emerge, like regulatory risks. With upcoming AI regulation, there's more focus on compliance. Questions arise about accountability for system performance in the wild. Understanding this risk is challenging.

Then there's robustness risk: how the system performs with unknown data, handles edge cases, and deals with unknown circumstances. Understanding system limits and these risks is vital, as it affects adoption and public trust. There's backlash against some systems, either from well-informed or misinformed public opinion, causing concerns about unintended societal harms.

So, we're focusing on three themes: compliance, risk, and harm. Understanding stakeholders in the MLOps lifecycle is key. Data scientists and engineers often feel frustrated when great systems they develop hit approval roadblocks. Regulatory users and higher-ups, like managers and directors, face challenges in understanding and approving AI systems due to a lack of information. The public, as another stakeholder, has concerns about how AI affects them and whether they can trust these systems.

Regarding a recent case study, Italy banned ChatGPT, not due to upcoming regulations but because of GDPR challenges. Concerns included the accuracy of ChatGPT's information, data usage and storage, and protecting young users. In response, OpenAI launched the Super Alignment Project to address these concerns, aiming to build trust and a more robust system.

Legislation worldwide shares common themes in AI regulation, focusing on risk, reliability, accountability, security, and human-centricity. The EU AI Act, the UK white paper, and the U.S. Bill of Rights are examples of this trend.

Looking ahead, the EU act and UK white paper are progressing. We expect specific regulatory guidelines and governmental functions to emerge. The U.S. Bill of Rights is a bit further out. By 2025-26, AI regulation will likely be business as usual, especially in regulated sectors and high-risk use cases.

Preparing for this landscape means focusing on compliance, risk, and harm. At Advai, we've developed methods to stress test AI systems and understand their failure limits. This understanding is crucial for robust system design.

Key takeaways include ensuring an assurance process, understanding use cases, tracking failure modes, involving stakeholders, and generating evidence of mitigating challenges.

In summary, aligning ML ops with principles of accountability, reliability, risk management, and protection against threats is crucial. Understanding stakeholders, from data scientists to the public, and their challenges in AI deployment is essential. Documenting evidence of addressing risks and failures is vital for auditability.

Finally, applying these principles in practice involves understanding use cases, tracking appropriate metrics, accounting for failure modes, ensuring robust design, securing endpoints, and documenting all processes for auditability.

Q&As at the talk

Question 1: Could we get an example of where the ML model has a security risk? Could this be the risk of having code leaked outside of the company?

Answer 1: Yes, a common risk involves using prompt engineering with large language models (LLMs) to recreate specific sequences of text. This can be dangerous as it potentially allows extraction of not just the code, but also insights into how the model was trained. A simpler technique, where endpoints return excessive performance measures, can also pose a risk. Attackers could use this information to reverse engineer the model's structure and training data, facilitating further attacks.

Question 2: I was wondering about the slide with the 'magic roundabout' diagram. Do you have any processes or pipelines to help streamline those roundabouts and keep development time low while adhering to all those different processes?

Answer 2: The 'magic roundabout' diagram illustrates divisions of responsibility in model building. For model builders, focusing on experimentation and model optimisation is key. Tracking a few additional metrics for various scenarios can enhance robustness. If handling data collection, simple tracking of data sources and metadata changes can be beneficial. Using model cards, data cards, and use case cards helps align data collection with the model's use case, defining its operational limits and facilitating a smoother process.

Question 3: How do you separate the concepts of answer veracity, accuracy of responses, versus failure modes? Do you define them separately?

Answer 3: Yes, we separate these concepts. Accuracy and responsiveness are categorized as performance failure modes. We broadly classify failure modes as performance, adversarial, bias, and ethical. The first few categories are highly technical and easier to measure, whereas ethics is more complex. Understanding ethical limitations allows us to create rudimentary measures for them. We consider various failure modes throughout the AI lifecycle.

Question 4: Is there a vulnerability assessment tool for LLMs that gives a score indicating risk vulnerability or bias from public repository models?

Answer 4: You can use fun techniques with LLMs for vulnerability assessment. For example, two open-source LLMs can validate each other using lang chain to build context windows and act as verifiers. Implementing guardrails for output control and using secondary procedures for validation can provide a rudimentary score. For inputs, checks for threatening or hateful content are already feasible. However, LLMs are a new space, and these methods aren't perfect.

Question 5: What do you think about including the societal effects of AI systems in AI principles, considering not everyone is on equal footing with AI understanding and impact?

Answer 5: The narrative around AI is crucial, and the gap in public perception of AI among non-practitioners and technical users needs consideration. Human-centricity is key, understanding miscommunications about AI's usage and impact on life. Addressing how AI changes industries and affects employment is important. Communicating these aspects and engaging stakeholders helps bridge understanding gaps, translating technical measures to high-level principles for better comprehension.

Who are Advai?

Advai is a deep tech AI start-up based in the UK that has spent several years working with UK government and defence to understand and develop tooling for testing and validating AI in a manner that allows for KPIs to be derived throughout its lifecycle that allows data scientists, engineers, and decision makers to be able to quantify risks and deploy AI in a safe, responsible, and trustworthy manner.

If you would like to discuss this in more detail, please reach out to contact@advai.co.uk