Forum Systems, a leader in LLM and API technologies, today announced the public release of two language models fine-tuned to optimize their productivity-risk profile from the Gartner® Data & Analytics Summit 2024. The research is discussed at length in two recent articles, Framework for LLM Selection by Balancing Model Risk with Workforce Productivity and Improving Productivity-Risk Profile through LLM Fine-tuning. This groundbreaking work presents a framework for balancing productivity and risk in GenAI deployments, an urgent question among business leaders today.

“LLMs security has its tradeoffs. More restrictive models will be safer but may hamper productivity. If enterprises aren’t measuring the productivity-risk balance of their models, they are in the dark about whether they’ve achieved an optimal tradeoff,” remarked Mamoon Yunus, CEO of Forum Systems. He continued, “classic machine learning metrics like precision and recall can serve as proxies for productivity and risk. Fine-tuning on extensive manual multi-vote labeled data, our LLMs show superior performance compared to base models.” 

The fine-tuned models—Mistral QS-Sentry and Llama 3 QS-Sentry—are based on Mistral-7B-Instruct-v0.2 and Meta-Llama-3-8B-Instruct.

In the first article, Forum Systems developed a framework for balancing risk and productivity and assessed the productivity-risk profile of Mistral and Llama 3 before fine-tuning. It found that, when asked to classify prompts as either safe or unsafe, Mistral was more precise and thus aligned with higher productivity, while Llama 3 was more restrictive and thus aligned with lower risk.

The second article analyzed the models after they were fine-tuned on a hand-curated dataset of about 20,000 prompts. The study showed that the productivity-risk profile of both models can see meaningful improvements after fine-tuning. Forum Systems has released both fine-tuned models on Hugging Face to contribute to the broader community of researchers and those working in AI governance and AI alignment, believing its framework for analyzing the trade-offs between productivity will also prove valuable to business leaders deploying safe and effective GenAI offerings.

As Gartner analyst Arun Chandrasekaran recently stated, “Generative AI (GenAI) has the potential to transform businesses across industries. Most business and technology leaders believe that the benefits of GenAI far outweigh its risks.” He recommends, “Put responsible AI at the heart of your generative efforts. Promote harmonious interaction among humans and machines with design thinking and by incorporating human feedback into GenAI applications.” Forum Systems agrees with Chandrasekaran’s analysis and recommendation. Its work in optimizing productivity-risk profiles of small models demonstrates that enterprise-class responsible AI deployments are within reach through fine-tuning.

Source: Gartner, 10 Best Practices for Scaling Generative AI Across the Enterprise, Arun Chandrasekaran, Leinar Ramos, Alberto Pietrobon, 10 January 2024.

GARTNER is a registered trademark and service mark of Gartner, Inc. and/or its affiliates in the U.S. and internationally and is used herein with permission. All rights reserved.