Privacy in the Age of ChatGPT: Understanding LLM Data Protection

Published on

November 16, 2023

Authors

Matt Kidd

Lead Senior Data Scientist, Deeper Insights

Advancements in AI Newsletter

Subscribe to our Weekly Advances in AI newsletter now and get exclusive insights, updates and analysis delivered straight to your inbox.

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

Artificial Intelligence (AI) tools and Large Language Model (LLM) services like ChatGPT have revolutionary applications. However, overlooking critical issues related to data privacy or intellectual property (IP) ownership can be a significant misstep.

Our experience includes assisting numerous clients with highly sensitive and private data. We aim to impart specialist knowledge, demonstrating various methods to achieve both quality and privacy in LLM applications:

Use Enterprise versions of ChatGPT
[Optionally] Request compliance for Health and Patient Data
Request Zero Data Retention
Take control with a custom solution

‍

Unpacking Privacy in ChatGPT's Basic and Enterprise Services

Different applications will require different levels of privacy and security - there’s no one size fits all. Certainly if you have any concerns, then do not use vanilla out-the-box ChatGPT as there are few privacy safeguards. Anything you send to or receive from the system can be used and retained by OpenAI - by design - as this is one way that the system learns.

Any project requiring extra privacy will at the very least need to use the special services offered by OpenAI and Microsoft - ChatGPT Enterprise or Azure OpenAI.

‍

Enterprise Solutions - Understanding the Fine Print

Despite these Enterprise solutions promising enhanced control, the service agreement reveals significant issues.

Some clauses will be untenable for sensitive projects: how would you / your clients / your stakeholders feel about this clause in a contract:

OpenAI employees will only ever access your data for the purposes of resolving incidents or where required by applicable law.

Wording like “resolving incidents” may be too loose for the most sensitive of projects, and “applicable law” may cause headaches for Government projects or other industries.

This may mean your use of OpenAI needs reinvestigation, and a custom approach taken instead. We have a decade of experience implementing State of the Art AI solutions, let us help you find the right solution together.

If this is not a concern, just bear in mind that OpenAI can still access your information for purposes including:

Internal audits
Compliance checks
Development of new features

‍

Compliance and Regulations

Depending on the industry, compliance is not merely a guideline but a stringent requirement. Regulatory frameworks are in place to ensure that AI services adhere to established laws and ethical standards - here we outline just two overlooked options.

‍

Business Associate Agreements (BAA): Compliance with Health Regs

The Health Insurance Portability and Accountability Act, commonly known as HIPAA, is a U.S. law designed to provide privacy standards to protect patients' medical records and other health information - including with AI systems. Non-compliance can result in severe penalties, including hefty fines and legal repercussions.

If you are working with health data, then you MUST ensure you take appropriate steps to comply with HIPAA.

OpenAI offers Business Associate Agreements for ensuring compliance with established laws, including for HIPAA. However, this will require a bespoke agreement with OpenAI, potentially extra costs, and changes in quality of service. Deeper Insights can help secure the right deal for you.

‍

Zero Data Retention Policies (ZDR): sensitive applications

Zero Data Retention (ZDR) is another service offered by OpenAI, particularly for use-cases that demand the highest levels of data privacy. ZDR policies ensure that data is not stored beyond the necessary timeframe, thereby reducing the risk of unauthorised access or data breaches.

ZDR is a highly bespoke solution by OpenAI and will likely cost more, feature changes to quality of service, and is by no means guaranteed.

Organisations must meet specific criteria and undergo rigorous assessments to be eligible for ZDR. Let us help you work out if you qualify and how we can get you access.

‍

Tailored AI Models Enhancing Data Privacy

Securing specialised agreements for privacy, such as Business Associate Agreements (BAAs) or Zero Data Retention (ZDR) policies, is a complex, costly, and uncertain process. And even with these in place, privacy is not guaranteed due to clauses that allow for data access under certain conditions, such as legal compliance or incident resolution.

For absolute privacy, the most reliable solution is to self-host AI models, ensuring complete control over data. True data protection may require measures beyond the standard offerings.

‍

Harnessing Local AI for Maximum Privacy

While leading AI models offer expansive capabilities, the essential aspect of privacy in AI applications hinges not on the sophistication of the models but on their deployment. The security of AI-driven operations is significantly enhanced when specialised AI models are run on local servers with private datasets, prioritising secure data handling and privacy by design.

If your data and your model lives in your infrastructure, then you have full control over data access compliance.

‍

Customisation: The Key to Private AI

Customisation stands at the forefront of this privacy-centric approach. By operating AI models locally and fine-tuning them to the specific needs of an organisation, there's an assurance of data control that cloud-based services may not provide. This localised approach allows organisations to benefit from AI advancements while maintaining stringent data privacy and security standards.

Hosting your own AI system gives you increased flexibility and access to the latest technology.

‍

Outperforming Generalists with Specialised Data Strategies

The performance of these specialised models is a subject of ongoing research. Research suggests that with a robust and relevant data strategy, these models can exceed the capabilities of more generalised counterparts such as GPT-4, even with modest amounts of training data.

Not to mention the fact that OpenAI and other companies will have linear costs due to their pay per request pricing - the more you use it, the more you pay. A custom infrastructure can leverage huge savings as your costs will plateau as they are designed to match your usage.

Your custom system can outperform OpenAI in cost and accuracy if you plan your product effectively with specialist knowledge.

‍

Conclusion

In summary, while services like ChatGPT offer varied privacy levels, genuine data protection often entails extra costs and complexities with no guarantee of success. The evaluation of agreements and privacy policies demonstrates that optimal privacy in AI is not a default feature but a customisable choice, especially critical in sectors with strict data confidentiality requirements.

The most secure strategy for organisations prioritising privacy is the local deployment of AI models. This approach guarantees the highest level of data control, circumventing the uncertainties of third-party agreements. Research also shows that costs are lower and accuracy is higher once you start to build a tailored solution away from generalist models.

As AI technology evolves, the necessity for local solutions becomes clear: they are the key to leveraging AI's capabilities while maintaining unwavering data privacy. For those where privacy is non-negotiable, the investment in localised AI infrastructure is not just prudent—it's imperative.

To discover how to develop your own AI or LLM solutions, we invite you to explore our Accelerated AI Innovation (AAII) programme. This program is expertly crafted to facilitate the initiation of your project with minimised risk.