Does the use of ChatGPT in libraries pose a threat to data security?

Written by PressReader Team | Jun 5, 2023 3:00:00 PM

Long before search engines became a norm, libraries, subject matter experts and academic journals functioned as reliable information models. Back then, things were a lot more personalized, transparent and authoritative.

Wake up to 2023 — the world has taken a leap into the future with the use of AI chatbots becoming more and more common.

From Open AI’s language model machine ChatGPT to Google’s Bard and Meta’s LLaMA, artificial intelligence technology and machine learning are upending how people search for information.

At times, it feels a tad surreal; especially when you have access to a seemingly never-ending source of knowledge on anything under the sun.

The relentless prompts we throw at these large language models can’t help but draw comparisons to Spike Jonze’s masterpiece, Her, a film that painstakingly depicts the potential downside of AI augmenting human lives.

Undeniably, ChatGPT and other AI tools open up exciting possibilities for librarians looking to enhance the experience of their patrons. On the other hand, data security is a legitimate concern that will need to be addressed.

In this post, we mull over the potential threats of ChatGPT for public and academic libraries, and ways to mitigate them.

Understanding ChatGPT for library services: capabilities and limitations

ChatGPT, as the name stands for, is a “Generative Pretrained Transformer.”

In other words, it's a large language model, trained on copious amounts of text data and capable of producing a “coherent” and “relevant” response to a given prompt.

An AI tool like ChatGPT holds a lot of potential uses for library professionals, especially when it comes to streamlining regular tasks. Functions such as answering reference questions, recommending books for further study and assisting students with information access can be automated and, therefore, sped up significantly.

For traditional libraries, it’s a win-win situation as it frees up staff to focus on complex patron interactions.

Here’s a rundown of key areas where ChatGPT packs a punch:

Virtual reference services around the clock: A human librarian can't be on call around the clock, but patrons can throw questions to a chatbot and receive real-time responses 24/7. A large language model such as ChatGPT can answer questions and provide everything from basic information on library services, policies and collections to more complex research info.
Extensive catalog search: With ChatGPT, patrons can effectively search the library catalog for books, peer-reviewed articles and other resources. Full credit to the natural language processing model that makes this possible.
Personalized reading recommendations: By referring to the user's past reading choices (preferences, authors, genres), ChatGPT can recommend books and study materials for patrons. It can also run the numbers on book circulation and analyze popular trends to suggest ways for the library to expand its collection.
Removing language barriers, offering tutorials and answering FAQs: ChatGPT works pretty neatly as a translation tool, helping patrons communicate in their native language. It can also be used to offer interactive tutorials on library resources, like guiding patrons in accessing electronic resources such as eBooks or databases. Frequently asked questions about policies, services and hours of operation can also be taken care of.
Accessibility for patrons with disabilities: An excellent resource for accessibility services for patrons with disabilities, ChatGPT can offer effective audio transcripts for any video content.
Promotion, outreach and engagement: Libraries can make use of ChatGPT to engage with patrons across social media via direct messaging and comments, as well as promote library programs. Alternatively, patrons can interact with ChatGPT to learn more about upcoming events.

So, what could go wrong?

What could possibly go wrong with ChatGPT for public or academic libraries? Well, it turns out that OpenAI's large language model has a few chinks in its armor.

Lack of critical thinking: For librarians and patrons alike, using ChatGPT can lead to less reliance on one's own critical thinking skills. This is where brushing up on your information literacy is crucial. Large language models are not infallible, and their responses may contain inaccurate or misleading information. (Hence the disclaimer at the bottom of your screen when you pull up chat.openai.com.)
100% dependency on third-party services: ChatGPT heavily relies on the knowledge base of external libraries and services to offer answers to user queries. At times, they may be subject to downtime or API changes affecting operability. Imagine what that could mean for library patrons who depend on the AI model for vital research.
Limitations in customization: ChatGPT’s external services are known to limit the extent to which data can be tailored or customized for specific use cases. This stops the AI model from offering optimal responses to certain queries.
Budget: There are a good number of external libraries and services working with ChatGPT that require a customer to pay a fee. While this may not be a constraint for larger institutions, it can limit the ability of a smaller library to use ChatGPT at its fullest.
Plagiarism: While using ChatGPT to generate text can make it easier for researchers, there’s a good chance the AI-powered writing may be plagiarized. Blame ChatGPT’s inherent paraphrasing capabilities, which have faced criticism on several occasions for scholarly articles submitted to reputed publications.

Why is data security a paramount concern?

While a lot has been said about AI disrupting a multitude of sectors — art, commerce and law, for example — we are only just beginning to hear about its implications for user privacy.

One recent high-profile example came after a March 20 data breach, when Italian lawmakers announced a temporary ban on ChatGPT over privacy and age-verification concerns.

The point is, AI is a trained platform that uses data present on the internet. And guess what forms the lion’s share of that data? Our personal information. Feel concerned already. Well, you should be; if you’ve written or supplied vital information online at any point in time, ChatGPT can read it.

Here’s an exact account of what goes into “collected data”, according to OpenAI’s own privacy policy:

Login data: IP address, browser details, internet settings and the date and time of your login
Usage data: Usage patterns for ChatGPT, time zone, country, software version, type of device and connection information
Device information: OpenAI gathers information about the operating system you use to access ChatGPT, along with cookies for tracking and analytical purposes.
User content: Any information that you upload or enter into ChatGPT is stored by OpenAI.
Communication information: If you have expressed interest, have contacted OpenAI support, or signed up for the newsletters, any personal info or messages will be stored.
Social media information: If you engage with OpenAI using your social media account, your profile info (including phone number and email) is collected and stored.
Account information: The details you generally provide while opening an account, such as your name, contact and payment info, are all stored by OpenAI.

You may be thinking, Okay, that’s not much different than what other websites do. So, why make a villain out of ChatGPT? The key point is that OpenAI’s data collection policy lists something called “User content”, and that is where the real problem lies.

Try searching for “red velvet cake recipe” (or you can do chocolate or anything else you prefer!), both on Google and on ChatGPT, and you will see how the results differ. Contrary to what some people might like to believe, ChatGPT was never designed to function like a typical search engine. Instead, it’s a tool that is driven by interaction.

And as such, it works well enough to give people a false sense of security. Users are tempted to share private information more easily than they would ever do with Google or any other search engine.

Samsung already had a bitter taste of data leakage when its employees allowed the chatbot to record “company meetings and check on proprietary code”.

No wonder, OpenAI has garnered a negative reputation owing to user content data collection. Private or not, the data collected from users never get deleted. From being concerned to scary, it’s all a matter of time.

The consequences of data breaches

Users still don’t have the option to download ChatGPT as a standalone app, but instead must rely on a web browser. So, a potential breach means unauthorized and unrestricted access to all conversation logs and other sensitive user info. This could lead to a series of unfavorable outcomes, including:

Identity theft, where cybercriminals use your personal information for fraudulent activities leading to financial losses
Misuse of data, where the user's information is either shared or sold with malicious intent for targeted advertising or disinformation campaigns

Despite OpenAI embracing a string of cybersecurity measures, its vulnerabilities are often triggered more by human errors than by technical glitches.

Unauthorized access is a threat to confidentiality

If your patrons enter sensitive information, such as passwords or credit card info, into ChatGPT, there is a possibility it could be intercepted by malicious actors. The best way to mitigate this is to replicate what several forward-thinking organizations have already done — embracing a comprehensive policy regarding generative AI technology.

For instance, Walmart and Amazon have reportedly instructed their workers to refrain from sharing confidential information with AI systems.

Dealing with biased and inaccurate information

Extensive datasets used to train AI models may unintentionally produce responses with false information or reflect biases.

Such outcomes can have negative consequences for libraries relying on AI-generated content for key decisions or customer communication. Therefore, users must evaluate their use of ChatGPT to tackle misinformation and prevent the dissemination of biased content.

Only stringent regulations can stem the rot

The absence of specific regulations directly governing ChatGPT and similar AI tools adds fuel to the fire. However, AI technologies, including ChatGPT, are subject to existing data protection and privacy regulations.

General Data Protection Regulation (GDPR): A comprehensive regulation for organizations operating within the European Union (EU) handling the personal data of EU residents. It chiefly focuses on data protection, privacy and personal data rights.
California Consumer Privacy Act (CCPA): A data privacy regulation in California that grants specific rights to consumers regarding their personal information. It requires businesses to disclose their data collection and sharing practices, enabling consumers to opt out of sharing their personal information.
Other regional regulations: Other countries and regions have also implemented data protection and privacy laws for AI systems like ChatGPT. For instance, the Personal Data Protection Act (PDPA) in Singapore and the Lei Geral de Proteção de Dados (LGPD) in Brazil. In Canada, the provincial authorities across Alberta, British Columbia and Quebec have also joined forces for an investigation launched by the Office of the Privacy Commissioner of Canada in April 2023.

The passing of the draft for the AI Act by European Union lawmakers might bring about a radical change. In all probability, this bill would require AI model developers to disclose copyrighted content used during the development phase. Also, the proposed legislation will classify AI tools based on their risk levels — from minimal to limited, high and unacceptable.

Other concerns addressed by the AI Act include biometric surveillance, misinformation and the use of discriminatory language. While high-risk tools will not be prohibited, their usage will require significant transparency.

If the AI Act is approved, it will become the world’s first comprehensive regulation of artificial intelligence.

However, until such rules become a reality, libraries and other academic institutions will have to bear sole responsibility for safeguarding user privacy when using the ChatGPT app.

Best practices and safety measures to ensure data security in libraries

Despite OpenAI’s safety measures, the protection of user data continues to be an issue. Thus, a significant burden rests on information professionals, as well as patrons, as they adopt a handful of best practices to minimize risks.

Limiting sensitive information: Users should refrain from sharing personal or sensitive data across conversations with ChatGPT.
Reviewing privacy policies: Before using an OpenAI language model, they must carefully review the privacy policy and data handling practices for conversations and their usage.
Using anonymous or pseudonymous accounts: Using anonymous or pseudonymous accounts is a wise call when using ChatGPT or similar AI models.
Monitoring data retention policies: Patrons must familiarize themselves with the data retention policies of ChatGPT and similar platforms to gain a better understanding of how long their conversations are stored before they are deleted or anonymized.
Staying informed: Patrons must keep themselves up to date with any changes to OpenAI’s security measures or privacy policies.

Artificial intelligence tools such as machine learning and ChatGPT hold great potential for libraries. Because of their limitations and especially their data-security implications, both the developers of these technologies and the end users must share the burden of ensuring that they are used responsibly.

How is your library navigating the brave new world of artificial intelligence and chatbots? Click on the button below and let us know.

View full post