Generative AI tools like ChatGPT are revolutionizing how we create content and interact with enterprise applications. These Large Language Models (LLMs) are trained on massive amounts of data in order to understand and respond to natural language instructions called prompts. (Try this one: “Write a Rolling Stones song about LLMs.”)
The rise of AI is not unlike that of the automobile in the last century. Though the first cars were dangerous, unreliable, and lacked laws around use, these negatives were outweighed by the potential economic and social benefits. To stay competitive, it’s imperative we face the challenges and define appropriate guidelines.
Now, in the landmark “Executive Order on Safe, Secure, and Trustworthy Artificial Intelligence” issued by the White House on October 30, 2023, the Office of Management and Budget has been tasked to develop guidelines for government employees who purchase AI services from private companies. Since many of our government clients are considering the use of LLMs, the GovWebworks AI Lab has been tracking the benefits, risks, and emergent Federal and State guidelines.
To facilitate the decision-making process around LLM adoption, we’ve compiled the following primers:
- Understanding LLMs: How LLMs work (see this post)
- Risks of Adoption: What to know (see this post)
- 9 Gov Tech Use Cases for LLMs: Tools agencies can safely leverage (see link)
Understanding LLMs
LLMs represent a revolution, not an evolution. Like the radio, combustion engine, telephone, and internet, generative AI opens new and unforeseen areas for products, services, and economies by drastically changing not just how we interact with applications, content, and data, but by introducing assisted automation.
AI Lab’s top picks for Large Language Model applications for government agencies9 Gov Tech Use Cases for LLMs
On one hand, LLMs have the potential to drastically improve and democratize how we:
- Create and refine content
- Build and interact with software solutions
- Find and consume information
On the other hand, LLMs bring risks around:
- Privacy and security
- Bias, hallucinations, and prompt leaking
- Uncertain legal landscape
What we’re seeing currently with LLMs is a massively accelerated timeline of adoption and improvements. While AI has been around for decades, Generative AI based on LLMs has only been widely available since ChatGPT launched in November 2022. Within a short time period, the trial and adoption rate has reached over 15% of the population, and some reports, such as this one by Predibase points to 58% of businesses already experimenting with LLMs.
Now, when we think of LLMs, we automatically think of tools like ChatGPT and Bard which can generate all kinds of text, or image generation tools like DALL-E, Stable Diffusion, and Midjourney. However, applications of LLMs take many forms beyond Generative AI. Businesses and governments are already leveraging LLMs for a wide variety of applications such as semantic search, machine translation, classification, and text augmentation. See 9 Gov Tech Use Cases for LLMs.
How LLMs work
At the heart of LLMs is the concept that they function similarly to the human brain, harnessing the ability to identify relationships between data such as words, and in doing so, understand context and meaning. This is what empowers tools like ChatGPT and other solutions built on top of LLMs to take natural language inputs (prompts) and “understand” the intent.
LLMs (and other AI) are prediction machines. An LLM in many ways is a statistical model that determines what it should “say” next. An LLM takes an input (a prompt) and based on the instructions, makes a stream of predictions for what it should output as a response. This could be the next word to say when generating text, or the next piece of code to write for an application. How it makes these predictions can be influenced by a variety of factors and settings.
Language Models Defined
“A language model is a probabilistic model of a natural language that can generate probabilities of a series of words, based on text corpora in one or multiple languages it was trained on. Language models are useful for a variety of tasks, including speech recognition, machine translation, natural language generation, optical character recognition, handwriting recognition, grammar induction, and information retrieval.A large language model is a type of language model notable for its ability to achieve general-purpose language understanding and generation. LLMs acquire these abilities by using massive amounts of data to learn billions of parameters during training and consuming large computational resources during their training and operation.” –Wikipedia
How LLMs learn
The current crop of LLMs are based on a type of machine learning model called transformer models. They allow for the model to understand context and relationships between things using a concept known as attention, first defined in a 2017 paper by Ashish Vaswani, et al.
In the following example from Nvidia, we see how changing just a single word can change the relationship, meaning, and where the attention is focused:
In the sentence, She poured water from the pitcher to the cup until it was full, we know “it” refers to the cup, while in the sentence, She poured water from the pitcher to the cup until it was empty, we know “it” refers to the pitcher.
“Meaning is a result of relationships between things, and self-attention is a general way of learning relationships,” Vaswani notes. In other words, the attention mechanism is what allows the model to focus on specific parts of a sequence in order to make a prediction about what will come next.
LLMs can also learn as you use them. In-context learning is a technique that allows large language models to learn new tasks by providing additional data in the form of a prompt. In-context learning can be achieved via what is known as One-Shot and Few-Shot Learning.
One-Shot Learning
As part of prompt engineering, you can provide examples of what you want the system to accomplish by giving a single example of the task you want the LLM to do: “Take some text, and assign it a category.” To use one-shot learning, let’s say you wanted to have the system categorize emails that are submitted via a support form so they can be routed and triaged appropriately.
A basic prompt for an LLM might be:
Classify the following text into one of the following categories: Road Hazards, Sidewalk Maintenance, Other
Then, to help the model better predict the appropriate classification, you can add an example of how you want it to complete the task:
There’s a large pothole on 5th and main. Cars are getting damaged by it. Category: Road Hazards
Few-Shot Learning
Often, you can improve the results even further by providing a few examples, also known as few-shot learning. To extend the above example, you might provide the following prompt:
Classify the following text into one of the following categories: Road Hazards, Sidewalk Maintenance, Other
Then add the following examples:
There’s a large pothole on 5th and main. Cars are getting damaged by it. Category: Road Hazards The path going by my house is covered in snow. Someone needs to plow it. Category: Sidewalk Maintenance There’s bird poop all over the cars in the parking lot. Can someone do something about this? Category: Other
With in-context learning, providing examples within the prompt provides the ability for the model to learn what you want it to do in the prompt’s context. The ability to essentially “train” LLMs is what makes them so powerful.
Risks of Adoption
Despite the benefits, successfully incorporating LLMs into products and services comes with potential technical, cultural, legal, and marketing challenges. Concerns about security, privacy, bias, and unintended consequences are driving the discussions and the introduction of laws regarding the implementation, use, transparency, and oversight of AI and LLM based technologies. A variety of strategies and tools that can be leveraged to help overcome or mitigate some of these issues will be discussed in the next post on 9 Gov Tech Use Cases for LLMs.
Some risks to be aware of include:
- Bias, non-deterministic results, and hallucinations
- Security, privacy, and costs
- Legal landscape
Bias
One of the primary concerns around LLMs is bias. Machine learning models like ChatGPT and Bard are trained on massive datasets and human generated content sourced from the internet which may contain inherent biases present in human language, behavior, and prevalent in society. It can also stem from uneven training data, and policy decisions. Due to the changing nature of the models, harmful bias in generated language and images can reoccur if they lack proper oversight, review, moderation, and feedback mechanisms.
Non-deterministic results
Generative AI results vary each time you run a prompt. Unlike traditional code, where a function defines the exact output that will be provided each time, an LLM uses a neural network and predictions to decide what to output, and the results from the same question could vary and cause confusion on which response is most correct.
Hallucinations
LLMs can be a source of misinformation. Because LLMs are trained to give you a response, when you ask it what the author of a particular book is, it knows it is supposed to respond with the name of a person. Without careful guidance, there is the potential that it could simply make up a name, thereby, as far as it is concerned, fulfilling the request successfully. When LLMs make up information like this, that is called “hallucinating”.
Hallucinations can be mitigated through careful prompt engineering. For example, “You are a librarian. Your job is to provide information about books such as, when you give the title of a book, providing the author of the book. If you do not know the name of the author, only reply that you do not know the name of the author.”
Additionally, LLMs have a setting called temperature. The higher the temperature, the more creative they can be when generating text. It can take some experimentation to find the appropriate temperature for the type of results that you want, but generally, when you want factual information returned, it can be helpful to use a lower temperature.
Security
There are a variety of security challenges that are unique to LLMs and Generative AI. These include:
- Prompt Leaking: Leaking can occur when users attempt to retrieve the prompt used in a particular LLM application. Having access to the prompt can potentially expose proprietary or application specific information that is not intended for public consumption.
- Prompt Poisoning: Hackers can insert prompts into LLMs telling them to act in ways that may be offensive or contrary to the desires of an agency and cause risks to security and brand identity.
- API/Integration: Frequently LLMs will be integrated or allowed to communicate with other services. Extra care needs to be taken to ensure the proper access controls are in place so that someone looking to exploit a prompt can’t gain access to services or capabilities that should be restricted. For example, imagine an application that lets users query a database with a natural language prompt such as, “Show me all orders placed in the last week.” If the integration is not read-only, and someone could give the prompt, “Delete all orders.”
- PII and Proprietary Information: With natural language interfaces, users may inadvertently provide Personally Identifiable Information. Safeguards need to be put in place to protect this content as needed.
Privacy
LLMs like ChatGPT are trained on large amounts of data about people, which means predictions can be made about a user based on the questions being asked. “ChatGPT can infer a lot of sensitive information about the people they chat with, even if the conversation is utterly mundane,” according to a recent article in Wired. The article notes that data can be harvested from unsuspecting users in order to profile them for advertising or other purposes to which they have not consented.
Another potential privacy concern is identifying what gets stored by third party LLM services. For example, do they store user’s prompt histories? What data do they use to train new iterations of the LLM? It’s important to know the answer to these questions.
Costs
Costs for LLMs can vary widely, and what may seem like small fees can add up to large sums. For example, LLMs as a service, such as OpenAI’s ChatGPT, charge by the token. A token can roughly be considered a single word. For applications or sites processing large quantities of text this can represent a non-trivial cost.
If utilizing an LLM service/api such as OpenAI, the costs can be dramatically different based on which model you use. GPT-4 can be 10 to 20 times more expensive than GPT 3.5-turbo. Depending on the use case, the lower model may perform equally well.
When a foundation model does not provide the specific domain requirements for your application, a method known as fine-tuning can be used, where you take an existing model and add training data to it. Executing training on an existing foundation model on the entire works of Shakespeare costs only about $40 to run, but that does not include the time identifying the training material, reviewing it, and cleaning it up. Many companies are training their own models from scratch and those costs can easily reach millions of dollars.
An alternative to LLM as a service is to host your own LLM. Costs here would include hosting on a service such as Amazon SageMaker and related fees, plus the staffing and hours to develop it.
Some cost-related questions to consider include:
- Will usage be mostly upfront, one-time costs, or ongoing?
- If processing a large set of documents for document understanding, how many new documents per month would be added? How large are those documents?
- Will custom or fine-tuned LLM models be needed?
Legal landscape
To address the risks noted above, artificial intelligence bills are being developed at State and Federal levels. Federally, according to the recent White House executive order on AI, “the Biden-Harris Administration, through the Office of Management and Budget, is releasing for public comment its first-ever draft policy guidance on the use of AI by the U.S. government. This draft policy builds on prior leadership—including the Blueprint for an AI Bill of Rights and the National Institute of Standards and Technology (NIST) AI Risk Management Framework—and outlines concrete steps to advance responsible AI innovation in government, increase transparency and accountability, protect federal workers, and manage risks from sensitive uses of AI.”
At the State level, the National Conference of State Legislators reports that 25 states that have introduced artificial intelligence bills as of October 2023, with 15 states adopting resolutions or enacting legislation, and other states in various levels of decision making.
In one example, “Connecticut required the State Department of Administrative Services to conduct an inventory of all systems that employ artificial intelligence and are in use by any state agency and, beginning Feb. 1, 2024, perform ongoing assessments of systems that employ AI and are in use by state agencies to ensure that no such system shall result in unlawful discrimination or disparate impact. Further, the legislation requires the Office of Policy and Management to establish policies and procedures concerning the development, procurement, implementation, utilization and ongoing assessment of systems that employ AI and are in use by state agencies.”
In our home state of Maine, the proposed Data Privacy and Protection Act would limit the amount of personal data companies can collect online about a person. Other regulations under consideration include a proposal that would require “covered entities using covered algorithms (broadly defined, including machine learning, AI, and natural language processing tools) to collect, process, or transfer data ‘in a manner that poses a consequential risk of harm’ complete an impact assessment of the algorithm…submitted to the Attorney General’s office within 30 days of finishing it.”
In June 2023, Maine Information Technology (MaineIT) issued a six-month moratorium on the use of Generative AI (ChatGPT and any other software that generates images, music, computer code, voice simulation and art) for all executive branch state agencies on any device connected to the state’s network.
Expect these guidelines to be a moving target as laws are passed and updated. See GovTech’s webinar on Generative AI for State and Local Government for further discussion of these issues.
In summary
The AI industry is still in its early stages, with products and services still maturing, and new ones coming out at a steady pace. As with the advent of the automobile, and any new technology, the risks are part of the challenge. We can expect a level of consolidation to happen, with many smaller players and products either getting absorbed by larger players or shutting down. Some level of safety can be had by sticking with existing big players such as Amazon, Google, Microsoft, and IBM. However, the LLM products themselves are evolving fairly quickly, so unlike management of cloud services like a database where the technology is stable and does not change frequently, more resources will be needed to keep up with changes around LLMs. Stay alert for updates.
Use cases and risk mitigation
In the next post, 9 Gov Tech Use Cases for LLMs, we’ll address some relatively stable recommended uses for LLMs in the government space.
Schedule a demo
For further exploration of these topics, the GovWebworks AI Lab can help clients identify the right value-based solutions that leverage artificial intelligence for your agency. Whether it’s developing a pilot program or a large-scale integration, we plan and implement solutions that meet business objectives. Contact GovWebworks for a free demo.
Learn more
- Contact GovWebworks to learn more and schedule a free demo
- Subscribe to the GovWebworks AI Lab Newsletter, edited by Adam Kempler
- 9 Gov Tech Use Cases for LLMs: AI Lab’s top picks for Large Language Model applications for government agencies, by Adam Kempler