Introduction to Gen AI

This quick-guide introduces you to generative AI (gen AI): how it works, how it’s trained, its limitations and abilities, and the benefits and disadvantages of using it for writing assignments.

What is gen AI?

Gen AI is a technology that produces text, images, music, and videos in response to user prompts, based on patterns it detects in the large datasets it is trained on. Most people interact with gen AI through a chatbot. Platforms that give users access to gen AI chatbots include Open AI’s Chat GPT, Google Gemini, Microsoft CoPilot, Anthropic’s Claude.ai, Perplexity, and more.

How does gen AI work?

Gen AI that produces text is powered by large language models (LLMs), a kind of AI that predicts sequences of words. An LLM is trained on massive amounts of textual data, and it detects patterns in those texts. Given a certain string of words, the LLM predicts what the next word will be, based on the patterns in its training data. When a user submits a prompt, the LLM generates a response, based on all the previous uses of a word, concept, or combination of concepts in its training data.

How is gen AI trained?

Programmers input massive amounts of text, which we call “training data,” into the LLM, and the LLM discerns patterns of language in that data. The LLM’s output reflects those patterns. For example, if many texts in the training data follow the phrase “on the other” with “hand,” the LLM’s output is likely to do the same. If the training data frequently associates the word or concept “playful” with “puppy,” the LLM’s output is likely to associate those two words and concepts as well.

Most LLMs have been trained on publicly available texts—that is, the text available on the “scrapable” internet. These include posts from Wikipedia, Reddit, and other forums; news sites; open-access books and articles; social media; and works in the public domain.

What Information does a chatbot access when It responds to a user’s prompts?

Whether a chatbot’s output is current (up to date) depends on the data it can access. In addition to drawing on their internal models, some chatbots browse the internet in real time to answer a prompt. If a chatbot was trained on the scrapable internet up to June 2025, and it cannot access the internet in real time, its responses reflect patterns and information up to June 2025, and not after.

If a chatbot was not trained on or does not have access to current texts, some of the information in its output will not be current. Similarly, it may not have access to material behind paywalls or scholarly articles in library databases. Additionally, if texts in its training data provide unreliable or inaccurate information, the LLM’s output can pass that information on, presenting it as reliable or accurate.

What are gen AI’s limitations and harms?

Inaccuracy - Gen AI chatbots frequently make up (“fabricate”) sources, source links, and quotations from sources. They also fabricate facts and data. They provide this inaccurate information in an authoritative, confident tone. For these reasons, it is crucial to check the accuracy of a chatbot’s output to your prompt.
Bias – Gen AI output reproduces the biases of its training data. These include cultural, racial, and gender bias, or any other bias from stereotypes. For example, in one LLM’s training data, mental illness was frequently associated with gun violence, homelessness, and drug addiction (Hutchinson et al., 2020). This pattern would be expected to show up in the LLM’s output. Bias can be more subtle; for example, a summary of a scholarly article generated by AI tools shifted authority from chronically ill people to their caregivers (Glazko et al., 2023). It is important to critically evaluate a chatbot’s output for bias.
Environmental damage – Training LLMs requires massive amounts of electricity, increasing CO2 emissions and putting pressure on the power grid. In addition, cooling the hardware that runs LLMs demands large amounts of water, depleting water supplies in localities where data centers are located (Zewe, 2025).
Labor – The labeling and “cleaning” of LLM training data may be carried out by vulnerable, low-wage workers who view and flag violent and traumatic content.
Non-consensual use of training material – Texts in the public domain were used to train LLMs, but authors and owners of these texts did not give their consent or receive payment for this use.

Mason and AI policy

Mason’s guidelines for AI use by students provide a set of principles and a decision-making framework for using AI. Students are responsible for their learning and for AI outputs they use.

Mason requires instructors to state their AI use policy on their syllabi. Instructor policies vary from forbidding AI use entirely to allowing AI use for all assignments. In between these two extremes are many variations. Make sure you are aware of the AI policy for each course you take, and follow that policy. Violating an instructor’s AI policy can constitute Academic Standards violations of unauthorized use, fabrication, or plagiarism.

Mason's AI Tools

Mason has released a suite of AI tools based on GPT 4o. These tools protect the data you submit to them (your queries and the materials you ask them to use or comment on) by storing it on Mason servers rather than sending it to OpenAI.

References

Gerlich, M. (2025). AI tools in society: Impacts on cognitive offloading and the future of critical thinking. Societies, 15(1), Article 6. https://doi.org/10.3390/soc15010006

Glazko, K., Yamagami, M., Desai, A., Mack, K., Potluri, V., Xu, X., & Mankoff, J. (2023). An autoethnographic case study of generative artificial intelligence’s utility for accessibility. ASSETS 23: Proceedings of the 25th International ACM SIGACCESS Conference on Computers and Accessibility, Article 99, 1 – 8. https://doi.org/10.1145/3597638.3614548

Hutchinson, B., Prabhakaran, V., Denton, E., Webster, K., Zhong, Y., & Denuyl, S. (2020). Social biases in NLP models as barriers for persons with disabilities. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 5491–5501. https://doi.org/10.18653/v1/2020.acl-main.487

Zewe, A. (2025, January 17). Explained: Generative AI’s environmental impact. MIT News. https://news.mit.edu/2025/explained-generative-ai-environmental-impact-0117

updated 11/1/2025