AI training data: How much do you need?

Artificial intelligence chipset on circuit board in futuristic concept technology artwork for web, banner, card, cover. Vector illustrationIf advancing your company’s adoption of generative artificial intelligence (genAI) is one of your primary areas of focus for 2025, you’re not alone: 75 percent of executives consider AI/genAI one of their top three strategic priorities this year, according to the Boston Consulting Group (BCG) AI Radar global survey.

Additionally, instead of the “plug and play” approach that many opted for when ChatGPT first debuted, many organizations have shifted toward deploying tailored genAI models, according to TechTarget. For example, Vivek Mishra, a senior Institute of Electrical and Electronics Engineers (IEEE) member, told TechTarget that a financial firm might utilize company data to train a large language model (LLM) to assist clients in selecting investments.

However, you need a substantial amount of high-quality training data to customize an LLM in that manner. If you’re a business leader thinking about implementing a bespoke genAI tool for your company, here’s what you should know about AI training data before you move forward with the project.

What is AI training data?

AI training data is the information utilized to teach AI systems and machine learning models, according to Uniphore. To create an effective and unbiased genAI tool capable of making well-informed decisions and adapting, you must train it with a significant amount of high-quality data.

Uniphore notes that AI training data can take several forms, including but not limited to the following:

  • Videos
  • Images
  • Text
  • Audio recordings
  • Sensor readings

How much AI training data do you need to create a customized model?

The amount of AI training data required increases with model size and complexity, according to IBM. The Harvard Business Review (HBR) article “How to Train Generative AI Using Your Company’s Data” notes that crafting an LLM from scratch necessitates not only massive quantities of high-quality data but also considerable data science skills and significant computing power. One example of a home-grown LLM is BloombergGPT, which Bloomberg data scientists built with 700 billion tokens (approximately 350 billion words, 50 billion parameters and 1.3 million hours of graphics processing unit time).

“Few companies have those resources available,” the HBR article notes.

Options for organizations with limited AI training data

If you want to create a customized LLM based on your organization’s data but don’t have the data to train it from scratch, you have a few other options, according to the HBR. One is adjusting the parameters of a pre-existing base model, which usually only necessitates hundreds or thousands of documents instead of millions or billions. Another approach involves tuning a pre-existing LLM with prompts so it can respond to questions specific to your company or industry.

Additionally, you can partner with an AI solution supplier or managed IT service provider that has already created models specific to the desired workflows or your industry based on data from their client base. This allows you to benefit from customized AI models even if you lack the resources to create one internally.

Our trusted technology advisors can help you craft an AI strategy that aligns with your organization’s unique needs and objectives. We maintain partnerships with leading suppliers of genAI tools and can assist you in streamlining operations across various departments, including your contact center, customer service, sales and marketing, IT support, facilities and human resources.

Additionally, we can grant you access to an AI success kit from one of our partners with tools to kickstart your journey, in-depth analyst reports, a genAI use policy template, and other valuable resources.

Start today by calling 877-599-3999 or emailing sales@stratospherenetworks.com.

Contact Us

We will handle your contact details in line with our Privacy Policy. If you prefer not to receive marketing emails from Stratosphere Networks, you can optout of all marketing communications or customize your preferences here.