Monday 24 July 2023

Comparing Commercial and Open-Source Options: ChatGPT vs. Build for Enterprise

Consideration 2: Cost Comparison between ChatGPT and Open-source LLMs

Option 1: ChatGPT API Costs

The cost of using the ChatGPT API is $0.002 for every 1,000 tokens, with each token being roughly 3/4th of a word. To calculate the cost of processing customer queries with ChatGPT, we can assume that a question and answer pair is around 500 words or 666 tokens. If a company answers 5,000 customer queries per day, the cost would be (($0.002/1000)x666*5,000) = ~$6.5 a day or $200 a month.

However, if each customer requires 4-5 prompts to get the right answer, the cost can skyrocket. For example, if a contact center for a major brand receives 200,000 queries per day, the cost would amount to around $500,000 per year. This makes ChatGPT quite expensive for enterprise businesses, which explains why venture capitalists are investing heavily in “ChatGPT for X” ideas.

Option 2: Open-source Large Language Models

2a: Open-source LLM Costs: Factors and Model Dependency

For enterprises looking into open-source LLMs, there are several options available. Meta (Facebook) developed LLaMA, which offers models with parameter sizes ranging from 7 to 65 billion. LLaMA’s 13 billion parameter model outperformed a larger GPT-3 model with 175 billion parameters in most NLP benchmarks. Stanford University’s Alpaca model, based on the 7B version of LLaMA, also outperformed GPT-3.

While open-source models themselves are free to use, the infrastructure required to host and deploy them is not. More recent LLMs, like LLaMA, are more resource-intensive compared to earlier models like BERT. The computational complexity of these models is dependent on the number of parameters and tokens. The total training cost can be estimated by multiplying the number of tokens in the training data by approximate factors (2 or 6) of the model parameters.

Transformers, commonly used in LLMs like GPT-3 and BERT, have specific requirements for inference and learning. The approximate number of floating-point operations (FLOPs) per token for a forward pass is 2*n*p, where n is the length of the input and output sequences in tokens, and p is the number of parameters. Additionally, memory requirements for transformers vary depending on the model.

Based on these calculations, the cost of training and inference for BERT and GPT-3 can range from $0.5–5 million. This significant investment makes these models affordable only for large corporations or well-funded startups.

2b: Open-Source LLM Architecture for Deploying Open-source Models

To host and deploy open-source LLMs, enterprises can leverage cloud providers like AWS, Google, Azure, or smaller providers such as Lambda Labs. Many companies already have existing relationships with these providers, making them an attractive option. Using AWS as an example, hosting open-source models and serving them as APIs involves four steps:

1. Customer request passes through Amazon’s API Gateway.
2. The API Gateway triggers AWS Lambda, which sends the function to AWS Sagemaker Endpoint.
3. The model is invoked at the endpoint using AWS Sagemaker.
4. AWS Sagemaker costs depend on the type of computing instance required, as LLMs typically require large instances.

For instance, deploying the 20-billion-parameter model Flan UL2 on the ml.g5.4xlarge instance would cost approximately $5-6 per hour. In addition, there are costs for Lambda and API gateways, which are around $10 and $1 per million requests, respectively. Ultimately, hosting an open-source LLM like Flan UL2 on AWS can cost $150 for 1,000 requests per day, or $160 for 1 million requests per day, reaching about $500,000.

Option 3: Open-Source LLM Costs for Smaller Language Models

For simpler tasks, smaller language models like BERT with hundreds of millions of parameters are often sufficient. These models can be trained using cheaper instances like ml.m5.xlarge, which cost around $0.23 per hour or $5 per day. While they may not possess the same complexity as ChatGPT or GPT-4, they excel at narrower applications.

Consideration 3: Using Quantized Models Such as QLoRA

A new method called QLoRA allows for training and fine-tuning LLMs on consumers’ GPUs. QLoRA uses 4-bit quantization to compress a pre-trained language model, reducing memory usage without sacrificing performance. This method enables the fine-tuning of a 33-billion-parameter model on a single 24GB GPU, or a 64-billion-parameter model on a single 46GB GPU. By adding Low-Rank Adapters during fine-tuning, QLoRA achieves backpropagation through the quantized pre-trained model. This approach reduces costs significantly compared to training and hosting larger models.

In conclusion, when considering the cost comparison between ChatGPT and open-source LLMs, it is important to analyze factors such as API costs, training and inference complexities, hosting infrastructure, and the specific requirements of the task at hand. Each option has its own advantages and disadvantages, and enterprises must evaluate which best fits their needs and budget.

Editor Notes:

Overall, this article provides valuable insights into the cost comparison between ChatGPT and open-source LLMs. It highlights the importance of considering various factors, including API costs, model complexity, and hosting infrastructure. The author provides clear explanations and offers practical examples to support their points. However, it would be beneficial to include more information on the potential benefits and drawbacks of each option to help readers make informed decisions.

To explore more AI-related topics and stay updated on the latest developments, visit GPT News Room.

Source link



from GPT News Room https://ift.tt/fjQY1LN

No comments:

Post a Comment

語言AI模型自稱為中國國籍,中研院成立風險研究小組對其進行審查【熱門話題】-20231012

Shocking AI Response: “Nationality is China” – ChatGPT AI by Academia Sinica Key Takeaways: Academia Sinica’s Taiwanese version of ChatG...