Blog

SOLAR-10.7B Fine-Tuned Model Tutorial

A complete guide to using the SOLAR-10.7B fine-tuned model for instruction-based tasks in a real-world scenario.

Updated Jan 2024 · 13 min read

The SOLAR-10.7B project represents a significant leap forward in the development of large language models, introducing a new approach to scaling these models in an effective and efficient manner.

This article starts by explaining what the SOLAR-10.7 B model is before highlighting its performance against other large language models and diving into the process of using its specialized fine-tuned version. Lastly, the reader will understand the potential applications of the fine-tuned SOLAR-10.7 B-Instruct model and its limitations.

What is SOLAR-10.7B?

SOLAR-10.7B is a 10.7 billion parameters model developed by a team at Upstage AI in South Korea.

Based on the Llama-2 architecture, this model surpasses other large language models with up to 30 billion parameters, including the Mixtral 8X7B model.

To learn more about Llama-2, our article Fine-Tuning LLaMA 2: A Step-by-Step Guide to Customizing the Large Language Model provides a step-by-step guide to fine-tune Llama-2, using new approaches to overcome memory and computing limitations for better access to open-source large language models.

Furthermore, building upon the robust foundation of SOLAR-10.7B, the SOLAR 10.7B-Instruct model is fine-tuned with an emphasis on following complex instructions. This variant demonstrates enhanced performance, showcasing the model's adaptability and the effectiveness of fine-tuning in achieving specialized objectives.

Finally, SOLAR-10.7B introduces a method called Depth Up-Scaling, and let’s further explore that in the following section.

The Depth Up-Scaling method

This innovative method allows for the expansion of the model's neural network depth without requiring a corresponding increase in computational resources. Such a strategy enhances both the efficiency and the overall performance of the model.

Essential elements of Depth Up-Scaling

The Depth Up-Scaling is based on three main components: (1) Mistral 7B weight, (2) Llama 2 framework, and (3) Continuous pre-training.

Depth up-scaling for the case with n = 32, s = 48, and m = 8. Depth up-scaling is achieved through a dual-stage process of depthwise scaling followed by continued pretraining. (Source)

Base Model:

Utilizes a 32-layer transformer architecture, specifically the Llama 2 model, initialized with pre-trained weights from Mistral 7B.
Chosen for its compatibility and performance, aiming to leverage community resources and introduce novel modifications for enhanced capabilities.
Serves as the foundation for depthwise scaling and further pretraining to scale up efficiently.

Depthwise Scaling:

Scales the base model by setting a target layer count for the scaled model, considering hardware capabilities.
Involves duplicating the base model, removing the final m layers from the original and the initial m layers from the duplicate, then concatenating them to form a model with s layers.
This process creates a scaled model with an adjusted number of layers to fit between 7 and 13 billion parameters, specifically using a base of n=32 layers, removing m=8 layers to achieve s=48 layers.

Continued Pretraining:

Addresses the initial performance drop after depthwise scaling by continuing to pre-train the scaled model.
Rapid performance recovery was observed during continued pre-training, which was attributed to reducing heterogeneity and discrepancies in the model.
Continued pre-training is crucial for regaining and potentially surpassing the base model's performance, leveraging the depthwise scaled model's architecture for effective learning.

These summaries highlight the key strategies and outcomes of the Depth Up-Scaling approach, focusing on leveraging existing models, scaling through depth adjustment, and enhancing performance through continued pretraining.

This multifaceted approach SOLAR-10.7B achieves and, in many cases, exceeds the capabilities of much larger models. This efficiency makes it a prime choice for a range of specific applications, showcasing its strength and flexibility.

How Does The SOLAR 10.7B Instruct Model Work?

SOLAR-10.7B instruct excels at interpreting and executing complex instructions, making it incredibly valuable in scenarios where precise understanding and responsiveness to human commands are crucial. This capability is essential for developing more intuitive and interactive AI systems. 

SOLAR 10.7B instruct results from fine-tuning the original SOLAR 10.7B model to follow instructions in QA format.
The fine-tuning mostly uses open-source datasets along with synthesized math QA datasets to enhance the model’s mathematical skills.
The first version of the SOLAR 10.7B instruct is created by integrating the Mistral 7B weights to strengthen its learning capabilities for effective and efficient information processing.
The backbone of the SOLAR 10.7B is the Llama2 architecture, which provides a blend of speed and accuracy.

Overall, the fine-tuned SOLAR-10.7B model's importance lies in its enhanced performance, adaptability, and potential for widespread application, driving forward the fields of natural language processing and artificial intelligence.

Potential Applications of the Fine-Tuned SOLAR-10.7B Model

Before diving into the technical implementation, let’s explore some of the potential applications of a fine-tuned SOLAR-10.7B model.

Below are some examples of personalized education and tutoring, enhanced customer support, and automated content creation.

Personalized education and tutoring: SOLAR-10.7B-Instruct can revolutionize the educational sector by providing personalized learning experiences. It can understand complex student queries, offering tailored explanations, resources, and exercises. This capability makes it an ideal tool for developing intelligent tutoring systems that adapt to individual learning styles and needs, enhancing student engagement and outcomes. 
Better customer support: SOLAR-10.7B-Instruct can power advanced chatbots and virtual assistants capable of understanding and resolving complex customer inquiries with high accuracy. This application not only improves customer experience by providing timely and relevant support but also reduces the workload on human customer service representatives by automating routine queries. 
Automated content creation and summarization: For media and content creators, SOLAR-10.7B-Instruct offers the ability to automate the generation of written content, such as news articles, reports, and creative writing. Additionally, it can summarize extensive documents into concise, easy-to-understand formats, making it invaluable for journalists, researchers, and professionals who need to quickly assimilate and report on large volumes of information.

These examples underscore the versatility and potential of SOLAR-10.7B-Instruct to impact and improve efficiency, accessibility, and user experience across a broad spectrum of industries.

A Step-By-Step Guide to Using SOLAR -10.7B Instruct

We have enough background about the SOLAR-10.7B model, and it is time to get our hands dirty.

This section aims to provide all the instructions to run the SOLAR 10.7 Instruct v1.0 - GGUF model from upstage.

The codes are inspired by the official documentation on Hugging Face. The main steps are defined below:

Install and import the necessary libraries
Define the SOLAR-10.7B model to use from Hugging Face
Run the model inference
Generate the result from users’s request

Libraries Installation

The main libraries used are transformers and accelerate.

The transformers library provides access to pre-trained models, and the version specified here is 4.35.2.
The accelerate library is designed to simplify running machine learning models on different hardware (CPUs, GPUs) without having to deeply understand the hardware specifics.

%%bash
pip -q install transformers==4.35.2
pip -q install accelerate

Import Libraries

Now that the installation is completed, we proceed by importing the following necessary libraries:

torch is the PyTorch library, a popular open-source machine learning library used for applications like computer vision and NLP.
AutoModelForCausalLM is used to load a pre-trained model for causal language modeling , and AutoTokenizer is responsible for converting text into a format that the model can understand (tokenization).

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

GPU Configuration

The model being used is the version 1 of the SOLAR-10.7B model from Hugging Face.

A GPU resource is necessary for speeding up the model loading and inference process.

The access to GPU on Google Colab is illustrated in the graphic below:

From the Runtime tab, select Change runtime
Then, choose T4 GPU from the Hardware accelerator section and Save the changes

This will switch the default runtime to T4:

We can check the properties of the runtime by running the following command from the Colab notebook.

!nvidia-smi

GPU properties

Model definition

Everything is set up; we can further with the loading of the model as follows:

model_ID = "Upstage/SOLAR-10.7B-Instruct-v1.0"

tokenizer = AutoTokenizer.from_pretrained(model_ID)
model = AutoModelForCausalLM.from_pretrained(
	model_ID,
	device_map="auto",
	torch_dtype=torch.float16,
)

model_ID is a string that uniquely identifies the pre-trained model we want to use. In this case, "Upstage/SOLAR-10.7B-Instruct-v1.0" is specified.
AutoTokenizer.from_pretrained(model_ID) loads a tokenizer pre-trained on the model_ID specified, preparing it for processing text input.
AutoModelForCausalLM.from_pretrained() loads the causal language model itself, with device_map="auto" to automatically use the best available hardware (the GPU we have set up), and torch_dtype=torch.float16 for using 16-bit floating-point numbers to save memory and potentially speed up computations.

Model inference

Before generating a response, the input text (user's request) is formatted and tokenized.

user_request contains the question or input for the model.
conversation formats the input as part of a conversation, tagging it with a role (e.g., 'user').
apply_chat_template applies a conversational template to the input, preparing it for the model in a format it understands.
tokenizer(prompt, return_tensors="pt") converts the prompt into tokens and specifies the tensor type ("pt" for PyTorch tensors), and .to(model.device) ensures the input is on the same device (CPU or GPU) as the model.

user_request = "What is the square root of 24?"

conversation = [ {'role': 'user', 'content': user_request} ]

prompt = tokenizer.apply_chat_template(conversation, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

Result generation

The final section uses the model to generate a response to the input question and then decodes and prints the generated text.

model.generate() generates text based on the provided inputs, with use_cache=True to speed up generation by reusing previously computed results. max_length=4096 limits the maximum length of the generated text.
tokenizer.decode(outputs[0]) converts the generated tokens back into human-readable text.
print statement displays the generated answer to the user's question.

outputs = model.generate(**inputs, use_cache=True, max_length=4096)
output_text = tokenizer.decode(outputs[0])
print(output_text)

The successful execution of the code above generates the following result:

By replacing the user request with the following text, we get the response generated

user_request = "Tell me a story about the universe"

Limitations of the SOLAR-10.7B model

Despite all the benefits of the SOLAR-10.7B, it has its own limitations like any other large language model, and the main ones are highlighted below:

Thorough hyperparameter exploration: the need for a more thorough exploration of hyperparameters of the model during the Depth Up-Scaling (DUS) is a key limitation. This caused the removal of 8 layers from the base model due to hardware limitations.
Computational demanding: the model is significantly demanding in computation resources, and this limits its use by individuals and organizations with lower computational capabilities.
Vulnerability to bias: potential bias in the training data could impact the performance of the model for some use cases.
Environmental concern: the training and inferencing of the model requires significant energy consumption, which can raise environmental concerns.

Conclusion

This article has explored the SOLAR-10.7B model, highlighting its contribution to artificial intelligence through the depth up-scaling approach. It has outlined the model’s operation and potential applications, and provided a practical guide for its use, from installation to generating results.

Despite its capabilities, the article also addressed the SOLAR-10.7B model's limitations, ensuring a well-rounded perspective for users. As AI continues to evolve, SOLAR-10.7B exemplifies the strides made toward more accessible and versatile AI tools.

For those looking to delve deeper into AI's potential, our tutorial, FLAN-T5 Tutorial: Guide and Fine-Tuning, offers a complete guide to fine-tuning a FLAN-T5 model for a question-answering task using transformers library and running optimized inference on a real-world scenario. You can also find our Fine-Tuning GPT-3.5 tutorial and our code-along on fine-tuning your own LlaMA 2 model.

Author

Zoumana Keita

Topics

Artificial Intelligence (AI)

Start Your AI Journey Today!

Course

Generative AI Concepts

2 hr

17K

Discover how to begin responsibly leveraging generative AI. Learn how generative AI models are developed and how they will impact society moving forward.

See Details

Start Course

Course

Generative AI for Business

1 hr

2.4K

Learn the role Generative Artificial Intelligence plays today and will play in the future in a business environment.

See Details

Start Course

Course

Working with the OpenAI API

3 hr

10.9K

Start your journey developing AI-powered applications with the OpenAI API. Learn about the functionality that underpins popular AI applications like ChatGPT.

See Details

Start Course

blog

What is Llama 3? The Experts' View on The Next Generation of Open Source LLMs

Discover Meta’s Llama3 model: the latest iteration of one of today's most powerful open-source large language models.

Richie Cotton

5 min

blog

Attention Mechanism in LLMs: An Intuitive Explanation

Learn how the attention mechanism works and how it revolutionized natural language processing (NLP).

Yesha Shastri

8 min

blog

Top 13 ChatGPT Wrappers to Maximize Functionality and Efficiency

Discover the best ChatGPT wrappers to extend its capabilities

Bex Tuychiev

5 min

podcast

How Walmart Leverages Data & AI with Swati Kirti, Sr Director of Data Science at Walmart

Swati and Richie explore the role of data and AI at Walmart, how Walmart improves customer experience through the use of data, supply chain optimization, demand forecasting, scaling AI solutions, and much more.

Richie Cotton

31 min

podcast

Creating an AI-First Culture with Sanjay Srivastava, Chief Digital Strategist at Genpact

Sanjay and Richie cover the shift from experimentation to production seen in the AI space over the past 12 months, how AI automation is revolutionizing business processes at GENPACT, how change management contributes to how we leverage AI tools at work, and much more.

Richie Cotton

36 min

tutorial

How to Improve RAG Performance: 5 Key Techniques with Examples

Explore different approaches to enhance RAG systems: Chunking, Reranking, and Query Transformations.

Eugenia Anello

See More See More

What is SOLAR-10.7B?

The Depth Up-Scaling method

Essential elements of Depth Up-Scaling

How Does The SOLAR 10.7B Instruct Model Work?

Potential Applications of the Fine-Tuned SOLAR-10.7B Model

A Step-By-Step Guide to Using SOLAR -10.7B Instruct

Libraries Installation

Import Libraries

GPU Configuration

Model definition

Model inference

Result generation

Limitations of the SOLAR-10.7B model

Conclusion

What is Llama 3? The Experts' View on The Next Generation of Open Source LLMs

Attention Mechanism in LLMs: An Intuitive Explanation

Top 13 ChatGPT Wrappers to Maximize Functionality and Efficiency

How Walmart Leverages Data & AI with Swati Kirti, Sr Director of Data Science at Walmart

Creating an AI-First Culture with Sanjay Srivastava, Chief Digital Strategist at Genpact

How to Improve RAG Performance: 5 Key Techniques with Examples

.css-1531qan{-webkit-text-decoration:none;text-decoration:none;color:inherit;}Generative AI Concepts

Generative AI for Business

Working with the OpenAI API

What is Llama 3? The Experts' View on The Next Generation of Open Source LLMs

Attention Mechanism in LLMs: An Intuitive Explanation

Top 13 ChatGPT Wrappers to Maximize Functionality and Efficiency

How Walmart Leverages Data & AI with Swati Kirti, Sr Director of Data Science at Walmart

Creating an AI-First Culture with Sanjay Srivastava, Chief Digital Strategist at Genpact

How to Improve RAG Performance: 5 Key Techniques with Examples

Generative AI Concepts