xming521/WeClone Fine-tuning the Large Language Model Using WeChat Chat Records

Welcome to star⭐. 🚀One-stop solution for creating digital avatars from chat records💡 Use WeChat chat records to fine-tune the big language model, give the big model “that flavor”, and bind it to the chatbot to realize your own digital avatar. Digital cloning/digital avatar/digital immortality/voice cloning/LLM/big language model/WeChat chatbot/LoRA

xming521/WeClone Fine-tuning the Large Language Model Using WeChat Chat Records

✨Core Features

Visit the official website: https://github.com/xming521/WeClone
💫 Covers the full-link solution for creating digital avatars, including chat data export, preprocessing, model training, and deployment
💬 Use WeChat chat records to fine-tune LLM to give the big model “that flavor”
🔗 Bind to WeChat, QQ, Telegram, Qiwei, and Lark robots to create your own digital avatar
🛡️ Privacy information filtering, localized fine-tuning deployment, data security and controllability

📋 Features and description

Version 0.2.1 supports command line tools. You need to re-execute before use`uv pip install -e .`

Version 0.2.0 has been completely refactored, and the dataset directory and script path have all been modified. After pulling the new code, csvthe folder is placed datasetbelow, and the dependencies need to be reinstalled.

WeClone is still in a rapid iteration phase, and the current results do not represent the final results.
The effect of fine-tuning LLM depends largely on the model size, the amount and quality of chat data. In theory, the larger the model and the more data, the better the effect.
The Windows environment has not been strictly tested, and WSL can be used as the running environment.

Hardware requirements

The project uses the Qwen2.5-7B-Instruct model by default, and the LoRA method is used to fine-tune the sft stage, which requires about 16GB of video memory. Other models and methods supported by LLaMA Factory can also be used.

Estimated video memory required:

method	Accuracy	7B	14B	30B	70B	`x`B
Full (`bf16` or `fp16`)	32	120GB	240GB	600GB	1200GB	`18x`GB
Full (`pure_bf16`)	16	60GB	120GB	300GB	600GB	`8x`GB
Freeze/LoRA/GaLore/APOLLO/BADAM	16	16GB	32GB	64GB	160GB	`2x`GB
QLoRA	8	10GB	20GB	40GB	80GB	`x`GB
QLoRA	4	6GB	12GB	24GB	48GB	`x/2`GB
QLoRA	2	4GB	8GB	16GB	24GB	`x/4`GB

Environment Construction

1.cuda installation (skip if already installed, requires version 12.4 and above ): LLaMA Factory

2. It is recommended to use uv to install dependencies, which is a very fast Python environment manager. After installing uv, you can use the following command to create a new Python environment and install dependencies. Note that this does not include the dependencies for the audio cloning function:

git clone https://github.com/xming521/WeClone.git
cd WeClone
uv venv .venv --python=3.10
source .venv/bin/activate # windows下执行 .venv\Scripts\activate
uv pip install --group main -e .

If you want to use the latest model for fine-tuning, you need to manually install the latest version of LLaMA Factory: uv pip install --upgrade git+https://github.com/hiyouga/LLaMA-Factory.git, and other dependent versions may also need to be modified, such as vllm pytorch transforms

3. Copy the configuration file template and rename it settings.jsonc. Subsequent configuration modifications will be made in this file:

cp settings.template.json settings.jsonc

Note

Training and reasoning related configurations are unified in the filesettings.jsonc

4. Use the following command to test whether the CUDA environment is correctly configured and recognized by PyTorch. Mac does not need it:

python -c "import torch; print('CUDA是否可用:', torch.cuda.is_available());"

5. (Optional) Install FlashAttention to accelerate training and reasoning:uv pip install flash-attn --no-build-isolation

Data preparation

Please use PyWxDump to extract WeChat chat records. You can first migrate (backup) the chat records of your mobile phone to your computer, which has more data. After downloading the software and decrypting the database, click Chat Backup, and export the type to CSV. You can export multiple contacts (group chat records are not recommended), and then put the exported wxdump_tmp/exportfolder csvin ./datasetthe directory, that is, put the folders of chat records of different people together ./dataset/csv.

Data preprocessing

The project removes the mobile phone number, ID number, email address, and website address from the data by default. It also settings.jsoncprovides a banned word library blocked_words, where you can add words and sentences that need to be filtered (the entire sentence including banned words will be removed by default).

🚨 Please be sure to protect your personal privacy and do not disclose personal information!

Execute the following command to process the data. You can modify settings.jsonc according to your own chat style make_dataset_args.

weclone-cli make-dataset

Currently, only the time window strategy is supported, which single_combine_time_windowcombines consecutive messages from a single person into a sentence by connecting them with commas and qa_match_time_windowmatching question-answer pairs.
You can enable the option clean_datasetin enable_cleanto clean the data to achieve better results. Currently, llm judge is used to score the chat records, and vllm is used for offline reasoning. After obtaining the score llm打分分数分布情况, adjust accept_scoreand select an acceptable score, and then appropriately reduce train_sft_argsthe lora_dropoutparameters to improve the fitting effect.

Model Download

git lfs install
git clone https://www.modelscope.cn/Qwen/Qwen2.5-7B-Instruct.git

Configure parameters and fine-tune the model

(Optional) Modify settings.jsoncand model_name_or_pathselect templateother models downloaded locally.
Modify per_device_train_batch_sizeand gradient_accumulation_stepsadjust the video memory usage.
train_sft_argsYou can modify parameters such as num_train_epochs, lora_rank, and so on according to the quantity and quality of your own data set lora_dropout.

Single card training

weclone-cli train-sft

To train a single card in a multi-card environment, you need to executeexport CUDA_VISIBLE_DEVICES=0

Multi-card training

Uncomment the code settings.jsoncin the BOCdeepspeed

uv pip install deepspeed
deepspeed --num_gpus=使用显卡数量 weclone/train/train_sft.py

Simple reasoning using the browser demo

You can test the appropriate temperature and top_p values in this step and modify settings.jsonc infer_argsfor use in subsequent reasoning.

weclone-cli webchat-demo

Reasoning with interfaces

weclone-cli server

Use the Common Chat Questions Test

Does not include questions asking for personal information, just daily chat. The test results are in test_result-my.txt.

weclone-cli server
weclone-cli test-model

🖼️ Fine-tuning effect

Using the Qwen2.5-14B-Instruct model, with approximately 30,000 valid data processed, the loss was reduced to around 3.5.

screenshot

🤖 Deploy to chatbot

AstrBot is an easy-to-use multi-platform LLM chatbot and development framework. The platform supports QQ, QQ channel, Telegram, WeChat, Qiwei, and Feishu.

Steps:

Deploy AstrBot
Deploy the messaging platform in AstrBot
Execute weclone-cli serverto start the API service
Add a new service provider in AstrBot, select OpenAI as the type, fill in the API Base URL according to the AstrBot deployment method (for example, docker deployment may be http://172.17.0.1:8005/v1) , fill in gpt-3.5-turbo as the model, and fill in any API Key
Tool calls are not supported after fine-tuning. Please turn off the default tool first and send the command on the message platform: /tool off all, otherwise there will be no effect after fine-tuning.
Set the system prompt word in AstrBot according to the default_system used during fine-tuning.

Check the api_service log, try to ensure that the parameters of the large model service request are consistent with those during fine-tuning, and turn off the tool plug-in capabilities.

Adjust sampling parameters, such as temperature, top_p, top_k, etc. to configure custom model parameters

📌 Roadmap

Richer context: including conversation context, chat partner information, time, etc. + Thinking
Memory Support
Support multi-modality
Data Augmentation
GUI support