Welcome to star⭐. 🚀One-stop solution for creating digital avatars from chat records💡 Use WeChat chat records to fine-tune the big language model, give the big model “that flavor”, and bind it to the chatbot to realize your own digital avatar. Digital cloning/digital avatar/digital immortality/voice cloning/LLM/big language model/WeChat chatbot/LoRA
✨Core Features
- Visit the official website: https://github.com/xming521/WeClone
- 💫 Covers the full-link solution for creating digital avatars, including chat data export, preprocessing, model training, and deployment
- 💬 Use WeChat chat records to fine-tune LLM to give the big model “that flavor”
- 🔗 Bind to WeChat, QQ, Telegram, Qiwei, and Lark robots to create your own digital avatar
- 🛡️ Privacy information filtering, localized fine-tuning deployment, data security and controllability
📋 Features and description
Version 0.2.1 supports command line tools. You need to re-execute before useuv pip install -e .
Version 0.2.0 has been completely refactored, and the dataset directory and script path have all been modified. After pulling the new code, csv
the folder is placed dataset
below, and the dependencies need to be reinstalled.
- WeClone is still in a rapid iteration phase, and the current results do not represent the final results.
- The effect of fine-tuning LLM depends largely on the model size, the amount and quality of chat data. In theory, the larger the model and the more data, the better the effect.
- The Windows environment has not been strictly tested, and WSL can be used as the running environment.
Hardware requirements
The project uses the Qwen2.5-7B-Instruct model by default, and the LoRA method is used to fine-tune the sft stage, which requires about 16GB of video memory. Other models and methods supported by LLaMA Factory can also be used.
Estimated video memory required:
method | Accuracy | 7B | 14B | 30B | 70B | x B |
---|---|---|---|---|---|---|
Full (bf16 or fp16 ) |
32 | 120GB | 240GB | 600GB | 1200GB | 18x GB |
Full (pure_bf16 ) |
16 | 60GB | 120GB | 300GB | 600GB | 8x GB |
Freeze/LoRA/GaLore/APOLLO/BADAM | 16 | 16GB | 32GB | 64GB | 160GB | 2x GB |
QLoRA | 8 | 10GB | 20GB | 40GB | 80GB | x GB |
QLoRA | 4 | 6GB | 12GB | 24GB | 48GB | x/2 GB |
QLoRA | 2 | 4GB | 8GB | 16GB | 24GB | x/4 GB |
Environment Construction
1.cuda installation (skip if already installed, requires version 12.4 and above ): LLaMA Factory
2. It is recommended to use uv to install dependencies, which is a very fast Python environment manager. After installing uv, you can use the following command to create a new Python environment and install dependencies. Note that this does not include the dependencies for the audio cloning function:
git clone https://github.com/xming521/WeClone.git cd WeClone uv venv .venv --python=3.10 source .venv/bin/activate # windows下执行 .venv\Scripts\activate uv pip install --group main -e .
If you want to use the latest model for fine-tuning, you need to manually install the latest version of LLaMA Factory: uv pip install --upgrade git+https://github.com/hiyouga/LLaMA-Factory.git
, and other dependent versions may also need to be modified, such as vllm pytorch transforms
3. Copy the configuration file template and rename it settings.jsonc
. Subsequent configuration modifications will be made in this file:
cp settings.template.json settings.jsonc
Note
Training and reasoning related configurations are unified in the filesettings.jsonc
4. Use the following command to test whether the CUDA environment is correctly configured and recognized by PyTorch. Mac does not need it:
python -c "import torch; print('CUDA是否可用:', torch.cuda.is_available());"
5. (Optional) Install FlashAttention to accelerate training and reasoning:uv pip install flash-attn --no-build-isolation
Data preparation
Please use PyWxDump to extract WeChat chat records. You can first migrate (backup) the chat records of your mobile phone to your computer, which has more data. After downloading the software and decrypting the database, click Chat Backup, and export the type to CSV. You can export multiple contacts (group chat records are not recommended), and then put the exported wxdump_tmp/export
folder csv
in ./dataset
the directory, that is, put the folders of chat records of different people together ./dataset/csv
.
Data preprocessing
- The project removes the mobile phone number, ID number, email address, and website address from the data by default. It also
settings.jsonc
provides a banned word libraryblocked_words
, where you can add words and sentences that need to be filtered (the entire sentence including banned words will be removed by default).
🚨 Please be sure to protect your personal privacy and do not disclose personal information!
- Execute the following command to process the data. You can modify settings.jsonc according to your own chat style
make_dataset_args
.
weclone-cli make-dataset
- Currently, only the time window strategy is supported, which
single_combine_time_window
combines consecutive messages from a single person into a sentence by connecting them with commas andqa_match_time_window
matching question-answer pairs. - You can enable the option
clean_dataset
inenable_clean
to clean the data to achieve better results. Currently, llm judge is used to score the chat records, and vllm is used for offline reasoning. After obtaining the scorellm打分分数分布情况
, adjustaccept_score
and select an acceptable score, and then appropriately reducetrain_sft_args
thelora_dropout
parameters to improve the fitting effect.
Model Download
git lfs install git clone https://www.modelscope.cn/Qwen/Qwen2.5-7B-Instruct.git
Configure parameters and fine-tune the model
- (Optional) Modify
settings.jsonc
andmodel_name_or_path
selecttemplate
other models downloaded locally. - Modify
per_device_train_batch_size
andgradient_accumulation_steps
adjust the video memory usage. train_sft_args
You can modify parameters such asnum_train_epochs
,lora_rank
, and so on according to the quantity and quality of your own data setlora_dropout
.
Single card training
weclone-cli train-sft
To train a single card in a multi-card environment, you need to executeexport CUDA_VISIBLE_DEVICES=0
Multi-card training
Uncomment the code settings.jsonc
in the BOCdeepspeed
uv pip install deepspeed
deepspeed --num_gpus=使用显卡数量 weclone/train/train_sft.py
Simple reasoning using the browser demo
You can test the appropriate temperature and top_p values in this step and modify settings.jsonc infer_args
for use in subsequent reasoning.
weclone-cli webchat-demo
Reasoning with interfaces
weclone-cli server
Use the Common Chat Questions Test
Does not include questions asking for personal information, just daily chat. The test results are in test_result-my.txt.
weclone-cli server
weclone-cli test-model
🖼️ Fine-tuning effect
Using the Qwen2.5-14B-Instruct model, with approximately 30,000 valid data processed, the loss was reduced to around 3.5.
🤖 Deploy to chatbot
AstrBot is an easy-to-use multi-platform LLM chatbot and development framework. The platform supports QQ, QQ channel, Telegram, WeChat, Qiwei, and Feishu.
Steps:
- Deploy AstrBot
- Deploy the messaging platform in AstrBot
- Execute
weclone-cli server
to start the API service - Add a new service provider in AstrBot, select OpenAI as the type, fill in the API Base URL according to the AstrBot deployment method (for example, docker deployment may be http://172.17.0.1:8005/v1) , fill in gpt-3.5-turbo as the model, and fill in any API Key
- Tool calls are not supported after fine-tuning. Please turn off the default tool first and send the command on the message platform:
/tool off all
, otherwise there will be no effect after fine-tuning. - Set the system prompt word in AstrBot according to the default_system used during fine-tuning.
Check the api_service log, try to ensure that the parameters of the large model service request are consistent with those during fine-tuning, and turn off the tool plug-in capabilities.
- Adjust sampling parameters, such as temperature, top_p, top_k, etc. to configure custom model parameters
📌 Roadmap
- Richer context: including conversation context, chat partner information, time, etc. + Thinking
- Memory Support
- Support multi-modality
- Data Augmentation
- GUI support
Problem Solving
- Fine-tuning questions: LLaMA-Factory| FAQs | Or more conveniently
Libre Depot original article,Publisher:Libre Depot,Please indicate the source when reprinting:https://www.libredepot.top/5426.html