Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Llama Factory微调报错 #807

Open
mkygogo opened this issue Feb 3, 2025 · 8 comments
Open

Llama Factory微调报错 #807

mkygogo opened this issue Feb 3, 2025 · 8 comments

Comments

@mkygogo
Copy link

mkygogo commented Feb 3, 2025

参考readme中微调的方法,微调报错:
Setting num_proc from 16 back to 1 for the train split to disable multiprocessing as it only contains one shard.
Generating train split: 6 examples [00:00, 644.04 examples/s]
num_proc must be <= 6. Reducing num_proc to 6 for dataset of size 6.
Converting format of dataset (num_proc=6): 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:00<00:00, 44.81 examples/s]
num_proc must be <= 6. Reducing num_proc to 6 for dataset of size 6.
Running tokenizer on dataset (num_proc=6): 0%| | 0/6 [00:01<?, ? examples/s]
multiprocess.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/root/miniconda3/lib/python3.10/site-packages/multiprocess/pool.py", line 125, in worker
result = (True, func(*args, **kwds))
File "/root/miniconda3/lib/python3.10/site-packages/datasets/utils/py_utils.py", line 678, in _write_generator_to_queue
for i, result in enumerate(func(**kwargs)):
File "/root/miniconda3/lib/python3.10/site-packages/datasets/arrow_dataset.py", line 3476, in _map_single
batch = apply_function_on_filtered_inputs(
File "/root/miniconda3/lib/python3.10/site-packages/datasets/arrow_dataset.py", line 3338, in apply_function_on_filtered_inputs
processed_inputs = function(*fn_args, *additional_args, **fn_kwargs)
File "/root/autodl-tmp/llamafactory/LLaMA-Factory-main/src/llamafactory/data/processors/supervised.py", line 107, in preprocess_supervised_dataset
input_ids, labels = _encode_supervised_example(
File "/root/autodl-tmp/llamafactory/LLaMA-Factory-main/src/llamafactory/data/processors/supervised.py", line 48, in _encode_supervised_example
messages = template.mm_plugin.process_messages(prompt + response, images, videos, processor)
File "/root/autodl-tmp/llamafactory/LLaMA-Factory-main/src/llamafactory/data/mm_plugin.py", line 433, in process_messages
image_processor: "BaseImageProcessor" = getattr(processor, "image_processor")
AttributeError: 'NoneType' object has no attribute 'image_processor'
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/root/miniconda3/bin/llamafactory-cli", line 8, in
sys.exit(main())
File "/root/autodl-tmp/llamafactory/LLaMA-Factory-main/src/llamafactory/cli.py", line 112, in main
run_exp()
File "/root/autodl-tmp/llamafactory/LLaMA-Factory-main/src/llamafactory/train/tuner.py", line 92, in run_exp
_training_function(config={"args": args, "callbacks": callbacks})
File "/root/autodl-tmp/llamafactory/LLaMA-Factory-main/src/llamafactory/train/tuner.py", line 66, in _training_function
run_sft(model_args, data_args, training_args, finetuning_args, generating_args, callbacks)
File "/root/autodl-tmp/llamafactory/LLaMA-Factory-main/src/llamafactory/train/sft/workflow.py", line 51, in run_sft
dataset_module = get_dataset(template, model_args, data_args, training_args, stage="sft", **tokenizer_module)
File "/root/autodl-tmp/llamafactory/LLaMA-Factory-main/src/llamafactory/data/loader.py", line 269, in get_dataset
dataset = _get_preprocessed_dataset(
File "/root/autodl-tmp/llamafactory/LLaMA-Factory-main/src/llamafactory/data/loader.py", line 204, in _get_preprocessed_dataset
dataset = dataset.map(
File "/root/miniconda3/lib/python3.10/site-packages/datasets/arrow_dataset.py", line 560, in wrapper
out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
File "/root/miniconda3/lib/python3.10/site-packages/datasets/arrow_dataset.py", line 3165, in map
for rank, done, content in iflatmap_unordered(
File "/root/miniconda3/lib/python3.10/site-packages/datasets/utils/py_utils.py", line 718, in iflatmap_unordered
[async_result.get(timeout=0.05) for async_result in async_results]
File "/root/miniconda3/lib/python3.10/site-packages/datasets/utils/py_utils.py", line 718, in
[async_result.get(timeout=0.05) for async_result in async_results]
File "/root/miniconda3/lib/python3.10/site-packages/multiprocess/pool.py", line 774, in get
raise self._value
AttributeError: 'NoneType' object has no attribute 'image_processor'

@BUAADreamer
Copy link
Contributor

应该是有些库没有装上,暂时推荐使用transformers==4.45.0,可以稳定跑通微调和推理

【重要】使用以下方式安装最新的llamafactory以及相应的库

git clone --depth 1 https://github.com/hiyouga/LLaMA-Factory.git
cd LLaMA-Factory
pip install -e ".[torch,metrics,deepspeed,minicpm_v]"
pip3 install transformers==4.45.0
pip3 install huggingface_hub==0.25.0

@mkygogo
Copy link
Author

mkygogo commented Feb 3, 2025

transformers和hugginfface_hud都安装了你说的版本,运行还是报一样的错误哦

@BUAADreamer
Copy link
Contributor

transformers和hugginfface_hud都安装了你说的版本,运行还是报一样的错误哦

pip list 结果发一下

@mkygogo
Copy link
Author

mkygogo commented Feb 3, 2025

pip-list.txt

@BUAADreamer
Copy link
Contributor

BUAADreamer commented Feb 3, 2025

拉取最新代码后执行一下这一行,有一些语音依赖没装上

pip install -e ".[torch,metrics,deepspeed,minicpm_v]"

@BUAADreamer
Copy link
Contributor

BUAADreamer commented Feb 3, 2025

@mkygogo 能跑起来了吗?

@mkygogo
Copy link
Author

mkygogo commented Feb 3, 2025

不行哦,我是拉的llamafactory最新的代码,问题一样的,之前安装就是按照你们文档里的pip install -e ".[torch,metrics,deepspeed,minicpm_v]",全部重新来一遍还是不行,貌似安装的时候有个warning,意思不支持minicpm-v。可能要等llamafactory更新吧

@BUAADreamer
Copy link
Contributor

BUAADreamer commented Feb 3, 2025

已经有很多人都微调成功了哟,你的pip list里没有torchaudio,说明没安装成功,有可能是环境嵌套,你试一下python -m pip install以下这些库,pip show torchaudio 确认安装好再试试 @mkygogo

soundfile
torchvision
torchaudio
vector_quantize_pytorch
vocos
msgpack
referencing
jsonschema_specifications
librosa

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants