2.我将特征数据下载下来后,根据minigpt4/datasets/datasets/first_face.py里面get()方法将ann_path修改为特征数据的所在路径,但是报了如下错误:
de 3 train.py --cfg-path train_configs/Emotion-LLaMA_finetune.yaml
/root/miniconda3/envs/llama/lib/python3.9/site-packages/accelerate/utils/torch_xla.py:18: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
import pkg_resources
/root/miniconda3/envs/llama/lib/python3.9/site-packages/accelerate/utils/torch_xla.py:18: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
import pkg_resources
===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
/root/miniconda3/envs/llama/lib/python3.9/site-packages/accelerate/utils/torch_xla.py:18: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
import pkg_resources
===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
| distributed init (rank 0, world 3): env://
| distributed init (rank 1, world 3): env://
| distributed init (rank 2, world 3): env://
2026-01-10 18:27:15,683 [INFO]
===== Running Parameters =====
2026-01-10 18:27:15,684 [INFO] {
"amp": true,
"device": "cuda",
"dist_backend": "nccl",
"dist_url": "env://",
"distributed": true,
"evaluate": false,
"gpu": 0,
"init_lr": 1e-05,
"iters_per_epoch": 1000,
"job_name": "minigptv2_finetune",
"lr_sched": "linear_warmup_cosine_lr",
"max_epoch": 30,
"min_lr": 1e-06,
"num_workers": 6,
"output_dir": "/root/autodl-tmp/Emotion-LLaMA/checkpoints/save_checkpoint",
"rank": 0,
"resume_ckpt_path": null,
"seed": 42,
"task": "image_text_pretrain",
"train_splits": [
"train"
],
"wandb_log": false,
"warmup_lr": 1e-06,
"warmup_steps": 1000,
"weight_decay": 0.05,
"world_size": 3
}
2026-01-10 18:27:15,684 [INFO]
====== Dataset Attributes ======
2026-01-10 18:27:15,684 [INFO]
======== feature_face_caption =======
2026-01-10 18:27:15,685 [INFO] {
"batch_size": 1,
"build_info": {
"ann_path": "/root/autodl-tmp/Emotion-LLaMA/dataset",
"annotation_file": "/root/autodl-tmp/Emotion-LLaMA/dataset/MERR_coarse_grained.txt",
"image_path": "/root/autodl-tmp/Emotion-LLaMA/dataset/MER2023/mer2023train_unzip/train"
},
"data_type": "images",
"sample_ratio": 30,
"text_processor": {
"train": {
"name": "blip_caption"
}
},
"vis_processor": {
"train": {
"image_size": 448,
"name": "blip2_image_train"
}
}
}
2026-01-10 18:27:15,685 [INFO]
====== Model Attributes ======
2026-01-10 18:27:15,685 [INFO] {
"arch": "minigpt_v2",
"chat_template": true,
"ckpt": "/root/autodl-tmp/Emotion-LLaMA/checkpoints/minigptv2_checkpoint.pth",
"drop_path_rate": 0,
"end_sym": "</s>",
"freeze_vit": true,
"image_size": 448,
"llama_model": "/root/autodl-tmp/Emotion-LLaMA/checkpoints/Llama-2-7b-chat-hf",
"lora_alpha": 16,
"lora_r": 64,
"max_txt_len": 1024,
"model_type": "pretrain",
"prompt": "",
"use_grad_checkpoint": true,
"vit_precision": "fp16"
}
BaseDatasetBuilder data type: images
2026-01-10 18:27:15,685 [INFO] Building datasets...
Traceback (most recent call last):
Traceback (most recent call last):
File "/root/autodl-tmp/Emotion-LLaMA/train.py", line 97, in <module>
ann_path: File "/root/autodl-tmp/Emotion-LLaMA/train.py", line 97, in <module>
/root/autodl-tmp/Emotion-LLaMA/dataset
main()
File "/root/autodl-tmp/Emotion-LLaMA/train.py", line 87, in main
Traceback (most recent call last):
datasets = task.build_datasets(cfg) File "/root/autodl-tmp/Emotion-LLaMA/train.py", line 97, in <module>
File "/root/autodl-tmp/Emotion-LLaMA/minigpt4/tasks/base_task.py", line 59, in build_datasets
dataset = builder.build_datasets()
File "/root/autodl-tmp/Emotion-LLaMA/minigpt4/datasets/builders/base_dataset_builder.py", line 58, in build_datasets
datasets = self.build() # dataset['train'/'val'/'test']
File "/root/autodl-tmp/Emotion-LLaMA/minigpt4/datasets/builders/image_text_pair_builder.py", line 36, in build
datasets[split] = dataset_cls(
File "/root/autodl-tmp/Emotion-LLaMA/minigpt4/datasets/datasets/first_face.py", line 74, in __init__
self.tmp = [x.strip().split(' ') for x in open(ann_path)]
IsADirectoryError: [Errno 21] Is a directory: '/root/autodl-tmp/Emotion-LLaMA/dataset'
main()
File "/root/autodl-tmp/Emotion-LLaMA/train.py", line 87, in main
main()
File "/root/autodl-tmp/Emotion-LLaMA/train.py", line 87, in main
datasets = task.build_datasets(cfg)
File "/root/autodl-tmp/Emotion-LLaMA/minigpt4/tasks/base_task.py", line 59, in build_datasets
datasets = task.build_datasets(cfg)
File "/root/autodl-tmp/Emotion-LLaMA/minigpt4/tasks/base_task.py", line 59, in build_datasets
dataset = builder.build_datasets()
File "/root/autodl-tmp/Emotion-LLaMA/minigpt4/datasets/builders/base_dataset_builder.py", line 58, in build_datasets
dataset = builder.build_datasets()
datasets = self.build() # dataset['train'/'val'/'test'] File "/root/autodl-tmp/Emotion-LLaMA/minigpt4/datasets/builders/base_dataset_builder.py", line 58, in build_datasets
File "/root/autodl-tmp/Emotion-LLaMA/minigpt4/datasets/builders/image_text_pair_builder.py", line 36, in build
datasets = self.build() # dataset['train'/'val'/'test']datasets[split] = dataset_cls(
File "/root/autodl-tmp/Emotion-LLaMA/minigpt4/datasets/builders/image_text_pair_builder.py", line 36, in build
File "/root/autodl-tmp/Emotion-LLaMA/minigpt4/datasets/datasets/first_face.py", line 74, in __init__
datasets[split] = dataset_cls(self.tmp = [x.strip().split(' ') for x in open(ann_path)]
File "/root/autodl-tmp/Emotion-LLaMA/minigpt4/datasets/datasets/first_face.py", line 74, in __init__
IsADirectoryError: [Errno 21] Is a directory: '/root/autodl-tmp/Emotion-LLaMA/dataset'
self.tmp = [x.strip().split(' ') for x in open(ann_path)]
IsADirectoryError: [Errno 21] Is a directory: '/root/autodl-tmp/Emotion-LLaMA/dataset'
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 1872) of binary: /root/miniconda3/envs/llama/bin/python
Traceback (most recent call last):
File "/root/miniconda3/envs/llama/bin/torchrun", line 7, in <module>
sys.exit(main())
File "/root/miniconda3/envs/llama/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper
return f(*args, **kwargs)
File "/root/miniconda3/envs/llama/lib/python3.9/site-packages/torch/distributed/run.py", line 794, in main
run(args)
File "/root/miniconda3/envs/llama/lib/python3.9/site-packages/torch/distributed/run.py", line 785, in run
elastic_launch(
File "/root/miniconda3/envs/llama/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 134, in __call__
return launch_agent(self._config, self._entrypoint, list(args))
File "/root/miniconda3/envs/llama/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 250, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
============================================================
train.py FAILED```
其中File "/root/autodl-tmp/Emotion-LLaMA/minigpt4/datasets/datasets/first_face.py", line 74, in __init__
self.tmp = [x.strip().split(' ') for x in open(ann_path)] 告诉我ann_path不能是文件夹,只能是文件。但是在get()方法中它是特征数据的路径啊。我不太懂为什么会这样。
3.
<img width="1504" height="670" alt="Image" src="https://github.com/user-attachments/assets/6c8b97f2-c109-444b-8ef2-977f2a0cd3ac" />
我对照了标准的数据集文件发现除了video文件之外, transcription_en_all.csv 我也没有,请问这个文件怎么获取?
非常感谢您在百忙之中阅读我的问题,若您能给予解答,我将不胜感激!
作者您好,我是一名研一新生,我在加载数据集的时候遇到些问题,想请教您。
1.我在huggingface上下载了MER2023的数据集,解压完毕后文件如下:
2.我将特征数据下载下来后,根据minigpt4/datasets/datasets/first_face.py里面get()方法将ann_path修改为特征数据的所在路径,但是报了如下错误: