在Python中,划分数据集通常有以下几种方法:
1. 使用`train_test_split`函数(来自`sklearn.model_selection`模块):
from sklearn.model_selection import train_test_splitX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
这里`X`是特征矩阵,`y`是目标向量,`test_size`参数指定测试集所占的比例,`random_state`参数用于设置随机种子。
2. 手动划分数据集:
import osimport shutildef move_files(train_img_dir, train_mask_dir, test_size=0.2):img_path_dir = os.listdir(train_img_dir)filenumber = len(img_path_dir)split_index = int(filenumber * test_size)for i in range(split_index):shutil.move(os.path.join(train_img_dir, img_path_dir[i]), os.path.join(train_mask_dir, img_path_dir[i]))
3. 使用`cross_validation`模块中的`train_test_split`函数:

from sklearn.cross_validation import train_test_splitX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)
4. 划分数据集为训练集、验证集和测试集:
import osimport shutildef split_data(file_path, new_file_path, split_rate):data_class = [cla for cla in os.listdir(file_path)]train_path = os.path.join(new_file_path, 'train')val_path = os.path.join(new_file_path, 'val')test_path = os.path.join(new_file_path, 'test')for cla in data_class:mkfile(os.path.join(train_path, cla))mkfile(os.path.join(val_path, cla))mkfile(os.path.join(test_path, cla))
5. 使用`random`模块进行手动划分:
import randomdef create_image_lists(testing_percentage, validation_percentage):result = {}获取所有图片列表all_files = glob.glob('path_to_image_folder/*')随机划分图片列表random.shuffle(all_files)split_index = int(len(all_files) * (1 - testing_percentage - validation_percentage))train_files = all_files[:split_index]val_files = all_files[split_index:split_index + int(len(all_files) * validation_percentage)]test_files = all_files[split_index + int(len(all_files) * validation_percentage):]将划分结果存储到字典中for file in train_files:根据文件名确定类别class_name = os.path.basename(os.path.dirname(file))if class_name not in result:result[class_name] = {'train': [], 'val': [], 'test': []}result[class_name]['train'].append(file)for file in val_files:class_name = os.path.basename(os.path.dirname(file))result[class_name]['val'].append(file)for file in test_files:class_name = os.path.basename(os.path.dirname(file))result[class_name]['test'].append(file)return result
请根据您的具体需求选择合适的方法进行数据集划分。
