How to split image dataset into train, validation and test set?

Aravinda 加阳
1 min readApr 23, 2023

Splitting image data into train, validation, and test sets is a crucial step in machine learning model development. It helps to prevent over-fitting, evaluate model performance, and ensure that the model generalizes well to new, unseen data.

It’s useful to have code that can quickly separate image data into training, validation, and testing datasets.

The folder structure would look something like this:

data/
train/
img_00.jpg
...
img_69.jpg
val/
img_70.jpg
...
img_85.jpg
test/
img_86.jpg
...
img_100.jpg
  1. Obtain the image data Retrieve all the images located in the designated folder.

2. Split the data Randomly separate the image data into three sets: 70% for training, 15% for validation, and 15% for testing.

3. Copy the images into their respective folders Create separate folders for each set, including the training, evaluation, and testing sets. Copy the images into their respective folders based on the random split.

Full code :

“ Why did the machine learning model go to therapy?
Because it couldn’t decide between the train, validation, and test sets — it just kept overthinking! :) ”

--

--