How to split image dataset into train, validation and test set?
Splitting image data into train, validation, and test sets is a crucial step in machine learning model development. It helps to prevent over-fitting, evaluate model performance, and ensure that the model generalizes well to new, unseen data.
It’s useful to have code that can quickly separate image data into training, validation, and testing datasets.
The folder structure would look something like this:
data/
train/
img_00.jpg
...
img_69.jpg
val/
img_70.jpg
...
img_85.jpg
test/
img_86.jpg
...
img_100.jpg
- Obtain the image data Retrieve all the images located in the designated folder.
2. Split the data Randomly separate the image data into three sets: 70% for training, 15% for validation, and 15% for testing.
3. Copy the images into their respective folders Create separate folders for each set, including the training, evaluation, and testing sets. Copy the images into their respective folders based on the random split.
“ Why did the machine learning model go to therapy?
Because it couldn’t decide between the train, validation, and test sets — it just kept overthinking! :) ”