Warning: I was training my image model on way too small of a dataset for months

I was trying to get a model to spot cracks in concrete for a local inspection project. For three months, I only used about 200 pictures from one job site. The results were always a bit off, but I blamed the model itself. The tip-off came when a contractor in Denver asked to see my training data and just laughed, saying, 'You need at least a few thousand varied images, man, or it'll never learn the real patterns.' I've been scraping more photos since then. Has anyone else had to massively increase their dataset size to get a simple classifier to actually work right?

3 comments

3 Comments

sanchez.pat1mo ago

Hold up, I gotta disagree! More data isn't always the magic fix. If those 200 pictures were really solid and covered all the types of cracks you needed, that should be enough for a simple job. The problem might be your model setup or how you labeled things, not the number of photos. I've seen people waste time collecting thousands of images when tweaking what they already had would've worked better.

chen.adam1mo ago

Wait you used only 200 pics for months? That's wild. No wonder it was acting up.

paul28619d agoMost Upvoted

Gotta push back a bit on the idea that 200 solid photos would cut it for crack detection, even for a simple job. Concrete cracks show up differently in different lighting, angles, and types of concrete, so a model trained on one site won't generalize well. @chen.adam it's not that wild when you consider how easy it is to underestimate the variety needed, I did the same thing with a project for bridge joints once. Start with at least 500 well-labeled images from different surfaces and conditions, then see if your model actually improves before grabbing thousands.