Searching for large files in TFVC

Posted on April 22, 2021 | 2 minute read

Has your Azure DevOps Server collection database grown so large it’s become unmanageable? Tired of your database backups running long? Do you suspect your users have been storing all their favorite cat videos in their Team Project’s TFVC repository?

It’s all too common for developers to mistakenly store large binary files in source control thinking it’s a good place to store large binary files. Although there are scenarios this might make sense, generally speaking it isn’t a great idea, it bloats your collection databases, slowing down backups, making database replication seeding slower and eats up drive space when there are better long term storage for such data, like S3 storage.

A fine example of a large binary objects I’ve seen sprinkled throughout Azure DevOps database is the Java SDK installer, MS VC++ Redistributable Installers, and things like these. Inadvertantly developers may upload these in Azure DevOps for safe keeping. Little do they know that a dozen other developers in your organization have all uploaded the exact same installer in the database, duplicating for now reason hundreds of gigabytes of data. If left unchecked your database can grow wildly out of control.

How to find the largest files in your TFVC repos

The below script can easily be ran against your collection db to locate the largest files, the project they reside in, the changeset they were committed in and when, and who was the brilliant minds behind upload such a file in your precious Azure DevOps data. :-)

The only thing left is to actually destroy the files. Head over to our latest docs to read about the tf destroy command. Don’t forget to that your database will not automatically remove the files immediately after destroying them. There is an automatic clean up job that runs within a week that will clean up the file metadata from your database. To force this cleanup use the /startcleanup flag when you destroy the file in question.

After you’ve done a solid round of cleanup, this doesn’t necesarrily reduce the database size on disk since the database will still retain it’s overall size after the data is destroyed. You will need to run DBCC SHRINK to ensure the database reduces it’s size footprint on disk.

Enjoy!

Andrew

Share via

Tags:azure devops
devops
tfvc
cleanup

Disclaimer: The opinions expressed herein are my own personal opinions and do not represent my employer’s view in any way.

Andrew Kanieski

Searching for large files in TFVC

How to find the largest files in your TFVC repos