top of page

SHAREPOINT - MANAGING THE PRESERVATION HOLD LIBRARY (2)

  • Writer: Jonathan Stuckey
    Jonathan Stuckey
  • Jan 14
  • 7 min read

Audience: SharePoint Solution Designer, IT Operations

Author: Jonathan Stuckey


A Practical Guide for IT Professionals (cont.d)


This is the second of a 2-part article on administration of the Preservation Hold Library (PHL). The first covered reporting on the Preservation Hold Library on a SharePoint site (or sites), in this article I use file reporting target items and trim version history to recovery the site storage capacity.


A monkey blindfolded with an orange cloth holds a hammer, sitting at a desk with computer screens showing code and text. Humorous setting.
I'm sure this will do the job...

The management of items in SharePoint Admin is pretty crude, with only basic reporting, and completely missing any finesse when you need to apply wide-ranging change or updates. You need to generate the reporting to be able to filter and target items that need treatment, and to apply treatment we need PowerShell to ensure were targeting items in the right-way.


NOTE: Purview Priority Cleanup policy is too broad without appropriate use of Regex in the rules for targeting and not covered here.


Contents:


Remove document versions of items in a Library using PowerShell

CAUTION: The following will result in permanent loss of data from documents processed using the following powershell option(s).

This sounds scary, but actually its what we need to do to recover the capacity being bound in Retention processes on a SharePoint site.


Because a Retention policy (or label) in effect 'blocks' most standard file-delete commands you have to plan for one of two strategies:


  1. Releasing site from Retention Policy (using Exclude)

  2. Updating the Retention Label to allow for change


Remember that this action needs to be applied, and kept in place, for duration of the update and clean-up.


Pre-requisites


To employ PowerShell the preference is to baseline using PowerShell 7.x command shell and import the following modules


  • PnP.Powershell (min. version 3.1.0)


If you want support for PowerShell 5.x uses combination of modules, and you are better trying the underlying commands:


  • Microsoft.Online.SharePoint.PowerShell (min. version 16.0.26413.12010)

  • Microsoft Graph


Access and permissions


You will need administrative privileges on the site (or sites) to operate the script and generate the reports.


  • Minimum for running scripts is delegated role: SharePoint Administrator


Note: there are specific delegated roles which can be applied using a custom-role group, if your organisations risk threshold does not allow use of Data Compliance Admin. Ping me to find out how...


Module registration


To run the powershell script commands you will need the Application (client) ID from the registration of PnP.Powershell.


Note: Application IDs look something like a0123456-789a-bc01-de23-4567f8901a23


Getting Started: Environment

Firstly we'll need to establish useable working environment for running commands:


  1. Launch PowerShell v7 command prompt.

  2. Import and load the PnP.Powershell module.

  3. Ensure you have the appropriate login credentials for authentication.

  4. Set your PnP module app registration Application (client) ID variable for running.


Replace my examples with details relevant to your tenancy and site....

#Set powershell variable
$AppId = "0a123456-78ab-9012-ab34-567a8901b234"
$siteUrl = "https://organisation.sharepoint.com/sites/<site-name>"
$libraryName = "PreservationHoldLibrary"

Setting up logging


Always ensure have multiple levels of reporting and logging enabled - both internally in scripts, and for screen-capture.


Crude, but effective screen-logging of tests etc can be as simple as exporting all screen reporting to log-file in base directory before run commands or script e.g.

start-transcript -Path ".\JobRef_xxxxx_-log-[date]-[time].txt" 

where you update the file elements with relevant info i.e. job number, time-stamp etc

Just remember to close it off again when you finished processing, with:

stop-transcript

Connect to your environment

Remember depending on scope of your scripting you'll need to connect for each of the modules you use. In this example:

Connect-PnpOnline -ClientId $AppId -Url $siteUrl -Interactive

Current state reporting

Get current information from target library to understand scope of required clean-up, and have it automatically loaded into site location e.g. Shared Documents

New-PnPLibraryFileVersionExpirationReportJob -Identity $libraryName -ReportUrl "$siteUrl/Shared Documents/FileVersionReport.csv"
NOTE: Report file population can between an hour to couple of days, depending on the size of the library being scanned

Changing the file retention labels

Assuming that you will know enough about using Purview Retention labels to have created administrative or automation focused labels which can be used as a substitute on the document, in this example I replace the existing label class with another that allows for modifications to content. The alternative is remove the label from the items in the PHL and restore them afterwards.


You can use Get-PnPRetentionLabel to list available labels, for related properties, but if you have the name from Purview > Retention Labels > FilePlan UI I would use that in the command variable.


Set the Retention label name variables:

#set the new label as variable - where new label (display) name is for label that does not trigger an action on expiry - as per examples below:

$oldLabelName = "HR - Personnelfiles"
$newLabelName = "Admin - automation processing"

Capture the list of items in the library into array...

#Get the list of items (documents) in the library, when match original label

Get-PnPListItem -List $libraryName -PageSize 2000

and dump the $ItemIds of listed documents into a loop to have their label updated:

#Update the label on the item

Set-PnPRetentionLabel -List $libraryName -ItemIds $ItemIds -Label $newLabelName

Once the trimming process is completed you can use the 'Set-PnPRetentionLabel' command to re-apply $oldLabelName to the files.


Deleting the file versions


Running the task (Blunt instrument)

In the event you need to cull Everything, Everywhere, All At Once in the site you have the massive blunt-stick that is the batchjob - you create the clean-up jobs as follows:

# limit items versions in specific library to latest 5 major versions

New-PnPLibraryFileVersionBatchDeleteJob -Identity $libraryName -MajorVersionLimit 30 -MajorWithMinorVersionsLimit 10

or thin-out all libraries on the site in same way

# apply job across all libraries in a specific site

New-PnPSiteFileVersionBatchDeleteJob -MajorVersionLimit 30 -MajorWithMinorVersionsLimit 10

The 'blunderbus' approach with this command is creating a job, which just looks at removing all versions on documents where versions are over 30-days old.:


# biff all versions of files that are older than 30-days

New-PnPSiteFileVersionBatchDeleteJob -DeleteBeforeDays 30
NOTE: this options sets the batch-job to continually be processing items on your site as they reach the date deadline

Running the task (Scalpel)


In the even you want to sort and only remove versions on files that meet specific criteria e.g. files with more than 50 versions, or files of a specific type/format, or over set size etc you are taking the FileVersionReport.csv data and feeding the (filtered) set of file names through scripted processing using:

# Surgically remove file versions (even specific version no.)

Remove-PnPFileVersion -Url "$libraryName/$file" -Force

This can be done taking the csv, parsing for file -names where FileVersion is greater than 5, and then piping them into a loop to remove versions until reached the desired size. So the logic looks like: capture → sort → filter → pipe → loop the deletion


Flowchart showing the "File Version Deletion Process" with steps like "Capture," "Iterative Removal," and "Refresh Version List" in various colors.
Getting from listing files to deleting versions
  1. Capture

    • read library item version report generated with New-PnPLibraryFileVersionExpirationReportJob which creates a CSV at -ReportUrl

    • Use Get-PnPFile to pull that report locally, then Import-Csv to script

  2. Normalize & parse

    • Convert absolute URLs to server-relative paths because Remove-PnPFileVersion -Url expects server-relative/site-relative.

    • ..and cast the version count to an integer.

  3. Sort & filter

    • Push list of item URLs through a Where-Object { VersionCount -gt 50 } - only checking on high-version files,

    • and Sort-Object VersionCount -Descending - to tackle the worst offenders first

  4. Pipe through iteration

    • establish internal function to Trim-FileVersionsToTarget, which use an

    • $targets | ForEach-Object { Trim-FileVersionsToTarget ... }

  5. Iterative removal until target reached (5)

    • In the internal script function (Trim-FileVersionsToTarget()), use a while ($versions.Count -gt 5) loop:

      • Choose the oldest version (excluding current)

      • Delete it using Remove-PnPFileVersion -Identity <$fileId>

      • Refresh version list and repeat the action, until count == 5


Now I could put entire script here, but to be honest ask Claude.ai, or GitHub Copilot to generate you something using the above command references and scope and you'll get a (mostly) workable output.


Monitoring and clean-up

Finally when run commands on large libraries (PHL or otherwise), these queued jobs can take an age. You can check progress with:

Get-PnPSiteFileVersionExpirationReportJobStatus -ReportUrl "$siteUrl/Shared Documents/FileVersionReport.csv"

...which will provide you some visibility of the job's progress.


Alternative options (SharePoint Module)

If you're using standard SharePoint module commands, the equivalent set of actions requires refactoring commands because the they area structured slightly differently.


It is worth noting that the PnP Batch commands identified in this article call into the underlying SharePoint Batch command for processing.

INFO the SharePoint Online equivalent commandlet offers a far simpler option to create, test and run format targeted trimming.

See SP command references in the Resources.


The Essentials

The key take-a-ways:


Preparation is everything. Ensure you have a full-picture of what you have, what needs to be updated, and how you will do it fully laid-out before proceeding.


Recovery. In the event of a failure during execution you have to be able to either:

  1. restore from backup of the site or content, or

  2. accurately identify all successful actions and point of where need to reset or recommence.


Current state. When moving a site covered under 'Retention Policy' to Exclusion it is critical to created detailed file-reporting from Recycle Bin, and PreservationHoldLibrary, on site (or sites) - prior to updating the policy.


Testing. Running extensive review of actions on dummy site which is setup in same way, should be an obligatory step before setup of the job.


Change Window needs to be long - and over weekends. You making changes to Purview Policy (Exclusions) and changes to Labels and these are updated in reference by batch-jobs which can take upwards of 48-hours, and sometimes require the weekend index run to complete.


Verification is tedious, but critical for validating successful outcomes and confirming reporting necessary for compliance or regulatory obligations. Reporting on before and after execution requires detailed understanding of legal obligations.


Resources

Use GitHub PnP reference pages for PnP PowerShell Cmdlets definitions


Claude.ai is excellent for auto-generating comprehensive scripts for key outcomes (especially if you need lots of reporting and logging).


Powershell command references covered include:



Disclaimer

All content was created by the author, based on released information from Microsoft and Community after step-by-step testing and verification before committing to this article.


Generative AI has been used in creating the article image, and Napkin.ai to create script process illustration. No other Generative AI was used for content creation.


Any errors or issues with the content in this article are entirely the author's responsibility.


About the author: Jonathan Stuckey

Comments


Commenting on this post isn't available anymore. Contact the site owner for more info.

©2024 by What's This...?

  • LinkedIn
  • YouTube
  • X
bottom of page