I recently came across an interesting side effect with the Azure DevOps cache task if its settings are not correctly configured. One that caused me to get somewhat confused before I realised what had occurred.
The Problem
I had a working pipeline that as part of its build process ran the OWASP Dependency Checker task. This can be slow to run as it has to download the current vulnerability database. To try to speed my builds I have been using the cache task to cache the current pipeline run’s downloaded vulnerability database, so on the next run the vast majority of the database is already downloaded.
The pipeline YAML is as follows
steps:
- task: PowerShell@2
    displayName: Find the NVD DB path to cache
    inputs:
    targetType: inline
    script: |
        $nvdcachepath = $(get-childitem "$(Agent.WorkFolder)\_tasks\dependency-check-build-task*\*.*.*\dependency-check\data").FullName
        echo "##vso[task.setvariable variable=nvdcachepath;]$nvdcachepath"
- task: Cache@2
    displayName: Cache NVD data
    inputs:
    key: '"NVDCache" | "$(Agent.OS)"'
    path: $(nvdcachepath)
    restoreKeys: |
        NVDCache | "$(Agent.OS)"
        NVDCache
# other build tasks
- task: JavaToolInstaller@0 
    displayName: Install Java needed for the Dependency Check
    inputs:
    versionSpec: '11'
    jdkArchitectureOption: 'x64'
    jdkSourceOption: 'PreInstalled'
- task: dependency-check.dependencycheck.dependency-check-build-task.dependency-check-build-task@6
    displayName: Dependency Check
    inputs:
    projectName: Identity Server
    scanPath: CCC.Web.IdentityAndSSO.IdentityServer
    format: HTML,XML
    additionalArguments: --nvdApiKey $(nvdapikey)
This has all been working well, but the YAML has a potential flaw, can you spot it?
The issue appeared when, due to the server that provides the OWASP vulnerability database being unavailable the dependency check task was temporarily disabled.
This meant that the dependency check task was not downloaded, so the Powershell script to find the vulnerability database folder returned an empty string as it could find no matching folder.
This is important as it is this string that controls what is cached. With no folder passed into the cache task, the whole of the pipeline working folder is cached. In effect the restoring of the cache is checking out the source as it was on the last successful build on the branch. So you are not building the commit you think you are, but an older commit, the source code that was cached.
The Solution
There are a number of options to avoid this problem
- Don’t use the Cache task
- Don’t disable any task that produces folders your Cache task is meant caches
- If you disable a task who’s data you cache also disable the Cache task
- Put some logic around the cache task e.g
- task: Cache@2 condition: and(succeeded(), ne(variables['nvdcachepath'], ''))
All are valid for different scenarios, the choice is down to your use case.
The overall point is make sure you are caching what you think you are caching, if not constrained the Cache task will cache everything it can.