Convert DOC to DOCX using PowerShell

July 6, 2012

I was tasked with taking a large number of .DOC and .RTF files and converting them to .DOCX. The files were then going to be imported into a SharePoint site. So I went out on the web looking for PowerShell scripts to accomplish this. There are plenty to choose from.

All the examples on the web were the same with some minor modifications. Most of them followed this pattern:

$word = new-object -comobject word.application
$word.Visible = $False
$saveFormat = [Enum]::Parse([Microsoft.Office.Interop.Word.WdSaveFormat],”wdFormatDocumentDefault”);

#Get the files
$folderpath = “c:\doclocation\*”
$fileType = “*doc”

Get-ChildItem -path $folderpath -include $fileType | foreach-object
{
$opendoc = $word.documents.open($_.FullName)
$savename = ($_.fullname).substring(0,($_.FullName).lastindexOf(“.”))
$opendoc.saveas([ref]”$savename”, [ref]$saveFormat);
$opendoc.close();
}

#Clean up
$word.quit()

After trying out several I started to convert some test documents. All went well until the files were uploaded to SharePoint. The .RTF files were fine but even though the .DOC fiels were now .DOCX files they did not allow for all the functionality of .DOCX to be used.

After investigating a little further it turns out that when doing a conversion from .DOC to .DOCX the files are left in compatibility mode. The files are smaller, but they don’t allow for things like coauthors.

So back to the drawing board and the web and I found a way to set compatibility mode off. The problem was that it required more steps including saving and reopening the files. In order to use this method I had to add a compatibility mode object:

$CompatMode = [Enum]::Parse([Microsoft.Office.Interop.Word.WdCompatibilityMode], “wdWord2010”)

And then change the code inside the {} from above to:

{
$opendoc = $word.documents.open($_.FullName)
$savename = ($_.fullname).substring(0,($_.FullName).lastindexOf(“.”))
$opendoc.saveas([ref]”$savename”, [ref]$saveFormat);
$opendoc.close();
$converteddoc = get-childitem $savename
$opendoc = $word.documents.open($converteddoc.FullName)$opendoc.SetCompatibilityMode($compatMode);
$opendoc.save()
$opendoc.close()
}

It worked, but I didn’t like it. So back to the web again and this time I stumbled across the real way to do it. Use the Convert method. No one else seems to have used this in any of the examples but it is a much cleaner way to do it then the compatibility mode setting. So this is how I changed my code and now all the files come in to SharePoint as true .DOCX files.

$word = new-object -comobject word.application
$word.Visible = $False
$saveFormat = [Enum]::Parse([Microsoft.Office.Interop.Word.WdSaveFormat],”wdFormatDocumentDefault”);

#Get the files
$folderpath = “c:\doclocation\*”
$fileType = “*doc”

Get-ChildItem -path $folderpath -include $fileType | foreach-object
{
$opendoc = $word.documents.open($_.FullName)
$savename = ($_.fullname).substring(0,($_.FullName).lastindexOf(“.”))
$word.Convert()
$opendoc.saveas([ref]”$savename”, [ref]$saveFormat);
$opendoc.close();
}

#Clean up
$word.quit()