Skip to main content

Exporting comments from Orchard CMS to import them into Disqus - Part 2

In my previous post we took a look at exporting comments from the Ochard CMS database when the export functionality is out of order. It had some nifty tricks to extract XML from an SDF1 file and resulted in an XML file looking like this.

<?xml version="1.0" encoding="utf-8"?>
<ArrayOfComment xmlns:i="http://www.w3.org/2001/XMLSchema-instance">
  <Comment>
    <Author>Mikael Lundin</Author>
    <CommentDateUtc>2013-06-08 12:43:42</CommentDateUtc>
    <Email i:nil="true" />
    <Id>16</Id>
    <PostLink>http://blog.mikaellundin.name/2013/06/08/test-post.html</PostLink>
    <PostPublishUtc>2013-06-08 12:43:23</PostPublishUtc>
    <PostSlug>test-post</PostSlug>
    <PostText>&lt;![CDATA[&lt;p&gt;Test&lt;/p&gt;]]&gt;</PostText>
    <PostTitle>Test post</PostTitle>
    <SiteName i:nil="true" />
    <Text>&lt;![CDATA[Test comment]]&gt;</Text>
  </Comment>
</ArrayOfComment>

Creating this XML was much easier than trying to create the WXR format that Disqus accepts. Instead we will quite easily turn this XML format into the WXR format by using another of my favorite tools, XSLT.

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:msxsl="urn:schemas-microsoft-com:xslt" exclude-result-prefixes="msxsl"
      xmlns:content="http://purl.org/rss/1.0/modules/content/"
      xmlns:dsq="http://www.disqus.com/"
      xmlns:dc="http://purl.org/dc/elements/1.1/"
      xmlns:wp="http://wordpress.org/export/1.0/">
    <xsl:output method="xml" indent="yes"/>

<xsl:template match="/"> <rss version="2.0" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dsq="http://www.disqus.com/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:wp="http://wordpress.org/export/1.0/">

  &lt;channel&gt;
    &lt;xsl:apply-templates select=&quot;ArrayOfComment/Comment&quot; /&gt;
  &lt;/channel&gt;
&lt;/rss&gt;

</xsl:template>

<xsl:template match="Comment"> <item> <!-- title of article --> <title><xsl:value-of select="PostTitle"/></title> <!-- absolute URI to article --> <link><xsl:value-of select="PostLink"/></link> <!-- body of the page or post; use cdata; html allowed (though will be formatted to DISQUS specs) --> <content:encoded><xsl:value-of select="PostText" disable-output-escaping="yes"/></content:encoded> <!-- value used within disqusidentifier; usually internal identifier of article --> <dsq:threadidentifier><xsl:value-of select="PostSlug"/></dsq:threadidentifier> <!-- creation date of thread (article), in GMT. Must be YYYY-MM-DD HH:MM:SS 24-hour format. --> <wp:postdategmt><xsl:value-of select="PostPublishUtc"/></wp:postdategmt> <!-- open/closed values are acceptable --> <wp:commentstatus>open</wp:commentstatus> <wp:comment> <!-- internal id of comment --> <wp:commentid><xsl:value-of select="Id"/></wp:commentid> <!-- author display name --> <wp:commentauthor><xsl:value-of select="Author"/></wp:commentauthor> <!-- author email address --> <wp:commentauthoremail><xsl:value-of select="Email"/></wp:commentauthoremail> <!-- author url, optional --> <wp:commentauthorurl><xsl:value-of select="SiteName"/></wp:commentauthorurl> <!-- author ip address --> <wp:commentauthorIP></wp:commentauthorIP> <!-- comment datetime, in GMT. Must be YYYY-MM-DD HH:MM:SS 24-hour format. --> <wp:commentdategmt><xsl:value-of select="CommentDateUtc"/></wp:commentdategmt> <!-- comment body; use cdata; html allowed (though will be formatted to DISQUS specs) --> <wp:commentcontent><xsl:value-of select="Text" disable-output-escaping="yes"/></wp:commentcontent> <!-- is this comment approved? 0/1 --> <wp:commentapproved>1</wp:commentapproved> <!-- parent id (match up with wp:commentid) --> <wp:commentparent>0</wp:commentparent> </wp:comment> </item> </xsl:template> </xsl:stylesheet>

This is a very simple XSL transformation. In the root template we're outputting the headers, and in the Comment template we output the WXR format while we fetch the values directly from the databaseComments.xml elements. Since we've preformatted the dates and cdata fields, there is not much data migration going on.

// transform input xml to output xml using the transformation xsl file
// transform: string -> string -> string -> unit
let transform (inputFile : string) (transformFile : string) (outputFile : string) =
  // make sure output is indented, human readable
  let writerSettings = new System.Xml.XmlWriterSettings(Indent = true)
  // create output xml file
  let outputWriter = System.Xml.XmlWriter.Create(outputFile, writerSettings)
  // create a transformer with debug option enabled
  let transformer = System.Xml.Xsl.XslCompiledTransform(enableDebug = true)
  // settings of transformation will disallow functions and scripts
  let settings = System.Xml.Xsl.XsltSettings(enableDocumentFunction = false, enableScript = false)
  // load the xsl stylesheet
  transformer.Load(System.Xml.XmlReader.Create(transformFile), settings, null)
  // transform input file and write result to output file
  transformer.Transform(System.Xml.XmlReader.Create(inputFile), outputWriter)

A few lines of F# code will turn our input XML to the WXR by transforming it with our XSL.

transform "databaseOutput.xml" "transform.xsl" "disqusImport.xml"

Before running the import I verified that all the URLs in the disqusImport.xml was working, and then I just pushed the file to Disqus with great success.

<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dsq="http://www.disqus.com/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:wp="http://wordpress.org/export/1.0/">
  <channel>
    <item>
      <title>Test post</title>
      <link>http://blog.mikaellundin.name/2013/06/08/test-post.html</link>
      <content:encoded><![CDATA[<p>Test</p>]]></content:encoded>
      <dsq:threadidentifier>test-post</dsq:threadidentifier>
      <wp:postdategmt>2013-06-08 12:43:23</wp:postdategmt>
      <wp:commentstatus>open</wp:commentstatus>
      <wp:comment>
        <wp:commentid>16</wp:commentid>
        <wp:commentauthor>Mikael Lundin</wp:commentauthor>
        <wp:commentauthoremail></wp:commentauthoremail>
        <wp:commentauthorurl></wp:commentauthorurl>
        <wp:commentauthorIP />
        <wp:commentdategmt>2013-06-08 12:43:42</wp:commentdategmt>
        <wp:commentcontent><![CDATA[Test comment]]></wp:commentcontent>
        <wp:commentapproved>1</wp:commentapproved>
        <wp:commentparent>0</wp:commentparent>
      </wp:comment>
    </item>
  </channel>
</rss>

Importing WXR files to Disqus


Footnotes


  1. SQL Server Compact 4 

comments powered by Disqus