Importing to RebelMouse: Technical Guidelines

In order to ingest a client's content into RebelMouse, we must be able to fully parse the site's content. RebelMouse accepts several input formats for content ingestion.


Requirements for Your Feed

To import your website's content into RebelMouse, you must provide an export file/feed of all the entries and authors of your website with the following fields:

Required Authors Fields to Import

  • To import authors, you should provide us with a "main" list of authors, separated from the list of entries.
  • Each author should contain:
    • Name: full name of the author
    • Email: full email address
    • ID: unique identifier for each author

Required Entry Fields to Import

  • Pubdate: publication date for the article following the date and time specifications of RFC 822
  • Content: full content of your article with HTML semantic
  • Headline: string as a title of the article
  • Images: list of URLs of featured images + description that represents your article
  • URL: full public URL of your article
  • Authors: list of author IDs for the given entry, as specified in the "main" authors list
  • Status: "published" or "draft"

Optional Entry Fields to Import

  • basename: string with the desired basename to follow in the new RebelMouse URL (no "/" — just alphanumeric characters and "-")
  • social_image: image to be used on social networks
  • social_headline: headline to be used on social networks
  • social_description: description text to be used on social networks
  • listicle: If your article contains more than text — such as slides or pagination — using the listicle option might make more sense. A listicle is a list of items, where each item contains the following fields:
    • headline: headline for this particular item/slide
    • body: content of your item/slide as HTML
    • media_html: any representative image or embed code as an HTML embed code
    • credit: credits for this particular item/slide
    • caption: small description of this item/slide's media
    • numeration (optional): stringified version of the slide number
      Each article can contain just one listicle. Optionally, you can set the following attributes that control the way the listicle will be rendered:
    • use_pagination: Boolean 0 or 1; default 0. Turns each listicle's items into a page of the given post.
    • use_numeration: Boolean 0 or 1; default 0. Turns the listicle into a numerated list.
    • numeration_sort: ASC or DESC; default ASC.
    • body_text_above: Boolean 0 or 1; default 0. Tells whether each item's body text should be placed above or below the item's media_html.
  • tags: list of strings where each string is a tag to be applied to your article
  • media_url: URL of a video/embed which you want to highlight as representative media for your entry. The requirement for this URL is to be usable as an "src" attribute of an <iframe> tag.
  • subheadline: a string for a second-level headline for your article
  • sections: A list of strings, where each string is the name of the RebelMouse section you want the current entry to be a part of. If no sections are set, the entry will go directly to your site's front page. If you want the entry to go to the front page and other sections, then you should name "front page" explicitly and include the names of the other sections. Section names must be lowercase: just a–z letters, and spaces should be replaced by underscores (_). (E.g., "front_page," "business," "healthy_living")

To ingest your content into RebelMouse, you must use one of the following formats: JSON, XML, or RSS 2.0.

JSON Format

To provide RebelMouse with a JSON output of your content, provide a list of authors and entries as specified in the instructions above, where the items of each list are simple dictionaries defining the given author or entry.

Here's an example of two authors + two entries with just required fields:

{
  "authors": [
   {"name": "John Smith", "email": 'smith@domain.com", "id": "123-456"},
   {"name": "John Doe", "email": "doe@domain.com", "id": 789-abc"}
  ],
  "entries": [
   {
     "headline": "This Is a Headline",
     "content": "<p>This is an <i>entry</i> with content full of <em>HTML</em> semantics.</p>",
     "pub_date": "Thu, 30 Jul 2015 15:00:45 +0000",
     "url": "<a href="http://www.yourdomain.com/path/to/your/article" target="_blank">http://www.yourdomain.com/path/to/your/article</a>",
     "images": [{"url": "<img src="<img src=" http:="" yourcdn.com="" path="" to="" your="" image.jpg"="">">", "description": 'some text"}, ...],
     "authors": ["123-456"],
     "status": "published"
   },
   {
     "headline": "This Is Another Headline",
     "content": "<p>I can contain: </p><ul> <li>any</li> <li>semantic</li> </ul>",
     "pub_date": "Thu, 30 Jul 2015 16:35:00 +0000",
     "url": "<a href="http://www.yourdomain.com/path/to/another/article" target="_blank">http://www.yourdomain.com/path/to/another/article</a>",
     "images": [{"url": "<img src="<img src=" http:="" yourcdn.com="" path="" to="" another="" image.jpg"="">">", "description": 'some text"}, ...],
     "authors": ["123-456", "789-abc"],
     "status": "published"
   }
  ]
}

Here's an example of a single entry with required and optional fields, among them a listicle:

{
     "headline": "This Is a Headline",
     "content": "<p>This is a <i>listicle</i> full of content with <em>HTML</em> semantics.</p>",
     "pub_date": "Thu, 30 Jul 2015 15:00:45 +0000",
     "url": "<a href="http://www.yourdomain.com/path/to/your-article" target="_blank">http://www.yourdomain.com/path/to/your-article</a>",
     "images": [{"urll": "<img src="<img src=" http:="" yourcdn.com="" path="" to="" your="" image.jpg"="">">", "description": 'some text"}, ...],
     "author": "Homer Simpson",
     "basename": "your-article",
     'social_headline": "You won't believe this is a headline.",
     'social_description": "OMG!",
     "authors": ["123-456"],
     "media_url": "<a href="https://www.youtube.com/watch?v=-WXAAAdGJ7o" target="_blank">https://www.youtube.com/watch?v=-WXAAAdGJ7o</a>" 
     "tags": ['simpsons", "tv show", "homer"],
     'sections": ["frontpage", "the_simpsons"]
     "listicle": [
         {"headline": "item 1 headline", "body": "<p>item 1 content</p>", "media_html": "<img src="<img src=" http:="" path.to="" image.jpg"="">" />"},
         {"headline": "item 2 headline", "body": "<p>item 2 content</p>", "media_html": "<iframe src="http://path.to/video"></iframe>"}
     ],
     "listicle_settings": {
         "use_numeration": 1,
         "numeration_order": "DESC"
     },
     "status": "draft"
}

XML Format

To provide RebelMouse with an XML output of your content, simply provide a list of entry and author entities, like the examples below:

Here's an example of two authors + two entries with just required fields:

<?xml version="1.0"?>
  <authors>
     <author>
        <name><![CDATA[John Smith]]></name>
        <email>smith@domain.com</email>
        <id>123-456</id>
     </author>
     <author>
        <name><![CDATA[John Doe]]></name>
        <email>doe@domain.com</email>
        <id>789-abc</id>
     </author>
  </authors>
  <entries>
    <entry>
      <headline><![CDATA[This Is a Headline]]></headline>
      <content>
          <![CDATA[<p>This is an <i>entry</i> full of content with <em>HTML</em> semantics.</p>]]>
      </content>
      <pub_date>Thu, 30 Jul 2015 15:00:45 +0000</pub_date>
      <url>http://www.yourdomain.com/path/to/your/article</url>
      <authors>
         <author>123-456</author>
      </author>
      <images>
        <image>
            <url>http://yourcdn.com/path/to/another/image.jpg</url>
            <description>Some text</description>
        </image>
      </images>
      <status>published</status>
    </entry>
    <entry>
      <headline><![CDATA[This Is Another Headline]]></headline>
      <content>
          <![CDATA[<p>I can contain <ul> <li>any</li> <li>semantic</li> </ul></p>]]>
      </content>
      <pub_date>Thu, 30 Jul 2015 16:35:00 +0000</pub_date>
      <url>http://www.yourdomain.com/path/to/another/article</url>
      <authors>
         <author>123-456</author>
         <author>789-abc</author>
      </author>
      <images>
        <image>
            <url>http://yourcdn.com/path/to/another/image.jpg</url>
            <description>Some text</description>
        </image>
      </images>
      <status>published</status>
    </entry>
  </entries>

Here's an example of a single entry with required and optional fields:

<entry>
      <headline><![CDATA[This Is a Headline]]></headline>
      <content>
          <![CDATA[<p>This is an <i>entry</i> full of content with <em>HTML</em> semantics.</p>]]>
      </content>
      <pub_date>Thu, 30 Jul 2015 15:00:45 +0000</pub_date>
      <url>http://www.yourdomain.com/path/to/your-article</url>
      <basename>your-article</basename>
      <authors>
         <author>123-456</author>
      </authors>
      <images>
        <image>
           <url>http://yourcdn.com/path/to/another/image.jpg</url>
            <description>Some text</description>
        </image>
      </images>
      <media_url>https://www.youtube.com/watch?v=-WXAAAdGJ7o</media_url> 
      <social_network><![CDATA[You won't believe this is a headline.]]></social_network>
      <social_description><![CDATA[OMG!]]></social_description>
      <tags>
        <tag>simpsons</tag>
        <tag>tv show</tag>
        <tag>home</tag>
      </tags>
      <sections>
        <section>frontpage</section>
        <section>the_simpsons</section>
      </sections>
      <listicle use_numeration="1" numeration_order="DESC">
         <item>
            <headline>item 1 headline</headline>
            <body><![CDATA[<p>item 1 content</p>]]></body>
            <media_html><![CDATA[<img src="http://path.to/image.jpg" />]]></media_html>
         </item>
         <item>
            <headline>item 2 headline</headline>
            <body><![CDATA[<p>item 2 content</p>]]></body>
            <media_html><![CDATA[<iframe src="http://path.to/video"></iframe>]]></media_html>
         </item>
      </listicle>
      <status>draft</status>
   </entry>

RSS 2.0 Format

To provide RebelMouse with an RSS Feed of your content, simply provide a list of entry and author entities, like the examples below:

<rss xmlns:rm="http://www.rebelmouse.com/NS/" version="2.0">
   <channel>
      <link>http://homesite.com</link>
      <rm:authors>
         <rm:author>
            <rm:name>author1</rm:name>
            <rm:email>athour1@homesite.com</rm:email>
            <rm:id>1<rm:id>
         </rm:author>
         <rm:author>
            <rm:name>author2</rm:name>
            <rm:email>author2@homesite.com</rm:email>
            <rm:id>2<rm:id>
         </rm:author>
      </rm:authors>
      <item>
         <title><![CDATA[This Is a Headline]]></title>
         <description> 
            <![CDATA[<p>This is an <i>entry</i> full of content with <em>HTML</em> semantics.</p>]]>
         </description>
         <pubDate>Thu, 30 Jul 2015 15:00:45 +0000</pubDate>
         <link>http://www.yourdomain.com/path/to/your-article</link>
         <guid>post_id1</guid>
         <rm:images>
            <rm:image>
               <rm:url></rm:image_url>
               <rm:description_url></rm:description>
            </rm:image>
         </rm:images>
         <rm:authors>
            <rm:author>1</rm:author>
            <rm:author>2</rm:author>
         </rm:authors>
         <rm:status>published</rm:status>
      </item>
      <item>
          ….
          ...
      </item>
</rss>

Here's an example of a single entry with required and optional fields:

   <item>
      <title><![CDATA[This Is a Headline]]></title>:
      <description>
          <![CDATA[<p>This is an <i>entry</i> full of content with <em>HTML</em> semantics.</p>]]>
      </description>
      <pub_date>Thu, 30 Jul 2015 15:00:45 +0000</pub_date>
      <link>http://www.yourdomain.com/path/to/your-article</link>
      <rm:basename>your-article</rm:basename>
      <rm:authors>
         <rm:author>123-456</rm:author>
      </rm:authors>
      <rm:images>
        <rm:image>
           <rm:url>http://yourcdn.com/path/to/another/image.jpg</rm:url>
           <rm:description>Some text</rm:description>
        </rm:image>
      </rm:images>
      <rm:social_network><![CDATA[You won't believe this is a headline.]]></rm:social_network>
      <rm:social_description><![CDATA[OMG!]]></rm:social_description>
      <rm:tags>
        <rm:tag>simpsons</rm:tag>
        <rm:tag>tv show</rm:tag>
        <rm:tag>home</rm:tag>
      </rm:tags>
      <rm:sections>
        <rm:section>frontpage</rm:section>
        <rm:section>the_simpsons</rm:section>
      </rm:sections>
      <rm:media_url>https://www.youtube.com/watch?v=-WXAAAdGJ7o</rm:media_url> 
      <rm:listicle use_numeration="1" numeration_order="DESC">
         <rm:item>
            <rm:headline>item 1 headline</rm:headline>
            <rm:body><![CDATA[<p>item 1 content</p>]]></rm:body>
            <rm:media_html><![CDATA[<img src="http://path.to/image.jpg" />]]></rm:media_html>
            <rm:credit><![CDATA[John Nash]]></rm:credit>
            <rm:caption><![CDATA[a description for that image]]><rm:caption>
         </rm:item>
         <rm:item>
            <rm:headline>item 2 headline</rm:headline>
            <rm:body><![CDATA[<p>item 2 content</p>]]></rm:body>
            <rm:media_html><![CDATA[<iframe src="http://path.to/video"></iframe>]]></rm:media_html>
            <rm:credit><![CDATA[John Brown]]></rm:credit>
            <rm:caption><![CDATA[a description for that video]]><rm:caption>
         </rm:item>
      </rm:listicle>
      <rm:status>draft</rm:status>
   </entry>

URL Patterns and Pagination

In order for Rebelmouse to be able to ingest your content, you must expose it through a standardized pattern of URLs and pagination.

Depending on your format, you should expose your API endpoint ending with one of the corresponding file extensions:

You should order your items from newest to oldest and paginate through the page GET parameter, starting from 0. Using the JSON format as an example:

Each page should have exactly 10 items.

We recommend that you protect your API with HTTP authentication and HTTPS so RebelMouse can use your API this way:

https://user:password@example.com/path/to/your/api.json?page=N

WordPress Sites

Alternatively, if you have a WordPress site, RebelMouse's import tool is already able to ingest all your content, keeping most of the configuration you already have in your WordPress setup.

Automated WordPress Import

We have an automated WordPress ingestion feature that you can try out. There are two instances where you can import your WordPress articles:

1. During signup when you're creating your new Rebel Runner. (Scroll down to step two if this is not your use case.) You'll see this screen at the end of the signup process:

If you click on the "Import your WordPress" option, you will be prompted for:

  • Your WordPress username and password
  • Your xml-rpc endpoint URL, which is where we'll be pulling all of your content from

For more information, or if you're unsure of how to get that endpoint URL, please reference this article.

That being said, the endpoint URL you need will most likely be the root URL of your WordPress site + /xmlrpc.php. (E.g., http://YourWordPressSite.com/xmlrpc.php)

If you click on the "Advanced Options" link, you will be given the option to install a plugin to get access to a couple more custom features for your import:

With the plugin, we'll be able to:

  • Import your WordPress authors and create RebelMouse users for them.
  • Give you status details on how the import process is going.

You don't have to wait for the import to be finished to go to your RebelMouse Dashboard. You can leave it running in the background.

2. If you already have a RebelMouse site created, you can start your WordPress import by going to the Content Feeds dashboard:

It's in there that you'll find all of your feed information and content for review. Scroll down until you find the WordPress importer:

You will be prompted for:

  • Your WordPress username and password
  • Your xml-rpc endpoint URL, which is where we'll be pulling all your content from

For more information, or if you're unsure of how to get that endpoint URL, please reference this article.

That being said, the endpoint URL you need will most likely be the root URL of your WordPress site + /xmlrpc.php. (E.g., http://YourWordPressSite.com/xmlrpc.php)

And — as also explained in step one — if you click on the "Advanced Options" link, you will be given the option to install a plugin to get access to a couple more custom features for your import.

With the plugin, we'll be able to:

  • Import your WordPress authors and create RebelMouse users for them.
  • Give you status details on how the import process is going.

As content starts to flow in, you'll see your home page automatically populated with more and more of your articles.

Custom WordPress Import

To take advantage of RebelMouse's ability to import a WordPress site, simply provide your RebelMouse contact with your WXR file. This can be found in your WordPress administration tools under "Tools" ~> "Export."

This is the best option to use when you have several custom shortcodes or plugins that you want to migrate over to RebelMouse.

RebelMouse is able to respect the following WordPress configurations:

  • It keeps your private posts private by storing them as RebelMouse drafts.
  • It keeps your sticky posts sticky by storing them as RebelMouse frozen posts.
  • WordPress categories are kept as RebelMouse sections.
  • If you have featured images or videos, they are kept as featured as well.

FAQ

What are your export requirements?

We need a file in any of the three formats described in the tutorial above, including the full text of each article. The article text should include the semantic HTML you want to be used on your RebelMouse website. You do not need to remove any CSS formatting, but it will be automatically removed by our import tool.

We will keep JavaScript code that might be included in your posts, mainly to support as much of your previously embedded media as possible. However, popular embed providers might be subject to some automatic processing by our import tool in order to turn them into RebelMouse shortcodes. You can find more details below in the "How do you work with embedded media?" section below.

Your posts might contain images (using the <img> tag) and they will be kept. However, they'll also be processed by our import tool in order to turn them into RebelMouse shortcodes. You can find more details below in the "How do you work with images?" section below.

Will my HTML semantic be kept?

Yes, we will keep your HTML semantic*, but we won't keep your CSS formatting. This means that we remove all CSS classes so that they don't interfere with the new RebelMouse theme that will be used with your new site.

*In certain cases we may see that some further processing of your HTML semantic is necessary. If so, your HTML semantic will be modified during the import process to better fit the RebelMouse platform.

How do you work with images?

The RebelMouse import tool keeps all images found inside your posts, but turns them into RebelMouse shortcodes. This means your images are downloaded from your server and then uploaded to ours. At the same time, we calculate several different sizes for each image. As a result, all images are ultimately stored and hosted by RebelMouse.

RebelMouse.com must have complete access to the client's website. This means that network requests cannot be blocked by any measure for our servers, since the import process makes automatic network requests to our clients' servers to fetch resources such as images or even complete posts.

How do you work with embedded media?

Since we try to keep all of your HTML semantic, we will usually keep all embedded media you may have in your posts*, including embeds based on JavaScript.

However, for certain cases listed below, your embedded media won't be kept as is, but instead will be automatically processed by our import tool to turn each instance into a RebelMouse shortcode. This mostly occurs with iframe-based embeds.

The following is a non-exhaustive list of embedded media that is processed into a RebelMouse shortcode:

  • YouTube
  • Vimeo
  • Dailymotion
  • SoundCloud
  • Vine
  • Twitch
  • Tout
  • Ustream
  • Livestream
  • TED Talks

*As long as your embeds are iframe-based, and not JavaScript-powered, there's a high chance that we can support your embed. However, it's important to note that, although we put great effort into supporting all kinds of embedded media, there are some restrictions which are important to understand:

  • If your site runs on HTTPS, then we will only support HTTPS embeds. That's because of browser restrictions on loading insecure media over secure websites. If, on the other hand, your site runs on HTTP, we will also support HTTP media.
  • We might support flash-based embedded media. However, due to mobile browser restrictions (no flash support), that media won't work on your mobile RebelMouse site either.
  • If you host your embedded media on the same server as your current website, then you will have to ask RebelMouse for a custom solution in order to keep that media on your new RebelMouse site.

What will happen with my SEO?

When you port your site over to RebelMouse, the URLs of your existing articles will change. However, our import tool keeps track of your previous URLs so it can catch them in the future and automatically redirect readers to the new, RebelMouse-powered URLs, thus keeping all SEO benefits in tact.