URL Crawling & Caching
Twitter’s crawler will respect
robots.txt when scanning URLs. If a page with card markup is blocked, no card will be shown. If an image URL is blocked, no thumbnail or photo will be shown.
Twitter uses the User-Agent of Twitterbot (with version, such as Twitterbot/1.0), which can be used to create an exception in your
For example, here is a
robots.txt which disallows crawling for all robots except Twitter’s fetcher:
User-agent: Twitterbot Disallow: User-agent: * Disallow: /
Here is another example, which specifies which directories are allowed to be crawled by Twitterbot (in this case, disallowing all except the images and archives directories):
User-agent: Twitterbot Disallow: * Allow: /images Allow: /archives
robots.txt file must be saved as plain text with ASCII character encoding. To verify this, you can run the following command:
$ file -I robots.txt robots.txt: text/plain; charset=us-ascii
Your content is cached by Twitter for 7 days after a link to your page with card markup has been published in a tweet.
The example below uses a mix of Twitter and Open Graph tags to define a summary card:
<meta name="twitter:card" content="summary_large_image" /> <meta name="twitter:site" content="@nytimesbits" /> <meta name="twitter:creator" content="@nickbilton" /> <meta property="og:url" content="website url" /> <meta property="og:title" content="A Twitter for My Sister" /> <meta property="og:description" content="In the early days, Twitter grew so quickly that it was almost impossible to add new features because engineers spent their time trying to keep the rocket ship from stalling." /> <meta property="og:image" content=" image location url" />