Search on Pages Sites

It’s easy to add search functionality to a site.

We recommend using Search.gov, a free site search and search analytics service for federal web sites. You will need to register for Search.gov and follow their instructions to integrate this service with your Pages site. For full details, visit Search.gov.

If you’d prefer another solution, you can configure a tool like lunrjs that creates a search function run using the client browser. An example of this is at the 18F blog. This avoids any dependency on another service, but the search results are not as robust.

Crawl/Index Pages sites

Pages automatically handles search engine visibility for preview URLs via the Pages proxy. For traffic served through a preview site, the Pages proxy automatically serves the appropriate HTTP robots header, robots:none . Preview URLs are not crawlable or indexable by design. Only webpages on the production domain are served with the robots: all directive, indicating to crawlers and bots such as search.gov to index the site and enable search capabilities.

If you want to disable crawling and indexing for specific pages of your production site, you can include the noindex/nofollow meta tag in the head of those pages, or include those folders in your robots.txt, if your site generates one.

Manage Search Visibility
Method to manage robot behavior	How to prevent indexing/crawling	How to allow indexing/crawling
robots.txt in your Pages site Discourages robots from crawling the page or pages listed. Webpages that aren’t crawled generally can’t be indexed.	`User-agent: * disallow: /directory`	N/A, crawling is allowed by default
X-Robots-Tag HTTP header (served by Pages via the Pages proxy) Encourages or discourages robots to read and index the content on this page or use it to find more links to crawl	`robots: none` (this is automatically served to visitors of all Pages preview builds)	`robots: all` (this is automatically served to visitors of all Pages preview builds)
in your Pages site webpage HTML Encourages or discourages robots to read and index the content on this page or use it to find more links to crawl	`content="noindex, nofollow”`	N/A, indexing is allowed by default

Conditionally set robots - Eleventy (11ty)

Take advantage of Pages-provided environment variables to enable environment-specific functionality. Hardcode the condition and meta tags to check the branch from the process.env environment variable. This differs from how it is dealt with on a Jekyll site, you are able to add specificity with process.env.BRANCH .
You can use this code sample


<meta name="robots" content="noindex, nofollow" />

See additional documentation on build environment variables.