How to Add Web Pages as Knowledge Sources in BoldDesk AI 2.0
Adding your website to BoldDesk AI 2.0 allows the platform to use your content as a reliable knowledge source. Once indexed, this content helps generate accurate, context-aware responses in tickets and customer interactions.
This guide walks you through prerequisites, supported methods, and step-by-step instructions for adding web pages.
Prerequisites
Before adding web pages, ensure the following conditions are met:
- The website must be publicly accessible (no login or authentication required).
- Only HTTP or HTTPS URLs are supported.
- The website must allow web crawling (whitelist BoldDesk crawler if needed).
- Agents must have AI Agent Builder Full Access. Explore How to Configure AI Agent Builder Access Permission.
How to Add Web Pages
Follow these steps to add web pages to BoldDesk AI 2.0:
-
Log in to the BoldDesk Agent Portal.
-
Navigate to AI > Knowledge Sources to open the Libraries page in BoldDesk.
-
Do one of the following:
- To use an existing library, select the required library from the list, click the More options icon, and then click Edit.
- To create a new library, click Create Library, enter the Library Name and optional Description, and then click Create.
-
In the selected library, click the Website tab.
-
Click Add website.
-
Choose an indexing method:
- Site Map
- Web Crawl
- Individual Site
-
Enter the required details and save.
Website Indexing Methods
BoldDesk AI 2.0 offers multiple ways to index your website content, allowing you to control how information is discovered and processed. Choosing the right method depends on your website structure, the volume of content, and how specific you want the indexing to be. Whether you prefer structured ingestion through a sitemap, automated discovery via crawling, or targeting individual pages, each method provides flexibility to ensure your AI knowledge base remains accurate and relevant.
1. Site Map
Use this method when your website has a structured sitemap.
Key Fields:
| SITE MAP FIELDS | DESCRIPTION |
|---|---|
| URL | Enter the base URL (e.g., https://example.com) |
| Include Only These Paths (optional) | Restrict crawling to specific sections, such as /docs or /support. |
| Exclude Paths (optional) | Prevent crawling of areas like /blog or /legal. |
| Max URLs | Maximum number of URLs to crawl (1-100,000) |
| Number of Chunks | Maximum number of tokens in each split segment of text. Large chunks keep more context together but are harder to process. |
| Schedule Resync | When enabled, you can sync your website as needed. |
| Frequency Type | Select how often the web source should automatically resync: daily, monthly, or yearly. |
| Frequency interval | Specify how often the resync should repeat. Example: 2 with daily means every 2 days. |
| Time zone | Select the time zone for scheduled resync time and date values. |
| Fire time | Choose the time of day when the resync should occur. |
| Start On | Choose the local date and time when the resync schedule should start. |
| End On | Choose the local date and time when the resync schedule should end. |
2. Web Crawl
Use this method to automatically discover pages starting from a root URL.
Key Fields:
| WEB CRAWL FIELDS | DESCRIPTION |
|---|---|
| URL | Enter the base URL (e.g., https://example.com) |
| Include Only These Paths (optional) | Restrict crawling to specific sections, such as /docs or /support. |
| Exclude Paths (optional) | Prevent crawling of areas like /blog or /legal. |
| Crawl depth | How deep the crawler should go through website links (1-10) |
| Max URLs | Maximum number of URLs to crawl (1-100,000) |
| Number of Chunks | Maximum number of tokens in each split segment of text. Large chunks keep more context together but are harder to process. |
| Schedule Resync | When enabled, you can sync your website as needed. |
3. Individual Site
Use this when you want to add only specific pages.
Key Fields:
| INDIVIDUAL SITE FIELDS | DESCRIPTION |
|---|---|
| URL | Enter the base URL (e.g., https://example.com) |
| Number of Chunks | Maximum number of tokens in each split segment of text. Large chunks keep more context together but are harder to process. |
| Schedule Resync | When enabled, you can sync your website as needed. |
URL Scope Rules
Understanding URL scope helps control what content is indexed:
- A root domain includes all pages within that domain.
- A specific path includes only nested pages under it.
- Only internal links are crawled.
- External links are ignored.
- Sitemap URLs are supported when using the sitemap method.
Managing Web Pages in AI Knowledge Sources
Once web pages are added, you can manage them from a single location using Preview, Resync, and Delete options.
-
Navigate to AI Module > Knowledge Sources.
-
Open the relevant library.
-
Go to the Website tab.
-
Locate the added website or indexed URLs.
From here, you can perform the following actions:
-
Preview
Click Preview to review the indexed URLs and their content. This helps ensure the correct pages and information are captured before being used by BoldDesk AI. -
Resync
Click Resync to manually refresh the indexed data. Use this when your website content has been updated and you want the latest information reflected in AI responses. -
Delete
Click Delete to remove unwanted URLs or website entries. Once removed, BoldDesk AI 2.0 will no longer reference this content.
Frequently Asked Questions
-
Why is my website not being indexed?
Your site may require login access, block crawlers, or have restricted permissions. Ensure it is publicly accessible and crawlable. -
Which indexing method should I use?
- Use Site Map for structured websites.
- Use Web Crawl for automatic discovery.
- Use Individual Site for specific pages.
-
What does “Number of Chunks” mean?
It determines how content is split into smaller segments for processing. Larger chunks retain more context but may be harder to process. -
Can BoldDesk crawl external links?
No, only links within the same domain are processed. -
How often should I schedule resync?
It depends on how frequently your content changes. Daily sync is recommended for dynamic websites. -
Can I limit which pages are indexed?
Yes, by using Include Paths and Exclude Paths.