Spider

Aug 14, 2017

You can use the stand-alone directed spider independently of AppDNA to capture your web applications’ runtime HTML pages so that you can import them into AppDNA.

To start the stand-alone directed spider:

  • From the Windows Start menu, choose Citrix AppDNA > Web Application Capture.

Note: If this option is not available, check that you have installed the stand-alone web capture tools. See Installing the Stand-alone Web Capture Tools for more information.

Overview

You enter the URLs of the web applications that you want to capture at the top of the screen. Below the list of URLs, there are three tabs. The first provides a log of the spider’s activity and you use the other two tabs to enter settings and options. These are documented under separate headings below.

The options on the main part of the screen are as follows:

URL. Specify the web application’s URL here and then click Add URL to add it to the list. This must be a valid URL and one to which it is possible to navigate from the computer on which you are running the stand-alone tool. You can add multiple URLs. This is useful, for example, if you want to run a series of web captures.

Remove URL. Removes a URL from the list. Select the URL to be removed before you click this button.

Go automatic. When you are using the Manual capture option, you can use this button to change to automatic mode. The spider then follows links automatically and stops only when it encounters an input form or dialog box, depending on the settings chosen.

Manual capture. Select this check box if you want to use manual mode. In this mode you manually walk through the web application, following the links that are relevant. Use manual mode for web applications that make significant use of JavaScript and related technologies (such as AJAX) to modify pages after they are loaded or if there is a complex single sign-on (SSO) scenario. You can optionally switch to automatic mode after capturing the SSO and AJAX pages, for example. Clear this check box (the default) if you want the spider to run in automatic mode, stopping only when it encounters an input form or dialog box, depending on the configuration options chosen.

Import CSV. Import a CSV file that lists the URLs that need to be captured.

Export CSV. Export a CSV file listing the URLs that have been captured.

Start capture. Click to begin capturing the list of URLs from the top.

Cancel all. Click to stop the spider.

Skip site. Click to skip the current web site.

General Settings tab

The General Settings tab provides options that control the directed spider’s behavior.

Generate MSI. Select this check box if you want to generate an MSI for import into AppDNA. Typically you do this when you are capturing a web application by using the spider only. Clear this check box if you want to combine the captured pages with source files for more comprehensive analysis. You then need to combine the output of the spider with the web application’s source files and run the Stand-Alone Web Application Source to MSI Converter over the combined files.

Capture results output directory. Set where you want the output files to be stored. This is where you can find the generated MSI files and the captured webpages.

Site traversal depth. Specify the link depth that you want the spider to follow. For example, if you specify a depth of 1, the spider starts on the site’s index page and looks to see how many links it contains and visits each of those links. If one of those links contains further links, the spider visits them if the depth is set to a depth of 2 or more. The default is 25.

Form user interaction. Select this check box if you want the spider (when running in automatic mode) to stop on every page that has a form and prompt you to fill it in. This is particularly useful when the web application has pages that require the user to login. When this option is selected and the spider detects a form on a webpage, it opens a dialog box and highlights the form input boxes in yellow. See Web Capture Processing for more information.

Browser timeout (sec). Specify the length of time in seconds that you want the spider to wait for a page to load before ignoring it and moving on to the next page (when running the spider in automatic mode). When you run the spider in manual mode, this setting is used for the first page only. The default is 15 seconds.

Delay timeout start by (sec). Specify an additional timeout period in seconds for use on older versions of Internet Explorer to cause a delay before the Browser timeout (entered above) starts. This is necessary because older versions of Internet Explorer, particularly when running on older versions of Windows can take some time to move to the next URL. The default value is 1 second.

Delay between capturing (msec). Select this check box if you want the spider to wait for a specified period between the capture of each page. This is useful if your enterprise’s firewall would otherwise block the spider from running in automatic mode. This setting is not used when you run the spider in manual mode. Enter the wait period in milliseconds.

Spider Settings tab

The Spider Settings tab provides further options that control the directed spider.

URL inclusions. By default, the AppDNA spider does not follow links to external domains. However, you can create a list of external domains to which you want the spider to follow links.

Domain. Specify the external domain here and click Add to add it to the list of allowed external domains. If the web application redirects to a different domain, enter that domain here. Similarly if an external authentication server that is in a different domain is used, enter that domain here.

Include sub-domains. Select this check box if you want the spider to follow links to sub-domains of the web application’s main domain (for example, http://staging.dev.myserver/myWebApp). Make sure you select this check box if the web application redirects to a sub-domain of the main domain. Clear this check box if you want the spider to ignore links to sub-domains.

Restrict web app to its virtual directory. Select this check box if you want the spider to ignore any links outside of the web application’s virtual directory (for example, http://myserver/myWebApp). This is useful when there are multiple web applications on the same server and each one is accessed by a different part of the URL. Clear this check box if you want the spider to follow links outside of the virtual directory.

Automatically close dialog boxes and popups. Select this check box if you want the spider to automatically close dialog boxes that it encounters when running in automatic mode. This is useful, for example, if you want to leave the import running unattended. However, note that the spider is unable to close JavaScript-initiated pop-ups. Clear this check box if you want the spider to wait for you to close dialog boxes manually.

Allow Proxy Authentication Prompt. Select this check box if your LAN is configured to use a proxy server and you have selected the Automatically close dialog boxes and popups check box. This means that the spider waits for you to fill in your login information in the authentication dialog box. Clear this check box if your LAN is not configured to use a proxy server.

Duplicates. This setting affects the spider when running in manual mode only. Select this check box if you want the spider to capture the same page more than once if the page changes. This is useful when capturing web applications that make use of JavaScript and related technologies (such as AJAX) to modify pages after they are loaded. After you select this check box, configure the option with the following:

  • Maximum number of duplicates for URL. Enter the maximum number of times you want the spider to capture a page.
  • Page content difference value to capture. Enter the percentage by which the page must change in order for it to be captured again.