Authentication

Top  Previous  Next

This tab allows you to enter login and password details to spider a secure site that requires authentication.

There are two common implementations of authentication on websites:

(1)HTTP authentication
(2)Cookie/session based authentication

You will need to identify the type of authentication you require.

HTTP authentication

HTTP authentication usually appears as a special login window (when you access the page in your browser) and is a standardised method of authenticating over HTTP, implemented by the web server.

httpauth
Example 1. A typical website with HTTP authentication accessed via Internet Explorer

 
If your website uses HTTP authentication, you can simply enter your login information into Zoom (under the "Authentication" tab of the Configuration window) and the spider will automatically login when required and index the protected parts of your website.

lightbulb

Note: Authentication information is saved in the ZCFG file when you select Save configuration from the file menu. The password is obfuscated, but not heavily encrypted. For sites with security sensitive information, we recommend creating a special user account for indexing on the web server where possible, so you can disable this user account after indexing.

Cookie or session based authentication

Cookie-based authentication however, usually appears as a form on a page, and is implemented by server-side scripts (such as PHP or ASP or Cold Fusion).

cookieauth
Example 2. A typical website with cookie-based (or session-based) authentication

 
Zoom now provide a way to automatically login such pages. To do so you will need to provide the following information and settings:

Use cookies from Windows and IE: This option enables cookie support in Zoom. You will need to check this option to access cookie-based authentication websites. Note that Zoom uses Windows' internal cookie cache (as part of WinInet) which means that it shares cookies with Internet Explorer.
Automatic login on following page (URL): Here, you should specify the URL to the page containing the login form. Using the example above (Example 2 screenshot), this would be "http://www.mysite.com/secure/login.php". On this page, the HTML for the form may look like the following:
 
<form action="?op=login" method="POST">
Login: <input name="username" size="15"><br>
Password: <input type="password" name="pass" size="8"><br>
<input type="hidden" name="secret" value="handshake">
<input type="submit" value="Login">
</form>
 
It is important to look at the HTML for the login form because you will need the name for the login variable and the password variable in the next steps.
Login variable name: This is the name of the login input text box. That is, it is the part after "name=" for the input tag where you will enter your login. In the above HTML example, this would be "username".
Your login: This is the actual login you would be typing into the text box normally. In the above example, this would be "bob".
Password variable name: This is the name of the password input text box. It would be the part after "name=" for the input tag where you enter your password. In the above HTML example, this would be "pass".
Your password: This is the actual password you would be typing into the text box.
Additional parameters (if required): Some web sites require more than just an user name and password to be submitted to login. They may want the name of the button you've clicked on because the form has multiple "submit" buttons (e.g. "Login", "Sign up", "Recover lost password"). In such cases, you would need to specify the additional parameters required in this field. The parameter should be specified in the format of HTTP GET parameters, that is: parameterName=parameterValue with the ampersand character (&) joining multiple values. For example,
submitButton=Login&hiddenParam=1

Note that the automatic login process will submit these values to the action= URL specified for the form. It will also pass along any hidden variables within that form as they are often also required by the login process.

lightbulb

Remember: If you are using one of the above methods to allow the spider to login to your cookie or session-based authenticated site, you need to make sure that the spider does not follow a link to the "logout" page, subsequently logging itself out of your website. You can prevent this by simply specifying the logout page in the "Skip pages and folder list" (in the Configuration window, under the "Skip options" tab), eg. "logout.asp" or "&logout=1", etc.

When automatic login will not work

Automatic login may not work on some sites or forums with anti-spider/anti-bot mechanisms that prevent exactly this type of automatic logins (they are usually put in place to avoid spam bots). In such cases, you will need to manually login with Internet Explorer. While making sure you have "Use cookies from Windows and IE" enabled in Zoom. With this feature enabled, you can login to your website using IE, and then (without logging out first), start indexing in spider mode. The spider would be authenticated with the required cookie due to the shared cookie cache.