Skip to content

WPSS Validation Tool User Guide

_Don edited this page Jan 30, 2014 · 15 revisions

Version 4.0 January 8, 2014

Contents

Introduction

Using the WPSS Validation Tool

Command Line Interface

Configuration Files

Troubleshooting


Introduction

The PWGSC WPSS Validation Tool provides web developers and quality assurance testers the ability to perform a number of web site and web page validation tasks at one time. The WPSS Validation Tool crawls a site to find all of the documents, then analyses each one with a number of validation tools. The analysis includes:

  • HTML validation,
  • CSS validation,
  • robots.txt validation,
  • link violation checking,
  • metadata checking,
  • Technical Quality Assurance (TQA) checking,
  • HTML document feature reporting, for example, forms,
  • Open Data checking using a separate application tool.

Details of these analysis tools are available in the document, Web-WPSS Standalone Validation Tool Testcases.

Tool Limitations

There are limitations within the tool that may affect the validation results.

  • The tools do not support JavaScript. The Validation Tool may not accurately crawl or analyse sites that rely on JavaScript.
  • The tools do not use a standard Web browser User Agent name. The Validation Tool may not accurately crawl or analyse sites that rely on the User Agent name.
  • The tools’ default behaviour is to respect robots directives. If a site has a robots.txt to prohibit crawlers, the WPSS Validation Tool will not validate the site’s documents. You can configure the WPSS Validation Tool to ignore robots.txt directives. For more information, see the section, “Configuration Tab”.
  • Some of the output of the individual Validation Tools may be in English only. This tool is using third party software components. The source of these components is available only in the language that it was authored.

Tool Risks

The validation tool includes a crawler that follows links and retrieves web documents from sites. Care should be taken with this tool to:

  • Ensure entry page URLs for the sites are accurate so the crawler does not go beyond the site being analysed.
  • Ensure that document retrieval from the site being analysed, and any links to other sites, do not impose excessive loads on web servers and the network.

Top of Page

What’s New

Version 4.0 includes a number of enhancements. These include:

  • An Open Data tool is now included to check dataset, resource and dictionary files for compliance. For information on using the Open Data Tool, see the Open Data Tool User Guide.
  • Case insensitive checks on titles to find matching HTML and PDF versions of a document.
  • Allows binary data in CSV files. For example, new-line in cell content.
  • Report interoperability failure for web feeds that do not parse properly. (SWI_B)
  • Check for table headers that reference undefined headers or headers outside the current table (WCAG_2.0-H43)
  • Check for all language markers to determine if a page is archived or not. It also handles the case where the wrong language message is used.
  • Encode text that is written to results tabs, eliminating garbled French characters.
  • Check that Python version is not 3.0 or greater. The feed validator does not work with Python version 3.0.
  • Report unknown mime-type documents as non-HTML primary format.
  • Accept the enter key in the URL List tab to move to the next input line.
  • Added additional French archived web page notice.
  • Check for very long, over 500 character, title and heading text (WCAG_2.0-H25, WCAG_2.0-H42).
  • Do not report zoom failures for fixed size fonts (WCAG_2.0-G142).

Installation

The WPSS Validation Tool requires a Perl distribution installed on the workstation, and.pl files are associated with the Perl interpreter. The WPSS Validation Tool has been tested with Strawberry Perl 5.18.1 and ActivePerl version 5.14. Other versions of Strawberry Perl or ActivePerl and other Perl installations may not work as expected or may be missing required modules.

System Requirements

To use the WPSS Validation Tool, you need:

  • Windows XP or Windows 7,
  • Java runtime environment 1.6.0 (other versions may not work),
  • Python version 2.7.3 or 2.7.6,
  • Strawberry Perl 5.18.1 (32 bit) or ActiveState Perl 5.14 (does not support 5.16),
  • Only one installation of Perl on the system. Multiple installations may cause problems.

If you do not have the Perl or Python, you will need to install them manually. You can find installations at:

For the Perl and Python installs, accept the default settings during the installation process.

Top of Page

Remove Existing .pl File Associations

Before installing the WPSS Tool, remove any existing .pl file type associations. The installation of Strawberry Perl or ActivePerl will create a new association to ensure the proper execution of Perl applications.

To remove the .pl file association:

  1. Go to Start > Settings > Control Panel.
  2. Click Folder Options.
  3. Click the File Types tab.
  4. In the Registered file types list, locate and click the .pl entry.
  5. Click Delete.
  6. Click OK.

Installing the WPSS Tool.

To install the WPSS Validation Tool, double-click the WPSS_Install.exe file and follow the instructions on the screen.

WPSS Validation Tool Install Icon

The default installation folder for the WPSS Validation Tool is C:\Program Files\WPSS_Tool.

Uninstall

To remove the WPSS Validation Tool from a workstation, run the uninstall script.

To uninstall the WPSS Validation Tool:

Go to Start > Programs > WPSS_Tool > Uninstall.

Uninstall path

Uninstalling Perl

To remove the Perl installation:

  1. Go to Start > Settings > Control Panel.
  2. Click Add or Remove Programs.
  3. Locate the Strawberry or ActivePerl installation and click Remove.

Top of Page

Uninstalling Python

To remove the Python installation:

  1. Go to Start > Settings > Control Panel.
  2. Click Add or Remove Programs.
  3. Locate the Python installation and click Remove.

Using the WPSS Validation Tool

To start the PWGSC WPSS Validation Tool:

  1. Go to Start > Programs > WPSS_Tool.
  2. Click the WPSS_Tool icon.

Alternatively, using Windows Explorer, navigate to the C:\Program Files\WPSS_Tool folder and double click the wpss_tool.pl file.

The main window consists of five tabs:

  • Site Details is for entering URL information for crawling a site.
  • Login/Logout is for entering site or application login/logout information.
  • Direct HTML Input is for pasting in HTML code.
  • URL List is for entering a list of URLs.
  • Configuration is for configuring the WPSS Validation Tool options.

Site Details Tab

To crawl and analyse the documents on a site, enter the URLs for the site and entry page addresses into the WPSS Validation Tool. Use the Site Details tab to enter this information. You need to enter the following details to properly define a site for crawling.

Site Details tab

Top of Page

Entering URLs

Enter the English and French URLs as well as the entry page names for the site.

English Directory – The English URL to the directory containing all of the site’s documents (HTML documents, image files, CSS files, etc.).

For example:

English Entry Page – The directory and file name of the English entry page. This is not the full URL, just the directory, file name and arguments portion. To find the entry page name, go to the site’s splash page to get the page referenced by the English language button.

For example:

French Directory – This is the French URL to the directory containing all of the site’s documents (HTML documents, image files, CSS files, etc.). If the French Directory is the same as the English Directory, leave the field empty.

For example:

French Entry Page – The directory and file name of the French entry page. This is not the full URL; only the directory, file name and arguments portion. To find the entry page name, go to the site’s splash page to get the page referenced by the French language button.

For example:

Crawl Limit – The maximum number of URLs the WPSS Validation Tool will retrieve and analyse from the site. The URLs include all file types, HTML, PDF, images, CSS files, etc. To crawl the entire site, enter 0 (zero).

Top of Page

Saving Site Details

You can save the site details in a configuration file for sharing or easy access if you want to use the WPSS Validation Tool on the site again. Note that the Tool only saves the information from the Site Details tab.

To save the site configuration:

  1. Go to File > Save Site Config.
  2. Select a folder and file name for the configuration file.
  3. Select OK.

To best manage site configuration files, it is suggested that you save the configuration files in the C:\Program Files\WPSS_Tool\profiles folder.

Loading Site Details

You can load a previously saved site configuration. Loading a saved configuration file loads the Site Details tab fields. Once loaded, you can modify the information if required.

To open a saved site configuration:

  1. Go to File > Load Site Config.
  2. Locate the folder and file.
  3. Select OK.

Login/Logout Tab

The WPSS Validation Tool can crawl application sites that have a simple login page consisting of a single form for login credentials. If the site has a login, then you must include additional data items. If the site does not have a login, these fields can remain blank.

Login Logout tab

Top of Page

Login/Logout URLs

English Login Page – The directory and file name of the English login page. This is not the full URL; just the directory, file name and arguments portion. For example:

English Logout Page – The directory and file name of the English logout page. This is not the full URL; just the directory, file name and arguments portion. For example:

French Login Page – The directory and file name of the French login page. This is not the full URL; just the directory, file name and arguments portion. For example:

French Logout Page – The directory and file name of the French logout page. This is not the full URL; just the directory, file name and arguments portion. For example:

Login Form Name – If the login page contains more than one form, for example a search form along with the login form, then you must specify the name of the form. The name is the value of the name or id field on the form. If there is only one form on the login page, you can leave the field blank.

Crawling a Site and Analysing Documents

Once you have completed the configuration details for the site you want to crawl, click Check Site on the Site Details tab.

The WPSS Validation Tool begins to crawl the site and analyses each of its documents. The results appear in the Results Window.

Direct HTML Input Tab

You can analyse a single HTML page or selection of HTML if you need to verify one particular page or chunk HTML code. To analyse specific HTML, select the Direct HTML Input tab and paste your HTML code into the text area. Select Check to begin the analysis. Results appear in the Results Window.

Direct HTML Input tab

You can also configure how the WPSS Validation Tool analyses the HTML by going to the Configuration tab. For more information on the configuration options, see the section, Configuration Tab.

Top of Page

URL List Tab

The URL List tab enables you to enter a list of URLs for analysis. You can either type the URLs directly into the text area or load them from a file.

To begin the analysis, click Check URL List. The results appear in the Results Window.

URL List tab

Configuration Tab

The Configuration tab enables you to select options for the analysis tools.

Configuration tab

401 status handling – Controls the behaviour of the WPSS Validation Tool if it encounters a 401 – Not Authorized code message.

Options include:

  • Ignore
  • Prompt for credentials

ACC Testcase Profile – Select the desired accessibility test case profile.

Options include:

  • WCAG 2.0
  • TBS WCAG 2.0 Quick Check (WAAT Tool profile)
  • None

CLF Testcase Profile – Select the desired look and feel test case profile.

Options include:

  • SWU (Standard on Web Usability)
  • PWGSC Intranet
  • TBS SWU Intranet
  • CLF 2.0
  • None

Department Check Testcase Profile – Select the appropriate content check test case to validate.

Options include:

  • PWGSC SWU
  • PWGSC Common
  • Common
  • PWGSC Intranet
  • None

Department Features Profile – Select the appropriate feature profile for the report.

Options include:

  • All – Report all HTML features
  • None

Interoperability Testcase Profile – Select the appropriate interoperability test case profile.

Options include:

  • SWI – Standard on Web Interoperability
  • None

Link Check Profile – Select the appropriate link check test case profile.

Options include:

  • All
  • Errors – report broken links only
  • IPV4 – report IP addresses used as domain names
  • None

Metadata Profile – Select the appropriate metadata profile to use to validate.

Options include:

  • PWGSC SWU – PWGSC Standard on Web Usability
  • TBS SWU – TBS Standard on Web Usability
  • PWGSC – The PWGSC CLF 2.0 Metadata profile.
  • TBS CLF 2.0 – The TBS CLF 2.0 profile.
  • None – No Metadata required.

PDF Property Profile – Select the appropriate PDF property profile to use to validate.

Options include:

  • PWGSC – The PWGSC profile.
  • None – No properties required.

Web Analytics Profile – Select the appropriate Web Analytics profile to use to validate.

Options include:

  • TBS Web Analytics
  • None

Robots.txt handling – Controls the behaviour of the WPSS Validation Tool if it encounters a 403 – Forbidden by robots code.

Options include:

  • Ignore robots.txt
  • Respect robots.txt

Top of Page

Results Window

The Results Window contains a number of tabs containing the output of an individual analysis from the WPSS Validation Tool. The output in each tab includes a header that lists the English and French site directory URLs along with the time and date when the analysis started.

Results Window

Crawled URLs tab – Provides the list of URLs the WPSS Validation Tool analysed. It lists the referrer page to indicate how the crawler reached a particular page. Use this tab to monitor the WPSS Validation Tool to ensure it is actively crawling and analysing a site’s pages.

Validation tab – Contains the output of the HTML/XHTML, CSS and robots.txt validation tools. The WPSS Tool does NOT perform HTML5 mark-up validation.

Link tab – Contains the output of the link check.

Metadata tab – Contains the output of the metadata check.

ACC tab – Contains the output of the accessibility check.

CLF tab – Contains the output of the look and feel check.

Interop tab – Contains the output of the interoperability check.

Department tab – Contains the output of the department check.

Document Features tab – Contains a list of documents that have HTML features such as forms, tables, etc.

Document List tab – The WPSS Validation Tool writes information to this tab after completing the site analysis. It contains the sorted list of documents found in the site, and all documents including HTML/XHTML, PDF, images, CSS, etc.

The WPSS Validation Tool includes the time and date at the end of the report in each tab.

Saving Analysis Results

To save the analysis results, in the Results Window go to File > Save As. Select the file name and folder path in the file chooser dialog.

The results are stored in a number of files, one for each result tab. Each file name contains a suffix identifying the report type. For example, if you save the results in the file www_corp.txt, the actual results files are:

  • www_corp_acc.txt
  • www_corp_clf.txt
  • www_corp_crawl.text
  • www_corp_dept.txt
  • www_corp_feat.txt
  • www_corp_int.txt
  • www_corp_link.txt
  • www_corp_meta.txt
  • www_corp_urls.txt
  • www_corp_val.txt

In addition to the above results, some HTML reports are also generated and saved with the result files. For example:

  • www_corp_img.html – Image Details report
  • www_corp_h.html – Headings Outline report

It is suggested that you use the same base name of the results as the site profile name, and save the results in the in the C:\Program Files\WPSS_Tool\results folder.

Stopping Crawler and Analysis

If you need to stop the analysis while it is running, in the Results Window, go to Options > Stop Crawl. This stops the WPSS Validation Tool after processing the current document. The results includes note to the bottom of each output tab indicating that analysis was aborted.

Top of Page

Application Login

If the site being analysed is an application with a login page specified in the Site Details tab, the WPSS Validation Tool must be able login to the application to view pages behind the login. When the WPSS Validation Tool reaches the English login page, a Login window appears to provide application credentials. All of the input fields listed in the login form are presented to the user in a login window. The order of the fields may be different than what appears on the web page.

Application login prompt

After entering the login credentials, click Login. The WPSS Validation Tool attempts to login to the application site. You are not prompted for login credentials for the French login page; the English credentials are reused.

Web Server Protected Sites

If the site being analysed contains directories that are password protected by the web server using Basic Authentication, the WPSS Validation Tool must be able to provide the credentials to view the documents. When the WPSS Validation Tool reaches a document that requires authentication, a Login window appears to provide the credentials.

Authorization prompt

After entering the login credentials, click OK. The WPSS Validation Tool attempts to access the document(s).

Reporting Passes and Fails

The default behaviour of the analysis tools is to report only URLs that fail checks. You can view results for both passes and fails. To see both passes and fails, in the WPSS Validation Tool window, go to Options > Report Fails and Passes. The URL for documents that pass checks are recorded in the results output.

To see only failed pages, go to Options > Report Fails Only.

Capture HTML Content on Errors

Sometimes having the URL of a document with errors is not sufficient, especially if the content is generated by an application on the fly. To capture the HTML content of a document with errors, it is necessary to stop the analysis process when the error occurs.

To stop an analysis when an error in an application occurs, in the WPSS Validation Tool window, go to Options > Stop on Error.

When selected, the crawler and analysis tools will stop when it detects an error. Once stopped, you have the option to save the HTML content, or to continue on with the next document. The saved content includes HTML comments at the top of the file providing the full URL of the document along with the WPSS Validation Tool error message.

To turn off this option, in the WPSS Validation Tool window go to Options > Continue on Error.

Top of Page

Command Line Interface

The WPSS Validation Tool is available from both the GUI interface and from the command prompt.

To access the command line version:

  1. Go to Start > Programs > Accessories > Command Prompt.

  2. Change to the C:\Program Files\WPSS_Tool directory.

  3. Run the program wpss_tool.pl.

Crawl a Site

To crawl a site and validate documents, use the command…

wpss_tool.pl –cli –c <crawl file>

…where is the path to a file containing the site details and WPSS Validation Tool configuration. The configuration file is a plain text file containing the following variable/value:

Variable Value
site_url_eng The URL to the English entry page.
site_url_fra The URL of the French entry page. If the French Entry Page is the same as the English Entry Page, you do not need to include it in the command.
crawllimit The maximum number of URLs the WPSS Validation Tool retrieves and analyses from the site. The URLs include all file types, HTML, PDF, images, CSS files, etc. To crawl the entire site, use a value of 0 (zero).
output_file The folder and file name where the analysis results are written.
httpproxy Sets the proxy server for HTTP traffic. Only use this variable when the user’s workstation requires a proxy to reach the Internet.

Example file contents:

crawllimit 100
site_url_eng http://webdev02.tpsgc-pwgsc.gc.ca/comm/index-eng.html
site_url_fra http://webdev02.tpsgc-pwgsc.gc.ca/comm/index-fra.html
output_file results/webdev02_comm
httpproxy

Crawl a site with a login

To crawl a site or application that has a login, you need to provide the WPSS Validation Tool with additional configuration items. All of the configuration items for crawling a site must be provided along with the following:

Variable Value
loginpagee The URL of the English login page.
logoutpagee The URL of the English logout page.
loginpagef The URL of the French login page.
logoutpagef The URL of the French logout page.
loginformname If the login page contains more than one form, like a search form with the login form, you need to specify the name of the form. The name is the value of the name/id attribute of the form. If there is only one form on the login page, it does not need to be included in the command.

Login credentials must also be provided in a credentials file. The credentials file is a text file containing name/value pairs for the login form’s text and password fields. The variable names match the name attribute of the various "input" fields.

Sample credentials file:

username admin
password abc123

To crawl a site or application that has a login, use the command:

 wpss_tool.pl –cli –c <crawl file> -login <credentials file>

Top of Page

Analyse a block of HTML code

To analyse a block of HTML mark-up, the command…

 wpss_tool.pl –cli –h <html content file>

…where is the path to a file containing the WPSS Validation Tool configuration. The configuration file is a plain text file containing the following variable/value pairs:

Variable Value
html_file The path of the file containing HTML mark-up to analyse.
output_file The folder and file name where the analysis results are written.
httpproxy The HTTP proxy field allows for setting proxy server for HTTP traffic. Only use this field when the user’s workstation requires a proxy in order to reach the Internet.

All other lines of the file that begin with either http:// or https:// are the list of URLs to analyse.

Example file contents:

html_file sample.html
output_file results/webdev02_comm
httpproxy

Analyse a list of URLs

To analyse a list of URLs, use the command…

wpss_tool.pl -cli –u <url list file>

…where the is the path to a file containing the list of URLs and WPSS Validation Tool configuration. The configuration file is a plain text file containing the following variable/value pairs:

Variable Value
output_file The folder and file name where the analysis results are written.
httpproxy The HTTP proxy field allows for setting proxy server for HTTP traffic. Only use this field when the user’s workstation requires a proxy in order to reach the Internet.

All other lines of the file that begin with either http:// or https:// are the list of URLs to analyse.

Example file contents:

output_file results/webdev02_comm
httpproxy
http://www.tpsgc-pwgsc.gc.ca/comm/index-eng.html
http://www.tpsgc-pwgsc.gc.ca/comm/index-fra.html

Program Status and Progress

As the command line WPSS Validation Tool runs and analyses documents, the URLs of the documents are printed to the console. Use this to monitor the WPSS Validation Tool to ensure it is actively crawling and analysing a site’s pages.

Viewing Results

The results of an analysis are stored in a number of files. Each file name includes a suffix identifying the report type. For example, if you save the results in the file webdev02_comm.txt, the actual results files are:

  • webdev02_comm _crawl.txt – list of analysed URLs by the WPSS Validation Tool.
  • webdev02_comm _acc.txt – details of WCAG 2.0 faults detected.

It is suggested that you use the same base name of the results as the site profile name, and save the results in the in the C:\Program Files\WPSS_Tool\results folder.

You can view the results files with any program that can display plain text files, such as WordPad.

Top of Page

Language Switching

You can toggle the language of the analysis results using a command line option.

Option Result
-eng Analysis results are provided in English.
-fra Analysis results are provided in French.

If no language selection is made, the language is determined from the operating system.

##Configuration Files There are a number configuration files that modify the behaviour of the WPSS Validation Tool. Most of these files should remain untouched with the exception of the wpss_tool.config file. This file contains the network scope designation for domains as well as the domain and domain alias mapping. The file is a plain text file with simple name value pairs for configuration parameters.

Domain Network Scope

This defines the network scope of a specific domain. The network scope is one of:

  • Internet
  • GC Intranet
  • PWGSC Intranet
  • Internet development
  • GC Intranet development
  • PWGSC Intranet development

Domain Alias Map

This provides a mapping of domain names and their aliases.

Troubleshooting

Entry Page Rewritten

Message Entry page http://webdev02.pwgsc.gc.ca/comm/index-eng.html rewritten to http://webdev02.tpsgc-pwgsc.gc.ca/comm/index-eng.html
Cause The entry page provided for the site configuration is redirected by the web server.
Correction Provide the rewritten entry page URL in the site configuration.

Forbidden by robots.txt

Message Failed to get URL http://ssi-iss.tpsgc-pwgsc.gc.ca/index-eng.html, error is 403 Forbidden by robots.txt.
Cause The site has a robots.txt file that prohibits crawlers from accessing the site. The Tool honours robots, and unable to analyse the site.
Correction Go to the **Configuration **tab and select Ignore robots.txt for the robots.txt handling option.

User Agent String

The validation tools have their own User Agent string that is different from a user’s browser. Some sites look at the User Agent string and present different output depending upon its value, for example, checking for a minimum browser version.

Message Varies
Cause The validation tools User Agent string not recognized by the site being analysed.
Correction Change site to not depend on a specific User Agent string and/or browser.

Perl Command Line Interpreter Error

The WPSS Validation Tool uses a number of Perl modules. These modules may have errors that cause the program to fail. You may encounter the following message:

Perl Command Line Error

Message Perl Command Line Interpreter has encountered a problem and needs to close.
Cause An error in the supporting Perl modules.
Correction Restart the WPSS Validation Tool, if the problem persists send an email to [email protected].

500 Internal Server Error

When analysing some sites, the WPSS Validation Tool may report a 500 Internal server error. This may be due to a limitation on the Web server and not the WPSS Validation Tool. The problem may be due to the Web server not handling the “Range” setting in the HTTP GET operation. The WPSS Validation Tool sets a size limit on GET operations to avoid getting extremely large documents.

To avoid this error, you can change the WPSS Validation Tool configuration file setting to not include the “Range” setting in a HTTP GET operation.

Using a simple text file editor, such as WordPad:

  1. Open the file c:\Program Files\wpss_tool\conf\wpss_tool.config.

  2. Locate the following lines in the file:

    # Max User Agent Size limits the size of files accepted in a GET
    # request. A value of 0 means we can accept documents of any size.
    # A value of 0 also removes the Range field from the HTTP header.
    #
    #User_Agent_Max_Size 0
    
  3. Remove the leading ‘#’ character from the #User_Agent_Max_Size 0.

This functionality is not available through the user interface, only by directly editing the configuration file.

Top of Page

Clone this wiki locally