Script to check links on a web page

Link Checker is a script that crawls every page of a site, and detects broken links and redirects.

Checking broken links is essential to maintain a site, for both users and search engines. All broken links must be removed as well redirects updated, whether they are external or internal to the site, for the good image of the site, and for a better ranking on search engines.

In addition, this open source program is easy to modify thanks to the Scriptol or PHP or JavaScript source code. It works with virtually all operating systems.

Check broken links in HTML page

Manual of the PHP version

If PHP is not installed, you should download the interpreter on php.net.

  1. Download the zip archive,
  2. unpack it ,
  3. open a command line window and
  4. run the program in the window according to the syntax given below.

The script in command line can be executed by the Scriptol PHP compiler:

solp linche [options] page

or directly by the PHP 5 interpreter:

php linche.php [options] page

The page is a complete URL in the form:

http://www.example.com/index.php

The options:

-r  recursive, follow  internal links. By default only the page is scanned. 
-s  short list. Displays only broken links and link to page redirected on the site. 
-f fast. Speed up the processing with a reduced time out
          . A value may be user defined and added after -f.   
-v verbose, displays all links with the HTTP header code found.
By default displays only links with errors.
-q quiet, no display.

The program produces a file named links.log which incorporates all the results and that you may save under another name if you want to retain them.

The results provided by the script are not perfect. Sometimes a link is reported "broken" while the page is accessible using the browser. This comes from a response time too long, or the server. In this case simply ignore the result.

Error messages and actions to accomplish

When the script tries to access a page, the server can return different code depending on the status of the page. It can be redirected by the .htaccess file or by a PHP script, and it may be missing.

Note that most of the time, the program displays a message in place of the HTTP code:

These various codes, other than OK, must not be taken into account while the page is accessible, except for permanent redirects which must be updated.

For the complete list of codes and their meaning, see the HTTP codes document.

With the -s option, only these codes are taken into account:

Using the JavaScript version

Install Node.js, download the jlinks.zip archive, and extract its contents.

Go to the archive's directory and type:

node links.js c:/example.com -whttps://example.com/

Replace example.com with the local directory containing your website's content and https://example.com with your website's URL.

License

This link checker is licensed under the GNU General Public License 2.0. Use it freely. If you distribute the archive, you must retain the copyright in both the Scriptol and PHP source code.
The changes and possible improvements in the code must be made public and supplied in the form of open source code, even if you use the modified program online.

By Denis Sureau. GNU GPL 2.0 license.

See also ...

Unredir
For a static site only, tests broken links and redirects. In the second case it replaces the URL with its new address.