PHP script to check links on a web page

Link Checker is a script that crawls every page of a site, and detects broken links and redirects.

Checking broken links is essential to maintain a site, for both users and search engines. All broken links must be removed as well redirects updated, whether they are external or internal to the site, for the good image of the site, and for a better ranking on search engines.

This program is an alternative to Xenu, it has the advantage to group the links by pages checked, which makes editing easier, and it is portable.
Compared to the online validator of W3C it has the advantage to not block on certain pages.

In addition, this open source program is easy to modify thanks to the Scriptol or PHP source code. It works with PHP 5 and virtually all operating systems and models of computers.

A version of Link Checker with graphical user interface is also available. It is harder to install but easier to use.

Commands and usage

The script in command line can be executed by the Scriptol PHP compiler:

solp linche [options] page

or directly by the PHP 5 interpreter:

php linche.php [options] page

The page is a complete URL in the form:

http://www.example.com/index.php

The options:

-r  recursive, follow  internal links. By default only the page is scanned. 
-s  short list. Displays only broken links and link to page redirected on the site. 
-f fast. Speed up the processing with a reduced time out
          . A value may be user defined and added after -f.   
-v verbose, displays all links with the HTTP header code found.
By default displays only links with errors.
-q quiet, no display.

The program produces a file named links.log which incorporates all the results and that you may save under another name if you want to retain them.

The results provided by the script are not perfect. Sometimes a link is reported "broken" while the page is accessible using the browser. This comes from a response time too long, or the server. In this case simply ignore the result.

Error messages and actions to accomplish

When the script tries to access a page, the server can return different code depending on the status of the page. It can be redirected by the .htaccess file or by a PHP script, and it may be missing.

Note that most of the time, the program displays a message in place of the HTTP code:

These various codes, other than OK, must not be taken into account while the page is accessible, except for permanent redirects which must be updated.

For the complete list of codes and their meaning, see the HTTP codes document.

With the -s option, only these codes are taken into account:

Versions

Getting the program and license

This link checker is licensed under the GNU General Public License 2.0. Use it freely. If you distribute the archive, you must retain the copyright in both the Scriptol and PHP source code.
The changes and possible improvements in the code must be made public and supplied in the form of open source code, even if you use the modified program online.
If PHP is not installed, you should download the interpreter to version 5 on php.net.

  1. Download the archive in zip format,
  2. unpack it ,
  3. open a command line window and
  4. run the program in the window according to the syntax given above.

By Denis Sureau. GNU GPL 2.0 license.

See also ...

Unredir
For a static site only, tests broken links and redirects. In the second case it replaces the URL with its new address.