Website Rework

by djsharpe on October 25, 2011

Today I found myself needing to extract all the page links from my works website to ensure that when I’m done reworking the site, all the old links were redirected to the new pages and there were no nasty 404’s.

So here’s my “Quick and dirty link extractor” I used in Cygwin to accomplish this:

MYSITE=’’;wget -nv -r –spider $MYSITE 2>&1 | egrep ‘ URL:’ | awk ‘{print $3}’ | sed s@URL:${MYSITE}@@g”

Of course subsitutute “” with the site URL you are extracting from and wait patiently. It will show output when the spidering is finished.

That’s it. Nice & easy, slow & low.

