Blog
Stay updated on breaking news and the latest trends
from the SEO community as well as important news about
Sorezki's latest products and reports.
Robots.txt Sensitive Backdoor
USC.edu Stats: http://www.usc.edu/stats/prev/hscweb-monthly-server-stats.html The URL above refers to the stats page of the University of Southern California.
Robots.TXT files tell search engines what they shouldn’t index, but it’s no use if we forget to tell them not to index themselves (which could lead to clashing the index with the noindex operation).
So, what do we do?
We could set up proper directory permissions on our server and
disable ‘read’ for all external IPs but ourselves, or we could add a blank index.html and forget about mentioning the URL in the robots file, because if google can’t find it – they can’t index it.
Where the heck is the actual problem?
Ah! well, it’s quite simple really. Run a search for the following string – “robots.txt” “Disallow:” “private” filetype:txt and you’ll find out for yourself.
Need an example?
- Ok, so here’s how I found the first URL in this post: “robots.txt” “Disallow:” “private” filetype:txt
- First Result: http://www.usc.edu/robots.txt
- /Stats folder is showing “Private”, so – http://www.usc.edu/stats/
- http://www.usc.edu/stats/prev/hscweb-monthly-server-stats.html
Bottom line
robots.txt may be good for telling search engines what NOT to index, but if you really wanna keep your data safe – don’t mention it there.