DevHeads.net

0 length robot.txt

This is probably a coincidence, but I had one of my hosted sites (with no php code anywhere, and certainly no .php files) returning a script error on load instead of showing the non-php webpage:

[proxy_fcgi:error] [pid 88148] [client xx.xx.xx.xx:63137] AH01071: Got error 'Primary script unknown\n’

And it would display a blank page for a few seconds, then “File Not Found” would appear. There was no HTTP error code.

Other hosted sites, also not using php, didn’t have this problem, and sites that did use php were working fine.

All sites are configured to allow php, and have an fcgi in their configuration:

DocumentRoot ${WEBROOT}
ProxyPassMatch ^/(.*\.php)$ fcgi://127.0.0.1:9000/${WEBROOT}$1

On the problem site, if I commented out the ProxyPassMatch line and reloaded apache, the site would load.

Confusing.

Not being sure what was causing this one one specific site, I started comparing the directory structure, .htaccess, and anything I could look at to see what was different about this particular site and I noticed that it had a zero length robots.txt file in the webroot, which no other site had.

Removing that file made the site load properly.

I still get many script errors in the error log, but these mostly have a referer [sic] at the end and are obviously attempts to hack into the page:

[proxy_fcgi:error] [pid 42901] [client 178.137.92.187:61783] AH01071: Got error 'Primary script unknown\n', referer: /www/XXX/license.php

[proxy_fcgi:error] [pid 18168] [client 74.71.9.14:59624] AH01071: Got error 'Primary script unknown\n'
[core:info] [pid 43056] [client 74.71.9.14:59637] AH00128: File does not exist: /www/XXX/apple-touch-icon-120x120-precomposed.png

But I still get a few bare ones:

[Wed Oct 03 08:13:05.504129 2018] [proxy_fcgi:error] [pid 43364] [client 74.71.9.14:57753] AH01071: Got error 'Primary script unknown\n'
[Wed Oct 03 09:17:36.194394 2018] [proxy_fcgi:error] [pid 42840] [client 54.36.148.74:60192] AH01071: Got error 'Primary script unknown\n'
[Wed Oct 03 10:08:08.834583 2018] [proxy_fcgi:error] [pid 18168] [client 74.71.9.14:59624] AH01071: Got error 'Primary script unknown\n'
[Wed Oct 03 10:17:17.791282 2018] [proxy_fcgi:error] [pid 43056] [client 180.76.15.30:29494] AH01071: Got error 'Primary script unknown\n'
[Wed Oct 03 10:40:17.322634 2018] [proxy_fcgi:error] [pid 42840] [client 74.71.9.14:64211] AH01071: Got error 'Primary script unknown\n'
[Wed Oct 03 11:27:58.098639 2018] [proxy_fcgi:error] [pid 18168] [client 202.46.50.182:22728] AH01071: Got error 'Primary script unknown\n'
[Wed Oct 03 11:33:13.054967 2018] [proxy_fcgi:error] [pid 43056] [client 123.125.71.77:21018] AH01071: Got error 'Primary script unknown\n'

(The 74.71.9.14 IP has made more than 50,000 requests for non-existent files on this hosted domain, sadly it is a residential IP from RoadRunner, and they are worthless to deal with, regardless of how often they change their company name)

Why would a blank robots.txt cause this issue? Or is there something else going on here and this is just a weird coincidence?

Comments

Re: 0 length robot.txt

By LuKreme at 10/03/2018 - 13:59

On 03 Oct 2018, at 11:39, @lbutlr < ... at kreme dot com> wrote:
Well, it did for about 3h25 minutes, in fact.

Just after posting the message, the site went back to showing only “File Not Found”

I’m at a loss.

The only other issue I see is in the main http-error log there are repeated instance of:

[ssl:info] [pid 43234] (70014)End of file found: [client 106.45.1.92:48564] AH01991: SSL input filter read failed.

(From various client addresses)

The site in question gets a grade of A+ from SSL Labs, and this error message appears to be somewhat spurious in nature as apache tries to use the default cert for the site before getting the server name, then loads the correct cert, so I don’t think this is really an issue.

Re: Re: 0 length robot.txt

By Filipe Cifali at 10/03/2018 - 14:07

Hi Kremels,

you can check what virtualhost is being served via apache2ctl like this: $
apache2ctl -S
$ apache2ctl -h provides this info:
-S : a synonym for -t -D DUMP_VHOSTS -D DUMP_RUN_CFG

After checking that the right vhost is being served, start removing proxy
logic and just make the txt work again, then slowly start adding the proxy
config to make the php work again.

If you can, post the full vhost here regarding the domain that misbehaves.

The important part is: Having a zeroed robots.txt doesn't break httpd.

Re: 0 length robot.txt

By LuKreme at 10/03/2018 - 14:27

On 03 Oct 2018, at 12:07, Filipe Cifali <cifali. ... at gmail dot com> wrote:
Yes that is all fine, and the site was loading perfectly for almost three and a half hours.

port 443 namevhost <a href="http://www.XXX.com" title="www.XXX.com">www.XXX.com</a> (/usr/local/etc/apache24/users/XXX.conf:1)
alias XXX.com
port 80 namevhost <a href="http://www.XXX.com" title="www.XXX.com">www.XXX.com</a> (/usr/local/etc/apache24/users/XXX,conf:26)
alias XXX.com

I do not have an apache2ctl, just apachectl (apache 2.4 FreeBSD 11.2-REALEASE compiled from ports)

There is exactly one line in the site configuration that, when commented, makes the site work again. Though, possibly only for a little while. I’ll have to check more in 3-4 hours. There is no other proxy logic at all.

Sure, but other than the host name, it is identical to all the other sites.

<VirtualHost *:443>
ServerName <a href="http://www.XXX" title="www.XXX">www.XXX</a>
ServerAlias XXX
DocumentRoot /www/XXX/
#ProxyPassMatch ^/(.*\.php)$ fcgi://127.0.0.1:9000/www/XXX/$1
<Directory "/www/XXX/">
Options +Indexes +FollowSymLinks +MultiViews -SymLinksIfOwnerMatch
AllowOverride all
Require all granted
</Directory>
SSLEngine on
SSLCertificateFile /usr/local/etc/dehydrated/certs/XXX/cert.pem
SSLCertificateKeyFile /usr/local/etc/dehydrated/certs/XXX/privkey.pem
SSLCertificateChainFile /usr/local/etc/dehydrated/certs/XXX/chain.pem
SSLProtocol ALL -SSLv2 -SSLv3
SSLHonorCipherOrder on
SSLCipherSuite ECDH+AESGCM:DH+AESGCM:ECDH+AES256:DH+AES256:ECDH+AES128:DH+AES:ECDH+3DES:DH+3DES:RSA+AESGCM:RSA+AES:RSA+3DES:!aNULL:!MD5:!DSS
# 15638400 seconds is 181 dayds
# 63072000 seconds is 730 days
Header always set Strict-Transport-Security "max-age=15638400; includeSubdomains;"
Header always set X-Frame-Options DENY
ErrorLog /home/user1/logs/XXX.error_log
CustomLog /home/user1/logs/XXX.access_log combined
</VirtualHost>

Yeah, it didn’t seem likely, but then again it seemed to work for q bit…

And, just for kicks:
# apachectl -M
Loaded Modules:
core_module (static)
so_module (static)
http_module (static)
authn_file_module (shared)
mpm_prefork_module (shared)
authn_dbm_module (shared)
authn_core_module (shared)
authz_host_module (shared)
authz_groupfile_module (shared)
authz_user_module (shared)
authz_dbm_module (shared)
authz_core_module (shared)
access_compat_module (shared)
auth_basic_module (shared)
auth_digest_module (shared)
socache_shmcb_module (shared)
socache_dbm_module (shared)
reqtimeout_module (shared)
include_module (shared)
filter_module (shared)
mime_module (shared)
log_config_module (shared)
env_module (shared)
headers_module (shared)
setenvif_module (shared)
version_module (shared)
proxy_module (shared)
proxy_fcgi_module (shared)
ssl_module (shared)
unixd_module (shared)
dav_module (shared)
status_module (shared)
autoindex_module (shared)
cgi_module (shared)
dav_fs_module (shared)
vhost_alias_module (shared)
dir_module (shared)
userdir_module (shared)
alias_module (shared)
rewrite_module (shared)

# cat /www/XXX/.htaccess
Options +Includes +FollowSymLinks +MultiViews

Re: 0 length robot.txt

By LuKreme at 10/03/2018 - 19:11

On 03 Oct 2018, at 12:27, @lbutlr < ... at kreme dot com> wrote:
It’ been over 4 hours now (almost 5) and the site is still responding perfectly. I still have no idea what is causing it to break if I uncomment the ProxyPass line considering there is no php anywhere on the site other than a couple of href to external sites.

Re: 0 length robot.txt

By LuKreme at 10/06/2018 - 19:51

On 03 Oct 2018, at 17:11, @lbutlr < ... at kreme dot com> wrote:
Well, I am more confused. I changed the log from common to debug and the site has been fine for days now.

- CustomLog /home/user/logs/XXX.access_log combined
+ CustomLog /home/user/logs/XXX.access_log debug

This was a mistake, as it simply logs “debug” now, so the logs are useless, but the site is up.

In https.conf:
LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" combined

¯\_(ツ)_/¯

Re: Re: 0 length robot.txt

By Filipe Cifali at 10/06/2018 - 19:59

It's described on the CustomLog docs:
<a href="https://httpd.apache.org/docs/current/mod/mod_log_config.html#customlog" title="https://httpd.apache.org/docs/current/mod/mod_log_config.html#customlog">https://httpd.apache.org/docs/current/mod/mod_log_config.html#customlog</a>

"The second argument specifies what will be written to the log file. It can
specify either a ***nickname*** defined by a previous LogFormat
<https://httpd.apache.org/docs/current/mod/mod_log_config.html#logformat>
directive, or it can be an explicit ***format*** string as described in the log
formats
<https://httpd.apache.org/docs/current/mod/mod_log_config.html#formats>
section. "

Either you use this way:
LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\""
combined
CustomLog /home/user/logs/XXX.access_log combined

Or this way:
CustomLog "/home/user/logs/XXX.access_log" "%h %l %u %t \"%r\" %>s %b
\"%{Referer}i\" \"%{User-Agent}i\""

You see, "combined" is just a nickname to the LogFormat, you can add
something like "my-site-special-log-format" and as long as you call it on
the CustomLog it will work, cause it's just an alias.

Re: 0 length robot.txt

By LuKreme at 10/06/2018 - 21:03

On 06 Oct 2018, at 17:59, Filipe Cifali <cifali. ... at gmail dot com> wrote:
Yes, I know this. The oddity is simply that changing it to, essentially, a nonsense setting has prevented the site from crashing exactly like disabling the Proxy prevented the site from crashing.

Re: Re: 0 length robot.txt

By Filipe Cifali at 10/03/2018 - 20:27

Lewis,

you can for example turn log level to debug and access the site, tailing
the logs should provide some information about what is breaking. Also, why
you have a ProxyPass on a virtualhost that doesn't run anything PHP? Create
a template without the config and use it.

Re: 0 length robot.txt

By LuKreme at 10/04/2018 - 13:45

On 03 Oct 2018, at 18:27, Filipe Cifali <cifali. ... at gmail dot com> wrote:
Is it possible to set the log level just for a virtual host? I thought that was a server-wide setting. I tried adding

LogLevel warn rewrite:trace8

to the virtual host and didn’t get an error on starting apache, but the http-error log for the site didn’t appear any different.

All the sites are setup for php so that I don’t have to get an email, go edit a file, and restart apache just because someone wants to put some php code in their page.

At least today it is failing immediately, so debugging should be easier.

Re: Re: 0 length robot.txt

By Filipe Cifali at 10/04/2018 - 13:50

You want to use a CustomLog for virtualhost config to gather the most info
you can from the request:

<a href="https://httpd.apache.org/docs/current/mod/mod_log_config.html#customlog" title="https://httpd.apache.org/docs/current/mod/mod_log_config.html#customlog">https://httpd.apache.org/docs/current/mod/mod_log_config.html#customlog</a>

Also, read the *Context* so you know where you can use them:

<a href="https://httpd.apache.org/docs/2.4/mod/core.html#LogLevel" title="https://httpd.apache.org/docs/2.4/mod/core.html#LogLevel">https://httpd.apache.org/docs/2.4/mod/core.html#LogLevel</a>

Re: 0 length robot.txt

By LuKreme at 10/04/2018 - 14:54

On 04 Oct 2018, at 11:50, Filipe Cifali <cifali. ... at gmail dot com> wrote:
Ugh. That is a terrible bit of documentation written by and for people who don’t need documentation.

It would be nice if there was something that clearly explained all of this, especially considering how it’s changed since 2.2.

I’ve enabled the proxy and set CustomLog /path/log debug

Everything has been working for a bit now; this is annoying. :/

Re: Re: 0 length robot.txt

By Filipe Cifali at 10/04/2018 - 15:20

It's a bit strange to say that considering there is a page covering the
changes from 2.2 to 2.4:

<a href="https://httpd.apache.org/docs/2.4/upgrading.html" title="https://httpd.apache.org/docs/2.4/upgrading.html">https://httpd.apache.org/docs/2.4/upgrading.html</a>

And the docs, this project is open source, we can change (or rather,
propose changes) to documentation anytime we want.

Re: 0 length robot.txt

By LuKreme at 10/04/2018 - 16:14

On 04 Oct 2018, at 13:20, Filipe Cifali <cifali. ... at gmail dot com> wrote:
Sure, but first you have to figure out the multiple layers of complexity in the current docs.