DevHeads.net

Package segfaults when built with -O2 but not with -O0

One of my packages, pptp, suffers occasional segfaults as reported in
<a href="http://bugzilla.redhat.com/749455" title="http://bugzilla.redhat.com/749455">http://bugzilla.redhat.com/749455</a>. However, whilst investigating this,
it seems to be the case that simply rebuilding the package using no
optimization (-O0) as opposed to the default -O2 is enough to stop this
happening.

This raises two questions (at least!):

1. Is it reasonable for me to flout the packaging guidelines by
rebuilding with -O0 until this is resolved?

2. How to determine what the actual problem is, e.g. a problem with the
way the code is written leading to unsafe optimizations, or a gcc bug?

Paul.

Comments

Re: Package segfaults when built with -O2 but not with -O0

By Gregory Maxwell at 11/18/2011 - 11:36

On Fri, Nov 18, 2011 at 6:31 AM, Paul Howarth <paul@city-fan.org> wrote:
[Obviously Andrew's look at warnings advice is good but also…]

See if you can reproduce it when compiled with -O2 -fno-strict-aliasing
if not, then the package violates C rules for pointer aliasing and
-Wstrict-aliasing=n can help you find the bug.

Otherwise, reproduce the crash while running in valgrind and it's pretty
likely that you'll see the cause there. (e.g. some use-uninitilized which
turns out to be harmless in O0)

Between these two that covers a pretty significant portion of bugs that
vanish at O0.

Re: Package segfaults when built with -O2 but not with -O0

By Andrew Haley at 11/18/2011 - 11:28

On 11/18/2011 11:31 AM, Paul Howarth wrote:
You're just going to have to debug it.

To start with, build with warnings, and look at them all. If none of
that works and you're stuck, I'll have a look.

This is unlikely to be a gcc bug.

Does the upstream package segfault?

Andrew.

Re: Package segfaults when built with -O2 but not with -O0

By Paul Howarth at 11/18/2011 - 12:32

On Fri, 18 Nov 2011 15:28:27 +0000

It already builds with -Wall and there are no warnings:

<a href="http://koji.fedoraproject.org/koji/getfile?taskID=3507432&amp;name=build.log" title="http://koji.fedoraproject.org/koji/getfile?taskID=3507432&amp;name=build.log">http://koji.fedoraproject.org/koji/getfile?taskID=3507432&amp;name=build.log</a>

Upstream's Makefile uses -O0 and doesn't appear to segfault (probably
as a result).

Paul.

Re: Package segfaults when built with -O2 but not with -O0

By Ralf Corsepius at 11/18/2011 - 13:53

On 11/18/2011 05:32 PM, Paul Howarth wrote:
Adding a couple of more agressive options, this is what happens to pptp:

orckit_quirks.c:65:2: warning: missing initializer
[-Wmissing-field-initializers]
orckit_quirks.c:65:2: warning: (near initialization for
'fixed_packet.header.pptp_type') [-Wmissing-field-initializers]
pptp.c:147:21: warning: unused parameter 'sig' [-Wunused-parameter]
pptp.c:153:19: warning: unused parameter 'sig' [-Wunused-parameter]
pptp.c:459:33: warning: dereferencing type-punned pointer might break
strict-aliasing rules [-Wstrict-aliasing]
pptp.c:492:65: warning: unused parameter 'argc' [-Wunused-parameter]
pptp_callmgr.c:105:48: warning: unused parameter 'envp' [-Wunused-parameter]
pptp_callmgr.c:202:48: warning: dereferencing type-punned pointer might
break strict-aliasing rules [-Wstrict-aliasing]
pptp_callmgr.c:215:17: warning: ignoring return value of 'read',
declared with attribute warn_unused_result [-Wunused-result]
pptp_callmgr.c:216:17: warning: ignoring return value of 'read',
declared with attribute warn_unused_result [-Wunused-result]
pptp_callmgr.c:325:29: warning: dereferencing type-punned pointer might
break strict-aliasing rules [-Wstrict-aliasing]
pptp_callmgr.c:330:28: warning: dereferencing type-punned pointer might
break strict-aliasing rules [-Wstrict-aliasing]
pptp_callmgr.c:358:25: warning: dereferencing type-punned pointer might
break strict-aliasing rules [-Wstrict-aliasing]
pptp_callmgr.c:368:44: warning: unused parameter 'inetaddr'
[-Wunused-parameter]
pptp_callmgr.c:39:29: warning: unused parameter 'sig' [-Wunused-parameter]
pptp_callmgr.c:44:29: warning: unused parameter 'sig' [-Wunused-parameter]
pptp_callmgr.c:71:18: warning: ignoring return value of 'write',
declared with attribute warn_unused_result [-Wunused-result]
pptp_ctrl.c:1062:13: warning: function declaration isn't a prototype
[-Wstrict-prototypes]
pptp_ctrl.c:177:1: warning: function declaration isn't a prototype
[-Wstrict-prototypes]
pptp_ctrl.c:206:31: warning: comparison between signed and unsigned
integer expressions [-Wsign-compare]
pptp_ctrl.c:245:13: warning: comparison between signed and unsigned
integer expressions [-Wsign-compare]
pptp_ctrl.c:535:14: warning: comparison between signed and unsigned
integer expressions [-Wsign-compare]
pptp_gre.c:237:27: warning: comparison between signed and unsigned
integer expressions [-Wsign-compare]
pptp_gre.c:241:30: warning: comparison between signed and unsigned
integer expressions [-Wsign-compare]
pptp_gre.c:249:18: warning: comparison between signed and unsigned
integer expressions [-Wsign-compare]
pptp_gre.c:260:23: warning: comparison between signed and unsigned
integer expressions [-Wsign-compare]
pptp_gre.c:441:26: warning: comparison between signed and unsigned
integer expressions [-Wsign-compare]
pptp_gre.c:446:23: warning: comparison between signed and unsigned
integer expressions [-Wsign-compare]
pptp_gre.c:493:27: warning: comparison between signed and unsigned
integer expressions [-Wsign-compare]
pptp_gre.c:527:19: warning: comparison between signed and unsigned
integer expressions [-Wsign-compare]
pptp_gre.c:74:10: warning: function declaration isn't a prototype
[-Wstrict-prototypes]
pptp_gre.c:85:19: warning: nested extern declaration of 'localbind'
[-Wnested-externs]
pptp_gre.c:92:29: warning: dereferencing type-punned pointer might break
strict-aliasing rules [-Wstrict-aliasing]
pptp_gre.c:99:28: warning: dereferencing type-punned pointer might break
strict-aliasing rules [-Wstrict-aliasing]
pptp_quirks.c:33:5: warning: function declaration isn't a prototype
[-Wstrict-prototypes]
pptp_quirks.h:56:1: warning: function declaration isn't a prototype
[-Wstrict-prototypes]
pqueue.c:220:11: warning: function declaration isn't a prototype
[-Wstrict-prototypes]
pqueue.h:27:1: warning: function declaration isn't a prototype
[-Wstrict-prototypes]
routing.c:117:8: warning: ignoring return value of 'fgets', declared
with attribute warn_unused_result [-Wunused-result]
routing.c:125:6: warning: function declaration isn't a prototype
[-Wstrict-prototypes]
routing.c:155:6: warning: function declaration isn't a prototype
[-Wstrict-prototypes]
routing.h:2:1: warning: function declaration isn't a prototype
[-Wstrict-prototypes]
routing.h:3:1: warning: function declaration isn't a prototype
[-Wstrict-prototypes]
test.c:145:14: warning: comparison between signed and unsigned integer
expressions [-Wsign-compare]
test.c:174:27: warning: function declaration isn't a prototype
[-Wstrict-prototypes]
test.c:55:16: warning: comparison between signed and unsigned integer
expressions [-Wsign-compare]
test.c:99:14: warning: comparison between signed and unsigned integer
expressions [-Wsign-compare]
test.h:5:8: warning: function declaration isn't a prototype
[-Wstrict-prototypes]
util.c:121:8: warning: ignoring return value of 'write', declared with
attribute warn_unused_result [-Wunused-result]
util.c:136:5: warning: function declaration isn't a prototype
[-Wstrict-prototypes]
util.c:142:5: warning: function declaration isn't a prototype
[-Wstrict-prototypes]
util.c:145:7: warning: ignoring return value of 'read', declared with
attribute warn_unused_result [-Wunused-result]
util.c:149:6: warning: function declaration isn't a prototype
[-Wstrict-prototypes]
util.c:90:5: warning: function declaration isn't a prototype
[-Wstrict-prototypes]
util.h:38:1: warning: function declaration isn't a prototype
[-Wstrict-prototypes]
util.h:47:1: warning: function declaration isn't a prototype
[-Wstrict-prototypes]
util.h:50:1: warning: function declaration isn't a prototype
[-Wstrict-prototypes]
util.h:52:1: warning: function declaration isn't a prototype
[-Wstrict-prototypes]
vector.c:38:9: warning: function declaration isn't a prototype
[-Wstrict-prototypes]
vector.h:15:1: warning: function declaration isn't a prototype
[-Wstrict-prototypes]

I can't spot anything obvious, but each of these are potential candiates
for issues and should be inspected.

If being hostile, one would be tempted to consider this to be an attempt
to play down known issues of sloppy programming :-)

Ralf

Re: Package segfaults when built with -O2 but not with -O0

By Paul Howarth at 12/01/2011 - 06:23

Ralf,

On 11/18/2011 05:53 PM, Ralf Corsepius wrote:
What were the "couple of more agressive options" you used, and which
distro? I've tried "-Wall -Wextra" and still can't provoke the strict
aliasing warning locally.

Paul.

Re: Package segfaults when built with -O2 but not with -O0

By Ralf Corsepius at 12/01/2011 - 10:45

On 12/01/2011 11:23 AM, Paul Howarth wrote:
This is what I did (rebased against today's git):

diff --git a/pptp.spec b/pptp.spec
index 289dd08..78e36ca 100644
--- a/pptp.spec
+++ b/pptp.spec
@@ -86,7 +86,7 @@ tunnels.
perl -pi -e 's/install -o root -m 555 pptp/install -m 755 pptp/;' Makefile

%build
-make %{?_smp_mflags} CFLAGS="-Wall %{optflags}" IP=/sbin/ip
+make %{?_smp_mflags} CFLAGS="-Wall %{optflags} -Wextra
-Wstrict-aliasing -Wnested-externs -Wstrict-prototypes" IP=/sbin/ip

%install
rm -rf %{buildroot}

Local rawhide mock.

The result I posted was extracted from a "fedpkg mockbuild"'s build.log
in a local mock on fedora-16/x86_64.

Today's version looks like this:

# grep warning: results_pptp/1.7.2/13.fc17/build.log | grep punned
pptp_gre.c:92:29: warning: dereferencing type-punned pointer might break
strict-aliasing rules [-Wstrict-aliasing]
pptp_gre.c:99:28: warning: dereferencing type-punned pointer might break
strict-aliasing rules [-Wstrict-aliasing]
pptp_callmgr.c:202:48: warning: dereferencing type-punned pointer might
break strict-aliasing rules [-Wstrict-aliasing]
pptp_callmgr.c:325:29: warning: dereferencing type-punned pointer might
break strict-aliasing rules [-Wstrict-aliasing]
pptp_callmgr.c:330:28: warning: dereferencing type-punned pointer might
break strict-aliasing rules [-Wstrict-aliasing]
pptp_callmgr.c:358:25: warning: dereferencing type-punned pointer might
break strict-aliasing rules [-Wstrict-aliasing]
pptp.c:459:33: warning: dereferencing type-punned pointer might break
strict-aliasing rules [-Wstrict-aliasing]

Ralf

Re: Package segfaults when built with -O2 but not with -O0

By Paul Howarth at 12/01/2011 - 14:11

On Thu, 01 Dec 2011 15:45:48 +0100

This is weird: I can't reproduce these either locally in mock or in
koji:

<a href="http://koji.fedoraproject.org/koji/taskinfo?taskID=3555838" title="http://koji.fedoraproject.org/koji/taskinfo?taskID=3555838">http://koji.fedoraproject.org/koji/taskinfo?taskID=3555838</a>

Paul.

Re: Package segfaults when built with -O2 but not with -O0

By Ralf Corsepius at 12/02/2011 - 00:12

On 12/01/2011 07:11 PM, Paul Howarth wrote:
... mock appends build.log's from subsequent mock runs ...

... the warnings above stem from older builds.

I had not cleaned up the build.log when "git merging" your recent
changes into my git checkout.

Ralf

Re: Package segfaults when built with -O2 but not with -O0

By Paul Howarth at 12/02/2011 - 10:08

On 12/02/2011 04:12 AM, Ralf Corsepius wrote:
Hmm, that sort of explains it - except that even if I don't apply any of
the last few patches since the F14 version, I *still* can't provoke a
strict-aliasing warning.

Paul.

Re: Package segfaults when built with -O2 but not with -O0

By Paul Howarth at 12/06/2011 - 08:22

On 12/02/2011 02:08 PM, Paul Howarth wrote:
Got it: I had to use -Wstrict-aliasing=2

Paul.

Re: Package segfaults when built with -O2 but not with -O0

By Kevin Kofler at 12/01/2011 - 16:19

Paul Howarth wrote:
-Wstrict-aliasing only works if -fstrict-aliasing is enabled, so if you're
using -fno-strict-aliasing (or an optimization level below -O2 –
-fstrict-aliasing is only enabled by default at -O2, -O3 or -Os), you won't
get the warnings.

-fno-strict-aliasing is a workaround for production builds, but it also
sweeps the warnings under the carpet.

Kevin Kofler

Re: Package segfaults when built with -O2 but not with -O0

By Paul Howarth at 12/01/2011 - 16:38

On Thu, 01 Dec 2011 21:19:14 +0100

Understood... but I'm using -O2 and not -fno-strict-aliasing:

gcc -Wall -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions
-fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic
-Wextra -Wstrict-aliasing -Wnested-externs -Wstrict-prototypes -c -o
pptp_gre.o pptp_gre.c

Paul.

Re: Package segfaults when built with -O2 but not with -O0

By Andrew Haley at 11/18/2011 - 17:01

On 11/18/2011 05:53 PM, Ralf Corsepius wrote:
Bingo! Bugs like this must be fixed.

Andrew.

Re: Package segfaults when built with -O2 but not with -O0

By Tom Lane at 11/18/2011 - 19:32

Andrew Haley < ... at redhat dot com> writes:
Sometimes that's easier said than done. -fno-strict-aliasing might be
your friend.

-fwrapv is another good tool for keeping the compiler from breaking
traditional understandings of C semantics.

regards, tom lane

Re: Package segfaults when built with -O2 but not with -O0

By Andrew Haley at 11/19/2011 - 08:23

On 11/18/2011 11:32 PM, Tom Lane wrote:
It's always easier said than done! Still, one could argue
that -fno-strict-aliasing is a fix, kinda sorta. My point is
that you've got to do *something*; this warning must not be
ignored.

Andrew.

Re: Package segfaults when built with -O2 but not with -O0

By Ralf Corsepius at 11/19/2011 - 09:08

On 11/19/2011 01:23 PM, Andrew Haley wrote:
The code in question is a classic of the GCC-aliasing cases:
sockaddr vs. sockaddr_{in,un,..} casts.

The standard work-around to the GCC warnings would be using unions,
containing fields of these types. Whether these spots actually are
affected by aliasing issues, is a different matter.

[I have seen cases of this sort, where aliasing actually had malicious
effects, but I've also seen cases it didn't.]

Ralf

Re: Package segfaults when built with -O2 but not with -O0

By Paul Howarth at 11/18/2011 - 14:58

On Fri, 18 Nov 2011 18:53:03 +0100

Hmm, thanks for that. I naively assumed that -Wall was "all warnings"!
I'll look into those.

Actually the comment that went with the original commit (a long time
ago) was that it was to make debugging easier.

Paul.

Re: Package segfaults when built with -O2 but not with -O0

By Kevin Kofler at 11/18/2011 - 20:31

Paul Howarth wrote:
Indeed, -Wall is not really all. :-) -Wall -Wextra is closer to all, but
there are still some things those won't warn about, e.g. -Wwrite-strings
catches places which use a string literal as a potentially writable char *
instead of a const char *.

Kevin Kofler

Re: Package segfaults when built with -O2 but not with -O0

By Jussi Lehtola at 11/18/2011 - 12:43

On Fri, 18 Nov 2011 16:32:23 +0000
Paul Howarth <paul@city-fan.org> wrote:
(clip)

.. but <a href="https://bugzilla.redhat.com/show_bug.cgi?id=749455" title="https://bugzilla.redhat.com/show_bug.cgi?id=749455">https://bugzilla.redhat.com/show_bug.cgi?id=749455</a> it's shown
that the upstream makefile uses "-Wall -O -Wuninitialized -g".
According to the gcc man page, -O is equal to -O1.

Re: Package segfaults when built with -O2 but not with -O0

By Paul Howarth at 11/18/2011 - 14:56

On Fri, 18 Nov 2011 18:43:30 +0200

Ah, I could have sworn it explicitly said "-O0".

More importantly, the bug reporter now tells me that the -O0 build has
now segfaulted - it just took longer to happen that way. So I'll go
back to more conventional debugging now. Thanks anyway.

Paul.