Signs of Triviality

Opinions, mostly my own, on the importance of being and other things.
[homepage]  [blog]  [jschauma@netmeister.org]  [@jschauma]  [RSS]

Uninitialized Stack Variables

November 24th, 2021

One of the benefits of using the C programming language is the significant amount of time spent on "Wtf?" only to ultimately arrive at "Oh, UB. Cool, cool." Having just spent some time debugging one of those cases and once again gone through another xkcd 979 experience, allow me to entertain you with this short tale:

While fiddling with IPC buffer sizes, I stumbled upon the seemingly bizarre scenario whereby on FreeBSD only I was unable to bind(2) a socket on ::1. Trying to reproduce the error, I arrived at the following PoC code snippet:

$ cat a.c
#include <arpa/inet.h>
#include <sys/socket.h>

#include <netinet/in.h>

#include <err.h>
#include <stdlib.h>

#define PORT 12345

int
main() {
	int sock;
	struct sockaddr_storage server;
	struct sockaddr_in6 *sin = (struct sockaddr_in6 *)&server;

	if ((sock = socket(PF_INET6, SOCK_STREAM, 0)) < 0) {
		err(EXIT_FAILURE, "socket");
		/* NOTREACHED */
	}

	if (inet_pton(PF_INET6, "::1", &(sin->sin6_addr)) != 1) {
		err(EXIT_FAILURE, "inet_pton");
		/* NOTREACHED */
	}

	sin->sin6_family = PF_INET6;
	sin->sin6_port = htons(PORT);

	if (bind(sock, (struct sockaddr *)sin, sizeof(*sin)) != 0) {
		err(EXIT_FAILURE, "bind");
		/* NOTREACHED */
	}

	return 0;
}
$ cc a.c
$ ./a.out
a.out: bind: Can't assign requested address
$ 

Failing with EADDRNOTAVAIL here surprised me, especially since using in6addr_any did work just fine (and did in fact allow connections on ::1). But things got even more confusing when, during some debugging, I ended up with a variation that did work:

$ diff -bu [ab].c
--- a.c	2021-11-24 15:28:05.269953000 +0000
+++ b.c	2021-11-24 15:27:51.813923000 +0000
@@ -11,6 +11,7 @@
 int
 main() {
 	int sock;
+	socklen_t length, s_size;
 	struct sockaddr_storage server;
 	struct sockaddr_in6 *sin = (struct sockaddr_in6 *)&server;
$ cc b.c
$ ./a.out
$ echo $?
0
$ 

So yeah... Wtf? what's different here? All we have is two extra variables, entirely unused. How does that change the behavior of the program? Well... to the debugger!

a.c b.c
$ cc -g a.c -o a
$ gdb -q ./a
Reading symbols from ./a...
(gdb) br main
Breakpoint 1 at 0x20197b: file a.c, line 15.
(gdb) run
Starting program: /usr/home/jschauma/a 

Breakpoint 1, main () at
a.c:15
15		struct sockaddr_in6 *sin = (struct sockaddr_in6 *)&server;
(gdb) n
17		if ((sock = socket(PF_INET6, SOCK_STREAM, 0)) < 0) {
(gdb) p &server
$2 = (struct sockaddr_storage *) 0x7fffffffe9b8
(gdb) p &sin
$3 = (struct sockaddr_in6 **) 0x7fffffffe9b0
(gdb)  
$ cc -g b.c -o b
$ gdb -q ./b
Reading symbols from ./b...
(gdb) br main
Breakpoint 1 at 0x20197b: file b.c, line 16.
(gdb) run
Starting program: /usr/home/jschauma/b 

Breakpoint 1, main () at
b.c:16
16		struct sockaddr_in6 *sin = (struct sockaddr_in6 *)&server;
(gdb) n
18		if ((sock = socket(PF_INET6, SOCK_STREAM, 0)) < 0) {
(gdb) p &server
$1 = (struct sockaddr_storage *) 0x7fffffffe9b0
(gdb) p &sin
$2 = (struct sockaddr_in6 **) 0x7fffffffe9a8
(gdb) 

The only difference here is the location of server and sin, which makes sense: sizeof(socklen_t) is 4 here, so both server and sin are pushed down the stack by 8. But why should that influence our program behavior?

Recall that per ISO/IEC 9899:201x, (non-static) variables on the stack are not initialized explicitly and their value thus is indeterminate, i.e., "undefined behavior". Now what does a struct sockaddr_in6 look like and what are the respective member sizes?

Per /usr/include/netinet6/in6.h:

struct sockaddr_in6 {
        uint8_t         sin6_len;       /* length of this struct */
        sa_family_t     sin6_family;    /* AF_INET6 */
        in_port_t       sin6_port;      /* Transport layer port # */
        uint32_t        sin6_flowinfo;  /* IP6 flow information */
        struct in6_addr sin6_addr;      /* IP6 address */
        uint32_t        sin6_scope_id;  /* scope zone index */
};

(gdb) p sizeof(uint8_t)
$6 = 1
(gdb) p sizeof(sa_family_t)
$7 = 1
(gdb) p sizeof(in_port_t)
$8 = 2
(gdb) p sizeof(uint32_t)
$9 = 4
(gdb) p sizeof(struct in6_addr)
$10 = 16
(gdb) p sizeof(uint32_t)
$11 = 4
(gdb) 

With that in mind, let's take a look at what this indeterminate value of sin is in either case and how those values fill in the struct:

a.c b.c
(gdb) p sizeof(*sin)
$4 = 28
(gdb) x/28xb 0x7fffffffe9b8
0x7fffffffe9b8:	0x00	0x00	0x00	0x00	 0x00	0x00	0x00	0x00
0x7fffffffe9c0:	0xc8	0xea	0xff	0xff	0xff	0x7f	0x00	0x00
0x7fffffffe9c8:	0x00	0x00	0x00	0x00	0x00	0x00	0x00	0x00
0x7fffffffe9d0:	0xb8	0xea	0xff	0xff
(gdb) p *sin
$5 = {sin6_len = 0 '\000', sin6_family =  0 '\000',
sin6_port = 0, 
  sin6_flowinfo = 0, sin6_addr = {__u6_addr = {
      __u6_addr8 = "\310\352\377\377\377\177\000\000\000\000\000\000\000\000\000",
__u6_addr16 = {60104, 65535, 32767, 0, 0, 0, 0, 0},
__u6_addr32 = {
        4294961864, 32767, 0, 0}}}, sin6_scope_id = 4294961848}
(gdb)  
(gdb) p sizeof(*sin)
$4 = 28
(gdb) x/28xb 0x7fffffffe9b0
0x7fffffffe9b0:	0x3f	0x40	0x00	0x00	0x00	0x00	0x00	0x00
0x7fffffffe9b8:	0x00	0x00	0x00	0x00	0x00	0x00	0x00	0x00
0x7fffffffe9c0:	0xc8	0xea	0xff	0xff	0xff	0x7f	0x00	0x00
0x7fffffffe9c8:	0x00	0x00	0x00	0x00
(gdb) p *sin
$5 = { sin6_len = 63 '?', sin6_family = 64 '@',
sin6_port = 0, 
  sin6_flowinfo = 0, sin6_addr = {__u6_addr = {
      __u6_addr8 = "\000\000\000\000\000\000\000\000\310\352\377\377\377\177\000",
__u6_addr16 = {0, 0, 0, 0, 60104, 65535, 32767, 0},
__u6_addr32 = {0, 0, 
        4294961864, 32767}}}, sin6_scope_id = 0}
(gdb) 

What we see here is how the location where sin happens to be located influences how it is initialized -- or rather, what the indeterminate values happen to be. By moving sin 8 bytes down on the stack, we are shifting the bytes 0xb8 0xea 0xff 0xff such that on the right side our sin->sin6_scope_id becomes 0x00 0x00 0x00 0x00 = 0, while on the left side it's (LSB) 0xb8 0xea 0xff 0xff = FFFFEAB8 = 4294961848.

In other words, since we don't explicitly set our sin->sin6_scope_id in our code, we will use 4294961848, meaning when we call bind(2), we are not trying to bind ::1, but the RFC4007 ::1%<zoneid> value ::1%4294961848. And that is why our code then fails with EADDRNOTAVAIL, since our system has no such zone id.

It just so happens that when we had additional variables on the stack, we shifted the struct sockaddr_in6 such that the sin6_scope_id was accidentally initialized as zero. So the correct solution to this problem is to properly initialize our struct:

$ diff -bu a.c fixed.c 
--- a.c	2021-11-24 16:10:28.068271000 +0000
+++ fixed.c	2021-11-24 18:13:17.008860000 +0000
@@ -5,6 +5,7 @@
 
 #include <err.h>
 #include <stdlib.h>
+#include <string.h>
 
 #define PORT 12345
 
@@ -13,6 +14,8 @@
 	int sock;
 	struct sockaddr_storage server;
 	struct sockaddr_in6 *sin = (struct sockaddr_in6 *)&server;
+
+	memset(sin, 0, sizeof(*sin));
 
 	if ((sock = socket(PF_INET6, SOCK_STREAM, 0)) < 0) {
 		err(EXIT_FAILURE, "socket");
$ cc fixed.c 
$ ./a.out
$ echo $?
0
$ 

By the way, I normally compile all code with -Wall -Werror -Wextra, which includes -Wuninitialized. But that would not have warned us here, since sin is initialized (by being pointed to &server), but server is not.

If we had moved server outside of main, then, as a global uninitialized variable, it would have been initialized by the compiler as 0 in the BSS segment, and sin would have pointed at all zero values and our program succeeded. (See this video lecture segment for more details about the memory layout of a process.) Similarly, initializing struct sockaddr_storage server = { 0 }; would accomplish the same result.

Finally, as we observe here once more, writing C leaves us (necessarily) at the whims of the compiler: FreeBSD 13.0-RELEASE happens to use clang, and gcc(1) would have failed in either of our two scenarios. So one question that arises is whether compilers should perhaps auto-initialize stack variables.

clang has a discussion around this, as does gcc, but there does not seem to be an agreed upon conclusion. Considering the possible security implications, it does seem to me that it would be a Good Thing™ to at least move away from having uninitialized variables by default and instead requiring explicit requests from the programmer (say, by way of an attribute?) that a given stack variable not be initialized. But I honestly don't know what the performance impact of this would be.

Either way, I'm going to make it a habit to memset(3) my structs going forward...

November 24th, 2021


Links:


Previous: [IPC Buffer Sizes]
[homepage]  [blog]  [jschauma@netmeister.org]  [@jschauma]  [RSS]