Talk:Ioctl

This is the talk page for discussing improvements to the Ioctl article.
This is not a forum for general discussion of the article's subject.

Put new text under old text. Click here to start a new topic.
New to Wikipedia? Welcome! Learn to edit; get help.

Article policies

Find sources: Google (books · news · scholar · free images · WP refs) · FENS · JSTOR · TWL

2007–2009

I think this is a fantastic article but may have to rewritten more for the general audience. I sorry I get it but some may not.

I'm not married to it; what was there before was inaccurate. What do you think it needs? --- tqbf 03:37, 20 November 2007 (UTC)[reply]

Hi, please add some examples of ioctl function calls. It would be nice to see the inputs and outputs. Thanks! —Preceding unsigned comment added by 206.114.9.2 (talk) 15:28, 21 October 2009 (UTC)[reply]

Unfortunately, for experts it is really hard to determine, what bits of knowledge a general audience would lack. As they are so natural to us, we can’t see them anymore. So: The better you can describe all the points where you got stuck, and the questions unanswered (or even if you didn’t even know what to ask there), the better we can improve the article. :) — 2A0A:A546:3214:1:13C4:FA28:421:E907 (talk) 14:17, 28 November 2024 (UTC)[reply]

References

User:Widefox recently tagged this article with {{refimprove}}. I've just added a list of references to the article, but they don't cover the whole article. Actually, I suspect that there many not be any WP:Reliable Sources for some of the claims, even they they are common knowledge amongst experienced Unix programmers.

I did not mention The Art of Unix Programming by Eric S. Raymond, which has a subsection titled "ioctl(2) and fcntl(2) Are an Embarrassment" (in chapter 20, online here). Should we use that in the article? If so, how? Does anyone have more sources? Cheers, CWC 11:34, 8 March 2010 (UTC)[reply]

If ioctls are an embarassment, then netlink is an eyesore from hell. :) … Thing is: It’s all the result of refusing to just have a real microkernel and a clean general higly emergent interface like Plan 9 had. Even though the speed problems are well solved, as QNX proved, a long time ago. … I would like if the article would fully reflect that. There are too many people nowadays who grew up never having been taught any better, inventing various solutions that all are inferior (and not even faster anymore) to the ancient clean design. :) We would benefit the new kids of today, by teaching that from the beginning. — 2A0A:A546:3214:1:13C4:FA28:421:E907 (talk) 14:25, 28 November 2024 (UTC)[reply]

Complexity of ioctl()

In the complexity paragraph it is said that one needs a "tangled mess of ioctl()s" to get an IP. That's BS. You only need one. Like so: ioctl(int sockfd, SIOCGIFADDR, struct ifreq *req); And voila, ((struct sockaddr_in *)&req.ifr_addr)->sin_addr yields the IP. Seriously, ioctl()s aren't complex, they are very easy to use and understand. It's only that there are hundreds of request codes(see /usr/include/linux/sockios.h on a Linux machine for a reference).

Maybe some example program should be shown.

That program below will output the IPv4 of the device 'lo' (local loopback on Linux, called lo0 on *BSD), which is of cource 127.0.0.1. Substitute lo in the macro definition with any device you want the IPv4 address from. The program will compile cleanly and can be run without rootr ights on Linux(and probably every other UNIX-like OS).

#include <sys/ioctl.h>  /* ioctl() */
#include <sys/socket.h> /* socket types, address families */
#include <net/if.h>     /* struct ifreq */
#include <arpa/inet.h>  /* inet_ntop() */
#include <netinet/in.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h> /* close() */

#define DEVICE "lo"

int main()
{
    int sockfd; /* file descriptor returned by socket(), it must be passed to ioctl() */
    char buf[INET_ADDRSTRLEN]; /* buffer for inet_ntop() */
    struct ifreq req = { 0 }; /* initialize to everything 0/NULL(better then memset()) */
    struct in_addr ip;

    strncpy(req.ifr_name, DEVICE, IF_NAMESIZE); /* logically, the name of the device must be passed to ioctl() */

    sockfd = socket(AF_INET, SOCK_DGRAM, 0); /* SOCK_DGRAM or SOCK_RAW .... doesn't matter */
    if (sockfd == -1) {
        perror("Error: socket()");
        exit(EXIT_FAILURE);
    }

    if (ioctl(sockfd, SIOCGIFADDR, &req) == -1) { /* SIOCGIFADDR: gets us the address(see netdevice(7) on Linux) */
        perror("Error: ioctl()");
        close(sockfd);
        exit(EXIT_FAILURE);
    }

    close(sockfd);

    ip = ((struct sockaddr_in *)&req.ifr_addr)->sin_addr; /* ifr_addr is just a generic sockaddr structure; must be cast to sockaddr_in to get IPV4 */

    printf("IP for device " DEVICE ": %s\n", inet_ntop(AF_INET, &ip, buf, INET_ADDRSTRLEN));

    return EXIT_SUCCESS;
}

Maybe that code, with (now added) commentary, could make it as an example of ioctl() usage?

80.226.24.13 (talk) 21:47, 25 January 2013 (UTC)[reply]

I fixed the paragraph. You are correct. I just looked into this for the past week, and: Nobody calls ioctls directly. At least no normal developer (except for driver developers) should. There’s pretty much always a nice library with very normal function calls that wrap them, the same way that libc wraps syscalls. I also added Mesa as a large (Linux) example. (There is no accelerated X or Wayland without Mesa. I literally just wrote my own “display server” using Mesa, that doesn’t require either X nor Wayland. Evidence can’t get much harder than that. :) — 2A0A:A546:3214:1:13C4:FA28:421:E907 (talk) 14:13, 28 November 2024 (UTC)[reply]

Request for sources…

Hi, I improved the “Security” section. But it implied that ioctls are less heavily audited than syscalls. I have kept that implication, as the given arguments are good, and removing it would imply it is false, which is just as much problematic. But I feel uneasy with just having the statement “In practice, this is not the case.” based merely on arguments. It would be better if we had some actual sources. (E.g. auditing reports.)
Can somebody who is closer to the auditing industry add some? They should be easy to find if one deals with this on a professional basis.
— 2A0A:A546:3214:1:13C4:FA28:421:E907 (talk) 14:09, 28 November 2024 (UTC)[reply]

Some questions about the Security section

It says

In traditional design, kernels resided in ring 0, separated from device drivers in ring 1, and in microkernels, also from each other. This has largely been given up due adding the same overhead of transitioning between rings to driver/kernel interfaces, that syscalls impose on kernel/user space interfaces. This has led to the difficult-in-practice requirement that all drivers, which now reside in ring 0 as well, must uphold the same level of security as the kernel core.

Which "traditional design" is that?

The first OS to provide protection rings was Multics, which originally implemented them in software on the GE 645, and later used the ring-protection hardware feature added to the Honeywell 6180 and continued with its successors. This 1968 memorandum speaks of "a ring 1 procedure, such as those in user control or in the 1/0 system", and this 1985 memorandum speaks of "ring 1 TCB subsystems such as RCP", but also speaks of "ring 0 file system primitives". ("TCB" presumably refers to the trusted computing base in this context.)

I'm not sure what ended up running in ring 1 in the GE 645 Multics, as ring crossings were somewhat expensive, and ring 1 functions, called from a user ring such as ring 4, that required ring 0 calls would increase the number of ring crossings over the number of ring crossings requiring if the function were implemented in ring 0. More was done in ring 1 on the 6180, as ring crossings were cheaper (implemented in hardware rather than in traps to low-level system code), but I don't know whether device drivers were run in ring 1 or not.

Procedure calls through a segment marked as a "call gate" could transfer to a lower ring from a higher ring; the call gate segment would allow transfers only to selected addresses, avoiding the risk of jumping to an arbitrary location in lower-ring code. Returns from lower-ring code to higher-ring code were also permitted, but not returns from higher-ring code to lower-ring code. References to call gates in Multics on the GE 645 would cause a trap to lower-level code that would check permissions and switch segment tables so as to give the code being jumped to more permission than the calling code; similarly, a return from lower-ring code would trap to low-level code that would switch the segment tables back. The 6180 and successors performed the privilege checks in hardware.

Some other hardware provided multiple privilege levels, either in the form of numbered rings or multiple privilege modes. Some provided only two such modes, e.g. the IBM System/360 and its successors ("supervisor state" and "problem state"), the PDP-6/PDP-10 ("monitor mode" and "user mode"), the GE-600 series ("master mode" and "slave mode"), etc.. The PDP-11/45 had three modes - "kernel mode", "supervisor mode". and "user mode"; some later models had all three modes, other models had only two (they omitted "supervisor mode"), and some low-end machines had only one fully-privileged mode. The VAX had four modes ("kernel mode", "executive mode", "supervisor mode", and "user mode"). The machines with multiple privileged modes generally transitioned between a less-privileged mode and a more-privileged mode with a trap instructions, and returned from a mode-privileged mode to a less-privileged mode with a return-from-trap instruction.

The Data General MV series had 8 protection rings; the 4GB address space was divided into 8 segments, each of which belonged to a ring, and the upper 3 bits of the PC indicated in which ring code was running. A procedure call to an address with a lower value for the upper 8 bits caused a ring crossing; the call had to go through a "gate array" (unrelated to a hardware "gate array"), so only a selected set of addresses could be jumped to, and the gate array can also limit which rings can go through a particular gate. This is similar to how ring crossings worked in Multics. Subroutine returns could return to the same or higher ring, but not to a lower ring.

The 80286 and beyond have four rings, and also use call gates to support calls from higher rings to lower rings. They require segmentation to do so.

Unix, on PDP-11s preceding the 11/45, ran on hardware that didn't support even two privilege levels. The PDP-11/45 version of Unix only used two levels, kernel and user; it didn't support supervisor mode. (MERT did; it ran a smaller kernel in kernel mode, with device drivers included. Supervisor mode ran higher-level operating-system code, such as code that provided a Unix-compatible interface, and application code ran in user mode.) That Unix was later ported to the PDP-11/40, which had only kernel and user mode, so not using supervisor mode was an advantage for that machine and later PDP-11s that had a memory management unit and support for both kernel and user mode - required for regular Unix - but lacked supervisor mode.

VAX/VMS used all four modes on the VAX. As far as I know, all drivers run in kernel mode; the Record Management Services (RMS, no relation to the other RMS) run in executive mode, and send QIOs to kernel-mode device and file-system drivers to perform device and file system operations. The command language interpreter runs in supervisor mode, with applications running in user mode.

VAX UNIXes use only kernel and user mode.

On x86 processors, neither Unixes nor Windows NT, as far as I know, use rings other than 0 and 3. For Unix, this stems from having originally been developed to use only two privilege modes, and later being developed as a portable operating system in a world in which many target machines had only two privilege modes, so it uses ring 0 as kernel mode and ring 3 as user mode. NT was similar, except that it started from Day One as a portable OS in a word in which many target machines had only two privilege modes. Perhaps OS/2 used ring 1 for drivers; id any other operating systems do so?

I.e., for whatever reason, the first Unix that supported running different code with different privilege levels used only "maximum privilege" and "minimum privilege" mode for some reason, which may or may not have had anything to do with the cost of privilege level transitions. To what extent that caused processor developers to build in only two privilege levels, and to what extent that was done for other reasons, reinforcing the Unix behavior, is another matter; somebody's have to provide some references for claims about that, so it's not just a matter of original research.

Note also that "the kernel core" (for both most Unixes and NT) includes file system code and networking code, so that code also needs to "uphold the same level of security as the kernel core." However, that code is entered through a common system call interface for multiple file systems and networking stacks, so some of the work is done in common code rather than having to be done correctly in each file system or networking stack.

A limited amount of that is done, on some Un*xes, at the ioctl system call level rather than at the device driver level; systems that adopted the BSD scheme, in which ioctl codes include not only two bytes to specify the particular ioctl operation, but two more bytes indicating whether data must be copied from userland, to userland, or both, and how much data should be copied, can validate access to the raw bytes of the argument, and copy code for the ioctl handler. However, what might be copied is a structure containing further pointers, and the code to copy in the targets of those pointers must reside in the driver itself, so the generality of ioctl is a disadvantage there. In addition, not all Un*xes have adopted that scheme.

NT inherits much of its I/O system structure from earlier operating systems with Dave Cutler as a or the senior designer. In RSX-11M and VMS, all I/O is done by "QIO" calls ("queue I/O"); this includes file system operations such as open, create, rename, delete, read, write, etc.. I'm not sure how much validation can be done by common code; at least in VMS, I think many of those QIO operations can only be done from executive mode (where RMS runs), so they may already be validated. In NT, I think most of those operations are done through system calls (such as NtCreateFile(), NtReadFile(), {{mono|NtWriteFile()}, so validation can be done there; it might be harder for DeviceIOControl() if a buffer handed to it as input can contain pointers - it may be that doing so is considered Bad Form. Guy Harris (talk) 10:47, 26 December 2024 (UTC)[reply]