Thursday, July 5, 2012

wtmp timeline efforts


In DFIR activities timelines are often determinant to understand what happened (lot of refs here). Luckily Kristinn Gudjonsson provided the community with the great log2timeline tool (here, from now l2t) that, along with the invaluable Brian Carrier's SleuthKit, gives a (temporal) order to chaos. But l2t is not currently considering valuable artifacts coming from wtmp/btmp files on Linux systems.

wtmp (utmp? btmp!)


For a rapid introduction to those files let's see what wikipedia says about them: "utmp, wtmp, btmp and variants such as utmpx, wtmpx and btmpx are files on Unix-like systems that keeps track of all logins and logouts to the system. The utmp file keeps track of the current login state of each user. The wtmp file records all logins and logouts history. The btmp file records failed login attempts. The utmp, wtmp and btmp files were never a part of any official Unix standard, such as Single UNIX Specification, while utmpx and corresponding APIs are part of it". Here we are.

Having included those information in the timeline of a (for example) compromised Linux server could help a lot in answering the who part of the canonical DFIR questions. Moreover keeping tracks on every registered login/logout of who is currently logged should be really useful. Indeed, it's preferable to have a quite verbose timeline to prune/filter during analysis than not having this logs included (and be obliged to manually correlate the various outputs, when spotted).
 

format


wtmp, btmp and utmp (not considered here, but should be considered during live/memory analysis) share a common format. Since the target OS is Linux, the format is found inside wtmp.h include file or, easier, in the utmp(5) man. An excerpt is following: 

           /* Values for ut_type field, below */

           #define EMPTY         0 /* Record does not contain valid info
                                      (formerly known as UT_UNKNOWN on Linux) */
           #define RUN_LVL       1 /* Change in system run-level (see
                                      init(8)) */
           #define BOOT_TIME     2 /* Time of system boot (in ut_tv) */
           #define NEW_TIME      3 /* Time after system clock change
                                      (in ut_tv) */
           #define OLD_TIME      4 /* Time before system clock change
                                      (in ut_tv) */
           #define INIT_PROCESS  5 /* Process spawned by init(8) */
           #define LOGIN_PROCESS 6 /* Session leader process for user login */
           #define USER_PROCESS  7 /* Normal process */
           #define DEAD_PROCESS  8 /* Terminated process */
           #define ACCOUNTING    9 /* Not implemented */

           #define UT_LINESIZE      32
           #define UT_NAMESIZE      32
           #define UT_HOSTSIZE     256

           struct exit_status {              /* Type for ut_exit, below */
               short int e_termination;      /* Process termination status */
               short int e_exit;             /* Process exit status */
           };

           struct utmp {
               short   ut_type;              /* Type of record */
               pid_t   ut_pid;               /* PID of login process */
               char    ut_line[UT_LINESIZE]; /* Device name of tty - "/dev/" */
               char    ut_id[4];             /* Terminal name suffix,
                                                or inittab(5) ID */
               char    ut_user[UT_NAMESIZE]; /* Username */
               char    ut_host[UT_HOSTSIZE]; /* Hostname for remote login, or
                                                kernel version for run-level
                                                messages */
               struct  exit_status ut_exit;  /* Exit status of a process
                                                marked as DEAD_PROCESS; not
                                                used by Linux init(8) */
               /* The ut_session and ut_tv fields must be the same size when
                  compiled 32- and 64-bit.  This allows data files and shared
                  memory to be shared between 32- and 64-bit applications. */

           #if __WORDSIZE == 64 && defined __WORDSIZE_COMPAT32
               int32_t ut_session;           /* Session ID (getsid(2)),
                                                used for windowing */
               struct {
                   int32_t tv_sec;           /* Seconds */
                   int32_t tv_usec;          /* Microseconds */
               } ut_tv;                      /* Time entry was made */
           #else
                long   ut_session;           /* Session ID */
                struct timeval ut_tv;        /* Time entry was made */
           #endif

               int32_t ut_addr_v6[4];        /* Internet address of remote
                                                host; IPv4 address uses
                                                just ut_addr_v6[0] */
               char __unused[20];            /* Reserved for future use */
           };


Despite semantic it should be easy to parse the data from files: but when you go delving into something what often happens is that few things are straightforward...


Alignment oddity?


The following is an example of the first entry in a wtmp file, where an entry is an instance of struct utmp.


00000000  02 00 00 00 00 00 00 00  7e 00 00 00 00 00 00 00  |........~.......|
00000010  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000020  00 00 00 00 00 00 00 00  7e 7e 00 00 72 65 62 6f  |........~~..rebo|
00000030  6f 74 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |ot..............|
00000040  00 00 00 00 00 00 00 00  00 00 00 00 32 2e 36 2e  |............2.6.|
00000050  6e 6f 73 79 20 3b 29
2e  78 38 36 5f 36 34 00 00  |36.fuffa.x86_64.|
00000060 
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000070  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000080  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000090  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
000000a0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
000000b0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
000000c0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
000000d0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
000000e0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
000000f0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000100  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000110  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000120  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000130  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000140  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000150  00 00 00 00 85 a8 d6 4e  a9 f2 09 00 00 00 00 00  |.......N........|
00000160  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000170  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|

Possibly something weird "happened" but when summing sizeof of every "struct utmp" field the result was 382 (bytes), instead of 384 as it's easy guessed from hex-viewing. So there is some alignment which currently is not obvious to me. Recalling the first lines from the previous example

00000000  02 00 00 00 00 00 00 00  7e 00 00 00 00 00 00 00  |........~.......|
00000010  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000020  00 00 00 00 00 00 00 00  7e 7e 00 00 72 65 62 6f  |........~~..rebo|

the ut_line string (char[32]) must start at ninth byte, but we have two "unknown" preceding bytes. Actually I did not clarify this issue (more testing needed, I rely on the community or in more free-time to come...), but I considered ut_type to be 4 bytes long and so the first line become

00000000  02 00 00 00 00 00 00 00  7e 00 00 00 00 00 00 00  |........~.......|

with ut_type=2 and ut_pid = 0. You are advised that this cut could be wrong, anyway on my cases I didn't face any problem when getting ut_types.


going further


Having all fields parsed from file, it was time to extract and summarize the login/logout entries. If you think that wtmp entries contain a simple (almost) human readable information, you're wrong: or, probably better, when you known how-to, well, it's simple. To achieve the result another read of  wtmp man is needed, with the aid of the last Linux command implementation (one here). The following C code belongs to last.c:


    for(i = listnr - 1; i >= 0; i--) {
        bp = utl+i;
        /*
         * if the terminal line is '~', the machine stopped.
         * see utmp(5) for more info.
         */
        if (!strncmp(bp->ut_line, "~", LMAX)) {
            /*
             * utmp(5) also mentions that the user
             * name should be 'shutdown' or 'reboot'.
             * Not checking the name causes e.g. runlevel
             * changes to be displayed as 'crash'. -thaele
             */
            if (!strncmp(bp->ut_user, "reboot", NMAX) ||
            !strncmp(bp->ut_user, "shutdown", NMAX)) {   
            /* everybody just logged out */
            for (T = ttylist; T; T = T->next)
                T->logout = -bp->ut_time;
            }

            currentout = -bp->ut_time;
            crmsg = (strncmp(bp->ut_name, "shutdown", NMAX)
                ? "crash" : "down ");
            if (!bp->ut_name[0])
            (void)strcpy(bp->ut_name, "reboot");
            if (want(bp, NO)) {
            ct = utmp_ctime(bp);
            if(bp->ut_type != LOGIN_PROCESS) {
                print_partial_line(bp);
                putchar('\n');
            }
            if (maxrec && !--maxrec)
                return;
            }
            continue;
            ...



I did not followed exactly what found in last.c, but I used it to correlate steps: in other words I made some tests and observations to write the code, then I checked results with last output. From my point of view this was more formative than a rough language porting. I made a switch statement on ut_types looking for: "User Process", which is a user login on the ut_line from the ut_host specified; "End Process", which is a user logout if the corresponding ut_line registered a login; "Run Level", which could be a shutdown; "Boot Time", which could be a rebooting. It could happens that wtmp registers a booting without a precedent shutdown (a crash?): in this cases the script PURGEs the still-logged-in users, so you cannot get the assurance that the users' work times are right (they should be smaller). Regarding btmp files, they are simpler since they contain only one entry type, which represents a failed login.


scripting... (and timeline)


I do not want to stuck with technical details since the Perl script utmpp.pl is open source and can be downloaded from hotoloti (download here). What makes this script different from the last tool is that it's able to generate a Sleuthkit mactime v3.x timeline of logins and logouts (the so called body file), and that body file can be added to other body files (like that one coming from Sleuthkit fls) to get a more complete timeline of events. Moreover, the script shows not only logins/logouts but who is currenlty logged at that event time, example following (not in mactime output format):



type = [0x0007] User Process
pid = 5192 [0x1448]
line = pts/7
user = root
host = foo.it
tv_sec  = 1335961904 (Wed May  2 12:31:44 2012)
tv_usec = 780918
ut_addr_v6 string = 157574295
ut_addr_v6 IPV4 = 192.168.200.10
NOTE = LOGIN  ( user=root@foo.it logged in on line=pts/7 now=root@foo.it_pts/6 gino@:1101.0_pts/0 root@foo.it_pts/7 root@:1100.0_pts/3 )


I made out a template for X-Ways Forensics too ("a template is a dialog box that provides means for editing custom data structures in a more comfortable and error-preventing way than raw hex editing does, info here), even if this great DFIR (more-than-a) tool has the capability to understand and parse wtmp file.


Log2Timeline (I forgot selinux)


I felt that the utmpp.pl script was too isolated and too unmanageable, so I wondered if it could be useful to expand the effort to get something more "shareable" and useful. What if not l2t? I wrote to Kristinn Gudjonsson to ask if it could be useful having such script included: moments after my email, Kristinn provided suggestions and instructions on what to do. Results: Log2Timeline has a new input module called utmp (Fast&Furious collaboration)! Actually the script is hosted in the experimental branch and it's subject to testing and revision. Feel free to download, test and send feedback.

I forgot selinux (wiki helps here). Selinux creates audit logs like "/var/log/audit/audit.log" which were not included in l2t input modules. They are quite simple to parse, so I wrote another l2t input module called (guess what?!) selinux: another useful source to be included in timelines. Again, it's hosted in the experimental branch of Log2Timeline.

thanks


This post and those scripts born from a compromised Linux server with EXT4 file system case. During the analysis I experienced how many benefits could come out from DFIR sharing. Without Log2Timeline and the Sleuthkit it would have been much more harder to get the job done. Moreover I want to thanks Simson Garfinkel and Kevin Fairbanks for the Sleuthkit version that includes EXT4 support (originally made by Willi Ballenthin here) and for the prompt support when facing a fls body file issue (one more field with respect to mactime v3.x format, be sure to download the last Ext4_Dev branch). Finally, I learned a lot from the Sans blog EXT4 series written by Hal Pomeranz, first part here. Following these great examples and trying to "repay", here there is something hopefully useful to the community.

2 comments:

  1. pid_t in GNU libc is an int afaik, so 32 bits is to be expected (but I'm too lazy to check).
    in any case beware that assuming a bit size for a datatype is not always going to be correct.
    C90/C99 do not mandate sizes, only requiring sizeof(short)<=sizeo(int)<=sizeof(long) (C99 extends with long long).
    Also C++ only requires a data types to be able to contain a numbers within certain ranges; for shorts the range is [-32768,+32767] but the underlying representation could very well be 32 bits.
    The only guarantee you have is that sizeof(char)==1.
    Bottom line: it all depends on the compiler :-)

    ReplyDelete
    Replies
    1. hi bro, a long time...!
      Sorry for the lag, I had a connection break ;)
      Thanks for commenting and pointing out references. You're right when warning about assumptions: unfortunately in DFIR it's common to analyze files coming from different OSes versions, without having the possibility to "power on" the system from which files come out. When a file format it's not precisely defined it's possible to get wrong info from wrong parsing. In this cases correlation it's needed, for example by using different tools to strengthen results.

      Delete