Check out the repo!
Project 1: Shell
Joey Maalouf, Apurva Raman, Sean Carter
Abstract
For this project, we aimed to create a functional shell that would be able to run on a Linux system and accept the typical commands. Our implementation allows for many of the basic features that a user might expect, such as an interactive terminal mode, a batch file mode, configuration files, and aliases.
Background
Most of our research and understanding comes from the bash shell that is used by default in most Linux distributions. While trying to identify essential features, we referred to the bash reference manual provided by gnu.org. We were also inspired by a number of assignments used at other colleges, such as this one from Purdue University and this one from the University of Wisconsin-Madison.
Implementation
Specifically, we wrote a program that can parse user input into tokens and search the filesystem for binaries matching the commands, running them with any supplied arguments and parameters. In between, it does things like check for custom "meta"-cases (like setting an alias or custom prompt), as well as un-alias any custom tokens into their original counterparts. It pulls its input from standard input in interactive mode, or a supplied file (or multiple) in batch mode.
Parsing
We identify commands by splitting input arguments into an array of character arrays, parsing the input one character at a time. Each character array is a single token (either the name of a command to run, or an argument for a previously-given command), so the array of arrays represents a complete command. Once the parser reaches a character that indicates the end of a command (e.g. ';'), it returns a flag that tells the rest of the code that it's time for execution (parsing.c L38).
One of the consequences of parsing a single character at a time is that we had to keep track of a lot of state information, such as where the next character would go in the args array, or whether the character was in a comment or quote block. To hold all of this state information, we used a struct:
typedef struct State {
bool in_comment;
bool in_quote;
bool in_whitespace;
int i_cmd;
int i_char;
} State;
This strategy makes it easier to add new text parsing features, should any arise; all of the parsing features have been simplified into logic based on their boolean flags, so new ones would just slightly change decisions about when to write into the array and increment the command pointer. This is a useful improvement from our first state tracker, which was a large collection of static variables at the start of the function.
Execution
Execution is relatively simple; after properly parsing the arguments and checking for special cases (shell.c L81), we fork a new process and execute the given array of strings as a command (shell.c L29).
Aliases
Aliases are created as a set of key-value pairs with the aliases as the keys and the original commands and any additional arguments as the values (alias.c L11). For each argument passed into the program, we check for any matching aliases (alias.c L41) and replace them with their corresponding values. This is a linear search through the alias array, so it would be inefficient for large numbers of aliases, but we do not expect a given user to have enough aliases defined to have a significant impact on the speed. If an alias maps to a value with multiple tokens, additional space is allocated (alias.c L49), all of the other arguments are shifted over in the array to make room for the additional tokens, and the alias is replaced (alias.c L61).
Memory
All of the memory for the various arrays and structs is dynamically allocated (memory.c L13) and freed (memory.c L26), with the exception of the initial input buffer. This allows us to minimize wasted space in memory, at the expense of some processing overhead. Initially, we actually allocated everything statically, for ease of use. However, once we were comfortable with using the malloc function family, we were able to refactor the project to allocate memory dynamically instead (but not without some difficulty). Debugging segfaults can be unusually frustrating, since there's no meaningful error message other than "something related to memory went wrong".
Results
The first of our videos demonstrates usage of the .shellrc file, interactive mode, and aliases.
Our second video shows how to use the batch file mode.