As beautiful as a shell🐚
This project was all about recreating your very own (mini)shell, taking bash as reference. This was a pair-coding project, and I was honored to do it with @jnsh
The shell we are implementing works as a command interpreter communicating with the OS kernel in a secure way, and allows us to perform tasks from a command line, such as control processes and files, start and control other programs. The shell manages the interaction between you and the operating system by prompting you for input, interpreting that input for the operating system, and then handling any resulting output from the operating system.
As the challenge is to recreate a simple version of bash, a commonly used GNU shell, we started by learning how bash actually works with the help of bash manual and other resources listed under References.
The general idea of Minishell is to read input from the user in a prompt using readline, the input is then proceeded through 4 steps: lexer → parser → expander → executor.
This is as simple as:
input = readline("prompt");
add_history(input);
The line readline returns is allocated with malloc(); you should free() the line when you are done with it. The line returned has the final newline removed, so only the text remains.
If readline encounters an EOF (Ctrl + D) while reading the line, and the line is empty at that point, then (char *)NULL is returned. Otherwise, the line is ended just as if a newline \n had been typed.
A zero-length input shoud not be added to history.
The history can be freed calling void rl_clear_history (void).
In a shell, a lexer (short for "lexical analyzer") is a component of the shell's parser that breaks down a user's input into a series of tokens or lexemes.
The lexer takes the user's input, which is typically a string of characters, and breaks it down into a series of smaller pieces or tokens that are more easily processed by the shell's parser. For example, the lexer may break down the input "echo "hello $USER " > file | grep h | cat << eof | cat >> file | echo 'done $USER'" into the following tokens:
echo: a command that prints its arguments to the standard output.
"hello $USER ": a double-quoted string that contains the string "hello " and the value of the environment variable $USER.
>: a redirection operator that redirects the standard output of the previous command to a file named file.
|: a pipe operator that connects the standard output of the previous command to the standard input of the next command.
grep: a command that searches for a pattern in its input and prints the matching lines to its standard output.
h: a string that represents the pattern to search for.
|: a pipe operator that connects the standard output of the previous command to the standard input of the next command.
cat: a command that concatenates files and prints the result to its standard output.
<<: a here-document operator that allows the input of multiple lines until a delimiter (eof in this case) is encountered.
eof: the delimiter for the here-document.
|: a pipe operator that connects the standard output of the previous command to the standard input of the next command.
cat: a command that concatenates files and prints the result to its standard output.
>>: a redirection operator that appends the standard output of the previous command to the end of the file named file.
|: a pipe operator that connects the standard output of the previous command to the standard input of the next command.
echo: a command that prints its arguments to the standard output.
'done $USER': a single-quoted string that contains the string 'done $USER', the environment variable $USER doesn't expand within ''.
Each token corresponds to a distinct component of the user's command, such as the command name, arguments, and options. We split the input string into a linked list of tokens.
Expanding environment variables with $ followed by characters.
$? expands to the exit status of the most recently executed foreground pipeline.
The parser is responsible for analyzing the command-line input and creating a data structure that represents the syntax of the input. For us, parsing is the process of turning the list of tokens into a linked list of command nodes. Each command node is a struct that consists data to pass to the executor.
- Iterate through the token list
- Whenever a redirection is found, check the type of redirection and retrieve the file descriptor(s) containing the info we need as the infile and outfile. Heredoc
<<, if encountered, is also handled at this step by - If a pipe is found, set up the current process's read- and write fd to the corresponding end of the pipe. Then, add a new node to the list of commands.
- When tokens type word or type var are found, add them to the argument list (argv)
For the above example "echo "hello $USER " > file | grep h | cat << eof | cat >> file | echo 'done $USER'", command linked list could be like:
cmds:
cmd 0:
pathname: echo
argv: {echo, hello expandedUser, NULL}
read_fd: STDIN_FILENO
write_fd: pipe0[1]
cmd 1:
pathname: grep
argv: {grep, h, NULL}
read_fd: pipe0[0] (read output of previous command)
write_fd: pipe1[1]
cmd 2:
pathname: cat
argv: {cat, NULL}
read_fd: pipe1[0]
write_fd: pipe2[1]
cmd 3:
pathname: cat
argv: {cat, NULL}
read_fd: pipe2[0]
write_fd: pipe3[1]
cmd 4:
pathname: echo
argv: {echo, done expandedUser, NULL}
read_fd: pipe3[0]
write_fd: STDOUT_FILENO
With all our data properly on our structs, the executor has all the necessary information to execute commands. For this part we use separate processess to execute either our builtins or other commands inside child processes that redirect stdin and stdout just like on pipex.
In cases of error, we must return the correct exit code.
echoprints arguments followed by newline, with option-nprints arguments without a newlinecdchanges current directory to relative or absolute path passed as an argumentpwdprints full path of the current directoryexportsets an environment variable passed as an argument, with no arguments it usually prints list of exported variables similar toenvunsetremove environment variable passed as an argumentenvprints list of exported environment variablesexitexits the shell, it accepts number between 0-255 as an ergument that specifies the exit status code.
The Ctrl-C and Ctrl-\ key inputs can be detected by listening to SIGINT and SIGQUIT signals from terminal. Detecting Ctrl-D is to simply check if input is NULL.
Ctrl-C should display empty prompt in a new line
Ctrl-D exits the shell
Ctrl-\ does nothing
Ctrl-C sends SIGINT to the subprocess, exit code should be 130
Ctrl-D sends EOF marker to the subprocess (there is no UNIX signal for this, but sending EOT character to the subprocess via STDIN seems to be enough: https://askubuntu.com/questions/724990/what-is-eof-and-how-to-trigger-it)
Ctrl-\ sends SIGQUIT to the subprocess, exit code should be 131
git clone https://github.com/linhtng/minishell.git
cd minishell
Note for MacOS: to install a recent version of readline, you need to use homebrew
brew install readline
Run minishell
make
./minishell
As this project is made to mimic bash, you can try any commands you normally would try in bash, except for the ones using ";" or "".
This is my first pair-coding project, which for me was an amazing experience. What I think we did well was dividing up the different parts of the project while also supporting each other when we ran into issues, and we documented our progress after each working day. I learnt a lot about collaborating using Github and communication with teammate in general. IMO, the hardest part of this project was handling all the edge cases, which keep coming to you once you closed your eyes ready to go to bed 😂 It took us 2 retries to pass this project, but we certainly learn a lot while having fun and achieving shell-growth! I mean self-growth! Ok, I'll see myself out.

