The Minishell project is a comprehensive and robust implementation of a custom command-line interpreter (shell), designed to meticulously emulate the core functionality and sophisticated behavioral intricacies of established Unix shells like Bash and Zsh.
Our goal was not merely to execute commands, but to engineer a pipeline capable of managing complex shell grammar, environment state, and process isolation with high precision.
Poject made by:
David Diaz: @david-dbd / "davdiaz-" (42 login)
Mikel Garrido: @lordmikkel / "migarrid" (42 login)
The core design philosophy is centered on Architectural Clarity and Contextual Integrity.
This shell is built on the classic Read-Eval-Print Loop (REPL) structure, processing user input through distinct, heavily validated phases to ensure that execution logic is completely decoupled from parsing and interpretation logic.
- Command correction for builtins ->
echa: did you mean 'echo'? - Line for unbalanced prompts ->
echo hello && - Script execution
- Local, Temporal and exported ASIGNATIONS and sync between them and the shell ->
var=1 ls - var=1 - export var=1 - Wildcards ->
*hello - hello* - h*ell*o - Accepts commands on both uppercase and lowercase ->
ECHO/echo/EchO/eChOall work - Subshells
- Operands '&' and ';'
- UX for user experiecnce
- Reddirs and HEREDOC
- Expansions:
env variables - tilde - tilde plus - tilde minus - History
- Easy to read code and structure
-
🛡️ Phase 3: Expansion Phase – Dynamic Substitution and Contextual Integrity
-
🎯 Phase 8: Assignment Engine – Scope, Persistence, and State Management
-
🧬 Phase 10: Minishell Data Model – Core Structures and Types
The Initialization phase establishes the essential foundation for the shell's operation, ensuring proper resource management, environmental context, and responsiveness to system events.
The main function starts by declaring and initializing the t_shell data structure. This structure serves as the single source of truth for the entire program, holding links and data required across all execution stages:
- Execution Context: Stores the current state of the process, including file descriptors, exit codes (
data.exit_code), and the execution status. - Prompt/Token Link: Contains the
t_promptstructure, which manages the user input and the dynamic array of tokens for the current cycle. - Environment: Holds the parsed environment variables (copied from
envp) in a manageable structure, enabling internal commands likeexportandunsetto modify the shell's environment. - AST Root: A pointer to the root of the Abstract Syntax Tree (
data.ast_root), which is populated after tokenization and parsing.
The core shell logic is managed by an infinite Read-Eval-Print Loop (REPL) implemented in main.c:
int main(int argc, char **argv, char **envp)
{
t_shell data;
init_minishell(&data, argc, argv, envp);
while (receive_input(&data, &data.prompt) != NULL)
{
if (!tokenizer(&data, &data.prompt, data.prompt.input))
continue ;
ast_builder(&data, data.prompt.tokens, data.prompt.n_tokens);
executor_recursive(&data, data.ast_root, &data.exec, FATHER);
clean_cycle(&data.exec, &data.prompt, &data.ast_root);
}
exit_succes(&data, MSG_GOODBYE, data.exit_code);
return (data.exit_code);
}- Input Acquisition: Instead of standard
readline, the custom libraryiscolineis employed for reading user input. This decision was made to avoid known memory leaks and potential malfunctions associated with certainreadlineimplementations, ensuring robustness. - Error Handling: If the
tokenizerfunction detects a syntax error, it returnsSYNTAX_ERROR(orFAILURE), causing theif (!tokenizer(...))condition to fail. Thecontinuestatement immediately skips the AST construction, execution, and cleanup for the current cycle, returning control to the start of thewhileloop to await new input. - Cycle Cleanup: The
clean_cyclefunction is crucial. It frees all memory associated with the processed command (tokens, AST nodes, execution variables), guaranteeing that the shell starts the next input cycle from a clean memory state, thus preventing incremental memory leaks.
Proper signal handling is paramount for a shell to behave reliably. This is managed using the signal() function setup during initialization and relies on a global array (global_arr[2]) to track state:
global\_arr[0](Signal Number): Stores the last received signal (e.g.,SIGINT,SIGQUIT).global\_arr[1](Mode): Indicates the current execution context (e.g.,HEREDOC,INTERACTIVE).
Impact: By centralizing signal state management, the shell can execute different signal handlers based on the current mode (e.g., ignoring heredoc input prompt).
The Tokenization phase transforms the raw input string into a structured, executable list of tokens.
Every logical unit recognized by the tokenizer is stored in a struct s_token. The design includes specific fields to maintain the integrity and traceability of data throughout the execution pipeline.
| Member | Type | Purpose and Impact |
|---|---|---|
id |
int |
Dynamic Index: Serves as the current array index. This value is updated every time the token array is modified (e.g., token deletion, reorganization, or simplification) to ensure efficient array traversal. |
hash |
int |
Permanent Identifier: A fixed, unchangeable value initially set to id. Its primary purpose is to act as a permanent key to link the token to its corresponding node in the Abstract Syntax Tree (AST) and to reference persistent data structures (like heredoc files), preventing connection loss during token array reconfigurations. |
type |
t_type |
Semantic Category: Identifies the token's role (e.g., COMMAND, BUILTIN, EXPANSION, PIPE). This is the key information used by later phases (Expansion, AST Builder) to target specific tokens for processing. |
value |
char * |
Raw Content: The string data extracted from the input using ft_substr (e.g., "ls", "USER", ""). |
single_quoted |
bool |
Indicates if the token was surrounded by single quotes (' '). Crucial for the Expansion Phase, as content inside single quotes is protected from variable expansion. |
double_quoted |
bool |
Indicates if the token was surrounded by double quotes (" "). Crucial for the Expansion Phase, as variable expansion is permitted within double quotes, but wildcard expansion is not. |
expand |
bool |
Expansibility Flag: A general flag indicating whether the token is eligible for environment variable expansion. Tokens like WORD and EXPANSION may have this set to TRUE. |
wildcard_info |
t_wild * |
Pointer to metadata required for the Wildcard Expansion phase. |
Tokens are stored in a dynamically allocated array (prompt->tokens). This provides cache efficiency while accommodating command lines of arbitrary length.
The check_buffer function ensures the array never overflows by dynamically resizing the underlying memory structure before a new token is added:
void check_buffer(t_shell *d, t_prompt *p)
{
size_t new_capacity;
t_token *new_tokens;
if (p->n_tokens >= p->n_alloc_tokens)
{
new_capacity = p->n_alloc_tokens * 2;
// ... safety check for INT_MAX ...
new_tokens = ft_realloc(p->tokens,
p->n_alloc_tokens * sizeof(t_token),
new_capacity * sizeof(t_token));
// ... error handling ...
p->tokens = new_tokens;
p->n_alloc_tokens = new_capacity;
}
}-
Strategy: The array capacity (
p->n_alloc_tokens) is doubled when the token count (p->n_tokens) meets the current limit. -
Impact: This design provides an amortized
$O(1)$ performance for adding tokens, making the tokenizer highly efficient even for extremely long command lines.
The get_tokens function iterates through the input string, applying a strict order of precedence to identify tokens:
void get_tokens(t_shell *data, t_prompt *prompt, char *input)
{
int i = 0;
while (input[i] != '\0')
{
is_not_token(input, &i); // Skip whitespace, comments
is_and(...); is_or(...); // Logical operators (high precedence)
is_pipe(...); is_parenten(...);
// ... other special/delimiter tokens ...
is_single_quote(...); // Quote handling (takes precedence over word)
is_double_quote(...);
is_wildcar(...);
is_dolar(...); // Expansion (takes precedence over generic word)
is_word(...); // Catch-all generic word
// ...
}
// ... post-processing
}Crucially, token creation functions do not just extract substrings; they actively clean and sanitize the content for later processing:
-
cleanner_word/cleanner_exp: These functions (found, for example, inis_double_quote.candis_dolar.c) eliminate characters that are necessary for token boundaries but undesirable in the final token value.-
Example: For variable expansion syntax,
cleanner_expremoves superfluous characters like the braces ({and}) from $$\text{{VAR}}$, leaving only the variable nameVARas the token value.
-
Example: For variable expansion syntax,
-
is_ignore_token: This utility is used to handle sequences that must be tokenized but which are meant to be ignored or treated specially, such as positional parameters ($1,$2), which are typically ignored by shells in an interactive context. -
Command Eager Classification: As detailed in the previous phase,
is_cmdis called immediately on newWORDtokens to classify them asCOMMAND(external executable) orBUILTIN(internal function), streamlining the subsequent execution process.
Immediately following token generation, the check_if_valid_tokens function performs a comprehensive sweep of the token array to enforce Bash syntax rules, ensuring the command structure is semantically viable.
int check_if_valid_tokens(t_shell *data, t_prompt *prompt, t_token *tokens)
{
// ... loop through all tokens ...
// ... validation checks ...
if (!check_parent_balance(data, prompt, tokens))
return (SYNTAX_ERROR);
return (SUCCESS);
}- Delimiter Placement: Functions like
check_pipe,check_or_and, andcheck_redir_*ensure that operators (e.g.,|,&&,<) are correctly flanked by valid tokens (commands, words, filenames, parentheses). For example,check_pipeconfirms tokens exist both before and after thePIPE. - Parentheses Balance:
check_open_parentandcheck_close_parentensure not only that parentheses are balanced but also that their placement is logically correct (e.g., preventingcmd(cmd)or( | )). The finalcheck_parent_balanceverifies the overall count. - Quote Closure: Functions like
check_double_balanceandcheck_single_balanceverify that all quotes in the input are properly closed.
When a syntax error is detected (e.g., in check_pipe):
- A dedicated error function, such as
syntax_error(data, ERR_SYNTAX, EXIT_USE, tokens[i].value), is called. - This error function prints the specific error message to the user, sets the correct exit code, and then triggers a cleanup sequence.
- The validation loop immediately returns
SYNTAX_ERROR.
This robust process guarantees that, upon failure, all memory related to the faulty command line is freed, and the program flow returns directly to the main loop's continue statement, ready for the next user input without attempting to execute the invalid command.
The check_if_valid_tokens function runs immediately after token generation to enforce fundamental Bash syntax rules. This mechanism ensures the shell fails fast on invalid input, preventing resource waste in later, more complex stages.
- Validation Check: A function (e.g.,
check_pipe,check_open_parent) performs a semantic rule check on the current token. - Error Trigger: If a violation is detected (e.g.,
|without a preceding command), the function immediately calls the central error handler:syntax_error(data, ERR_SYNTAX, EXIT_USE, tokens[i].value);
- Error Handling (
error.cLogic):- Reporting: The
syntax_errorfunction uses variadic arguments (va_list) to construct and print a detailed error message toSTDERR(standard error). - Cleanup: It immediately calls
clean_prompt(&data->prompt)to free all tokens and the input buffer memory associated with the faulty command line. This is a crucial step to avoid incremental memory leaks. - State Update: It sets the shell's global
data->exit_codeto the provided error code (e.g.,2forEXIT_USE), allowing the shell to reflect the error status.
- Reporting: The
- Loop Termination: The validation check returns
SYNTAX_ERROR(which maps toFAILURE), causing the main loop inmain.cto trigger thecontinuestatement, returning control to the start of the REPL to await the next user input.
This explicit protocol guarantees that all resources related to an invalid command are released before the next cycle begins.
The Expansion Phase is the engine of variable substitution, where raw token values (\$VAR, ~) are transformed into their environment-defined counterparts. This stage is engineered with a two-phase system to maintain contextual integrity, preventing the critical state-dependency flaws common in single-pass shell implementations.
The Minishell expansion is divided into two distinct phases to manage dependency and execution order, ensuring that variable substitution occurs only when the context is stable.
| Phase | Core Logic | Dependencies and Purpose |
|---|---|---|
INITIAL_PHASE (initial\_expansion\_process) |
Expands all tokens that can be safely expanded (i.e., those whose value is not being changed by a same-line assignment). | Pre-Processing & Simplification: Allows immediate token simplification and prepares the token list before AST construction begins. |
FINAL_PHASE (final\_expansion\_process) |
Expands all remaining tokens, particularly those that depend on a runtime context (i.e., those blocked by dont\_expand\_this). |
Contextual Execution: Ensures variables modified in the current command line receive their correct, final value before execution. |
$$\text{VAR}=\text{NEW\_VALUE}; \text{echo} \ \$ \text{VAR}$$
Read/Expand: The shell would tokenize and expand $VAR first. It would substitute the old value of VAR from the environment.Execute: Only later, during the execution stage, would the assignment VAR=NEW_VALUE occur.Result: The echo command would execute with the stale, old value, violating the user's intent to use the NEW_VALUE set in the same command line. Your solution, dont_expand_this, was engineered specifically to detect this pending state change and defer the expansion, thereby ensuring contextual integrity between assignment and variable usage.
Consider this input: USER=David; echo hello $USER
- A simple, single-pass expansion would see
echo hello $USER. IfUSERwas previously set toAlexin the environment, the shell would expand$USERtoAleximmediately during the tokenization stage. - The assignment
USER=Davidoccurs later during execution. -
Resulting Flaw: The command executed would be
echo hello Alex, even though the user intended the command to beecho hello David. The expansion failed to recognize the upcoming change in state.
The function dont_expand_this (within initial_expansion_process) acts as a crucial sentinel against this premature expansion:
-
Assignment Recognition: The system first analyzes all tokens to correctly identify valid Assignment Tokens (e.g.,
VAR=Hello) by checking their syntax and semantics, distinguishing them from ordinaryWORDtokens. -
Value Comparison: For every Assignment token (
KEY=VALUE), the function performs a lookup using:char *get_var_value(t_var *vars, const char *key);
-
Case 1: No Change: If the value being assigned (
VALUE= "David") is identical to the current value stored in the environment list (vars= "David"), the expansion is safe. Tokens attempting to expand$KEYcan proceed in the INITIAL_PHASE. - Case 2: Change Detected (Block): If the assigned value ("David") differs from the current environment value ("Alex"), a change is pending.
-
Case 1: No Change: If the value being assigned (
-
Blocking Expansion: To prevent the expansion flaw, the system iterates over the token array and finds all pending
EXPANSIONtokens matching the key ($USER). It sets a dedicated boolean flag:This signal tells the main expansion routines to ignore these tokens in the INITIAL_PHASE, effectively blocking them until the FINAL_PHASE, where the state change will have been executed.bool expand = FALSE;
Impact: By deferring expansions that rely on a pending state change, this system ensures contextual integrity. The EXPANSION tokens are held in their raw \$VAR form until the execution phase is ready to handle the new state.
The core substitution logic (copy_key, find_key_in_list, copy_value) is called repeatedly by the main expansion function for every variable found in an eligible token.
-
Extraction (
copy_key): Scans the token's value string to find the first valid$or~, enforcing shell rules (alphanumeric keys, special tilde forms). The key name is extracted into a buffer (key\_to\_find). -
Value Retrieval (
find_key_in_list): Searches the environment linked list (d->env.vars) for the matching key. It also handles special symbols ($??,$$) during the FINAL_PHASE.
Once the environment value is found, the copy_value function executes the in-place replacement:
-
Length Calculation:
calculate_total_lengthdetermines the exact size of the resultant string. -
Expansion Logic (
expand): A new buffer is allocated. Using efficient pointer arithmetic andft_memcpy, the function copies the string segments:- String before the
$$\rightarrow$ Value from environment$\rightarrow$ String after the key. - This is a high-performance approach, avoiding multiple expensive allocation calls (like
strjoin) typically required for string manipulation.
- String before the
-
Token Update: The original
token->valueis freed, and the token pointer is updated to the new, expanded buffer.
If a variable is not found in the environment, expand_empty_str ensures the token list remains syntactically correct:
-
Unquoted & Standalone (
$UNSET): The token is eliminated from the array. -
Quoted & Standalone (
"$UNSET"): The token is replaced by a single space, preventing adjacent tokens from merging. -
Embedded (
text$UNSET): The$UNSETsubstring is removed, but the remainingtextis preserved.
The final_expansion_process is executed during the command setup phase and manages array instability introduced by expansion.
- Node-Specific Expansion: Expansion is deliberately limited to the tokens relevant to the current
t_node(command arguments). This scoping prevents unintended interference between different commands in a complex pipe or list. - Simplification:
simplify_tokensis called to merge adjacentWORDtokens and clean up residualNO_SPACEtokens left by the tokenizer. - Synchronization (Hashing): Because simplification, word splitting, and wildcard expansion physically rearrange the token array, the link to the AST is lost.
reconect_nodes_tokensis called after every array-modifying step, using the tokens'hash(the permanent key) to re-establish the connection between the AST node and its corresponding token index (id). - Word Splitting (
split_expansion_result): If an unquoted expansion resulted in a value with whitespace, this function splits the expanded token into multiple newWORDtokens. - Wildcard Expansion (
expand_wildcards): The final step performs pattern matching. - Argument Finalization: The resulting tokens are collected and assembled into the
char **argsarray for the executor.
The prompt->before_tokens_type array is a temporary, crucial data structure used to save the original token types immediately before the expansion and simplification process begins.
Necessity for Contextual Splitting After an EXPANSION token is substituted, its resulting type is often changed to WORD. The challenge lies in determining if this new WORD token should be subject to Word Splitting (splitting the token by internal whitespace) or if it was originally protected by quotes.
The Problem: In Bash, word splitting only applies to an expanded token if the original token was unquoted (e.g., $VAR). If the original token was double-quoted ("$VAR"), its expanded content must remain a single token, even if it contains spaces.
The Solution: The split_expansion_result function cannot rely on the token's current type or quoting flags alone. By referencing the saved before_tokens_type array, the system can definitively look back and confirm:
What type was this token originally? (It must have been an EXPANSION).
What was the surrounding context? (Was it an unquoted EXPANSION?).
This look-back mechanism allows Minishell to correctly apply or suppress word splitting, ensuring strict adherence to the complex rules of shell expansion integrity.
The prepare_simplify function is a crucial pre-processing step within the final_expansion_process. Its sole purpose is to audit the token array for specific boundary conditions created by expansion results (such as an $VAR being eliminated) and override the default joining behavior of the upcoming simplify_tokens function.
The simplify_tokens function joins tokens that are separated by a NO_SPACE token. However, if an adjacent EXPANSION token results in an empty string and is eliminated, the NO_SPACE might shift and become adjacent to two tokens that should not be joined (e.g., a command and its argument), leading to an incorrect command structure.
prepare_simplify iterates through the tokens, focusing on indices where the original token type (before\_tokens\_type) was EXPANSION and the current adjacent token is a NO_SPACE. It then applies a strict set of contextual rules (check_cases) to stabilize the array:
-
Protecting Commands: If a
NO_SPACEis located between a Command/Built-in token and a subsequent token that is not anotherNO_SPACE, the centralNO_SPACEtoken is converted toDONT_ELIMINATE.- Impact: This prevents the command token from being accidentally concatenated with the following word, preserving the command boundary.
-
Protecting Literal Words: If a
NO_SPACEis positioned between twoWORDtokens, it is converted toDONT_ELIMINATE.- Impact: This maintains intended separation between the two distinct words, which is necessary if the
NO_SPACEwas part of a complex expression that resolved into two separate words that should not be fused.
- Impact: This maintains intended separation between the two distinct words, which is necessary if the
-
Eliminating Redundancy: If a
NO_SPACEtoken is found immediately adjacent to anotherNO_SPACEtoken, the token at the current index is eliminated immediately.- Impact: This tidies the array, removing unnecessary duplication and ensuring clean boundaries. The array is immediately reconnected (
reconect_nodes_tokens) after elimination to restore the AST link.
- Impact: This tidies the array, removing unnecessary duplication and ensuring clean boundaries. The array is immediately reconnected (
By strategically transforming the NO_SPACE token type to DONT_ELIMINATE, Minishell forces simplify_tokens to skip joining that particular segment, thus guaranteeing that the structural integrity of commands and arguments is maintained following dynamic expansion.
The Simplification Phase is the clean-up and consolidation stage of the pipeline. Its primary objective is to finalize word formation by concatenating adjacent, related tokens and removing non-semantic markers (like quotes and NO_SPACE tokens) to produce a dense, accurate sequence of final command arguments ready for the Abstract Syntax Tree (AST) builder.
The central function, simplify_tokens, iterates through the token array, searching for NO_SPACE tokens which act as flags indicating that adjacent content should be fused.
Simplification operates on a range defined by a NO_SPACE marker:
-
Search:
get_no_space_rangefinds the nextNO_SPACEtoken in the stream. -
Boundary Definition: It calls
find_range_startandfind_range_end(adjust_range_tokens.c) to determine the exact index range$[R_{\text{start}}, R_{\text{end}}]$ of the tokens that need to be merged.-
Complex Rule Set: The range functions incorporate complex heuristics to correctly identify tokens that belong together, including:
-
Quoted Segments: Tokens enclosed by opening and closing quotes (
SINGLE\_QUOTE,DOUBLE\_QUOTE) are always included in the range to ensure the entire quoted segment is treated as one unit. -
Consecutive Quotes: They handle complex scenarios where multiple tokens are involved in a single logical string (e.g.,
cmd"word1""word2"), ensuring the full sequence is captured for joining.
-
Quoted Segments: Tokens enclosed by opening and closing quotes (
-
Complex Rule Set: The range functions incorporate complex heuristics to correctly identify tokens that belong together, including:
Once a valid range is found:
- Feasibility Check:
is_possible_simplifyensures the range is viable (contains at least oneNO_SPACEmarker and does not contain unprocessedEXPANSIONtokens). - String Fusion:
join_tokensiterates through the tokens in the identified range. It allocates a new, final string (result) and usesft_strjoinrepeatedly to concatenate thevaluestrings of all tokens within the range. - Efficiency: The concatenation process targets only tokens that are relevant (
is_needed_to_simplify), ensuring efficiency by skipping purely structural tokens (likeNO_SPACEitself).
The most complex task in this phase is managing the array after a fusion operation, as replacing multiple tokens with a single token creates a gap in the array.
The reorganize_tokens function manages the transition from the old, fragmented tokens to the new, simplified token:
-
Preserve Hash: The
hashof the token at the start of the range (tokens[range[0]]) is saved. This is critical because the new simplified token inherits the original hash, preserving the permanent link to the AST node (established during the Expansion Phase). -
Resource Cleanup:
free_tokens_in_rangeis called to free the memory (valuestrings) of all tokens in the old, redundant range. -
Token Replacement: The newly created, concatenated string (
res) is assigned totokens[range[0]].value, and its type is set toWORD. -
Array Collapse:
ft_memmoveis used to shift all tokens following the range ($R_{\text{end}} + 1$ ) forward to fill the created gap.- Impact: This is a high-performance operation that physically collapses the array, maintaining contiguous memory.
-
State Update: The total token count (
p->n_tokens) is decreased by the number of tokens removed, andvoid_tokens_at_the_endinitializes the memory at the end of the array to zero.
Before and during simplification, specific helper functions enforce boundary rules related to delimiters and line endings (reorganize_tokens.c):
no_space_at_delimiter: If aNO_SPACEis found immediately preceding a delimiter (e.g.,|,&&), theNO_SPACEis immediately eliminated. This prevents logic errors that could arise if a simplified range included a delimiter.no_space_at_end: If aNO_SPACEtoken is the very last token in the array (a common result of expansion failures), it is eliminated, tidying the final array state.
The final step in simplify_tokens is structural cleanup:
remove_quotes_tokens: This function iterates through the entire array and physically removes all remaining quote tokens (SINGLE\_QUOTE,DOUBLE\_QUOTE).- Rationale: The quotes have served their purpose—they protected content during tokenization and expansion, and defined the boundaries for simplification. Since their content is now fully fused, the quote markers themselves are non-semantic and are removed using a two-pointer technique (read/write indices) to complete the array collapse.
- Final ID Update: The array size is updated, and
adjust_idis called one last time to ensure the dynamicidof every token accurately reflects its final index in the array, making it ready for the AST builder.
void simplify_tokens(t_shell *data, t_prompt *prompt, t_token *tokens)
{
int i;
int range[2];
i = 0;
while (i < prompt->n_tokens && tokens[i].type)
{
if (get_no_space_range(tokens, range, i, prompt->n_tokens))
{
if (is_possible_simplify(tokens, range))
{
if (no_space_at_end(data, prompt, tokens)
|| no_space_at_delimiter(data, prompt, tokens))
return ;
join_tokens(data, prompt, tokens, range);
i = range[0] + 1;
continue ;
}
}
i++;
}
remove_quotes_tokens(prompt, tokens);
adjust_id(tokens, prompt->n_tokens);
}The Transformation Phase is a complex, multi-pass refinement system that runs after simplification to finalize the semantic role of every token. The difficulty lies in the ambiguity of the command line, where a sequence like VAR=value can be either an argument (if preceded by a command) or an assignment (if at the start of the line).
The core engineering challenge in this phase is the precise classification of tokens that adhere to the assignment syntax (KEY=VALUE):
The logic employs a two-tiered validation system (transform_word_to_asignation and check_externs_syntax) to ensure a token only retains the ASIGNATION type if both its internal syntax and external context are valid.
- Phase: Initial pass (
INITIAL\_PHASE). - Logic: The
check_asignation_syntaxfunction first verifies that the token'svaluestring strictly follows assignment rules:- It must contain at least one unquoted
=sign. - The variable name (text before the
=) must begin with an alphabetic character or underscore (_) and contain only alphanumeric characters or underscores.
- It must contain at least one unquoted
- Action: If the syntax is valid, the token is provisionally converted from
WORD/COMMANDtoASIGNATION.
- Phase: Final pass (
FINAL\_PHASE). - Logic: This is the critical step.
check_externs_syntaxvalidates the token's environment by checking its neighbors (tokens to the left and right). A token only retains theASIGNATIONtype if:- It is at the start of the command line (
token->id == 0). - It is preceded by a delimiter (
|,&&,;,(). - It is preceded by an
EXPORTbuilt-in.
- It is at the start of the command line (
- Action: If the token fails this contextual check (e.g., if it is preceded by a regular
COMMANDlikels), it is definitively reverted back toWORDusingtransform_invalid_asig_to_word.
Result: A token is only considered a definitive, executable assignment if it successfully passes both the internal syntax and external context filters.
The transform_tokens_logic function executes a detailed, multi-step pipeline to handle all remaining ambiguities.
Once an ASIGNATION is confirmed, its type is further refined for execution:
- Concatenation (
transform_asig_to_asig_plus): Assignments containing the+=operator (e.g.,VAR+=1) are converted to the specialized typePLUS_ASIGNATION. This provides the executor with the precise instruction to append the value rather than replace it. - Temporal Context (
transform_asig_to_temp): The logic analyzes whether the assignment is meant for the parent shell or a child process.- If an assignment is immediately followed by a command or subshell (e.g.,
VAR=1 ls -lorVAR=1 (cmd)), it is converted toTEMP_ASIGNATIONorTEMP_PLUS_ASIGNATION. - Impact: This ensures the variable is only applied to the child process environment, accurately emulating Bash's behavior.
- If an assignment is immediately followed by a command or subshell (e.g.,
Throughout the pipeline, the roles of generic tokens are repeatedly checked against their final context:
| Transformation | Condition & Impact |
|---|---|
transform_cmd_to_word |
Argument Reversion: Ensures tokens initially classified as COMMAND or BUILTIN are downgraded to WORD if they appear as arguments (i.e., immediately following another command/builtin). This ensures arguments are collected correctly by the AST builder. |
transform_word_to_file |
Redirection Context: Converts WORD, EXPANSION, or WILDCARD tokens that immediately follow a redirection operator (<, >) into the dedicated FILENAME type (or DELIMITER for <<). |
transform_word_to_wildcard |
Wildcard Typing: Checks for unquoted tokens containing the * character and converts them to the WILDCARD type, flagging them for pattern matching during the execution phase. |
transform_cmd_to_built_in |
Case Normalization: Performs a final check to confirm if a COMMAND is actually a BUILTIN and converts the value to lowercase for standardized matching. |
This rigorous multi-pass system is vital to stabilize the token array, ensuring that the AST Builder receives a syntactically and semantically unambiguous list of command arguments and operators.
void transform_tokens_logic(t_shell *data, t_prompt *prompt, t_token *tokens)
{
transform_cmd_to_built_in(data, prompt, tokens);
transform_cmd_to_word(data, tokens, INITIAL_PHASE);
transform_word_to_asignation(data, tokens, INITIAL_PHASE);
transform_word_to_asignation(data, tokens, FINAL_PHASE);
transform_cmd_to_word(data, tokens, FINAL_PHASE);
transform_invalid_asig_to_word(prompt, tokens);
transform_asig_to_asig_plus(prompt, tokens);
transform_asig_to_temp(prompt, tokens);
transform_word_to_file(prompt, tokens);
transform_word_to_wildcard(prompt, tokens);
transform_command_built_lowercase(prompt, tokens);
transform_cmd_to_built_in(data, prompt, tokens);
}The Wildcard Expansion Phase is the final array-modifying step in the token processing pipeline. Its role is to take a single WILDCARD token (containing the * pattern) and replace it with a sorted list of matching filenames from the current directory, if any exist.
The process is executed in both the INITIAL_PHASE (to check basic validity) and the FINAL_PHASE (to perform the actual expansion and array modification).
The expansion of a wildcard token is an intensive, multi-step process that utilizes a temporary structure, t_wild, to store pattern matching details.
int process_wildcard(t_shell *data, t_token *token)
{
char **new_tokens;
int n_dirs;
new_tokens = NULL;
n_dirs = 0;
if (token->double_quoted)
return (FAILURE);
init_wildcard(data, &token->wildcard_info);
if (!extract(data, token, token->wildcard_info))
return (FAILURE);
if (!matches(data, token, token->wildcard_info, &n_dirs))
return (FAILURE);
new_tokens = find_matches(data, token->wildcard_info, n_dirs);
if (!new_tokens)
{
free_wildcard(&token->wildcard_info);
return (FAILURE);
}
free_wildcard(&token->wildcard_info);
rebuild_tokens(data, token, new_tokens, n_dirs);
ft_free_str_array(&new_tokens);
return (SUCCESS);
}- Initialization: A temporary
t_wildstructure is initialized on the token. - Extraction (
extract_wildcard.c): The token's raw value (e.g.,*.cortemp*file) is analyzed byextract_wildcardto determine the pattern type:ALL: (*) - Matches all files.BEGINING: (*file) - Matches files ending withfile.END: (file*) - Matches files beginning withfile.COMPLEX: (file*part*ext) - Contains multiple internal wildcards.
- Dotfiles Rule (
should_ignore_file): Thet_wildstructure tracks if the pattern explicitly begins with a dot (.). This adheres to the Bash rule: if the pattern doesn't start with a dot, ignore files that start with a dot.
-
Count Matches (
count_matches.c): The system opens the current directory (opendir(".")) and iterates through its entries (readdir).- Matching Logic: For each file, specialized logic checks for a match based on the pattern type (
if_theres_matchfor simple patterns;handle_complex_casefor complex patterns). - Ambiguity Check: Before array modification, a critical syntax safety check is performed: if the wildcard is preceded by a redirection operator (e.g.,
> *.txt) and results in more than one match (n_dirs > 1), anERR_AMBIGUOUS_REDIRsyntax error is raised. This is standard Bash behavior.
- Matching Logic: For each file, specialized logic checks for a match based on the pattern type (
-
Find Matches (
find_matches.c): If the count is valid,find_matchesperforms a second directory traversal to allocate and populate achar **dirsarray with the names of all matching files.
If matches are found, the single WILDCARD token must be replaced by the array of matching filenames.
- Reorder Tokens (
reorder_tokens.c): This is the memory-intensive operation that replaces the token:- It creates a larger, temporary array (
t_token *tmp). - Copies Original Tokens up to the wildcard's position.
- Copies New Tokens: Inserts the matched filenames (
dirs) as newWORDtokens into the array. - Copies Remaining Tokens: Shifts and copies the tokens that followed the original wildcard.
- Hash Integrity: When inserting new tokens, a unique
hashis generated for each usingcreate_hashto guarantee no collisions occur with existing tokens, which is essential for AST integrity. - The old token array is freed, and the prompt's pointer is updated to the new array.
- It creates a larger, temporary array (
- Synchronization: After the array is rebuilt, the tokens' dynamic indices are adjusted (
adjust_id), andreconect_nodes_tokensis called to update the pointers in the AST, restoring the execution context. - No Match Scenario: If
n_dirsis zero, the originalWILDCARDtoken is left untouched in the array, preserving its literal value for the next phase.
For complex patterns containing multiple internal wildcards (e.g., *part1*part2*), a sophisticated matching algorithm is used:
- Pattern Splitting: The complex wildcard string is split into an array of required literal substrings (e.g.,
part1,part2). - Sequential Match: The algorithm verifies that all literal substrings exist within the target filename and appear in the correct sequential order.
- Pointer Comparison: It uses
ft_intstr_matchto find the starting index of each literal part within the filename and stores these indices in an array (result). - Order Validation: A final check (
compare_match_order) ensures that the indices stored inresultare strictly ascending.- Impact: This guarantees that the matched file (e.g.,
a_part1_b_part2_c) adheres to the strict ordering enforced by Bash for complex wildcard expressions.
- Impact: This guarantees that the matched file (e.g.,
The AST Construction phase translates the simplified, validated token stream into a hierarchical structure that models the logical flow and dependencies of the user's command line. This is achieved using a Recursive Descent Parser based on operator precedence.
The parser functions are structured to reflect the standard Unix shell precedence rules. The ast_builder function initiates the process by calling the function with the lowest binding strength (parse_sequence), ensuring the correct tree structure is built from the top-down.
| Precedence Level | Function | Operators Handled | Role in the Tree |
|---|---|---|---|
| Lowest (1) | parse_sequence |
; (Semicolon) |
Groups commands for sequential, unconditional execution. |
| 2 | parse_and_or |
&&, OR (AND, OR) |
Groups pipelines based on logical success/failure of the left side. |
| 3 | parse_pipes |
Pipe (Pipe) |
Groups commands for chained execution, directing output to the next command's input. |
| Highest (4) | parse_subshell |
(, ) |
Handles subshells, real assignments, and delegates to the final command parser. |
| Leaf Node | parse_cmd |
COMMAND, BUILTIN, WILDCARD |
Creates the final executable nodes. |
The parser functions follow a standard pattern: they attempt to parse a lower-precedence component as the left child, and if they find their operator (e.g., PIPE), they create a central node with that operator type and recursively parse the right child.
These functions are responsible for creating the executable nodes that form the leaves (or internal wrappers) of the AST.
The parse_cmd function is versatile, handling not just standard commands but also special cases and implicit actions:
- Special Cases (
special_cases): This helper detects command lines that consist solely of Redirections or Temporary Assignments (e.g.,VAR=1 > file.txt). If found, it creates a placeholder command node (atruenode) specifically to execute the redirections and assignments without an explicit executable. - Token Inclusion: It creates the central node for
COMMAND,BUILTIN,WILDCARD, or Local Assignments (is_asignation_type). - Wildcards: Tokens classified as
WILDCARDorEXPANSIONare correctly parsed as command nodes. This design choice handles the edge case where an unexpanded wildcard (e.g.,*) acts as the command itself, deferring the final expansion and error check to the executor.
t_node *parse_cmd(t_shell *data, t_token *tokens, int *i, int n_tokens)
{
t_node *central;
central = NULL;
central = special_cases(data, tokens, i, n_tokens);
if (central)
return (central);
if (*i < n_tokens && tokens[*i].type
&& (is_cmd_builtin_type(tokens[*i].type)
|| is_real_assignation_type(tokens[*i].type)
|| is_redir_type(tokens[*i].type) || tokens[*i].type == WILDCARD
|| tokens[*i].type == EXPANSION))
{
index_redir_input(tokens[*i].type, i, n_tokens);
central = create_node(data, &tokens[*i], tokens[*i].type);
if (!central)
return (NULL);
if (is_asignation_type(tokens[*i].type))
return (safe_index_plus(i, data->prompt.n_tokens), central);
get_information(data, tokens, i, central);
if (data->error_state == TRUE || check_signal_node_heredoc(central))
return (clean_node(¢ral), NULL);
return (central);
}
return (central);
}- Subshells: If
parse_subshellencounters aPAREN_OPENtoken, it creates aSUBSHELLnode. It recursively callsparse_sequenceto build the entire command structure inside the parentheses as its left child. - Assignments: If the current token is a Real Assignment (
ASIGNATION,PLUS_ASIGNATION),parse_assignationsis called. This function groups multiple consecutive assignments by chaining them together with;(SEMICOLON) nodes.- Example:
VAR=1 VAR+=2is converted into an AST structure equivalent toVAR=1 ; VAR+=2, ensuring they are executed sequentially.
- Example:
After the central command node is created, get_information gathers all peripheral metadata required for execution.
- Context Scoping: Temporary assignments (e.g.,
VAR=1 ls -l) only apply to the command they precede. - Logic:
get_temp_asignationssearches backward from the command's index to locate and extract tokens flagged asTEMP_ASIGNATION. It builds achar **assig\_tmparray on the command node. - Impact: This metadata is crucial for the executor, which must apply these environment variables only to the child process executing the current command.
- Redirection List:
get_redirsiterates through the command's token range, identifies all redirection operators (<,>,>>,<<), and builds a linked list oft_redirstructures on the node'sredirmember. - Heredoc Management: If a
REDIR_HEREDOC(<<) is found:get_heredocis called to enter the prompt loop (loop_heredoc).- The user's input lines are collected and stored in a linked list (
t_list *heredoc_lines) within thet_redirstructure. - Expansion Rule:
expand_heredocchecks if the delimiter itself was quoted. If the delimiter was unquoted, theredir->expandflag is set toTRUE, indicating that variables within the collected lines must be expanded later during execution.
- Binary Arguments (
get_args_for_binary.c): This function iterates forward from the command token to collect all subsequentWORDtokens (excluding redirections, temporary assignments, and delimiters). It builds the standardchar **argsarray for the executor (e.g.,{"ls", "-l", NULL}). - Argument Types (
get_arg_types.c): For specialized built-ins likeexport, the executor needs to know the context of each argument.get_arg_typescreates an auxiliary array (int *arg\_types) that stores the original tokenidfor each argument.- Impact: This preserves the original semantic type (e.g.,
ASIGNATION,PLUS_ASIGNATION) forexportarguments, allowing the built-in logic to correctly process and update the environment.
- Impact: This preserves the original semantic type (e.g.,
- The parser checks if the final token after all arguments and redirections is a
BACKGROUNDoperator (&). If present, it sets the node'sbackgroundflag toTRUEand advances the index past the operator.
The Execution Phase is the runtime core of Minishell. It traverses the Abstract Syntax Tree (AST) constructed by the parser, interprets the logical structure (pipes, AND/OR, subshells), and manages process creation (fork), I/O redirection, and signal handling to execute commands and built-ins.
The main function, executor_recursive, traverses the AST starting from the root node. It delegates the execution based on the node's type, strictly following the precedence defined by the AST structure.
void executor_recursive(t_shell *data, t_node *node, t_exec *exec, int mode)
{
// ... node->executed = true; ...
if (node->type == SEMICOLON)
exec_semicolon(data, node, exec, mode);
else if (node->type == PIPE)
exec_pipe(data, node, exec, mode);
// ... other node types ...
}Execution for logical and sequence nodes is straightforward, utilizing the exit code (data->exit_code) to control flow:
| Node Type | Execution Function | Logic |
|---|---|---|
SEMICOLON (;) |
exec_semicolon |
Executes the left child, then the right child, regardless of the exit code. (Sequential Execution) |
AND (&&) |
exec_and |
Executes the left child. Only executes the right child if data->exit_code == 0 (Success). |
OR (` |
`) |
Minishell uses a precise system to determine the execution context based on the current mode (FATHER, CHILD, SUBSHELL).
| Execution Mode | Context | Process Creation (fork) |
|---|---|---|
FATHER |
Top-level command or command after a sequence operator (&&, ` |
|
CHILD |
Command is part of a Pipe (` | `). |
SUBSHELL |
Command is part of a SUBSHELL node. |
Required: A new child process is forked by exec_subshell. |
Before any command or built-in is executed (in the parent or child process), apply_properties is called to set up the necessary environment:
- Temporary Assignments: If the node has
assig_tmp(temporary assignments likeVAR=1 ls),apply_temp_asigadds them to the child's environment variables. - Redirections: If the node has
redirstructures,apply_redirshandles file opening (or pipe duplication) and redirects standard file descriptors (STDIN,STDOUT,STDERR) usingdup2.
This is the path for external executables found in PATH (e.g., /bin/ls) or commands that require a full search.
- Final Transformation: The crucial
final_expansion_processis called first. This executes all deferred expansions, word splitting, and wildcard expansion specifically for the current command's arguments. - Process Control: Based on
mode, eitherexecute_cmd_from_father(ifmode == FATHER) orexecute_cmd_from_child(ifmode == CHILD) is called.- Child Process Logic: The child process: 1) calls
apply_properties, 2) finds the executable path (get_path), 3) updates the_environment variable, and 4) executes the command usingexecve.
- Child Process Logic: The child process: 1) calls
- Background Control (
wait_cmd_background):- If
node->backgroundis FALSE, the parent process callswaitpidto wait for the child's termination and retrieves the final exit status. - If
node->backgroundis TRUE (&operator), the parent prints the PID and does not wait, returning immediately to the interactive shell.
- If
Built-in commands (like cd, export, exit) are handled separately because they must modify the state of the parent shell process.
- Foreground Execution: If the node is not in the background, the built-in is executed directly in the current process (
FATHERorSUBSHELLprocess) without afork. This ensures changes to the environment (export,unset) or shell state (cd,exit) persist. - Background Execution: If
node->backgroundis TRUE, afork()is required (hanlde_background_exec). The built-in is executed in the child process, and the parent prints the PID and continues. This sacrifices persistence (e.g.,cdin background won't affect the parent) but preserves process control. - Built-in Selection: The function
which_builtindelegates control to the correct internal function (e.g.,my_export,my_cd,my_echo) based on the token's value.
Pipe nodes (|) execute commands concurrently using standard Unix pipelining:
- Pipe Creation:
pipe(pipefd)creates the pipe. - Forking: Two child processes are forked, one for the left command and one for the right command.
- I/O Duplication (
handle_child):- Left Child: Duplicates the write end of the pipe (
pipefd[1]) toSTDOUT_FILENO. - Right Child: Duplicates the read end of the pipe (
pipefd[0]) toSTDIN_FILENO.
- Left Child: Duplicates the write end of the pipe (
- Waiting: The parent process closes the pipe descriptors and uses
waitpidto wait for both children, setting the final exit code based on the rightmost command's status.
Subshells ((cmd1 | cmd2)) force the enclosed command structure to run in a separate process:
- Forking: A single child process is forked.
- Execution: The child process applies redirections and recursively calls
executor_recursiveon the subshell's content (node->left). - Exit: The child process terminates using
exit_succeswith the exit code resulting from the sub-command execution. - Waiting: The parent waits for the subshell process, retrieves the final status, and handles any signal termination.
The shell's I/O management system is responsible for implementing the behavior of all redirection operators (apply_redirs function, which is called before any command execution takes place.
The apply_redirs function iterates through the linked list of t_redir structures attached to the current command node. For each redirection, it performs the following:
- Ambiguity Check: It calls
check_ambiguous_redir(not detailed here, but essential) to ensure the filename (often resulting from expansion) does not resolve to multiple files, which would constitute an ambiguous redirect error. - File Descriptor Handling: Based on the redirection type (
type), it calls the appropriate handler function to open the target file and duplicate the relevant file descriptor (dup2). - Error Propagation: If a file opening fails (due to permissions,
EACCES, or file not found,ENOENT), the function prints the specific error message and returnsFAILURE. If this occurs in a child process (mode == CHILD), the child terminates immediately viaexit_error.
| Type | Function |
open Flags |
Action |
|---|---|---|---|
REDIR_OUTPUT ( |
handle_redir_output |
`O_WRONLY | O_CREAT |
REDIR_APPEND ( |
handle_redir_append |
`O_WRONLY | O_CREAT |
REDIR_INPUT ( |
handle_redir_input |
O_RDONLY |
Opens file for reading. Redirects to fd_redir (default |
The Heredoc operator requires dynamic I/O creation to pass multi-line user input to the command.
- Pipe Creation: A temporary pipe (
pipe(pipe_fd)) is created. This pipe serves as the virtual file that will hold the heredoc content. - Content Expansion and Writing: The function iterates through the lines previously collected from the user (
redir->heredoc_lines).- Conditional Expansion: If the original delimiter was unquoted (
redir->expand == TRUE), the line is sent toexpand_line_heredocfor environment variable substitution before being written. - Each line is written sequentially to the write end of the pipe (
pipe_fd[1]).
- Conditional Expansion: If the original delimiter was unquoted (
- I/O Redirection:
- The write end (
pipe_fd[1]) is closed after all content is written. - The read end (
pipe_fd[0]) is duplicated ontoSTDIN_FILENOusingdup2.
- The write end (
- Cleanup: The temporary read end of the pipe is closed. The command process now reads its input directly from the pipe, which contains the buffered heredoc text.
This function performs variable substitution specifically on the lines collected within the heredoc:
- The input line is treated as a temporary token (
t_token). - It utilizes a simplified version of the main expansion logic to find and substitute
$variables (e.g.,$USER) and special variables (e.g.,$?) within the line. - The expanded line is returned and subsequently written to the pipe. This confirms that, unlike the main shell expansion, heredoc expansion is a single-step, line-by-line substitution without complex features like word splitting or array reorganization.
The Assignment Engine is the subsystem responsible for parsing assignment tokens, validating their syntax and context, determining their scope (Local, Exported, or Temporary), and managing how their values interact with the shell's environment (t_var linked list).
The central function, asignation(t_shell *data, t_token *token, int type), orchestrates the process of converting a token into a persistent environment variable.
- Memory Allocation: Dedicated memory buffers are allocated for the
key(variable name) andvalue(variable content). - Extraction: Utility functions (
aux_key_asig,aux_value_asig) carefully extract the key and value from the raw token string (token->value):- Key Extraction: Stops at the first unquoted
=sign, while accounting for the+in+=. - Value Extraction: Collects all characters after the
=sign. - Impact: This step separates the semantic components needed for environment manipulation.
- Key Extraction: Stops at the first unquoted
Before creating a new variable, the system must check if the variable already exists using verify_if_already_set.
- Lookup: The environment linked list (
data->env.vars) is traversed usingft_strcmpon thekey. - Update Logic (
handle_existing_value): If the variable is found, its value and type are updated based on the assignment mode (t):- Standard Assignment (
LOCAL,ENV, etc.): The old value is freed, and the new value is stored. - Concatenation (
PLUS_ASIGNATION):handle_plus_assignationconcatenates the new value onto the existing value usingft_strjoin. - Type Promotion: If a variable currently marked as
LOCALis assigned via anEXPORTcommand, its type is promoted toENV(update_variable_type), ensuring it persists in the child processes.
- Standard Assignment (
- Result: If the variable exists and is updated, the function returns
TRUE. If not found, it returnsFALSE, triggering the creation of a new variable.
If the variable does not exist, add_var_and_envp is called to create a new t_var node and append it to the environment list.
- Type Determination: If the original token type was
PLUS_ASIGNATIONand the variable is new,is_it_env_or_localperforms a backward search to determine if the assignment belongs to anEXPORTcontext (ENV) or a general command line context (LOCAL). This resolves the ambiguous nature of a new+=assignment.
Minishell meticulously tracks variable scope to emulate Bash's environment persistence rules.
| Type | Context | Persistence | Example |
|---|---|---|---|
LOCAL |
VAR=value at the start of the line or after a delimiter without export. |
Parent Shell Only. Used for internal shell variables; not passed to execve. |
MYVAR=1; echo $MYVAR |
ENV |
export VAR=value |
Exported. Passed to child processes and visible in env. |
export MYVAR=1 |
TEMP\_ASIGNATION |
VAR=value cmd |
Temporary. Applied only to the environment of the immediate child process executing cmd, then cleaned up in the parent. |
TEMP=1 ls |
PLUS\_ASIGNATION |
VAR+=value (concatenation) |
Handled by handle_plus_assignation to append values. |
VAR=a; VAR+=b |
The transform_asig_to_temp logic handles the transition to temporary assignments, which is crucial for managing the environment stack during execution:
- If an assignment is followed by a command, built-in (non-export), or a subshell, its type is converted to
TEMP_ASIGNATION. - During execution, the
exec_commandandexec_builtinfunctions useclean_temp_variablesto remove these variables from the parent shell's environment immediately after the child process finishes, ensuring the parent's environment remains unchanged.
Before an assignment is processed, its structure must be validated to ensure it adheres to shell naming conventions.
| Rule | Function | Description |
|---|---|---|
| Valid Key Name | check_invalid_char |
The variable key must start with a letter or _. It can only contain alphanumeric characters or _. |
Presence of = |
count_syntax |
Ensures the token contains at least one = and that there is text preceding it. |
| Export Syntax | check_invalid_char_exp |
A simplified check for the format export VAR (Type EXP), ensuring no illegal characters are present throughout the string. |
int asignation(t_shell *data, t_token *token, int type)
{
char *key;
char *value;
int result;
int i;
key = NULL;
value = NULL;
i = 0;
if (aux_mem_alloc_asignation(&key, &value, ft_strlen(token->value)) == ERROR)
exit_error(data, ERR_MALLOC, EXIT_FAILURE);
aux_key_asig(token, &key, &i);
aux_value_asig(token, &value, &i);
result = verify_if_already_set(data, key, &value, type);
if (result == TRUE || result == IGNORE)
{
free (key);
free (value);
}
else if (result == FALSE)
{
if (type == PLUS_ASIGNATION)
is_it_env_or_local(data, &type, token->id);
add_var_and_envp(data, key, value, type);
}
return (0);
}The Assignment Engine is arguably the hardest extra feature of the Minishell project because it requires managing the shell's persistent global state while simultaneously respecting transient, local process environments. Unlike simple command execution (which relies on fork/execve), assignments force the shell to become a data manager with complex scoping rules.
- Scope Interception: The shell must intercept a token (
VAR=value) and decide its destiny before it is ever seen as a command argument. This requires two-tiered validation:- Syntactic Complexity: Determining if a token like
VAR+=1orVAR=1is correctly formatted (check_asignation_syntax). - Contextual Dependency: Determining if the token's neighbors permit it to be an assignment (e.g., must not follow an unexported command), which is handled by the
check_externs_syntaxlogic.
- Syntactic Complexity: Determining if a token like
- Environment State Integrity: Changes made by built-ins (
export,unset) must modify the parent process's state. Failure to execute these directly in the parent would prevent the changes from persisting. - The Temporary Assignment Flaw: This is the highest hurdle. The shell must distinguish between:
export VAR=1: Persists everywhere.VAR=1: Persists only in the parent shell (LOCAL).VAR=1 ls: Persists only for thelschild process (TEMP_ASIGNATION).
The solution, which involves flagging variables as TEMP and ensuring they are cleaned up immediately after the child process finishes, is essential for correct emulation of Bash's environment inheritance.
int verify_if_already_set(t_shell *data, char *key, char **value, int t)
{
t_var *var;
int result;
result = 0;
var = data->env.vars;
while (var)
{
if (ft_strcmp(var->key, key) == 0)
{
result = handle_existing_value(data, var, value, t);
if (result == IGNORE)
return (IGNORE);
else if (result == ERROR)
{
free (key);
free (*value);
exit_error(data, ERR_MALLOC, EXIT_FAILURE);
}
update_variable_type(var, t);
return (TRUE);
}
var = var->next;
}
return (FALSE);
}Minishell uses seven distinct variable types (including two for expansion purposes) to accurately track variable state, scope, and modification intent within the environment linked list.
| Type | Context/Purpose | Persistence Scope | Modification |
|---|---|---|---|
ENV |
Created by export VAR=value. |
Persistent. Visible to the parent shell and inherited by all child processes via execve. |
Standard assignment (=). |
LOCAL |
VAR=value at the start of the line, outside of export. |
Parent Shell Only. Used by the shell internally; not automatically passed to children via execve. |
Standard assignment (=). |
EXP |
Created by export VAR (without a value). |
Exported, but Value is NULL. Marks the variable for export, but it holds no value until assigned. | Used for initial setup in the export built-in. |
PLUS\_ASIGNATION |
Used internally by the engine to flag the intent to concatenate (+=). |
Not a storage type; it's a behavioral flag used by verify_if_already_set to trigger string appending (e.g., VAR=a; VAR+=b results in VAR=ab). |
Concatenation (+=). |
TEMP\_ASIGNATION |
Identified as VAR=value cmd (precedes an external command). |
Transient/Child-Specific. Added to the child's environment before execve and removed immediately upon return to the parent shell. |
Standard assignment (=). |
TEMP\_PLUS\_ASIGNATION |
Identified as VAR+=value cmd. |
Transient/Child-Specific. Concatenates the value and applies it only to the child's environment. | Concatenation (+=). |
- Promoting Variables: If a variable is currently
LOCALbut is subsequently assigned usingexport, its type is automatically promoted toENV(update_variable_type), reflecting its new persistent scope. - Cleaning Temporary State: The
exec_commandandexec_builtinfunctions are responsible for callingclean_temp_variablesafter a child process exits. This cleanup step is vital: it iterates through the environment list and removes all variables flagged asTEMP_ASIGNATIONorTEMP_PLUS_ASIGNATION, restoring the parent environment's original state.
Minishell's built-in commands are vital for core shell functionality and environment management. Unlike external commands, they are executed directly within the running shell process (exec_builtin), ensuring their effects (e.g., changing directory or environment variables) are persistent.
The my_export function orchestrates argument processing, scope promotion, and conditional variable listing.
- Conditional Listing: If the command is called without arguments (i.e.,
node->arg_typesisNULL),my_exportimmediately callsprint_env_variablesto iterate over the entire environment list (t_env) and output only variables flagged asENVorEXPin thedeclare -xformat. - Argument Processing Loop: If arguments are present, the function loops through the
node->arg_typesarray, which contains the indices of the arguments in the main token array. - Assignment Delegation: For each argument index, it calls
asignation_typeto delegate the variable creation or update based on the token's classification. - Context Boundary Check: The loop includes a critical
check_for_valid_argscheck to ensure processing stops immediately if it encounters a delimiter (PIPE,AND,OR,PAREN_OPEN). This respects the AST's logical boundaries.
The export flow determines the variable's ultimate persistence and value mechanism via the central asignation function, delegating the type parameter:
| Token Type | Assignment Delegation | Persistence Flow |
|---|---|---|
ASIGNATION (VAR=value) |
Type ENV |
Promotion: The variable is added/updated in the environment and flagged as ENV, ensuring it is passed to all future child processes via execve. |
WORD (VAR) |
Type EXP |
Exported but Unset: The variable is added to the environment list and flagged as EXP. It holds a NULL value until a standard assignment occurs later, but its status as an exported variable is secured. |
PLUS_ASIGNATION (VAR+=value) |
Type PLUS_ASIGNATION |
Concatenation: The assignment engine uses the handle_plus_assignation logic in verify_if_already_set to append the new value to the existing variable's content. The final variable type is promoted to ENV. |
The node->arg_types array is an engineering solution for managing arguments in the AST.
- Function: It is an array of integers that stores the dynamic index (
id) of each argument token relative to the main token array. - Necessity: For
export, arguments are not just simple words; they must beASIGNATIONorWORDtokens whose original syntax and value must be retrieved. By storing the index,my_exportcan efficiently look back into the primary token array to access the exacttoken->value(e.g., the string"MYVAR=10"or"MYVAR") and its semantic type.
Wildcard tokens within export arguments require specialized, in-place processing to ensure shell rules are followed:
- Pre-check: If a token is detected as a
WILDCARD(e.g.,export *VAR=value), the function callsexpand_wildcardson that specific token. - In-Place Expansion:
expand_wildcardsreplaces the single wildcard token with a list of zero or more matchingWORDtokens (if any matches are found). - Post-Expansion Check: The resulting tokens (which may be a list of filenames or the original unexpanded string) are then immediately checked again for valid assignment syntax using
check_asignation_syntax. - Error Handling: If the expanded result fails the assignment syntax check, an
ERR_EXPORTis printed, but processing continues for other arguments. This complex flow ensures the export command handles dynamic string generation correctly.
The cd (change directory) command is one of the most state-intensive built-ins, requiring careful management of path and environment variables.
- Argument Validation:
my_cdfirst checks that it receives exactly one valid argument (or zero, for HOME). - Path Resolution:
- If called without arguments, it defaults to the path stored in the
HOMEenvironment variable. - If called as
cd -, it attempts to move to the previous directory stored inOLDPWD.
- If called without arguments, it defaults to the path stored in the
- Directory Validation (
validate_and_move): Before calling the system functionchdir(), it performs rigorous checks:- Existence:
stat()confirms the target path exists. - Type:
S_ISDIR()confirms the path points to a directory, not a file. - Permissions:
access()confirms the shell has execute permission (X_OK).
- Existence:
- State Update (Critical): After a successful
chdir():- The value of the existing
PWDvariable is copied toOLDPWD(using the path before the move). - The current working directory is retrieved using
getcwd()and is used to update the new value ofPWD. This ensures the shell's internal environment accurately reflects the actual filesystem location.
- The value of the existing
The unset built-in permanently removes variables from the shell's environment.
- Argument Validation: It checks that argument names adhere to variable naming rules (letters/underscore followed by alphanumeric/underscore).
- Deletion: For each valid argument name, it calls
delete_varon the environment linked list (t_env). This function is responsible for safely removing the correspondingt_varnode, updating theprevandnextpointers, and freeing the associated memory (key and value). - Impact: Deleting the node removes the variable from the parent shell's memory, ensuring it is no longer visible to subsequent commands or inherited by future child processes.
Certainly. Let's dive deeper into the core logic of the export built-in and the crucial environment cleanup utility, my_clean_unset, highlighting how they manage the shell's state and memory.
The my_clean_unset function is a non-user-facing, internal utility critical for enforcing process isolation and the Transient Scope rules of temporary assignments. It ensures that temporary variables do not pollute the parent shell's environment.
When the shell executes a temporary assignment (e.g., VAR=1 ls):
VAR=1is correctly added to the child's environment beforelsruns.- The parent shell, however, must ensure that
VAR=1is removed from its own environment immediately after thelscommand finishes. If it didn't,VARwould persist as a "leak."
The function operates on the list of temporary assignments collected on the command node (e.g., node->assig_tmp) and works in conjunction with the my_unset logic.
- Iteration on Temp Tokens:
my_clean_unsetiterates through all tokens that were identified asTEMP_ASIGNATIONorTEMP_PLUS_ASIGNATIONfor the recently executed command. - Key Extraction: For each temporary token, it carefully extracts the variable key (the name before the
=or+sign).- This is achieved by searching for the delimiter (
=or+) within the token's value and copying the substring before it.
- This is achieved by searching for the delimiter (
- Runtime Deletion: It calls the core deletion function
delete_varon the parent shell's global environment list using the extracted key. - Impact: This action enforces the transient nature of the variable. By calling
delete_varafter the child process (or built-in) execution, the shell guarantees that the variable only existed for the duration of that single command, keeping the parent shell's environment clean and stable.
This entire mechanism guarantees that variables flagged as TEMP will not affect the global state, ensuring a clean and logical flow that respects the parent shell's environment integrity.
The remaining built-in commands handle simpler tasks related to I/O and shell termination:
| Built-in | Function | Core Logic |
|---|---|---|
echo |
my_echo |
Prints arguments to STDOUT. Implements special logic to detect and handle the -n flag and its variations (e.g., -nnnn), suppressing the final newline character if found. |
pwd |
my_pwd |
Prints the current working directory. Primarily uses the system call getcwd(). It includes fallback logic to use the PWD environment variable if getcwd() fails (e.g., if the current directory was deleted). |
env |
my_env |
Prints the environment variables. Iterates through the t_var list and prints only variables marked with the ENV type in the KEY=VALUE format. It fails if any arguments are provided. |
exit |
my_exit |
Terminates the shell. Validates the number of arguments (must be 0 or 1). If 1 argument is provided, it verifies the argument is a valid numeric value and uses it as the exit status (modulo 256). Sets the shell's exit status and calls exit_succes for final termination. |
static void asignations(t_shell *data, t_token *token)
{
if (token->type == ASIGNATION)
data->exit_code = asignation(data, token, LOCAL);
else if (token->type == PLUS_ASIGNATION)
data->exit_code = asignation(data, token, PLUS_ASIGNATION);
}
static void env_cmds(t_shell *data, t_env *env, t_token *token, t_node *node)
{
if (ft_strcmp(token->value, BUILTIN_EXPORT) == 0)
data->exit_code = my_export(data, data->prompt.tokens, env, node);
else if (ft_strcmp(token->value, BUILTIN_UNSET) == 0)
data->exit_code = my_unset(data, env, node->args);
else if (ft_strcmp(token->value, BUILTIN_ENV) == 0)
data->exit_code = my_env(env->vars, node->args);
}
static void basic_builtins(t_shell *data, t_token *token, t_node *node)
{
if (ft_strcmp(token->value, BUILTIN_ECHO) == 0)
data->exit_code = my_echo(node->args);
else if (ft_strcmp(token->value, BUILTIN_PWD) == 0)
data->exit_code = my_pwd(data);
else if (ft_strcmp(token->value, BUILTIN_EXIT) == 0)
my_exit(data, node->args);
else if (ft_strcmp(token->value, BUILTIN_CD) == 0)
data->exit_code = my_cd(data, node->args);
}
void which_builtin(t_shell *data, t_token *token, t_node *node)
{
asignations(data, token);
env_cmds(data, &data->env, token, node);
basic_builtins(data, token, node);
}This final file, minishell_structs.h, provides the comprehensive definitions for all the structures and enumerations used throughout your Minishell project. This information is crucial for understanding the data model that underpins the entire execution flow.
Here is the detailed documentation for the core data structures, organized by their function in the Minishell pipeline.
The minishell_structs.h file defines the persistent and transient data structures that manage the shell's state, input processing, and execution context.
The t_type enumeration is the backbone of the shell's semantic system. Every token, variable, and AST node is categorized by one of these types, guiding the parser and executor logic.
- Logical/Sequence:
SEMICOLON,AND,OR,PIPE. - Grouping:
PAREN_OPEN,PAREN_CLOSE,SUBSHELL. - Redirection:
REDIR_INPUT,REDIR_OUTPUT,REDIR_APPEND,REDIR_HEREDOC. - Metacharacters:
BACKGROUND,WILDCARD,EXPANSION.
- Commands:
COMMAND,BUILT_IN,SCRIPT_ARG. - General Content:
WORD,FILENAME,DELIMITER(Heredoc delimiter).
- Local/Exported:
ASIGNATION,LOCAL,ENV,EXP(Exported, Unset). - Concatenation:
PLUS_ASIGNATION. - Temporary/Transient:
TEMP_ASIGNATION,TEMP_PLUS_ASIGNATION.
NO_SPACE: Used by the Tokenizer to flag mandatory token concatenation.DONT_ELIMINATE: Used by the Simplifier to override concatenation rules.INDIFERENT,DELETE,NEW_TOKEN_TO_ORGANIZE: Internal markers for cleanup and array reorganization.
These structures manage the data flow from input to final execution.
The atomic unit of the parser.
id(int): Dynamic Index. The current index in the token array, updated frequently during array modification (simplification, expansion).hash(int): Permanent Identifier. A fixed value used to maintain the link to the corresponding AST node (t_node) throughout array reorganization.type(t_type): The final semantic role of the token (e.g.,COMMAND,WILDCARD).value(char *): The final, cleaned string content.single_quoted/double_quoted(bool): Flags indicating the original quoting context, crucial for deferred expansion rules.
Manages the input and the dynamic token array.
input(char *): The raw command line string read from the user.tokens(t_token *): The dynamically allocated array holding all generated tokens.n_tokens/n_alloc_tokens(int): Counters for the current number of tokens and the allocated capacity (used for dynamic resizing).before_tokens_type(int *): Array used to save the originalt_typeof tokens before expansion, necessary for conditional word splitting (refer to Section 3.5).
The node structure for the environment linked list.
key/value(char *): The variable name and its assigned value.type(t_type): The persistence scope of the variable (ENV,LOCAL,EXP).next/prev(t_var *): Pointers for the doubly linked list, enabling efficient insertion and deletion byexportandunset.
These structures define the command structure and the shell's global execution state.
The building block of the Abstract Syntax Tree (AST).
type(t_type): The operator or command type (e.g.,PIPE,COMMAND,SUBSHELL).left/right(t_node *): Pointers defining the hierarchical relationship of the AST.token(t_token *): A pointer to the primary token (e.g., the command name or the pipe symbol) that the node represents.token_hash(int): Stores the permanent hash of the token for link restoration after array modification.args(char **): The final argument vector (argv) passed toexecve.assig_tmp(char **): Array of transient assignments (TEMP_ASIGNATION) applied only to this node's execution context.redir(t_redir *): Linked list of all I/O redirections associated with this command.background(bool): Flag set if the command should run in the background (&).
Details for each I/O redirection operation.
-
type(t_type): The operator type ($\lt, \gt, \lt\lt$ , etc.). -
filename(char *): The target file path. -
fd_redir(int): The file descriptor to be redirected (e.g., 0 for stdin, 1 for stdout, or an explicit number like2>). -
heredoc_lines(t_list *): A linked list storing the collected input lines forREDIR_HEREDOC. -
expand(bool): Flag to indicate if variables in the heredoc content should be expanded.
The master structure containing all global state.
env(t_env): The structure managing the environment variables list andenvparray.prompt(t_prompt): The structure managing the current input and tokens.ast_root(t_node *): Pointer to the root of the active AST.exec(t_exec): Contains original standard file descriptors (stdin,stdout) for restoration after redirection.exit_code(int): Stores the exit status of the last executed command ($?).error_state(bool): A flag used internally to signal critical errors that should stop the execution flow.
A core robustness feature is the shell's ability to recognize incomplete commands (unbalanced quotes, unclosed parentheses, etc.) and seamlessly prompt the user for continuation until the command is logically complete.
Grabacion.de.pantalla.2025-11-25.a.las.18.23.00.mov
- Initial Check: The function
read_until_balancedtakes the initial input line and passes it to thecheck_global_balancefunction. - Global Balance Audit:
check_global_balancedelegates the task to several sub-functions that check specific syntactic components:- Quotes: Checks for unmatched
SINGLE_QUOTEandDOUBLE_QUOTEtokens. - Pipes/Logic: Checks for operators (
PIPE,AND,OR) that lack a necessary operand (e.g.,ls |requires continuation). - Parentheses:
get_paren_balancechecks the overall balance ofPAREN_OPENandPAREN_CLOSEtokens.
- Quotes: Checks for unmatched
- Continuation Loop (
join_lines_until_balanced):- If
check_global_balancereturnsKEEP_TRYING(unclosed quotes/pipes) or an unbalance count greater than zero (unclosed parentheses), the shell enters a loop. - The user is prompted with
>. - The new line is concatenated with the previous lines, separated by a space (
ft_strjoin_multi). - The process repeats until
BALANCEis achieved or the user sends an EOF signal.
- If
- Syntax Error Handling: If
check_global_balancereturnsCANT_CONTINUE(e.g., a closing parenthesis without an opening one, or similar severe error), the process breaks, and the error is flagged, preventing the AST construction.
This feature is a sophisticated form of proactive error handling that prevents the frustrating "command not found" error by suggesting corrections for misspelled built-in commands. It is deliberately engineered for intelligence and minimal disruption.
Grabacion.de.pantalla.2025-11-25.a.las.18.21.29.mov
The core of this feature is the function find_match, which employs an efficient heuristic to detect typographical errors with precision:
-
Length Tolerance: The function strictly compares the length of the user's input against the list of known built-ins. A match is only considered if the length difference is at most
$\pm 1$ character. This immediately eliminates distant or irrelevant suggestions. -
Character Alignment: It uses a modified character-by-character alignment to tolerate single common mistakes:
-
Deletion: (e.g., typing
ehoinstead ofecho). -
Insertion: (e.g., typing
echopinstead ofecho). -
Substitution/Transposition: (e.g., typing
exhoinstead ofecho).
-
Deletion: (e.g., typing
-
Annoyance Prevention (UX Focus): The logic is tuned to avoid being "fastidiosa" (annoying):
- Inputs consisting of a single space or containing symbols are ignored.
- Single-character inputs are ignored unless they are highly ambiguous for a core command (e.g., typing
cordis still processed as a possible typo forcd).
The system is designed to be highly interactive via the ask_confirmation utility:
- Suggestion Prompt: When a probable typo is found, the user is immediately prompted with a clear question using specialized color coding:
"Did you mean %s? y/n". - Input Loop: The process waits for the user to confirm (
y/yes) or deny (n/no) the suggestion, looping until a valid response is given. - In-Place Correction:
- If the user confirms, the misspelled token's
valuestring is freed and replaced with the correct built-in name. - The corrected token then continues through the pipeline, where subsequent passes of
transform_tokens_logicautomatically recognize the fixed built-in name and correctly flag the token type asBUILT_IN.
- If the user confirms, the misspelled token's
- Error Handling on Denial: If the user declines the suggestion, the
cmd_correctionfunction returns aFAILUREflag, forcing the shell to clean up the prompt and gracefully return to the main loop, preventing the command from proceeding to execution.
This interactive approach turns a potential hard failure into a soft, informative recovery, significantly enhancing the user's workflow.
The shell initializes with a customized banner and welcome sequence:
Grabacion.de.pantalla.2025-11-25.a.las.18.11.24.mov
- Banner:
print_minishell_titledisplays an ASCII art banner using color-coded macros (T1-T5). - User Identity: The
find_userfunction attempts to retrieve the user's name from theUSERenvironment variable. If unsuccessful, it falls back to prompting the user for their login. - Time-of-Day Greeting:
print_time_of_daydetermines the local time and displays a specialized greeting based on the hour (e.g.,Good morning,Burning the midnight oil?).
- Start Time: The session start time is recorded and printed upon initialization.
- End Time:
print_session_endis called when the shell exits. It calculates the elapsed duration in minutes and seconds and prints a farewell message.
The shell fully implements the structural control flow operators required for complex scripting:
- Subshells: Encapsulated commands within parentheses (
(cmd1 | cmd2)) are executed in a new, isolated process usingexec_subshell, preserving I/O state and managing exit status propagation. - Sequencing: Commands separated by semicolons (
;) are executed unconditionally, one after the other, viaexec_semicolon. - Background Execution: Commands ending with the **background operator have their
backgroundflag set. These are executed by a child process, but the parent process immediately prints the PID and does not wait (waitpidis skipped), allowing the user to continue interacting with the shell immediately.
┌───────────────────────────┐
│ MAIN LOOP │
│───────────────────────────│
│ signals() // PARENT │
│ init(&data) │
└─────────────┬─────────────┘
│
▼
┌──────────────────────┐
│ prompt(&input) │
│(waits for user input)│
└──────────┬───────────┘
(SIGINT) │ [EOF/exit]
PARENT │ clean & exit
cleans │
prompt ▼
┌────────────────────────────────┐
│ 1) TOKENIZER │
│ - Divides the tokens │
│ - Identifies pipes, redirs │
│ - Manages quotes │
└───────────────┬────────────────┘
▼
┌────────────────────────────────┐
│ 2) EXPANSION │
│ - Substitution $VAR, $?, tildes│
│ - Expands wildcards │
│ - Respects quotes │
└───────────────┬────────────────┘
▼
┌───────────────────────────────┐
│ 3) AST BUILDER │
│ - Creates the AST │
│ - Gathers cmds/args │
│ - Orders pipes/redirs │
└──────────────┬────────────────┘
▼
┌───────────────────────────────┐
│ 4) EXECUTOR │
│ - Iterates over AST │
│ - fork() for commands │
│ - Redirections │
│ - Pipes │
└──────────────┬────────────────┘
│
┌────────────┴─────────────┐
▼ ▼
┌─────────────────────┐ ┌─────────────────────┐
│ PARENT │ │ CHILD │
│──────────────────── │ │──────────────────── │
│ - Waits with │ │ - Restores Signals │
│ waitpid() │ │ SIG_DFL │
│ - Manages Signals │ │ - Executes CMDS │
│ (Ctrl+C) │ │ - If signal → Dies │
└─────────────────────┘ └─────────────────────┘
│ │
│ (exit/signal) │
└───────────┬───────────────┘
▼
┌─────────────────────────────┐
│ clean_data(data) │
│ Goes back to the MAIN LOOP │
└─────────────────────────────┘