Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add more lex tests #36

Merged
merged 11 commits into from
Jun 26, 2024
59 changes: 51 additions & 8 deletions src/lexer/lex.c
Original file line number Diff line number Diff line change
Expand Up @@ -113,7 +113,7 @@ int lexer_ungetchar(Lexer *l) {
int real_lex(Lexer*, Token*);

/**
* This produces a list of tokens after having been processed by the
* This produces a list of tokens after having been processed by the
* preprocessor. For example, if the code is
* #define MAX_ARRAY 5
* int arr[MAX_ARRAY];
Expand All @@ -125,7 +125,7 @@ int real_lex(Lexer*, Token*);
* ]
* ;
*/
int lex(Lexer* l, Token* t) {
int lex(Lexer *l, Token *t) {
// For now, all we need to do is skip newlines
for (;;) {
real_lex(l, t);
Expand Down Expand Up @@ -319,6 +319,8 @@ int skip_to_token(Lexer *l) {
return -1; // EOF was reached
}

// This is a function for parsing single char tokens
// Now handles all cases of single char tokens
TokenType ttype_one_char(char c) {
switch (c) {
case '(':
Expand Down Expand Up @@ -372,11 +374,15 @@ TokenType ttype_one_char(char c) {
case '?':
return TT_QMARK;
default:
PRINT_ERROR("Token type for token '%c' not recognized", c);
return TT_NO_TOKEN;
if (isdigit(c)) {
return TT_LITERAL;
} else {
return TT_IDENTIFIER;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we know that everything not listed is an identifier?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great question. I think we might need to check for one character symbols that definitely aren't tokens or identifiers. I can only think of things like the @ symbol that isn't either, so we might want to have error handling for that later

}
}
}

// This is a function for parsing exclusively tokens with more than one char
TokenType ttype_many_chars(const char *contents) {
if (STREQ(contents, "auto")) {
return TT_AUTO;
Expand Down Expand Up @@ -546,6 +552,7 @@ TokenType ttype_many_chars(const char *contents) {
return TT_IDENTIFIER;
}

// This is the function for parsing all tokens from strings
TokenType ttype_from_string(const char *contents) {
int len;

Expand All @@ -554,10 +561,7 @@ TokenType ttype_from_string(const char *contents) {
// Single character contents
if (len == 1) {
TokenType token = ttype_one_char(contents[0]);

if (token != TT_NO_TOKEN) {
return token;
}
return token;
}

return ttype_many_chars(contents);
Expand Down Expand Up @@ -654,9 +658,48 @@ static const char *ttype_names[] = {

const char *ttype_name(TokenType tt) { return ttype_names[tt]; }

int test_ttype_many_chars() {
testing_func_setup();

tassert(ttype_many_chars("foo") == TT_IDENTIFIER);
tassert(ttype_many_chars("struct") == TT_STRUCT);
tassert(ttype_many_chars("while") == TT_WHILE);

return 0;
}

int test_ttype_one_char() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to get some representation for failure paths, e.g. invalid tokens?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean as in testing that it fails gracefully and doesn't segfault or something if we ask ttype_string("/* This is not a token. */") or similar.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, testing doesn't segfault, it just prints the assert that failed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't mean that the testing code itself might be broken, I mean testing whether ttype_string might fail gracefully or just crash or something. I think this is good to test because our system should be robust.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ohh I see, yup we definitely should have that added. Same PR or later?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Later is probably fine.

testing_func_setup();

// Use ttype_from_string
tassert(ttype_one_char('a') == TT_IDENTIFIER);
tassert(ttype_one_char('1') == TT_LITERAL);

tassert(ttype_one_char('+') == TT_PLUS);
tassert(ttype_one_char('-') == TT_MINUS);
tassert(ttype_one_char('>') == TT_GREATER);
tassert(ttype_one_char('~') == TT_BNOT);

return 0;
}

int test_ttype_name() {
testing_func_setup();

tassert(strcmp(ttype_name(TT_LITERAL), "literal") == 0);
tassert(strcmp(ttype_name(TT_PLUS), "+") == 0);
tassert(strcmp(ttype_name(TT_SIZEOF), "sizeof") == 0);
tassert(strcmp(ttype_name(TT_WHILE), "while") == 0);

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

testing this function is great! I am worried that as a lookup table with no information about the ordering of the enum, it is a bit fragile though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true. Having this is a good canary in the coal mine for names not working. If the tests break, any code that relies on the function will also break. It's a problem with how naming enums works in C in general so we should be careful with the use of the enum + name function.

return 0;
}

int test_ttype_from_string() {
testing_func_setup();

tassert(ttype_from_string("+") == TT_PLUS);
tassert(ttype_from_string("=") == TT_ASSIGN);

tassert(ttype_from_string("1") == TT_LITERAL);
tassert(ttype_from_string("1.2") == TT_LITERAL);

Expand Down
6 changes: 6 additions & 0 deletions src/lexer/lex.h
Original file line number Diff line number Diff line change
Expand Up @@ -53,3 +53,9 @@ const char *ttype_name(TokenType tt);

// Test for ttype_from_string
int test_ttype_from_string();

int test_ttype_many_chars();

int test_ttype_one_char();

int test_ttype_name();
3 changes: 3 additions & 0 deletions src/lexer/test_lexer.c
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,10 @@
int test_lexer() {
testing_module_setup();

test_ttype_name();
test_ttype_from_string();
test_ttype_many_chars();
test_ttype_one_char();

testing_module_cleanup();
return 0;
Expand Down
Loading