Ask your own question, for FREE!
Computer Science 15 Online
OpenStudy (turingtest):

C programming: loading a dictionary into a trie data structure. Why am I segfaulting?

OpenStudy (turingtest):

here is the code as I wrote it to unload the dictionary (all lowercase letters, words separated by '\n' newline character: /**************************************************************************** * dictionary.c * * * Implements a dictionary's functionality. ***************************************************************************/ #include <stdio.h> #include <stdbool.h> #include <stdlib.h> #include "dictionary.h" typedef struct node { bool isWord; struct node* children[27]; } node; /** * Loads dictionary into memory. Returns true if successful else false. */ bool load(const char* dictionary) { // set index pointer to root node* root = malloc(sizeof(node)); node* index; // open dictionary FILE* dictptr = fopen(dictionary, "r"); if (dictptr == NULL) return false; // until end of file is reached get chars for (int c = fgetc(dictptr); c != EOF; c = fgetc(dictptr)) { // set index to root index = root; // for each line (word) in the dict while(c != '\n') { // if letter is not in trie, malloc new node if (index->children[c % 27] == NULL) index->children[c % 27] = malloc(sizeof(node)); // check that malloc worked if (index->children[c % 27] == NULL) { printf("Could not allocate memory.\n"); return 1; } // update the index pointer to the new node index = index->children[c % 27]; // get next letter c = fgetc(dictptr); } // newline, so the letter we are pointing at ends a word index->isWord = true; } return true; }

OpenStudy (turingtest):

when I run gdb on this I see that it seems to start loading the words correctly, but at about 1650 words I am malloc-ing address 0x50, which gdb says it cannot access the contents of. Am I using up too much memory somehow?

OpenStudy (asnaseer):

At the end of your inner-most while loop you do "c = fgetc(dictptr);" but that inner while loop is not checking for EOF condition.

OpenStudy (turingtest):

Thanks @asnaseer (long time no see btw), but as per your advice I changed the while loop to: while ( c != '\n' && c != EOF) and still I segfault. I wouldn't be expecting the EOF condition anyway, since it segfaults at about the 1650th word out of a dictionary of about 100k. Did I misunderstand your advice, or are we missing something still?

OpenStudy (anonymous):

``` if (index->children[c % 27] == NULL) index->children[c % 27] = malloc(sizeof(node)); ``` This is unsafe. You never initialized these pointers to be `NULL` so technically it is not safe to assume they are. Now I know many compilers will do this initialization, but it is not guaranteed to don't behave as if it is. It's possible you are getting pages of memory with all bytes set to `0` ( `NULL` ), and eventually you hit something else, it tries to deference it, and then segfault.

OpenStudy (anonymous):

What really bothers me about the code though is how you have a loop within a loop. You really should just keep it at a single loop: ``` // initialize root code c = fgetc(dictptr); while (c != EOF) { if (c != '\n') { // traverse tree code } else { // finish word code } c = fgetc(dictprt); } // finish word code ```

OpenStudy (asnaseer):

I believe @wio has spotted the issue. malloc is only guaranteed to allocate an area of memory. If you want this area initialised to zeros then use calloc instead.

OpenStudy (turingtest):

Thank you, @wio , that makes perfect sense. Using gdb, I saw that you were, in fact, correct about the fact that arrays do not get not initialize each element to null with my gcc compiler (I actually already knew that from inspecting my initialized arrays with gdb, but I figured that was a problem I could fix later) , but it never occurred to me that in this case it could lead me to dereferencing a null pointer. I think really should have seen that myself, but hindsight is always 20/20. @asnaseer using calloc fixed all my problems, so thank you as well. I'll now reorganize my code as per @wio 's recommendation. This program is part of a spell-checker. I think this was the hardest part to implement, but I may be back lol. Again, thanks a lot, guys!

OpenStudy (turingtest):

my new version: #include <stdio.h> #include <stdbool.h> #include <stdlib.h> #include "dictionary.h" typedef struct node { bool isWord; struct node* children[27]; } node; /** * Loads dictionary into memory. Returns true if successful else false. */ bool load(const char* dictionary) { // set index pointer to root node* root = calloc(27, sizeof(node)); node* index; // open dictionary FILE* dictptr = fopen(dictionary, "r"); if (dictptr == NULL) return false; // until end of file is reached get chars int c = fgetc(dictptr); // for each line (word) in the dict while(c != EOF) { // set index to root index = root; if(c != '\n') { // if letter is not in trie, malloc new node if (index->children[c % 27] == NULL) index->children[c % 27] = calloc(27, sizeof(node)); // check that calloc worked if (index->children[c % 27] == NULL) { printf("Could not allocate memory.\n"); return 1; } // update the index pointer to the new node index = index->children[c % 27]; } // newline, so the letter we are pointing at ends a word else index->isWord = true; // get next letter c = fgetc(dictptr); } return true; } Looks much cleaner, and seems to work like a charm :) Thanks again!

OpenStudy (turingtest):

my new version: #include <stdio.h> #include <stdbool.h> #include <stdlib.h> #include "dictionary.h" typedef struct node { bool isWord; struct node* children[27]; } node; /** * Loads dictionary into memory. Returns true if successful else false. */ bool load(const char* dictionary) { // set index pointer to root node* root = calloc(27, sizeof(node)); node* index; // open dictionary FILE* dictptr = fopen(dictionary, "r"); if (dictptr == NULL) return false; // until end of file is reached get chars int c = fgetc(dictptr); // until the end of the file is reached while(c != EOF) { // set index to root index = root; if(c != '\n') { // if letter is not in trie, malloc new node if (index->children[c % 27] == NULL) index->children[c % 27] = calloc(27, sizeof(node)); // check that calloc worked if (index->children[c % 27] == NULL) { printf("Could not allocate memory.\n"); return 1; } // update the index pointer to the new node index = index->children[c % 27]; } // newline, so the letter we are pointing at ends a word else index->isWord = true; // get next letter c = fgetc(dictptr); } return true; } Looks much cleaner, and seems to work like a charm :) Thanks again!

Can't find your answer? Make a FREE account and ask your own questions, OR help others and earn volunteer hours!

Join our real-time social learning platform and learn together with your friends!
Can't find your answer? Make a FREE account and ask your own questions, OR help others and earn volunteer hours!

Join our real-time social learning platform and learn together with your friends!