Ask your own question, for FREE!
Mathematics 24 Online
OpenStudy (anonymous):

How many unique substrings are there in the word "christmas"?

OpenStudy (anonymous):

Define unique sub-string.

OpenStudy (anonymous):

a subset of the characters in the original word, in their original order

OpenStudy (anonymous):

Letters would be repeated ?

OpenStudy (anonymous):

think of it this way, take the original word, cross out as many letters as you want, combine those that remain. how many unique words can you make this way?

OpenStudy (anonymous):

\( 2^n-1 \) ? For all different letters.

OpenStudy (anonymous):

no it's not!

OpenStudy (anonymous):

but there's a repeated letter S ... chris is just one word but can be made two ways so shouldn't be counted twice

OpenStudy (anonymous):

Then you have to compute the sum individually.

OpenStudy (anonymous):

Chosing 1 letter , 2 letter , 3 letters, ....

OpenStudy (anonymous):

I can solve it using generating functions.

OpenStudy (anonymous):

It seems like that might be one of the ways to tackle this. Doing cases for how many letters are in the substring.

OpenStudy (anonymous):

Or maybe you can do three cases, when there are no S's, one S and two S's. That might be a little easier to manage.

OpenStudy (anonymous):

The brute force way is just to enumerate all 2^9 substrings and eliminate duplicates; I was wondering if there was something more elegant.

OpenStudy (anonymous):

I still believe that generating function would be good way to solve this.

OpenStudy (anonymous):

oh you want to generate them ? Then you need suffix tree ( http://en.wikipedia.org/wiki/Suffix_tree)

OpenStudy (anonymous):

It's in linear time.

OpenStudy (anonymous):

When you say unique substring, order is important as well right? Like Chris and hcirs are different?

OpenStudy (anonymous):

we're just talking about 2^9 so generating them won't take long, i was just wondering if there was a clever analysis.

OpenStudy (anonymous):

the way I've defined it, hcirs is not a valid substring. I am not allowing permutations.

OpenStudy (anonymous):

unique sub-string means the orders in the actual string is preserved...

OpenStudy (anonymous):

so, hcirs = Chris

OpenStudy (anonymous):

oh oh oh oh. this is no where near as bad as I was thinking then. Forgive me for not knowing the jargon.

OpenStudy (anonymous):

Just choose .. 1, 2,3,4,5,6,7,8,9 and sum them up.

OpenStudy (anonymous):

well, yes, that's the brute force way, enumerate all 2^9 possible substrings and then eliminate duplicates. I'm wondering if there's a way to analyze the number that will result without enumerating them by brute force.

OpenStudy (anonymous):

no that's not brue force way.

OpenStudy (anonymous):

sure it is.

OpenStudy (anonymous):

do you know : if a things of one kind, b things of second and c things of third kind and so on, then how many ways of choosing r objects out of these ? combinatorically not bruteforce.

OpenStudy (anonymous):

I'm not sure what you mean, a*b*c doesn't preserve order. I'm saying the guaranteed-correct, but not-very-smart way to solve this problem is just to enumerate every possible substring (there are only 511 of them) and count how many unique ones there are in the list

OpenStudy (anonymous):

a is a variable.

OpenStudy (anonymous):

what i'm wondering is if there's a more clever way to solve it other than enumerating every possibility

OpenStudy (anonymous):

There is a clever way and I already told you ..

OpenStudy (anonymous):

how?

OpenStudy (anonymous):

do you know how are we get \( 2^n \) subset of a set having exactly \( n \) elements? The combinatoric prove.

OpenStudy (anonymous):

getting*

OpenStudy (anonymous):

there are repeated letters so that doesn't give the correct answer

OpenStudy (anonymous):

Why don't you listen to me ? you don't know the basics even.

OpenStudy (anonymous):

if the letters were all unique then yes, there would be 2^9-1 possible unique substrings. but in this case, the letter 'S' appears twice so that reduces the number of unique substrings. my question is, by how many?

OpenStudy (anonymous):

you are not making any sense

OpenStudy (anonymous):

I am making complete sense.

OpenStudy (anonymous):

let's try with a smaller word so you can understand what i'm saying -- how many unique substrings are there of the word "sas"?

OpenStudy (anonymous):

okay .. that number 2^n do you know we are getting it ?

OpenStudy (anonymous):

yes

OpenStudy (anonymous):

tell me ..

OpenStudy (anonymous):

i understand, for the 3rd time, that 2^n gives you the number of substrings if every string were unique. i'm trying to explain to you why that analysis fails if the letters are not all unique. tell me, what does your method say are the number of unique substrings in the string "sas"?

OpenStudy (anonymous):

Oh nooooo I know this analysis fails.. but the process of developing the analysis works here too.

OpenStudy (anonymous):

So, do you know how we get 2^n combinatorically ?

OpenStudy (anonymous):

yes obviously i know that

OpenStudy (anonymous):

tell me ..

OpenStudy (anonymous):

if you had an n-letter word where each letter were unique, you could consider each letter one bit of an n-bit number, where 0 indicates the letter is absent and 1 indicates it is present. each n-bit number (from 0..2^n-1) is a unique substring. what i keep trying to explain is that this analysis does not work if a letter appears more than once!

OpenStudy (anonymous):

Oh you don't know the combinatorial proof.

OpenStudy (anonymous):

explain it then

OpenStudy (anonymous):

If you have n letter word then the number of subset of it given by \( \sum \limits_{k=0}^n \binom{n}{k} = 2^n\) since we ignore the empty subset for some analysis we ignore k =1 and we write \( \sum \limits_{k=1}^n \binom{n}{k} = 2^n -1 \)

OpenStudy (anonymous):

The same analysis works here, but choosing r items from n non-distinct items requires generating functions.

OpenStudy (anonymous):

i think you are missing a subtlety to the problem, which is what i keep trying to explain: the distance between the repeated letters matters. the number of unique substrings for "abcdee" is different than "eabcde" even though they have the same letters.

OpenStudy (anonymous):

"abcdee" has a large number of non-unique strings because any string that ends with 'e' can just as easily end with the first e or the second, and the two are indistinguishable

OpenStudy (anonymous):

but that is not true of the case where the repeated letters are at the beginning and the end

OpenStudy (anonymous):

just choosing the letters to be included and excluded doesn't take into account these differences

OpenStudy (anonymous):

Alright, fair enough. Then there is only O(n+L) algorithm that I can do using suffix tree.

OpenStudy (anonymous):

for example "sas" can generate s, sa, sas, ss, and as. but "ssa" can only generate s, sa, ss, and ssa. 4 words vs 5 even though the letters are the same.

OpenStudy (anonymous):

I am sure no closed form is possible with that constraints.

OpenStudy (anonymous):

i agree there is no closed form generally. but a friend and i on the whiteboard today thought the answer might be 2^9 - 2^4 - 1 specifically for the word "christmas", because the two s's generate repeats only when "tma" is excluded, leaving all combinations of "chri" (2^4) as repeats. i was curious to see if someone here might also come up with the same number.

OpenStudy (anonymous):

in any case, thanks for the discussion.

OpenStudy (anonymous):

you are welcome.

Can't find your answer? Make a FREE account and ask your own questions, OR help others and earn volunteer hours!

Join our real-time social learning platform and learn together with your friends!
Can't find your answer? Make a FREE account and ask your own questions, OR help others and earn volunteer hours!

Join our real-time social learning platform and learn together with your friends!