Compare similarity of terms/expressions using NLTK?

I’m trying to compare terms/expressions which would (or not) be semantically related – these are not full sentences, and not necessarily single words; e.g. –

‘Social networking service’ and ‘Social network’ are clearly strongly related, but how to i quantify this using nltk?

Clearly i’m missing something as even the code:

w1 = wordnet.synsets('social network')

returns an empty list.

Any advice on how to tackle this?

Best answer

There are some measures of semantic relatedness or similarity, but they’re better defined for single words or single expressions in wordnet’s lexicon – not for compounds of wordnet’s lexical entries, as far as I know.

This is a nice web implementation of many similarity wordnet-based measures

Some further reading on interpreting compounds using wordnet similarity (although not evaluating similarity on compounds), if you’re interested:

  • CiteSeerX (citations are clearer)
  • Same article, PDF