Explicit Semantic Analysis (ESA)
I often receive requests for the implementation of Explicit Semantic Analysis, so I thought to put together a Web page about it :)
At the moment, we do not release the actual source code, as we do not have resources to properly maintain it. Most of the algorithms should be covered in our papers. Below I list a few publicly available ESA implementations in the hope they might be useful for you.
By the way, my Wikipedia preprocessing code (WikiPrep) is available here.
- Wikiprep-ESA (by Cagatay Calli). The exact settings needed to replicate our IJCAI'07 results are listed here. If I understand correctly, this code is suitable for the current (as of 2010) versions
of Wikipedia. If you would like to process old dumps (specifically, the 2005 dump I used), you probably
want to use this version, which you can download from this commit tree.
- Research-ESA Web service + code (by Philipp Sorg). This implementation is based on the Terrier IR Platform.
- WikipediaESA demo + code (by Henning Jacobs)
- ESAlib (by Lukas Zilka)
- ESA Semantic Relatedness (by Joao Gabriel Oliveira)
- EasyESA (by Calli et al.)
- Wikipedia-based Explicit Semantic Analysis (by Philip van Oosten).
Open sourced under the AGPLv3 license. Implemented in Java, using Lucene for indexing. The author says this
may not be a 100% accurate implementation of the original ESA paper, but it is close enough for practical applications.
- If you are aware of additional publicly available ESA implementations (or you have developed your own one :), please
drop me a line and I'll be happy to list it here.
- Evgeniy Gabrilovich and
Shaul Markovitch
"Wikipedia-based Semantic Interpretation
for Natural Language Processing."
Journal of Artificial Intelligence Research, 34:443-498, 2009
[PDF]
- Evgeniy Gabrilovich and
Shaul Markovitch
"Computing Semantic Relatedness
using Wikipedia-based Explicit Semantic Analysis"
Proceedings of
The 20th International Joint Conference on
Artificial Intelligence (IJCAI), Hyderabad, India, January 2007
[Abstract /
PDF]
- Evgeniy Gabrilovich and
Shaul Markovitch
"Overcoming the Brittleness
Bottleneck using Wikipedia: Enhancing Text Categorization with Encyclopedic
Knowledge"
Proceedings of
The 21st
National Conference on Artificial Intelligence (AAAI), pp. 1301-1306, Boston,
July 2006
[Abstract /
PDF]
- Evgeniy Gabrilovich
"Feature Generation for Textual Information Retrieval
Using World Knowledge"
PhD Thesis,
Technion - Israel Institute of Technology,
Haifa, Israel, December 2006
[Abstract /
PDF]