aboutsummaryrefslogtreecommitdiff
path: root/textproc/py-gensim/pkg-descr
blob: dbdd5b4fab4f108f2a343a25cf9d78d3f85c0b6d (plain) (blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
Gensim is a Python library for topic modelling, document indexing and similarity
retrieval with large corpora. Target audience is the natural language processing
(NLP) and information retrieval (IR) community.

Features:
* All algorithms are memory-independent w.r.t. the corpus size (can process
  input larger than RAM, streamed, out-of-core),
* Intuitive interfaces
  * easy to plug in your own input corpus/datastream (trivial streaming API)
  * easy to extend with other Vector Space algorithms (trivial transformation
    API)
* Efficient multicore implementations of popular algorithms, such as online
  Latent Semantic Analysis (LSA/LSI/SVD), Latent Dirichlet Allocation (LDA),
  Random Projections (RP), Hierarchical Dirichlet Process (HDP) or word2vec deep
  learning.
* Distributed computing: can run Latent Semantic Analysis and Latent Dirichlet
  Allocation on a cluster of computers.
* Extensive documentation and Jupyter Notebook tutorials.


WWW: https://radimrehurek.com/gensim/