MUSE (Multilingual Unsupervised and Supervised Embeddings) is a Python library that enables faster and easier development and evaluation of cross-lingual word embeddings and natural language processing. This library enables researchers and developers to ship their AI technologies to new languages faster.

Multilingual embeddings for scale

MUSE takes a novel approach to natural language processing. Rather than relying on language-specific training or intermediary translations in order to classify text, it utilizes multilingual word embeddings to enable training across many languages to help developers scale.

MUSE is compatible with fastText, and offers large-scale, high-quality bilingual dictionaries for training and evaluation. It's available on CPU or GPU, in Python 2 or 3.


Get Started

2


  • Clone Muse and get monolingual and cross-lingual word embeddings evaluation datasets.

    1
    2
    cd ./MUSE/
    ./data/get_evaluation.sh
              

  • 3


  • Download monolingual word embeddings.

    1
    2
    3
    4
    # English fastText Wikipedia embeddings
    curl -Lo data/wiki.en.vec https://s3-us-west-1.amazonaws.com/fasttext-vectors/wiki.en.vec
    # Spanish fastText Wikipedia embeddings
    curl -Lo data/wiki.es.vec https://s3-us-west-1.amazonaws.com/fasttext-vectors/wiki.es.vec
              

  • 4


  • Review documentation to familiarize yourself with MUSE dictionaries and word embeddings.

  • 5

  • Experiment with supervised and unsupervised training.

  • More Tools

    FastText

    FastText is a lightweight library designed to help build scalable solutions for text representation and classification.

    Join Us

    Tackle the world's most complex technology challenges

    Join Our Team

    Latest News

    Visit the AI Blog for updates on recent publications, new tools, and more.

    Visit Blog