Francisco (Paco) Guzmán

Paco Guzmán is a Research Scientist working on Translations. His research has been focused on several aspects of Machine Translation including low-resource translation, translation mining, evaluation, and quality estimation.
Over the years, Paco has been a speaker and panelist on several events dedicated to increasing diversity in AI. He co-founded the Facebook-Georgia Tech co-teaching program. Before joining Facebook in 2016, Paco was a Research Scientist at Qatar Computing Research Institute in Qatar in 2012-2016. He obtained his PhD in 2011 from ITESM (Monterrey Tech) in Mexico. Paco visited Carnegie Mellon University in 2008-2009 where he worked at the Language Technologies Institute.

Paco's Publications

December 04, 2020

NLP

Massively Multilingual Document Alignment with Cross-lingual Sentence-Mover's Distance

Document alignment aims to identify pairs of documents in two distinct languages that are of comparable content or translations of each other. Such aligned data can be used for a variety of NLP tasks from training cross-lingual representations to mining parallel data for machine translation.…

Ahmed El-Kishky, Francisco Guzman

December 04, 2020

November 16, 2020

NLP

RESEARCH

CCAligned: A Massive Collection of Cross-Lingual Web-Document Pairs

Cross-lingual document alignment aims to identify pairs of documents in two distinct languages that are of comparable content or translations of each other. In this paper, we exploit the signals embedded in URLs to label web documents at scale with an average precision of 94.5% across different language pairs.…

Ahmed El-Kishky, Vishrav Chaudhary, Francisco Guzmán, Philipp Koehn

November 16, 2020

August 31, 2020

RESEARCH

NLP

Unsupervised Quality Estimation for Neural Machine Translation

Quality Estimation (QE) is an important component in making Machine Translation (MT) useful in real-world applications, as it is aimed to inform the user on the quality of the MT output at test time. Existing approaches…

Marina Fomicheva, Shuo Sun, Lisa Yankovskaya, Frédéric Blain, Francisco Guzmán, Mark Fishel, Nikolaos Aletras, Vishrav Chaudhary, Lucia Specia

August 31, 2020

July 06, 2020

RESEARCH

NLP

Are We Estimating or Guesstimating Translation Quality?

Recent advances in pre-trained multilingual language models lead to state-of-the-art results on the task of quality estimation (QE) for machine translation. A carefully…

Shuo Sun, Francisco Guzmán, Lucia Specia

July 06, 2020

October 31, 2019

RESEARCH

NLP

The FLORES Evaluation Datasets for Low-Resource Machine Translation: Nepali–English and Sinhala–English

For machine translation, a vast majority of language pairs in the world are considered low-resource because they have little parallel data available.…

Francisco Guzmán, Peng-Jen Chen, Myle Ott, Juan Pino, Guillaume Lample, Philipp Koehn, Vishrav Chaudhary, Marc'Aurelio Ranzato

October 31, 2019

August 02, 2019

RESEARCH

NLP

Low-Resource Corpus Filtering using Multilingual Sentence Embeddings

In this paper, we describe our submission to the WMT19 low-resource parallel corpus filtering shared task…

Vishrav Chaudhary, Yuqing Tang, Francisco Guzmán, Holger Schwenk, Philipp Koehn

August 02, 2019

August 02, 2019

RESEARCH

NLP

Findings of the WMT 2019 Shared Task on Parallel Corpus Filtering for Low-Resource Conditions

Following the WMT 2018 Shared Task on Parallel Corpus Filtering (Koehn et al., 2018), we posed the challenge of assigning sentence level quality scores for very noisy corpora of sentence pairs crawled from the web …

Philipp Koehn, Francisco Guzmán, Vishrav Chaudhary, Juan Pino

August 02, 2019

May 04, 2019

NLP

SPEECH & AUDIO

Design and Evaluation of a Social Media Writing Support Tool for People with Dyslexia

People with dyslexia face challenges expressing themselves in writing on social networking sites (SNSs)…

Shaomei Wu, Lindsay Reynolds, Xian Li, Francisco Guzmán

May 04, 2019