Pierre Colombo

👨‍🏫 Associate Professor · 🧪 Chief Science Officer

I lead the NLP research at MICS in 🎓 CentraleSupélec (Paris-Saclay) and ⚖️ Equall.ai. My work focuses on making large language models 🤖 reliable, 📏 measurable, and 🚀 deployable in high-stakes domains where precision matters—like law, healthcare, and enterprise systems.


🏛️ Equall.ai — AI for Law

Legal AI demands precision and domain expertise. We built SaulLM—the first LLM for law—and use it to power M&A due diligence automation currently used at Big Laws.


🔬 Research

I believe AI research should be open, reproducible, and useful. I build models that anyone can use, develop evaluation methods that reveal true capabilities, and focus on problems that matter in the real world. Good research creates tools that others can build on—not just papers.

Open, multilingual, and multimodal models

SOTA document retrieval with vision-language
ICLR
Multilinguality in vision-language models
arXiv
Smaller visual document retrievers
arXiv
GPT-4 translation quality with 7B params
COLM
First bilingual French-English LLM
TMLR
Multilingual LLMs for European languages
NeurIPS
Scaling multilingual encoders for Europe
COLM
Cross-tokenizer distillation for LLMs
TMLR
176B open multilingual model
JMLR

Metrics that measure what matters

Fine-grained translation evaluation
TACL
Information-theoretic generation metrics
🏆 Best Paper AAAI
Detection in neural machine translation
TACL
Beyond Mahalanobis for textual OOD
NeurIPS
Fair model comparison methods
NeurIPS

Fair and safe AI systems

Fair generation via mutual information
ACL
Fair classification via differential entropy
ICML
Robust text classification
NeurIPS
Stronger textual adversarial detectors
EMNLP
Robustness in text classification
AAAI

Full publication list

100+ papers · 5000+ citations · Google Scholar

ColPali, SaulLM-141B, EuroLLM, TowerVision...
15 papers
xCOMET, CroissantLLM, Hallucinations...
20 papers
InfoLM, BLOOM, OOD Detection...
18 papers

Teaching & academic service

Teaching
Introduction to Deep Learning
CentraleSupélec
Natural Language Processing
Master M2
Service
Area Chair
ACL EMNLP NeurIPS
Reviewer
ICLR ICML AAAI

Let's collaborate. 🤝 I work with CMU, IST Lisbon, Edinburgh, INRIA, Telecom Paris, Sorbonne Universite and industry partners.
Open to collaborations.