Associate Professor and Co-founder of Equall.ai

My Research…

I hold the position of Associate Professor at CentraleSupelec and serve as the Chief Science Officer at Equall.ai, our esteemed legalTech startup. My expertise lies in the forefront of AI advancement, with a particular focus on Natural Language Processing (NLP).

At Equall.ai, we pioneer the development of AI-driven products meticulously crafted for the legal sector, alongside the formulation of innovative workflows designed to streamline lawyers’ tasks, enabling them to work with unprecedented speed and efficiency.

At MICS CentraleSupelec - Université ParisSaclay, my research focuses on making AI usable in NLP industrial systems. We publish in top-tiers NLP (ACL, NAACL, EMNLP, AACL,TACL) and general AI (NeurIPs, ICML, TMLR, ACL) conferences and journals. I currently focus on:

I am always happy to discuss new projects and collaborations.

Publications

generated by bibbase.org
  2024 (17)
xcomet : Transparent Machine Translation Evaluation through Fine-grained Error Detection. Guerreiro, N. M.; Rei, R.; van Stigt, D.; Coheur, L.; Colombo, P.; and Martins, A. F. T. Trans. Assoc. Comput. Linguistics, 12: 979–995. 2024.
xcomet : Transparent Machine Translation Evaluation through Fine-grained Error Detection [link]Paper   doi   link   bibtex  
A Pseudo-Metric between Probability Distributions based on Depth-Trimmed Regions. Staerman, G.; Mozharovskyi, P.; Colombo, P.; Clémençon, S.; and d'Alché-Buc , F. Trans. Mach. Learn. Res., 2024. 2024.
A Pseudo-Metric between Probability Distributions based on Depth-Trimmed Regions [link]Paper   link   bibtex  
Unsupervised Layer-Wise Score Aggregation for Textual OOD Detection. Darrin, M.; Staerman, G.; Gomes, E. D. C.; Cheung, J. C. K.; Piantanida, P.; and Colombo, P. In Wooldridge, M. J.; Dy, J. G.; and Natarajan, S., editor(s), Thirty-Eighth AAAI Conference on Artificial Intelligence, AAAI 2024, Thirty-Sixth Conference on Innovative Applications of Artificial Intelligence, IAAI 2024, Fourteenth Symposium on Educational Advances in Artificial Intelligence, EAAI 2014, February 20-27, 2024, Vancouver, Canada, pages 17880–17888, 2024. AAAI Press
Unsupervised Layer-Wise Score Aggregation for Textual OOD Detection [link]Paper   doi   link   bibtex  
Towards More Robust NLP System Evaluation: Handling Missing Scores in Benchmarks. Himmi, A.; Irurozki, E.; Noiry, N.; Clémençon, S.; and Colombo, P. In Al-Onaizan, Y.; Bansal, M.; and Chen, Y., editor(s), Findings of the Association for Computational Linguistics: EMNLP 2024, Miami, Florida, USA, November 12-16, 2024, pages 11759–11785, 2024. Association for Computational Linguistics
Towards More Robust NLP System Evaluation: Handling Missing Scores in Benchmarks [link]Paper   link   bibtex  
Enhanced Hallucination Detection in Neural Machine Translation through Simple Detector Aggregation. Himmi, A.; Staerman, G.; Picot, M.; Colombo, P.; and Guerreiro, N. In Al-Onaizan, Y.; Bansal, M.; and Chen, Y., editor(s), Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, EMNLP 2024, Miami, FL, USA, November 12-16, 2024, pages 18573–18583, 2024. Association for Computational Linguistics
Enhanced Hallucination Detection in Neural Machine Translation through Simple Detector Aggregation [link]Paper   link   bibtex  
SaulLM-54B & SaulLM-141B: Scaling Up Domain Adaptation for the Legal Domain. Colombo, P.; Pires, T. P.; Boudiaf, M.; Melo, R.; Hautreux, G.; Malaboeuf, E.; Charpentier, J.; Culver, D.; and Desa, M. In Globersons, A.; Mackey, L.; Belgrave, D.; Fan, A.; Paquet, U.; Tomczak, J. M.; and Zhang, C., editor(s), Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, NeurIPS 2024, Vancouver, BC, Canada, December 10 - 15, 2024, 2024.
SaulLM-54B & SaulLM-141B: Scaling Up Domain Adaptation for the Legal Domain [link]Paper   link   bibtex  
Is Preference Alignment Always the Best Option to Enhance LLM-Based Translation? An Empirical Analysis. Gisserot-Boukhlef, H.; Rei, R.; Malherbe, E.; Hudelot, C.; Colombo, P.; and Guerreiro, N. M. In Haddow, B.; Kocmi, T.; Koehn, P.; and Monz, C., editor(s), Proceedings of the Ninth Conference on Machine Translation, WMT 2024, Miami, FL, USA, November 15-16, 2024, pages 1373–1392, 2024. Association for Computational Linguistics
Is Preference Alignment Always the Best Option to Enhance LLM-Based Translation? An Empirical Analysis [link]Paper   link   bibtex  
CroissantLLM: A Truly Bilingual French-English Language Model. Faysse, M.; Fernandes, P.; Guerreiro, N. M.; Loison, A.; Alves, D. M.; Corro, C. F.; Boizard, N.; Alves, J.; Rei, R.; Martins, P. H.; Casademunt, A. B.; Yvon, F.; Martins, A. F. T.; Viaud, G.; Hudelot, C.; and Colombo, P. CoRR, abs/2402.00786. 2024.
CroissantLLM: A Truly Bilingual French-English Language Model [link]Paper   doi   link   bibtex  
Towards Cross-Tokenizer Distillation: the Universal Logit Distillation Loss for LLMs. Boizard, N.; Haddad, K. E.; Hudelot, C.; and Colombo, P. CoRR, abs/2402.12030. 2024.
Towards Cross-Tokenizer Distillation: the Universal Logit Distillation Loss for LLMs [link]Paper   doi   link   bibtex  
Towards Trustworthy Reranking: A Simple yet Effective Abstention Mechanism. Gisserot-Boukhlef, H.; Faysse, M.; Malherbe, E.; Hudelot, C.; and Colombo, P. CoRR, abs/2402.12997. 2024.
Towards Trustworthy Reranking: A Simple yet Effective Abstention Mechanism [link]Paper   doi   link   bibtex  
Enhanced Hallucination Detection in Neural Machine Translation through Simple Detector Aggregation. Himmi, A.; Staerman, G.; Picot, M.; Colombo, P.; and Guerreiro, N. M. CoRR, abs/2402.13331. 2024.
Enhanced Hallucination Detection in Neural Machine Translation through Simple Detector Aggregation [link]Paper   doi   link   bibtex  
Tower: An Open Multilingual Large Language Model for Translation-Related Tasks. Alves, D. M.; Pombal, J.; Guerreiro, N. M.; Martins, P. H.; Alves, J.; Farajian, M. A.; Peters, B.; Rei, R.; Fernandes, P.; Agrawal, S.; Colombo, P.; de Souza, J. G. C.; and Martins, A. F. T. CoRR, abs/2402.17733. 2024.
Tower: An Open Multilingual Large Language Model for Translation-Related Tasks [link]Paper   doi   link   bibtex  
SaulLM-7B: A pioneering Large Language Model for Law. Colombo, P.; Pires, T. P.; Boudiaf, M.; Culver, D.; Melo, R.; Corro, C.; Martins, A. F. T.; Esposito, F.; Raposo, V. L.; Morgado, S.; and Desa, M. CoRR, abs/2403.03883. 2024.
SaulLM-7B: A pioneering Large Language Model for Law [link]Paper   doi   link   bibtex  
ColPali: Efficient Document Retrieval with Vision Language Models. Faysse, M.; Sibille, H.; Wu, T.; Omrani, B.; Viaud, G.; Hudelot, C.; and Colombo, P. CoRR, abs/2407.01449. 2024.
ColPali: Efficient Document Retrieval with Vision Language Models [link]Paper   doi   link   bibtex  
SaulLM-54B & SaulLM-141B: Scaling Up Domain Adaptation for the Legal Domain. Colombo, P.; Pires, T.; Boudiaf, M.; Melo, R.; Culver, D.; Morgado, S.; Malaboeuf, E.; Hautreux, G.; Charpentier, J.; and Desa, M. CoRR, abs/2407.19584. 2024.
SaulLM-54B & SaulLM-141B: Scaling Up Domain Adaptation for the Legal Domain [link]Paper   doi   link   bibtex  
EuroLLM: Multilingual Language Models for Europe. Martins, P. H.; Fernandes, P.; Alves, J.; Guerreiro, N. M.; Rei, R.; Alves, D. M.; Pombal, J.; Farajian, M. A.; Faysse, M.; Klimaszewski, M.; Colombo, P.; Haddow, B.; de Souza, J. G. C.; Birch, A.; and Martins, A. F. T. CoRR, abs/2409.16235. 2024.
EuroLLM: Multilingual Language Models for Europe [link]Paper   doi   link   bibtex  
Is Preference Alignment Always the Best Option to Enhance LLM-Based Translation? An Empirical Analysis. Gisserot-Boukhlef, H.; Rei, R.; Malherbe, E.; Hudelot, C.; Colombo, P.; and Guerreiro, N. M. CoRR, abs/2409.20059. 2024.
Is Preference Alignment Always the Best Option to Enhance LLM-Based Translation? An Empirical Analysis [link]Paper   doi   link   bibtex  
  2023 (21)
Hallucinations in Large Multilingual Translation Models. Guerreiro, N. M.; Alves, D. M.; Waldendorf, J.; Haddow, B.; Birch, A.; Colombo, P.; and Martins, A. F. T. Trans. Assoc. Comput. Linguistics, 11: 1500–1517. 2023.
Hallucinations in Large Multilingual Translation Models [link]Paper   doi   link   bibtex  
A Halfspace-Mass Depth-Based Method for Adversarial Attack Detection. Picot, M.; Granese, F.; Staerman, G.; Romanelli, M.; Messina, F.; Piantanida, P.; and Colombo, P. Trans. Mach. Learn. Res., 2023. 2023.
A Halfspace-Mass Depth-Based Method for Adversarial Attack Detection [link]Paper   link   bibtex  
Optimal Transport for Unsupervised Hallucination Detection in Neural Machine Translation. Guerreiro, N. M.; Colombo, P.; Piantanida, P.; and Martins, A. F. T. In Rogers, A.; Boyd-Graber, J. L.; and Okazaki, N., editor(s), Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2023, Toronto, Canada, July 9-14, 2023, pages 13766–13784, 2023. Association for Computational Linguistics
Optimal Transport for Unsupervised Hallucination Detection in Neural Machine Translation [link]Paper   doi   link   bibtex  
Toward Stronger Textual Attack Detectors. Colombo, P.; Picot, M.; Noiry, N.; Staerman, G.; and Piantanida, P. In Bouamor, H.; Pino, J.; and Bali, K., editor(s), Findings of the Association for Computational Linguistics: EMNLP 2023, Singapore, December 6-10, 2023, pages 484–505, 2023. Association for Computational Linguistics
Toward Stronger Textual Attack Detectors [link]Paper   doi   link   bibtex  
Transductive Learning for Textual Few-Shot Classification in API-based Embedding Models. Colombo, P.; Pellegrain, V.; Boudiaf, M.; Tami, M.; Storchan, V.; Ayed, I. B.; and Piantanida, P. In Bouamor, H.; Pino, J.; and Bali, K., editor(s), Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, EMNLP 2023, Singapore, December 6-10, 2023, pages 4214–4231, 2023. Association for Computational Linguistics
Transductive Learning for Textual Few-Shot Classification in API-based Embedding Models [link]Paper   doi   link   bibtex  
RainProof: An Umbrella to Shield Text Generator from Out-Of-Distribution Data. Darrin, M.; Piantanida, P.; and Colombo, P. In Bouamor, H.; Pino, J.; and Bali, K., editor(s), Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, EMNLP 2023, Singapore, December 6-10, 2023, pages 5831–5857, 2023. Association for Computational Linguistics
RainProof: An Umbrella to Shield Text Generator from Out-Of-Distribution Data [link]Paper   doi   link   bibtex  
Revisiting Instruction Fine-tuned Model Evaluation to Guide Industrial Applications. Faysse, M.; Viaud, G.; Hudelot, C.; and Colombo, P. In Bouamor, H.; Pino, J.; and Bali, K., editor(s), Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, EMNLP 2023, Singapore, December 6-10, 2023, pages 9033–9048, 2023. Association for Computational Linguistics
Revisiting Instruction Fine-tuned Model Evaluation to Guide Industrial Applications [link]Paper   doi   link   bibtex  
Steering Large Language Models for Machine Translation with Finetuning and In-Context Learning. Alves, D. M.; Guerreiro, N. M.; Alves, J.; Pombal, J.; Rei, R.; de Souza, J. G. C.; Colombo, P.; and Martins, A. F. T. In Bouamor, H.; Pino, J.; and Bali, K., editor(s), Findings of the Association for Computational Linguistics: EMNLP 2023, Singapore, December 6-10, 2023, pages 11127–11148, 2023. Association for Computational Linguistics
Steering Large Language Models for Machine Translation with Finetuning and In-Context Learning [link]Paper   doi   link   bibtex  
The Glass Ceiling of Automatic Evaluation in Natural Language Generation. Colombo, P.; Peyrard, M.; Noiry, N.; West, R.; and Piantanida, P. In Park, J. C.; Arase, Y.; Hu, B.; Lu, W.; Wijaya, D.; Purwarianti, A.; and Krisnadhi, A. A., editor(s), Findings of the Association for Computational Linguistics: IJCNLP-AACL 2023 - Findings, Nusa Dua, Bali, November 1-4, 2023, pages 178–183, 2023. Association for Computational Linguistics
The Glass Ceiling of Automatic Evaluation in Natural Language Generation [link]Paper   doi   link   bibtex  
A Novel Information Theoretic Objective to Disentangle Representations for Fair Classification. Colombo, P.; Noiry, N.; Staerman, G.; and Piantanida, P. In Park, J. C.; Arase, Y.; Hu, B.; Lu, W.; Wijaya, D.; Purwarianti, A.; and Krisnadhi, A. A., editor(s), Findings of the Association for Computational Linguistics: IJCNLP-AACL 2023 - Findings, Nusa Dua, Bali, November 1-4, 2023, pages 184–198, 2023. Association for Computational Linguistics
A Novel Information Theoretic Objective to Disentangle Representations for Fair Classification [link]Paper   doi   link   bibtex  
Unsupervised Layer-wise Score Aggregation for Textual OOD Detection. Darrin, M.; Staerman, G.; Gomes, E. D. C.; Cheung, J. C. K.; Piantanida, P.; and Colombo, P. CoRR, abs/2302.09852. 2023.
Unsupervised Layer-wise Score Aggregation for Textual OOD Detection [link]Paper   doi   link   bibtex  
The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset. Laurençon, H.; Saulnier, L.; Wang, T.; Akiki, C.; del Moral, A. V.; Scao, T. L.; von Werra, L.; Mou, C.; Ponferrada, E. G.; Nguyen, H.; Frohberg, J.; Sasko, M.; Lhoest, Q.; McMillan-Major, A.; Dupont, G.; Biderman, S.; Rogers, A.; Allal, L. B.; Toni, F. D.; Pistilli, G.; Nguyen, O.; Nikpoor, S.; Masoud, M.; Colombo, P.; de la Rosa, J.; Villegas, P.; Thrush, T.; Longpre, S.; Nagel, S.; Weber, L.; Muñoz, M.; Zhu, J.; van Strien, D.; Alyafeai, Z.; Almubarak, K.; Vu, M. C.; Gonzalez-Dios, I.; Soroa, A.; Lo, K.; Dey, M.; Suarez, P. O.; Gokaslan, A.; Bose, S.; Adelani, D. I.; Phan, L.; Tran, H.; Yu, I.; Pai, S.; Chim, J.; Lepercq, V.; Ilic, S.; Mitchell, M.; Luccioni, S.; and Jernite, Y. CoRR, abs/2303.03915. 2023.
The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset [link]Paper   doi   link   bibtex  
Hallucinations in Large Multilingual Translation Models. Guerreiro, N. M.; Alves, D. M.; Waldendorf, J.; Haddow, B.; Birch, A.; Colombo, P.; and Martins, A. F. T. CoRR, abs/2303.16104. 2023.
Hallucinations in Large Multilingual Translation Models [link]Paper   doi   link   bibtex  
Towards More Robust NLP System Evaluation: Handling Missing Scores in Benchmarks. Himmi, A.; Irurozki, E.; Noiry, N.; Clémençon, S.; and Colombo, P. CoRR, abs/2305.10284. 2023.
Towards More Robust NLP System Evaluation: Handling Missing Scores in Benchmarks [link]Paper   doi   link   bibtex  
A Functional Data Perspective and Baseline On Multi-Layer Out-of-Distribution Detection. Gomes, E. D. C.; Colombo, P.; Staerman, G.; Noiry, N.; and Piantanida, P. CoRR, abs/2306.03522. 2023.
A Functional Data Perspective and Baseline On Multi-Layer Out-of-Distribution Detection [link]Paper   doi   link   bibtex  
xCOMET: Transparent Machine Translation Evaluation through Fine-grained Error Detection. Guerreiro, N. M.; Rei, R.; van Stigt, D.; Coheur, L.; Colombo, P.; and Martins, A. F. T. CoRR, abs/2310.10482. 2023.
xCOMET: Transparent Machine Translation Evaluation through Fine-grained Error Detection [link]Paper   doi   link   bibtex  
Steering Large Language Models for Machine Translation with Finetuning and In-Context Learning. Alves, D. M.; Guerreiro, N. M.; Alves, J.; Pombal, J.; Rei, R.; de Souza, J. G. C.; Colombo, P.; and Martins, A. F. T. CoRR, abs/2310.13448. 2023.
Steering Large Language Models for Machine Translation with Finetuning and In-Context Learning [link]Paper   doi   link   bibtex  
A Novel Information-Theoretic Objective to Disentangle Representations for Fair Classification. Colombo, P.; Noiry, N.; Staerman, G.; and Piantanida, P. CoRR, abs/2310.13990. 2023.
A Novel Information-Theoretic Objective to Disentangle Representations for Fair Classification [link]Paper   doi   link   bibtex  
Transductive Learning for Textual Few-Shot Classification in API-based Embedding Models. Colombo, P.; Pellegrain, V.; Boudiaf, M.; Storchan, V.; Tami, M.; Ayed, I. B.; Hudelot, C.; and Piantanida, P. CoRR, abs/2310.13998. 2023.
Transductive Learning for Textual Few-Shot Classification in API-based Embedding Models [link]Paper   doi   link   bibtex  
Toward Stronger Textual Attack Detectors. Colombo, P.; Picot, M.; Noiry, N.; Staerman, G.; and Piantanida, P. CoRR, abs/2310.14001. 2023.
Toward Stronger Textual Attack Detectors [link]Paper   doi   link   bibtex  
Revisiting Instruction Fine-tuned Model Evaluation to Guide Industrial Applications. Faysse, M.; Viaud, G.; Hudelot, C.; and Colombo, P. CoRR, abs/2310.14103. 2023.
Revisiting Instruction Fine-tuned Model Evaluation to Guide Industrial Applications [link]Paper   doi   link   bibtex  
  2022 (15)
InfoLM: A New Metric to Evaluate Summarization & Data2Text Generation. Colombo, P. J. A.; Clavel, C.; and Piantanida, P. In Thirty-Sixth AAAI Conference on Artificial Intelligence, AAAI 2022, Thirty-Fourth Conference on Innovative Applications of Artificial Intelligence, IAAI 2022, The Twelveth Symposium on Educational Advances in Artificial Intelligence, EAAI 2022 Virtual Event, February 22 - March 1, 2022, pages 10554–10562, 2022. AAAI Press
InfoLM: A New Metric to Evaluate Summarization & Data2Text Generation [link]Paper   doi   link   bibtex  
Learning Disentangled Textual Representations via Statistical Measures of Similarity. Colombo, P.; Staerman, G.; Noiry, N.; and Piantanida, P. In Muresan, S.; Nakov, P.; and Villavicencio, A., editor(s), Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2022, Dublin, Ireland, May 22-27, 2022, pages 2614–2630, 2022. Association for Computational Linguistics
Learning Disentangled Textual Representations via Statistical Measures of Similarity [link]Paper   doi   link   bibtex  
Of Human Criteria and Automatic Metrics: A Benchmark of the Evaluation of Story Generation. Chhun, C.; Colombo, P.; Suchanek, F. M.; and Clavel, C. In Calzolari, N.; Huang, C.; Kim, H.; Pustejovsky, J.; Wanner, L.; Choi, K.; Ryu, P.; Chen, H.; Donatelli, L.; Ji, H.; Kurohashi, S.; Paggio, P.; Xue, N.; Kim, S.; Hahm, Y.; He, Z.; Lee, T. K.; Santus, E.; Bond, F.; and Na, S., editor(s), Proceedings of the 29th International Conference on Computational Linguistics, COLING 2022, Gyeongju, Republic of Korea, October 12-17, 2022, pages 5794–5836, 2022. International Committee on Computational Linguistics
Of Human Criteria and Automatic Metrics: A Benchmark of the Evaluation of Story Generation [link]Paper   link   bibtex  
A Differential Entropy Estimator for Training Neural Networks. Pichler, G.; Colombo, P. J. A.; Boudiaf, M.; Koliander, G.; and Piantanida, P. In Chaudhuri, K.; Jegelka, S.; Song, L.; Szepesvári, C.; Niu, G.; and Sabato, S., editor(s), International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA, volume 162, of Proceedings of Machine Learning Research, pages 17691–17715, 2022. PMLR
A Differential Entropy Estimator for Training Neural Networks [link]Paper   link   bibtex  
Beyond Mahalanobis Distance for Textual OOD Detection. Colombo, P.; Gomes, E. D. C.; Staerman, G.; Noiry, N.; and Piantanida, P. In Koyejo, S.; Mohamed, S.; Agarwal, A.; Belgrave, D.; Cho, K.; and Oh, A., editor(s), Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, November 28 - December 9, 2022, 2022.
Beyond Mahalanobis Distance for Textual OOD Detection [link]Paper   link   bibtex  
What are the best Systems? New Perspectives on NLP Benchmarking. Colombo, P.; Noiry, N.; Irurozki, E.; and Clémençon, S. In Koyejo, S.; Mohamed, S.; Agarwal, A.; Belgrave, D.; Cho, K.; and Oh, A., editor(s), Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, November 28 - December 9, 2022, 2022.
What are the best Systems? New Perspectives on NLP Benchmarking [link]Paper   link   bibtex   1 download  
The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset. Laurençon, H.; Saulnier, L.; Wang, T.; Akiki, C.; del Moral, A. V.; Scao, T. L.; von Werra, L.; Mou, C.; Ponferrada, E. G.; Nguyen, H.; Frohberg, J.; Sasko, M.; Lhoest, Q.; McMillan-Major, A.; Dupont, G.; Biderman, S.; Rogers, A.; Allal, L. B.; Toni, F. D.; Pistilli, G.; Nguyen, O.; Nikpoor, S.; Masoud, M.; Colombo, P.; de la Rosa, J.; Villegas, P.; Thrush, T.; Longpre, S.; Nagel, S.; Weber, L.; Muñoz, M.; Zhu, J.; van Strien, D.; Alyafeai, Z.; Almubarak, K.; Vu, M. C.; Gonzalez-Dios, I.; Soroa, A.; Lo, K.; Dey, M.; Suarez, P. O.; Gokaslan, A.; Bose, S.; Adelani, D. I.; Phan, L.; Tran, H.; Yu, I.; Pai, S.; Chim, J.; Lepercq, V.; Ilic, S.; Mitchell, M.; Luccioni, A. S.; and Jernite, Y. In Koyejo, S.; Mohamed, S.; Agarwal, A.; Belgrave, D.; Cho, K.; and Oh, A., editor(s), Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, November 28 - December 9, 2022, 2022.
The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset [link]Paper   link   bibtex  
What are the best systems? New perspectives on NLP Benchmarking. Colombo, P.; Noiry, N.; Irurozki, E.; and Clémençon, S. CoRR, abs/2202.03799. 2022.
What are the best systems? New perspectives on NLP Benchmarking [link]Paper   link   bibtex   1 download  
KNIFE: Kernelized-Neural Differential Entropy Estimation. Pichler, G.; Colombo, P.; Boudiaf, M.; Koliander, G.; and Piantanida, P. CoRR, abs/2202.06618. 2022.
KNIFE: Kernelized-Neural Differential Entropy Estimation [link]Paper   link   bibtex  
Learning Disentangled Textual Representations via Statistical Measures of Similarity. Colombo, P.; Staerman, G.; Noiry, N.; and Piantanida, P. CoRR, abs/2205.03589. 2022.
Learning Disentangled Textual Representations via Statistical Measures of Similarity [link]Paper   doi   link   bibtex  
Of Human Criteria and Automatic Metrics: A Benchmark of the Evaluation of Story Generation. Chhun, C.; Colombo, P.; Clavel, C.; and Suchanek, F. M. CoRR, abs/2208.11646. 2022.
Of Human Criteria and Automatic Metrics: A Benchmark of the Evaluation of Story Generation [link]Paper   doi   link   bibtex  
The Glass Ceiling of Automatic Evaluation in Natural Language Generation. Colombo, P.; Peyrard, M.; Noiry, N.; West, R.; and Piantanida, P. CoRR, abs/2208.14585. 2022.
The Glass Ceiling of Automatic Evaluation in Natural Language Generation [link]Paper   doi   link   bibtex  
Beyond Mahalanobis-Based Scores for Textual OOD Detection. Colombo, P.; Gomes, E. D. C.; Staerman, G.; Noiry, N.; and Piantanida, P. CoRR, abs/2211.13527. 2022.
Beyond Mahalanobis-Based Scores for Textual OOD Detection [link]Paper   doi   link   bibtex  
Rainproof: An Umbrella To Shield Text Generators From Out-Of-Distribution Data. Darrin, M.; Piantanida, P.; and Colombo, P. CoRR, abs/2212.09171. 2022.
Rainproof: An Umbrella To Shield Text Generators From Out-Of-Distribution Data [link]Paper   doi   link   bibtex  
Optimal Transport for Unsupervised Hallucination Detection in Neural Machine Translation. Guerreiro, N. M.; Colombo, P.; Piantanida, P.; and Martins, A. F. T. CoRR, abs/2212.09631. 2022.
Optimal Transport for Unsupervised Hallucination Detection in Neural Machine Translation [link]Paper   doi   link   bibtex  
  2021 (13)
Apprendre à représenter et à générer du texte en utilisant des mesures d'information. (Learning to represent and generate text using information measures). Colombo, P. Ph.D. Thesis, Polytechnic Institute of Paris, France, 2021.
Apprendre à représenter et à générer du texte en utilisant des mesures d'information. (Learning to represent and generate text using information measures) [link]Paper   link   bibtex   1 download  
A Novel Estimator of Mutual Information for Learning to Disentangle Textual Representations. Colombo, P.; Piantanida, P.; and Clavel, C. In Zong, C.; Xia, F.; Li, W.; and Navigli, R., editor(s), Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, ACL/IJCNLP 2021, (Volume 1: Long Papers), Virtual Event, August 1-6, 2021, pages 6539–6550, 2021. Association for Computational Linguistics
A Novel Estimator of Mutual Information for Learning to Disentangle Textual Representations [link]Paper   doi   link   bibtex  
Improving Multimodal fusion via Mutual Dependency Maximisation. Colombo, P.; Chapuis, E.; Labeau, M.; and Clavel, C. In Moens, M.; Huang, X.; Specia, L.; and Yih, S. W., editor(s), Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021, Virtual Event / Punta Cana, Dominican Republic, 7-11 November, 2021, pages 231–245, 2021. Association for Computational Linguistics
Improving Multimodal fusion via Mutual Dependency Maximisation [link]Paper   doi   link   bibtex  
Code-switched inspired losses for spoken dialog representations. Colombo, P.; Chapuis, E.; Labeau, M.; and Clavel, C. In Moens, M.; Huang, X.; Specia, L.; and Yih, S. W., editor(s), Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021, Virtual Event / Punta Cana, Dominican Republic, 7-11 November, 2021, pages 8320–8337, 2021. Association for Computational Linguistics
Code-switched inspired losses for spoken dialog representations [link]Paper   doi   link   bibtex  
Automatic Text Evaluation through the Lens of Wasserstein Barycenters. Colombo, P.; Staerman, G.; Clavel, C.; and Piantanida, P. In Moens, M.; Huang, X.; Specia, L.; and Yih, S. W., editor(s), Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021, Virtual Event / Punta Cana, Dominican Republic, 7-11 November, 2021, pages 10450–10466, 2021. Association for Computational Linguistics
Automatic Text Evaluation through the Lens of Wasserstein Barycenters [link]Paper   doi   link   bibtex  
Beam Search with Bidirectional Strategies for Neural Response Generation. Colombo, P.; Clavel, C.; Yang, C. (.; and Varni, G. In Abbas, M.; and Freihat, A. A., editor(s), 4th International Conference on Natural Language and Speech Processing, Trento, Italy, November 12-13, 2021, pages 287–294, 2021. Association for Computational Linguistics
Beam Search with Bidirectional Strategies for Neural Response Generation [link]Paper   link   bibtex  
A Novel Estimator of Mutual Information for Learning to Disentangle Textual Representations. Colombo, P.; Clavel, C.; and Piantanida, P. CoRR, abs/2105.02685. 2021.
A Novel Estimator of Mutual Information for Learning to Disentangle Textual Representations [link]Paper   link   bibtex  
Automatic Text Evaluation through the Lens of Wasserstein Barycenters. Colombo, P.; Staerman, G.; Clavel, C.; and Piantanida, P. CoRR, abs/2108.12463. 2021.
Automatic Text Evaluation through the Lens of Wasserstein Barycenters [link]Paper   link   bibtex  
Code-switched inspired losses for generic spoken dialog representations. Chapuis, E.; Colombo, P.; Labeau, M.; and Clavel, C. CoRR, abs/2108.12465. 2021.
Code-switched inspired losses for generic spoken dialog representations [link]Paper   link   bibtex  
Improving Multimodal fusion via Mutual Dependency Maximisation. Colombo, P.; Chapuis, E.; Labeau, M.; and Clavel, C. CoRR, abs/2109.00922. 2021.
Improving Multimodal fusion via Mutual Dependency Maximisation [link]Paper   link   bibtex  
Beam Search with Bidirectional Strategies for Neural Response Generation. Colombo, P.; Yang, C.; Varni, G.; and Clavel, C. CoRR, abs/2110.03389. 2021.
Beam Search with Bidirectional Strategies for Neural Response Generation [link]Paper   link   bibtex  
InfoLM: A New Metric to Evaluate Summarization & Data2Text Generation. Colombo, P.; Clavel, C.; and Piantanida, P. CoRR, abs/2112.01589. 2021.
InfoLM: A New Metric to Evaluate Summarization & Data2Text Generation [link]Paper   link   bibtex  
NL-Augmenter: A Framework for Task-Sensitive Natural Language Augmentation. Dhole, K. D.; Gangal, V.; Gehrmann, S.; Gupta, A.; Li, Z.; Mahamood, S.; Mahendiran, A.; Mille, S.; Srivastava, A.; Tan, S.; Wu, T.; Sohl-Dickstein, J.; Choi, J. D.; Hovy, E. H.; Dusek, O.; Ruder, S.; Anand, S.; Aneja, N.; Banjade, R.; Barthe, L.; Behnke, H.; Berlot-Attwell, I.; Boyle, C.; Brun, C.; Cabezudo, M. A. S.; Cahyawijaya, S.; Chapuis, E.; Che, W.; Choudhary, M.; Clauss, C.; Colombo, P.; Cornell, F.; Dagan, G.; Das, M.; Dixit, T.; Dopierre, T.; Dray, P.; Dubey, S.; Ekeinhor, T.; Giovanni, M. D.; Goyal, T.; Gupta, R.; Hamla, L.; Han, S.; Harel-Canada, F.; Honore, A.; Jindal, I.; Joniak, P. K.; Kleyko, D.; Kovatchev, V.; Krishna, K.; Kumar, A.; Langer, S.; Lee, S. R.; Levinson, C. J.; Liang, H.; Liang, K.; Liu, Z.; Lukyanenko, A.; Marivate, V.; de Melo, G.; Meoni, S.; Meyer, M.; Mir, A.; Moosavi, N. S.; Muennighoff, N.; Mun, T. S. H.; Murray, K.; Namysl, M.; Obedkova, M.; Oli, P.; Pasricha, N.; Pfister, J.; Plant, R.; Prabhu, V.; Pais, V.; Qin, L.; Raji, S.; Rajpoot, P. K.; Raunak, V.; Rinberg, R.; Roberts, N.; Rodriguez, J. D.; Roux, C.; Vasconcellos, P. H. S.; Sai, A. B.; Schmidt, R. M.; Scialom, T.; Sefara, T.; Shamsi, S.; Shen, X.; Shi, Y.; Shi, H.; Shvets, A.; Siegel, N.; Sileo, D.; Simon, J.; Singh, C.; Sitelew, R.; Soni, P.; Sorensen, T.; Soto, W.; Srivastava, A.; Srivatsa, K. V. A.; Sun, T.; T., M. V.; Tabassum, A.; Tan, F. A.; Teehan, R.; Tiwari, M.; Tolkiehn, M.; Wang, A.; Wang, Z.; Wang, Z. J.; Wang, G.; Wei, F.; Wilie, B.; Winata, G. I.; Wu, X.; Wydmanski, W.; Xie, T.; Yaseen, U.; Yee, M. A.; Zhang, J.; and Zhang, Y. CoRR, abs/2112.02721. 2021.
NL-Augmenter: A Framework for Task-Sensitive Natural Language Augmentation [link]Paper   link   bibtex  
  2020 (9)
Guiding Attention in Sequence-to-Sequence Models for Dialogue Act Prediction. Colombo, P.; Chapuis, E.; Manica, M.; Vignon, E.; Varni, G.; and Clavel, C. In The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, New York, NY, USA, February 7-12, 2020, pages 7594–7601, 2020. AAAI Press
Guiding Attention in Sequence-to-Sequence Models for Dialogue Act Prediction [link]Paper   doi   link   bibtex   3 downloads  
Hierarchical Pre-training for Sequence Labelling in Spoken Dialog. Chapuis, E.; Colombo, P.; Manica, M.; Labeau, M.; and Clavel, C. In Cohn, T.; He, Y.; and Liu, Y., editor(s), Findings of the Association for Computational Linguistics: EMNLP 2020, Online Event, 16-20 November 2020, volume EMNLP 2020, of Findings of ACL, pages 2636–2648, 2020. Association for Computational Linguistics
Hierarchical Pre-training for Sequence Labelling in Spoken Dialog [link]Paper   doi   link   bibtex   3 downloads  
The importance of fillers for text representations of speech transcripts. Dinkar, T.; Colombo, P.; Labeau, M.; and Clavel, C. In Webber, B.; Cohn, T.; He, Y.; and Liu, Y., editor(s), Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, Online, November 16-20, 2020, pages 7985–7993, 2020. Association for Computational Linguistics
The importance of fillers for text representations of speech transcripts [link]Paper   doi   link   bibtex  
Heavy-tailed Representations, Text Polarity Classification & Data Augmentation. Jalalzai, H.; Colombo, P.; Clavel, C.; Gaussier, É.; Varni, G.; Vignon, E.; and Sabourin, A. In Larochelle, H.; Ranzato, M.; Hadsell, R.; Balcan, M.; and Lin, H., editor(s), Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, 2020.
Heavy-tailed Representations, Text Polarity Classification & Data Augmentation [link]Paper   link   bibtex  
Guiding attention in Sequence-to-sequence models for Dialogue Act prediction. Colombo, P.; Chapuis, E.; Manica, M.; Vignon, E.; Varni, G.; and Clavel, C. CoRR, abs/2002.08801. 2020.
Guiding attention in Sequence-to-sequence models for Dialogue Act prediction [link]Paper   link   bibtex   3 downloads  
Guider l'attention dans les modeles de sequence a sequence pour la prediction des actes de dialogue. Colombo, P.; Chapuis, E.; Manica, M.; Vignon, E.; Varni, G.; and Clavel, C. CoRR, abs/2002.09419. 2020.
Guider l'attention dans les modeles de sequence a sequence pour la prediction des actes de dialogue [link]Paper   link   bibtex  
Heavy-tailed Representations, Text Polarity Classification & Data Augmentation. Jalalzai, H.; Colombo, P.; Clavel, C.; Gaussier, É.; Varni, G.; Vignon, E.; and Sabourin, A. CoRR, abs/2003.11593. 2020.
Heavy-tailed Representations, Text Polarity Classification & Data Augmentation [link]Paper   link   bibtex  
Hierarchical Pre-training for Sequence Labelling in Spoken Dialog. Chapuis, E.; Colombo, P.; Manica, M.; Labeau, M.; and Clavel, C. CoRR, abs/2009.11152. 2020.
Hierarchical Pre-training for Sequence Labelling in Spoken Dialog [link]Paper   link   bibtex   3 downloads  
The importance of fillers for text representations of speech transcripts. Dinkar, T.; Colombo, P.; Labeau, M.; and Clavel, C. CoRR, abs/2009.11340. 2020.
The importance of fillers for text representations of speech transcripts [link]Paper   link   bibtex  
  2019 (4)
From the Token to the Review: A Hierarchical Multimodal approach to Opinion Mining. Garcia, A.; Colombo, P.; d'Alché-Buc , F.; Essid, S.; and Clavel, C. In Inui, K.; Jiang, J.; Ng, V.; and Wan, X., editor(s), Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019, Hong Kong, China, November 3-7, 2019, pages 5538–5547, 2019. Association for Computational Linguistics
From the Token to the Review: A Hierarchical Multimodal approach to Opinion Mining [link]Paper   doi   link   bibtex  
Affect-Driven Dialog Generation. Colombo, P.; Witon, W.; Modi, A.; Kennedy, J.; and Kapadia, M. In Burstein, J.; Doran, C.; and Solorio, T., editor(s), Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pages 3734–3743, 2019. Association for Computational Linguistics
Affect-Driven Dialog Generation [link]Paper   doi   link   bibtex   5 downloads  
Affect-Driven Dialog Generation. Colombo, P.; Witon, W.; Modi, A.; Kennedy, J.; and Kapadia, M. CoRR, abs/1904.02793. 2019.
Affect-Driven Dialog Generation [link]Paper   link   bibtex   5 downloads  
From the Token to the Review: A Hierarchical Multimodal approach to Opinion Mining. Garcia, A.; Colombo, P.; Essid, S.; d'Alché-Buc , F.; and Clavel, C. CoRR, abs/1908.11216. 2019.
From the Token to the Review: A Hierarchical Multimodal approach to Opinion Mining [link]Paper   link   bibtex  
  2018 (1)
Disney at IEST 2018: Predicting Emotions using an Ensemble. Witon, W.; Colombo, P.; Modi, A.; and Kapadia, M. In Balahur, A.; Mohammad, S. M.; Hoste, V.; and Klinger, R., editor(s), Proceedings of the 9th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, WASSA@EMNLP 2018, Brussels, Belgium, October 31, 2018, pages 248–253, 2018. Association for Computational Linguistics
Disney at IEST 2018: Predicting Emotions using an Ensemble [link]Paper   doi   link   bibtex