We ask Google to pinpoint the nearest café all the time, but how are language and computers integrated?
UBC Master of Data Science in Computational Linguistics director Bryan Gick and assistant professor of Computational Linguistics and Information Science Muhammad Abdul-Mageed, both Language Sciences members, explain just what computational linguistics is, and what the future holds for this fast-growing field.
What exactly is computational linguistics? What does this program teach students?
BG: Language makes up most of the content of the internet, and is our primary means of using it, through our devices. We generate a massive amount of text which can be used for all sorts of purposes from marketing to public health tracking.
Computational linguistics is about using machine learning techniques to analyze large amounts of language data. When you get to a large enough scale, traditional statistical and computational methods don’t really apply and you need to use machine learning that can learn how to navigate that volume of data, and detect patterns that a human could never see.
MAM: Our Master of Data Science in Computational Linguistics takes students who already have some background outside computer science plus a basic level of mathematics and computing such as programming languages, and then equips them with the ability to work with language data, to use a human language to build a model that would enable us to find specific patterns in the data, and make predictions.
The program consists of both lectures and labs and features an opportunity for students to work with people from the industry on specific types of projects in the form of a co-op.
Why is UBC offering this program now?
BG: There’s an overwhelming demand coming from industry, and it spans every field. We have a global marketplace, and the amount of data available is not processable using traditional techniques, so you need to learn how to use these computational techniques. In addition, if you have a language that’s structurally similar to another, it becomes possible to bootstrap from a language that has lots of data to a language that has less, which is where some of the need for linguistics comes in.
That’s something that UBC is in a good position to contribute to, because we look at lots of different languages here, and we’re working across disciplines to generate a new population of people who can navigate that interface. The program is accepting students now.
How did this program come about?
BG: Around 2014, the Department of Linguistics realized our students had a lot of interest in computation and language and some were already being hired by technology companies.
I met with colleagues from Computer Science and we decided to put together a computational linguistics professional Masters degree, which aligned with the Faculty of Science’s then-new MDS program. Linguistics, Computer Science and Statistics worked together to retool computational linguistics to become the first option of several under the MDS program umbrella. Our first cohort started this term, with 27 students from all over the world.
What kinds of jobs can you get with this degree?
BG: Four of the five most valuable companies in the world right now are working at this interface, so when you’re thinking about Google, Microsoft, Amazon, and Facebook, these are all companies whose bread and butter is working with large amounts of language data.
MAM: Data science and computing with language is extremely pervasive right now, meaning there’s a host of positions that students could take when they graduate. These include: a natural language processing engineer, where you would build machines or models that improve computers’ ability to understand, and generate, the human language, for application in things like Siri; a data analyst for institutions requiring people to work with unstructured language data, such as large brands of restaurants wanting to summarize reviews, financial institutions needing to work with complaints data, and so on; and positions involving machine translation, how to classify text automatically, and finding people’s sentiment in online data.