Bayesian Modelling of Spatial Typology

Key Information

Emmy-Noether Research Group funded by the Deutsche Forschungsgemeinschaft (DFG)

Start: December 01, 2022

Duration: 6 years

Head: Matías Guzmán Naranjo

Project Team: I-Ying Lin and Marvin Martiny

Research assistants: Sarah Neitzel and Miriam Schiele

Location: Department of General Linguistics, University of Freiburg

The Project

There is currently a considerable amount of work in spatial typology (dialectology, areal typology and language diffusion) covering different topics like the emergence of linguistic areas, the need to control for contact effects, sociolinguistic factors which can play a role in contact, and how geographic features affect contact between languages. However, attempts at building computational models of these issues have remained disconnected, and for the most part have relied on suboptimal or incomplete assumptions, and heavily simplified data. In order to better understand spatial phenomena we need to take these issues seriously and develop more realistic, generative models for spatial typology using more complete and detailed dataset. The present project aims to improve the state of the art by building Bayesian Generative models of spatial phenomena (both for induction and as typological controls) using more realistic assumptions and better quality spatial data. We will work on both modeling to control for spatial-induced confounds, and on understanding spatial effects onto themselves.

Better data:

Realistic distances between languages: Most work in spatial typology assumes that the distance between linguistic communities is either Euclidean or geodesic. Both these metrics ignore social and geographic features like mountain ranges, rivers, trade routes, etc. The first aim of the project is to design and calculate better distance metrics of spatial separation
Realistic representations of linguistic areas: While most work on spatial typology assumes that languages can be represented as points in space, this is a heavily simplified assumption. Here the aim of the project is to produce more realistic, polygon representations of language areas.

Better models:

In the project we will build Bayesian models that take into account the following:

Barriers and pathways: Contact between linguistic communities is hindered by natural (mountains, fast flowing rivers, oceans) and socio-political barriers (borders, religion) on the one hand; and facilitated by pathways (roads, trade routes, slow flowing rivers). The PI will work on developing Bayesian models of language contact and diffusion that can take into account both barriers and pathways.
Asymmetric language contact: Language contact does not have to be, and in fact often isn't, symmetric. Larger languages will have a greater impact on smaller languages than the other way around. Lingua francas and imperial languages can have a much further reach, than languages spoken by marginalized communities. Marvin Martiny will work on developing Bayesian models of asymmetric language contact.
Models using polygon data: Different languages are spoken in larger or smaller areas, and as a result, can influence more or fewer neighbors. I-Ying Lin will work on developing Bayesian models of language contact that can use polygon representations of linguistic areas.

Project-related publications and conference talks

Publications

Guzmán Naranjo, Matías and Miri Mertner. In Press. Estimating areal effects in typology: A case study of African phoneme inventories. Linguistic Typology.

Conference talks

Guzmán Naranjo, Matías and Gerhard Jäger. December 2022. Euclide, the crow, the wolf and the pedestrian. ALT.
Marvin Martiny. June 2023. Grammaticalization in Western Amazonia: a study on areal effects. Amazonicas IX.