This thread is to introduce a new library that is coming to the UrbanSim platform, and will soon be released in the Urban Data Science Toolkit, called ChoiceModels. It leverages the great work of Timothy Brathwaite and Feras El Zarwi, both completing their PhDs in Civil Engineering at the University of California, Berkeley, Sam Maurer, completing his PhD in City and Regional Planning at UC Berkeley, and draws on the research of Angelo Guevara. So what is ChoiceModels about?
Historically, UrbanSim has used the Multinomial Logit Model (MNL) as its workhorse methodology to model individual choices of households and firms to choose a residential location. The MNL is a simple model, which is has some very attractive properties, such as being able to estimate the model with a random sample of alternatives for situations that have a very large potential set of alternatives, such as residential location choices, while ensuring the coefficients will be consistent, or unbiased. It is also a very computationally efficient model, which is easy to program, and runs very fast. Sounds great, you say? Well it is, but there is a catch. It uses an assumption that any alternative added or dropped from the choice set would compete with other alternatives and draw market share from them in equal proportions. This is known as the Independence of Irrelevant Alternatives (IIA) assumption, and it is what makes the nice properties of MNL possible. But a lot of real-world situations do not align well with this assumption. In the context of residential choice, you might anticipate that housing in the same neighborhood is a closer substitute than housing in a neighborhood at the other end of a metropolitan region, for example, so if housing is added in that neighborhood, it should compete more closely with nearby alternatives than with those farther away, that are less close substitutes. There are other side-effects of the simple assumptions behind MNL models, including a lack of recognition of the heterogeneity of preferences within and across groups of choosers.
For these reasons, researchers working on modeling people's choice behavior have increasingly turned to more sophisticated models, such as Nested Logit (NL), Mixed Logit (MXL), and Latent Class Models (LCM). These and other models relax the IIA and other assumptions and enable the models to become more behaviorally realistic. Wonderful, so why aren't we already using these in UrbanSim, you might be wondering? The answer is that until very recent research by scholars like Angelo Guevara, we have not had good theoretical and econometric evidence that a researcher use these more advanced discrete choice models with problems that have large choice sets like residential location choice, that require sampling alternatives, and still obtain consistent (unbiased) estimates of the model parameters. It turns out that it is, indeed, possible, provided you use a correction to account for the sampling process.
The ChoiceModels library contains wrappers for all of these model types in a unified interface, building on Timothy Brathwaite's PyLogit and Feras El Zarwi's LCCM Python libraries. Sam Maurer has been adding support for sampling of alternatives and doing tests to verify that the sampling corrections are indeed working correctly. We should be able to support random sampling, stratified sampling, importance sampling, capacity-based sampling, and choice-based sampling, as we flesh out this library.
We are also collaborating with Angelo Guevara to incorporate corrections for endogeneity bias that arises from the omission of variables that are correlated with price and with the utility (attractiveness) of alternatives, such as unobserved quality of finishes in a building, or of views that the modeler may not have data to measure. The endogeneity bias tends to produce price coefficents that are biased upwards if not corrected. The most extreme case is a positive price coefficient, which is wrong theoretically and flies in the face of common sense: who would prefer pay more for an alternative that they could buy for less with the same attributes?
So - those are the goals of the ChoiceModels project. To bring these advances in discrete choice models to UrbanSim, and thereby enable making UrbanSim models more behaviorally realistic, and more accurate by addressing model biases. We will provide updates here, so stay tuned. If you have questions about this project, this is the right place to ask them.