Yes, understandably you might be thinking is this related to Rick and Morty?

Well unfortunately No. **But** you should really continue reading cause Multidimensional topic modeling is really cool.

In this short piece we will explore the fundamental idea behind multi-dimensional topic modeling and even give you a list of some open-sourced implementations so stay tuned.

## The What?

### One Dimension

The aforementioned factors are called Latent variables (hidden factors that

LDA [1] is one of the oldest and most successful topic models, it assumes we have a set of Z latent components (usually called “topics” ), and each data point (document) has a discrete distribution over these topics.

The set of latent components usually relates to a single latent variable where the LDA tends to learn distributions which correspond to semantic topics (such as SPORTS or ECONOMICS) which dominate the choice of words in a document, rather than syntax, perspective, or other aspects of document content.

### Two Dimensions

Better modelling can be achieved by using more than a single set of latent components.

Imagine that instead of a one-dimensional vector of Z topics, we have a two-dimensional matrix with Z1 components along one dimension (rows) and Z2 components along with the other (columns).

This structure makes sense if your data is composed of two different factors, and the two dimensions might correspond to factors such as news topic and political perspective (if we are modelling newspaper editorials), or research topic and discipline (if we are modelling scientific papers). Individual cells of the matrix would represent pairs such as (ECONOMICS, CONSERVATIVE) or (GRAMMAR, LINGUISTICS). this is the idea behind the two-dimensional models like TAM [2] and SAGE.

### A Ton of Dimensions

We can expand the idea even further by assuming K factors modeled with a K-dimensional array, where each cell of the array has a pointer to a word distribution corresponding to that particular K-tuple.

For example, in addition to topic and perspective, we might want to model a third factor of the author’s gender in newspaper editorials, yielding triples such as (ECONOMICS, CONSERVATIVE, MALE).

Conceptually, each K tuple functions as a topic in the original LDA (with an associated word distribution ) except that K-tuples imply a structure, e.g. the pairs (ECONOMICS, CONSERVATIVE) and (ECONOMICS, LIBERAL) are related.

### Related algorithms

Other related approaches include the **Contrastive Opinion Summarization **task. The goal of this task is to extract sentences from positive and negative sets of opinions on a topic and generate a comparative summary containing sentence pairs that are both contrastive to each other and representative with respect to the given sets of opinions.

The method reported in [3] models the summarization task as an Optimization problem and it uses Natural Language Processing (NLP) and Optimization techniques to generate a representative and comparative summary from customer reviews about a topic, product or service.

## The How

In the following table we list some of the available implementations of the aforementioned algorithms

Name | Programming Language: | Description |

Factorial LDA code | Java | Implementation of factorial LDA |

ccLDA and TAM code | Java | Implementation TAM [2] and ccLDA |

VODUM | Java | Implementation of the Viewpoint and Opinion Discovery Unification Model [4] |

Contrastive Summarization | Python | Implementation of the Contrastive summarization model from [3] |

SeaNMF | Python | Implementation of the Sea Nonnegative Matrix Factorization from [5] |

STTM | Java | A Library of Short Text Topic Modeling |

## Conclusion

In this article we explored the idea of expanding topic modeling to cover various aspect and by now, you might be thinking …

I told you didn’t I.

But seriously how am I going to make a million-dollar from this knowledge? Well, if you want to see a cool application check out this piece to find out how it is possible to use multi-dimensional topic modelling to discover different political views in an opinionated text.

**Do you know that we use all this and other AI technologies in our app?** Look at what you’re reading now applied in action. Try our Almeta News app. You can download it from Google Play or Apple’s App Store.

## Further reading

[1] D. M. Blei, A. Y. Ng, and M. I. Jordan, “Latent dirichlet
allocation,” *J. Mach. Learn. Res.*, vol. 3, no. Jan, pp.
993–1022, 2003.

[2] M. Paul and R. Girju, “A two-dimensional topic-aspect model for discovering multi-faceted topics,” in *Twenty-Fourth AAAI Conference on Artificial Intelligence*, 2010.

[3] H. D. Kim and C. Zhai, “Generating comparative summaries of
contradictory opinions in text,” in *Proceedings of the 18th ACM
conference on Information and knowledge management*, 2009, pp.
385–394.

[4] T. Thonet, G. Cabanac, M. Boughanem, and K. Pinel-Sauvagnat,
“Vodum: A topic model unifying viewpoint, topic and opinion
discovery,” in *European Conference on Information Retrieval*,
2016, pp. 533–545.

[5] T. Shi, K. Kang, J. Choo, and C. K. Reddy, “Short-text topic
modeling via non-negative matrix factorization enriched with local
word-context correlations,” in *Proceedings of the 2018 World
Wide Web Conference*, 2018, pp. 1105–1114.