CSCI 8970 – Colloquium Series – Fall 2010 – Third Event
Probabilistic Models for Matrix Analysis
Monday, September 27, 2010
Presenter: |
Arindam Banerjee | |
Dr. Banerjee’s presentation dealt with the new forms in which researchers are analysis matrixes, in particular, the use of Bayesian network analysis methods. Data matrixes had been able to process single label classifications, and optimized a problem to cluster similar values, yet the goal of Dr. Banerjee process is to not simply fit the data to the best model but analyze all possible straight lines. After evaluating different linear relationships, some of the lines will gain and lose weight according to how much they are used. Using gene expression analysis, it is possible to find some valuable data in the matrix. Through the use of a dirichlet distribution of all possible mixed memberships, and the use of plate diagrams, the use of a Bayesian co-clustering process permits for an accurate and rapid evaluation of data.
Through the use of variation inference, Dr. Banerjee demonstrated the results the process when analyzing movielens information and foodmart information. The datasets allowed for the comparison of RBC and RBC-FF to other matrix management systems including SVD. While SVD can be more accurate obtaining results, RBC obtained similar results at a faster rate. Dr. Banerjee’s team competed in the one million dollar Netflix challenge and their approach obtained significant praise. The tests showed how some information was better in providing taste predictions. When analyzing genre and cast, the matches were more helpful, while the use of plot data was detrimental. Other applications for this method of analysis includes: text classifications (discrimination between topics), cluster ensembles (combines multiple clusters of a dataset), and Bayesian kernel methods (nonlinear covariance among rows and columns).
In conclusion, unlike the other two presentations, this one was somewhat harder to follow, perhaps due to its mathematical complexity. The study was very interesting and demonstrated how important it is for constantly find different ways to look at data and find which evaluations methods will be most helpful. Another application which Dr. Banerjee shared with the audience included the use of multi label classifications, and their use in evaluating NASA aviation safety reports. Other applications for their matrixes analysis includes: covariance models – nonlinear, high-dimensional, online learning and tracking covariances over rows, as well as applications in climate science, finance and health care. Apart from the staff at the University of Minnesota, the project works in collaboration with the NSA and NASA.