The KAUST Research Conference: Computational and Statistical Interface to Big Data brought together leading computer science researchers and statistical experts to discuss data science on campus from March 19 to 21. Photo by Andrea Bachofen-Echt.
-By David Murphy, KAUST News
The recent KAUST Research Conference: Computational and Statistical Interface to Big Data brought together leading computer science researchers and statistical experts to discuss the current state and future of data science. Held on campus from March 19 to 21, the conference covered such data science topics as succinct data representation and storage; big data visualization; parallel and distributed algorithms for inference and optimization; and analysis of large graphs and networks.
The event also covered spatial and temporal statistics; large-scale machine learning; differential privacy in big data; and technology and engineering for the big data generation. Visiting institutions included the American University of Sharjah, Aalborg University, the University of Chicago, Princeton University, the University of Technology Sydney and the University of Pennsylvania, among others.
The goal of the conference was centered on the exchange of ideas, research findings and collaboration to promote big data research at KAUST and to highlight the power of big data and how it can be applied to the benefit of numerous scientific fields in the Kingdom and globally.
Profesor Xuefeng Cui, Tsinghua University, and Xin Gao, KAUST associate professor of computer science and the conference chair during the recent Computational and Statistical Interface to Big Data research conference. Photo by Andrea Bachofen-Echt.
In his conference welcoming remark, Xin Gao, KAUST associate professor of computer science and the conference chair, took the opportunity to welcome conference guests to the campus and also highlighted the importance of big data research in the coming years.
"This conference is very important to us to promote the Big Data conversation at KAUST and worldwide. I feel it is timely needed to create such a dialogue for computer scientists, statisticians, as well as big data generators to interact and exchange their ideas and findings. At KAUST, we want to promote collaboration and create active links internationally," Gao emphasized.
"We are now in the fourth paradigm of science—data science. The massive amount of structured and unstructured data we have now poses new challenges and opportunities to the fields of computer science and statistics," he added.
Hernando Ombao, KAUST professor of statistics, speaks during the recent Computational and Statistical Interface to Big Data research conference. Photo by Andrea Bachofen-Echt.
In his keynote address entitled "Challenges in the Analysis of High Dimensional Brain Signals" Hernando Ombao, KAUST professor of statistics, explained how advances in imaging technology have given unparalleled access neuroscientists into how the brain "works" and the challenges that still remain in analyzing brain data.
"There is not one way to easily characterize the health of an individual. One of the challenges for data scientists is to characterize brain activity, and in particular the connectivity between brain regions," Ombao said. "Complex structure in the data, high dimensionality and large data sets are among the other challenges."
Tony Cai, vice dean and the Dorothy Silberberg professor of statistics at the Wharton School of the University of Pennsylvania, spoke on the interplay between statistical accuracy and computational efficiency in two specific problems. Photo by Andrea Bachofen-Echt.
In the opening keynote address of the conference's second day, Tony Cai, vice dean and the Dorothy Silberberg professor of statistics at the Wharton School of the University of Pennsylvania, spoke on the interplay between statistical accuracy and computational efficiency in two specific problems. The problems Cai discussed were submatrix localization and sparse matrix detection based on a noisy observation of a large matrix.
"My talk concerns the intersection between computation and statistics. Most of the problems in machine learning center around estimation and recovery. A wide range of problems can be described with the 'signal+noise' model. In particular, the problem of sparse signal detection arises frequently in a wide range of fields," Cai stated.
"The optimal statistical performance can always be achieved by computationally efficient methods. Achieving the optimal estimation/localization rate does not present any computational difficulty," he added.
Robert Hoehndorf, an assistant professor in computer science at KAUST, discussed the role symbolic artificial intelligence plays in computational biology in his talk "Symbolic AI in Computational Biology."
"At KAUST, we work on knowledge representation and reasoning. We work on ontologies and knowledge graphs. We also work on data and knowledge integration and semantic technologies and integration—we want our knowledge to influence our decisions," Hoehndorf expressed.
"Symbols are physical entities that can encode our knowledge of the world. We use neuro-symbolic feature learning and knowledge graphs to infer information in our research," he noted.
An audience of KAUST faculty members, students and postdoctoral fellows and visiting researchers listens to a presentation during the recent on campus Computational and Statistical Interface to Big Data research conference. Photo by Andrea Bachofen-Echt.
On the conference's third and final day, Christian Jensen, professor of computer science at Aalborg University, Denmark, expanded on his research which focuses on data management and data-intensive systems during the events final keynote address. During his address, Jensen touched on the continued growth of society-wide digitalization and how our daily lives are increasingly being captured through the digital sphere.
"We are instrumenting reality at a rapid pace. Just look at our smartphones, for example. There is an ever-growing volume of data these days and we need new ways to capture it. To be competitive, society and businesses must be able to create value from data. Decisions based on good data beat decisions based on feelings or opinions," he said.
"There is now an unprecedented level of data worldwide and we use path weights to achieve more accurate results and data. In the future, we will have to deal with a much greater volume of data," Jensen added.
In her presentation entitled "Causal Modeling with Generative Neural Networks," Michèle Sebag, a professor from the Centre National de la Recherche Scientifique, France, warned of the perception that big data is the all-encompassing solution in data-driven science.
"We don't know the extent that Big Data can pave the way for 'Big Brother' in the future. With Big Data, we can run the risk of exporting our biases into the future. There is a sense that Big Data cures everything and can do everything," Sebag pronounced. "We want to create an AI with a common decency and we want to create an AI with no biases."
Xin Gao, KAUST associate professor of computer science and the conference chair during the recent Computational and Statistical Interface to Big Data research conference. Photo by Andrea Bachofen-Echt.
Professor Xin Gao, conference chair, was pleased with the outcome of the conference, and felt that the conference met its objectives, also creating a route for future global collaborations.
"We are grateful to all the guests for having delivered those inspiring and exciting talks. We are also proud to showcase the top-notch research done by KAUST researchers to the leading experts in the field. A number of collaborations and initiatives have been conceived among the participants of the conference, and we look forward to hearing about their successful stories soon," Gao emphasized.
The conference was co-sponsored by the KAUST Computer, Electrical and Mathematical Science and Engineering (CEMSE) Division, and additional support was provided by the KAUST Industry Collaboration Program (KICP), Industry Partnerships Office, with financial support from the KAUST Office of Sponsored Research.