Knowledge Discovery in Massive Data

The Internet has grown tremendously around the world, following early commercial use in the United States in the early 1990s. This has had a huge effect on society, and there is now an enormous amount of data available on the Internet. The Internet is no longer merely a means of communication, but is now the largest database that humans have ever assembled. However, it is a truly massive collection of unstructured data and finding necessary information is not trivial.

In recent years, there is an increasing amount of data that is produced by sensors. This includes, for example, GPS data from cars and cellphones, log data from transportation systems, and life log data from wearable devices. Our laboratory researches software techniques (information retrieval and techniques for knowledge discovery) for obtaining useful information and knowledge from such massive data. More concretely, we research topics such as the following:

  • Techniques for information retrieval and string processing needed to retrieve data related to a search request (e.g. keyword).
  • Techniques for efficiently indexing large amounts of data for fast searching and processing.
  • Compression techniques to allow efficient storage, transmission and re-use of large data.
  • Data mining techniques to retrieve useful rules from massive amounts of data.
  • Artificial intelligence techniques that automatically learn from known data to classify unknown data and make predictions regarding the future.

Our research on such topics includes theoretical work on the underlying algorithms, complexity theory, and machine learning theory. We also research applications to multimedia (sound, images, movies), viruses, human genetics, and medical applications including diagnosis and medical drugs.


Hiroki Arimura
Specialized field
Algorithm design and analysis, artificial intelligence, information retrieval, data mining