Visualisation Methods of Hierarchical Biological Data: A Survey and Review

  • Irina Kuznetsova
  • Artur Lugmayr
  • Andreas Holzinger

Abstract

The sheer amount of high dimensional biomedical data requires machine learning, and advanced data visualization techniques to make the data understandable for human experts. Most biomedical data today is in arbitrary high dimensional spaces, and is not directly accessible to the human expert for a visual and interactive analysis process. To cope with this challenge, the application of machine learning and knowledge extraction methods is indispensable throughout the entire data analysis workflow. Nevertheless, human experts need to understand and interpret the data and experimental results. Appropriate understanding is typically supported by visualizing the results adequately, which is not a simple task. Consequently, data visualization is one of the most crucial steps in conveying biomedical results. It can and should be considered as a critical part of the analysis pipeline. Still as of today, 2D representations dominate, and human perception is limited to this lower dimension to understand the data. This makes the visualization of the results in an understandable and comprehensive manner a grand challenge.


This paper reviews the current state of visualization methods in a biomedical context. It focuses on hierarchical biological data as a source for visualization, and gives a comprehensive

Downloads

Download data is not yet available.

References

[1] Aggarwal, C.C. and Reddy, C.K. 2013. Data Clustering: Algorithms and Applications. Chapman & Hall/CRC.
[2] Balzer, M., Deussen, O. and Lewerentz, C. 2005. Voronoi Treemaps for the Visualization of Software Metrics. Proceedings of the 2005 ACM Symposium on Software Visualization (St. Louis, Missouri, 2005), 165–172.
[3] Bederson, B.B., Shneiderman, B. and Wattenberg, M. 2002. Ordered and Quantum Treemaps: Making Effective Use of 2D Space to Display Hierarchies. ACM Trans. Graph. 21, 4 (Oct. 2002), 833–854.
[4] Boser, B.E., Guyon, I.M. and Vapnik, V.N. 1992. A Training Algorithm for Optimal Margin Classifiers. Proceedings of the Fifth Annual Workshop on Computational Learning Theory (Pittsburgh, Pennsylvania, USA, 1992), 144–152.
[5] Bruls, M., Huizing, K. and Wijk, J.J. van 2000. Squarified Treemaps. Eurographics / IEEE VGTC Symposium on Visualization (2000).
[6] Celko, J. 2004. Joe Celko’s Trees and Hierarchies in SQL for Smarties, (The Morgan Kaufmann Series in Data Management Systems) (The Morgan Kaufmann Series in Data Management Systems). Morgan Kaufmann.
[7] Daniel H. Huson, C.S. Regula Rupp 2010. Phylogenetic Networks Concepts, Algorithms and Applications. Cambridge : Cambridge University Press.
[8] Date, C.J. 2003. An Introduction to Database Systems. Addison-Wesley Longman Publishing Co., Inc.
[9] Eades, P. 1993. Algorithms for drawing trees. CM SIGACT News. (1993).
[10] Eddy, S.R. 2004. What is a hidden Markov model? Nature Biotechnology. 22, 10 (Oct. 2004), 1315–1316.
[11] Farris, J.S. 1970. Methods for Computing Wagner Trees. Systematic Biology. 19, 1 (Mar. 1970), 83–92.
[12] Ferdosi, B.J. and Roerdink, J.B.T.M. 2011. Visualizing High-dimensional Structures by Dimension Ordering and Filtering Using Subspace Analysis. Proceedings of the 13th Eurographics / IEEE - VGTC Conference on Visualization (Bergen, Norway, 2011), 1121–1130.
[13] Fitch, W.M. 1971. Toward Defining the Course of Evolution: Minimum Change for a Specific Tree Topology. Systematic Biology. 20, 4 (Dec. 1971), 406–416.
[14] Fling, K. Four Ways To Work With Hierarchical Data.
[15] Gonen, A., Rosenbaum, D., Eldar, Y.C. and Shalev-Shwartz, S. 2016. Subspace Learning with Partial Information. Journal of Machine Learning Research. 17, 52 (2016), 1–21.
[16] Greenacre, M. Hierarchical cluster analysis.
[17] Ham, F. van and Wijk, J.J. van 2002. Beamtrees: compact visualization of large hierarchies. IEEE Symposium on Information Visualization, 2002. INFOVIS 2002. (2002), 93–100.
[18] Holzinger, A. 2014. Biomedical Informatics: Discovering Knowledge in Big Data. Springer.
[19] Holzinger, A. 2016. Interactive machine learning for health informatics: when do we need the human-in-the-loop? Brain Informatics. 3, 2 (Jun. 2016), 119–131.
[20] Holzinger, A., Plass, M., Holzinger, K., Crisan, G.C., Pintea, C.-M. and Palade, V. 2017. A glass-box interactive machine learning approach for solving NP-hard problems with the human-in-the-loop. CoRR. abs/1708.01104, (2017).
[21] Huerta-Cepas, J., Serra, F. and Bork, P. 2016. ETE 3: Reconstruction, Analysis, and Visualization of Phylogenomic Data. Molecular biology and evolution. 33, 6 (Jun. 2016), 1635–1638.
[22] Hund, M., Böhm, D., Sturm, W., Sedlmair, M., Schreck, T., Ullrich, T., Keim, D.A., Majnaric, L. and Holzinger, A. 2016. Visual analytics for concept exploration in subspaces of patient groups. Brain Informatics. 3, 4 (Dec. 2016), 233–247.
[23] Hund, M., Sturm, W., Schreck, T., Ullrich, T., Keim, D., Majnaric, L. and Holzinger, A. 2015. Analysis of Patient Groups and Immunization Results Based on Subspace Clustering. Brain Informatics and Health: 8th International Conference, BIH 2015, London, UK, August 30 - September 2, 2015. Proceedings. Y. Guo, K. Friston, F. Aldo, S. Hill, and H. Peng, eds. Springer International Publishing. 358–368.
[24] Jacobson, R.E. and Jacobson, R. 2000. Information Design. MIT Press.
[25] Abu-Jamous, A.K.. Basel.‎ Fa Rui.‎ Nandi 2015. Integrative Cluster Analysis in Bioinformatics. Somerset : Wiley‎.
[26] Abu -Jamous Basel, N.A.K. Fa Rui 2015. Integrative Cluster Analysis in Bioinformatics. Wiley.
[27] Jeanquartier, F., Jeanquartier, C. and Holzinger, A. 2015. Visualizing Uncertainty of RNA Sequence Base Pairing Variants. (2015).
[28] Jiang, M.Q.. Rui.‎ Zhang Xuegong.‎ Zhang 2013. Basics of Bioinformatics Lecture Notes of the Graduate Summer School on Bioinformatics of China‎. Berlin : Springer Berlin Heidelberg‎.
[29] John Stasko, K.M. Richard Catrambone Mark Guzdial 2000. An evaluation of space-filling information visualizations for depicting hierarchical structures. International Journal of Human - Computer Studies. 53(5), (2000), 663–694.
[30] Kannan, L. and Wheeler, W.C. 2012. Maximum Parsimony on Phylogenetic networks. Algorithms for molecular biology : AMB. 7, 1 (May. 2012), 9+.
[31] Kent, W.J., Sugnet, C.W., Furey, T.S., Roskin, K.M., Pringle, T.H., Zahler, A.M., Haussler and D. 2002. The Human Genome Browser at UCSC. Genome Research. 12, 6 (Jun. 2002), 996–1006.
[32] Keogh, E. and Mueen, A. 2010. Curse of Dimensionality. Encyclopedia of Machine Learning. C. Sammut and G.I. Webb, eds. Springer US. 257–258.
[33] Kreft, Ł., Botzki, A., Coppens, F., Vandepoele, K. and Van Bel, M. 2017. PhyD3: a phylogenetic tree viewer with extended phyloXML support for functional genomics data visualization. Bioinformatics. 33, 18 (2017), 2946–2947.
[34] Kruskal, J.B. and Landwehr, J.M. 1983. Icicle Plots: Better Displays for Hierarchical Clustering. The American Statistician. 37, 2 (1983), 162–168.
[35] Kuznetsova, I., Karpievitch, Y.V., Filipovska, A., Lugmayr, A. and Holzinger, A. 2017. Review of Machine Learning Algorithms in Differential Expression Analysis. ArXiv e-prints. (Jul. 2017).
[36] Loewenstein, Y., Portugaly, E., Fromer, M. and Linial, M. 2008. Efficient algorithms for accurate hierarchical clustering of huge datasets: tackling the entire protein space. Bioinformatics. 24, 13 (Jul. 2008), i41–i49.
[37] Lugmayr, A. 2007. Ambience, ambience, ambience - What are Ambient Media? Interactive TV: A Shared Experience, TISCP Adjunct Proceedings of EuroITV 2007 (Amsterdam, 2007).
[38] Lugmayr, A., Risse, T., Stockleben, B., Kaario, J. and Laurila, K. 2009. Special issue on semantic ambient media experiences. Multimedia Tools and Applications. 44, 3 (2009), 331–335.
[39] Lugmayr, A., Serral, E., Scherp, A., Pogorelc, B. and Mustaquim, M. 2013. Ambient media today and tomorrow. Multimedia Tools and Applications. (2013), 1–31.
[40] Lugmayr, A., Stockleben, B., Scheib, C. and Mailaparampil, M. 2017. Cognitive Big Data. Survey and Review on Big Data Research and its Implications: What is Really “New”? Cognitive Big Data! Journal of Knowledge Management (JMM). 21, 1 (2017).
[41] Lugmayr, A., Stockleben, B., Scheib, C., Mailaparampil, M., Mesia, N. and Ranta, H. 2016. A Comprehensive Survey on Big Data Research and It’s Implications - What is really ’new’ in Big Data? It’s Cognitive Big Data. Proceedings of the 20th Pacific-Asian Conference on Information Systems (PACIS 2016) (2016).
[42] Mazza, R. 2009. Introduction to Information Visualization. Springer.
[43] Müller, E., Assent, I., Krieger, R., Jansen, T. and Seidl, T. 2008. Morpheus: Interactive Exploration of Subspace Clustering. Proc. 14th ACM SIGKDD International Conference on Knowledge Discovery in Databases (KDD 2008), Las Vegas, USA (New York,NY,USA, 2008), 1089–1092.
[44] Müller, J.S., Bünau, P. von, Meinecke, F.C., Király, F.J. and Müller, K.-R. 2011. The Stationary Subspace Analysis Toolbox. J. Mach. Learn. Res. 12, (Nov. 2011), 3065–3069.
[45] Nielsen, P. and Parui, U. 2009. Microsoft SQL Server 2008 Bible. Wiley Publishing.
[46] Noble, W.S. 2006. What is a support vector machine? Nature Biotechnology. 24, 12 (2006), 1565–1567.
[47] Otjacques, B., Cornil, M., Noirhomme, M. and Feltz, F. 2009. CGD – A New Algorithm to Optimize Space Occupation in Ellimaps. Human-Computer Interaction – INTERACT 2009: 12th IFIP TC 13 International Conference, Uppsala, Sweden, August 24-28, 2009, Proceedings, Part II. T. Gross, J. Gulliksen, P. Kotzé, L. Oestreicher, P. Palanque, R.O. Prates, and M. Winckler, eds. Springer Berlin Heidelberg. 805–818.
[48] Pettersen, E.F., Goddard, T.D., Huang, C.C., Couch, G.S., Greenblatt, D.M., Meng, E.C. and Ferrin, T.E. 2004. UCSF Chimera - A visualization system for exploratory research and analysis. Journal of Computational Chemistry. 25, 13 (2004), 1605–1612.
[49] Pogorelc, B., Vatavu, R.-D., Lugmayr, A., Stockleben, B., Risse, T., Kaario, J., Lomonaco, E. and Gams, M. 2012. Semantic ambient media: From ambient advertising to ambient-assisted living. Multimedia Tools and Applications. 58, 2 (2012), 399–425.
[50] Porollo, A. and Meller, J. 2007. Versatile annotation and publication quality visualization of protein complexes using POLYVIEW-3D. BMC Bioinformatics. 8, 1 (Aug. 2007), 316.
[51] Puga, J.L., Krzywinski, M. and Altman, N. 2015. Points of significance: Bayesian statistics. Nature Methods. 12, 5 (Apr. 2015), 377–378.
[52] Ramakrishnan, R. and Gehrke, J. 2003. Database Management Systems. McGraw-Hill, Inc.
[53] Robinson, J.T., Thorvaldsdottir, H., Winckler, W., Guttman, M., Lander, E.S., Getz, G. and Mesirov, J.P. 2011. Integrative genomics viewer. Nature Biotechnology. 29, 1 (Jan. 2011), 24–26.
[54] Saitou, N. and Nei, M. 1987. The neighbor-joining method: a new method for reconstructing phylogenetic trees. Molecular biology and evolution. 4, 4 (Jul. 1987), 406–425.
[55] Schulz, H. H. & Schumann 2006. Visualizing Graphs - A Generalized View. Tenth International Conference on. IEEE. (2006).
[56] Severin, J., Lizio, M., Harshbarger, J., Kawaji, H., Daub, C.O., Hayashizaki, Y., Consortium, T.F., Bertin, N. and Forrest, A.R.R. 2014. Interactive visualization and analysis of large-scale sequencing datasets using ZENBU. Nat Biotech. 32, 3 (Mar. 2014), 217–219.
[57] Sharma, A., Boroevich, K.A., Shigemizu, D., Kamatani, Y., Kubo, M. and Tsunoda, T. 2017. Hierarchical Maximum Likelihood Clustering Approach. IEEE Transactions on Biomedical Engineering. 64, 1 (Jan. 2017), 112–122.
[58] Shepard, R.N. 1962. The analysis of proximities: Multidimensional scaling with an unknown distance function. II. Psychometrika. 27, 3 (Sep. 1962), 219–246.
[59] Shi, M., Gao, J. and Zhang, M.Q. 2017. Web3DMol: interactive protein structure visualization based on WebGL. Nucleic Acids Research. 45, W1 (2017), W523–W527.
[60] Shneiderman, B. 1996. The Eyes Have It: A Task by Data Type Taxonomy for Information Visualizations. Proceedings of the 1996 IEEE Symposium on Visual Languages (Washington, DC, USA, 1996), 336–.
[61] Shneiderman, B. 1992. Tree visualization with tree-maps: 2-d space-filling approach. ACM Transactions on Graphics (TOG). (1992).
[62] Simonic, K.M., Holzinger, A., Bloice, M. and Hermann, J. 2011. Optimizing long-term treatment of rheumatoid arthritis with systematic documentation. 2011 5th International Conference on Pervasive Computing Technologies for Healthcare (PervasiveHealth) and Workshops (May. 2011), 550–554.
[63] Sokal, R.R. and Michener, C.D. 1958. A statistical method for evaluating systematic relationships. University of Kansas Scientific Bulletin. 28, (1958), 1409–1438.
[64] Spence, R. 2001. Information Visualization. Springer-Verlag.
[65] Stasko, J. and Zhang, E. 2000. Focus+Context Display and Navigation Techniques for Enhancing Radial, Space-Filling Hierarchy Visualizations. Proceedings of the IEEE Symposium on Information Vizualization 2000 (Washington, DC, USA, 2000), 57–.
[66] Supek, F., Bošnjak, M., Škunca, N. and Šmuc, T. 2011. REVIGO Summarizes and Visualizes Long Lists of Gene Ontology Terms. PLOS ONE. 6, 7 (2011), 1–9.
[67] T. Von Landesberger, D.W. A.Kuijper T.Schreck J.Kohlhammer J. J.Van Wijk J.-D.Fekete D.W. Fellner 2011. Visual Analysis of Large Graphs: State‐of‐the‐Art and Future Research Challenges. Computer Graphics Forum. 30, 6 (2011), 1719–1749.
[68] Tenenbaum, J.B., Silva, V. and Langford, J.C. 2000. A Global Geometric Framework for Nonlinear Dimensionality Reduction. Science. 290, 5500 (2000), 2319–2323.
[69] Tropashko, V. 2005. Nested Intervals Tree Encoding in SQL. SIGMOD Rec. 34, 2 (Jun. 2005), 47–52.
[70] Tropashko, V. 2002. Trees in SQL: Nested Sets and Materialized Path.
[71] Tu, Y. and Shen, H.W. 2007. Visualizing Changes of Hierarchical Data using Treemaps. IEEE Transactions on Visualization and Computer Graphics. 13, 6 (Nov. 2007), 1286–1293.
[72] Tufte, E.R. 1990. Envisioning information. Graphics Press.
[73] Turkay, C., Jeanquartier, F., Holzinger, A. and Hauser, H. 2014. On Computationally-Enhanced Visual Analysis of Heterogeneous Data and Its Application in Biomedical Informatics. Interactive Knowledge Discovery and Data Mining in Biomedical Informatics: State-of-the-Art and Future Challenges. A. Holzinger and I. Jurisica, eds. Springer Berlin Heidelberg. 117–140.
[74] Turkay, C., Laramee, R. and Holzinger, A. 2017. On the Challenges and Opportunities in Visualization for Machine Learning and Knowledge Extraction: A Research Agenda. Machine Learning and Knowledge Extraction: First IFIP TC 5, WG 8.4, 8.9, 12.9 International Cross-Domain Conference, CD-MAKE 2017, Reggio, Italy, August 29 – September 1, 2017, Proceedings. A. Holzinger, P. Kieseberg, A.M. Tjoa, and E. Weippl, eds. Springer International Publishing. 191–198.
[75] Wattenberg, M. 1999. Visualizing the Stock Market. CHI ’99 Extended Abstracts on Human Factors in Computing Systems (Pittsburgh, Pennsylvania, 1999), 188–189.
[76] Wong, K.-C. 2016. Computational Biology and Bioinformatics : Gene Regulation. Raton : CRC Press.
[77] Yellen Jay, R.K.B. Maurer StephenB 2003. Handbook of Graph Theory. CRC Press.
[78] Zhang, H., Gao, S., Lercher, M.J., Hu, S. and Chen, W.-H.H. 2012. EvolView, an online tool for visualizing, annotating and managing phylogenetic trees. Nucleic acids research. 40, Web Server issue (Jul. 2012), W569–W572.
[79] 1999. SmartMoney map of the market.(Interactive Media Design Review 1999 gold winner). I.D. 46, 4 (1999), 62(2).
[80] The simplest(?) way to do tree-based queries in SQL.
[81] Troels’ links: Relational database systems.
[82] What are the options for storing hierarchical data in a relational database?
Published
2018-01-11
How to Cite
KUZNETSOVA, Irina; LUGMAYR, Artur; HOLZINGER, Andreas. Visualisation Methods of Hierarchical Biological Data: A Survey and Review. International SERIES on Information Systems and Management in Creative eMedia (CreMedia), [S.l.], n. 2017/2, p. 32-39, jan. 2018. ISSN 2341-5576. Available at: <http://www.ambientmediaassociation.org/Journal/index.php/series/article/view/283>. Date accessed: 16 july 2018.
Share |

Most read articles by the same author(s)