The Resource Dynamic model generation and semantic search for open source projects using big data analytics, Sravani Punyamurthula

Dynamic model generation and semantic search for open source projects using big data analytics, Sravani Punyamurthula

Label
Dynamic model generation and semantic search for open source projects using big data analytics
Title
Dynamic model generation and semantic search for open source projects using big data analytics
Statement of responsibility
Sravani Punyamurthula
Creator
Contributor
Author
Degree supervisor
Subject
Genre
Language
eng
Summary
Open source software is quite ubiquitous and caters to most common software needs developers come across. Many open source projects are considered better than their commercial equivalents as a larger pool of developers constantly improve it. However, one of the challenges to using open source is to manually analyze the code and understand the dependencies. Especially, for larger projects it is a very time consuming task. Hence, there is a strong demand for an automated process that could analyze the code and build an accurate model that represents the software system of the open source. The objective of this thesis is to provide a solution to this problem by building a framework that can extract the features, identify components, connectors from the open source and provide the user a way to search functionality. The first step of this process is to extract the metadata and dependency information from the source code using a call graph. A call graph is a directed graph that represents the execution logic of the program and helps with analyzing the relationships between various classes. The extracted data is then transformed using Natural language processing (NLP) [15] techniques like lemmatization. In the second step, the transformed data is semantically analyzed for feature extraction using Term Frequency Inverse Document Frequency (TF-IDF), synonym detection using Word2Vec [3] and component detection using Machine Learning dynamically. The dependency information extracted from the call graph is then used for identifying the connectors between the detected components. Also, the dependency information is used to build a class dependency matrix that is further used for identifying dependency based components. In the final step, ontology is used to represent the features, components, connectors, classes discovered in the previous step and the relationships between them. The generated ontology can be queried to search for functionality using the SPARQL [5] query language. Protégé [4] is used for visualization of the generated ontology. The proposed solution is built on Spark, a parallel processing framework and provides a fully automated and scalable model for representing the software. In this thesis, we have analyzed two open source projects Apache Solr and Apache Lucene as a case study. Apache Solr is built using Apache Lucene core library. The results from Apache Solr analysis are compared to the manual evaluation of software architecture by experts. We have observed that 90% of the features identified in the manual analysis are recovered in the automated approach and also many new features are discovered. This thesis also analyzes the dependencies between the components detected for Apache Solr and Apache Lucene projects. From this analysis of the two systems, we have observed that Apache Solr is highly dependent on Apache Lucene
Cataloging source
UMK
http://library.link/vocab/creatorName
Punyamurthula, Sravani
Degree
M.S.
Dissertation note
School of Computing and Engineering.
Dissertation year
2015.
Granting institution
University of Missouri-Kansas City,
Illustrations
illustrations
Index
no index present
Literary form
non fiction
Nature of contents
  • dictionaries
  • bibliography
  • theses
http://library.link/vocab/relatedWorkOrContributorDate
1960-
http://library.link/vocab/relatedWorkOrContributorName
  • Lee, Yugyung
  • Zheng, Yongjie
http://library.link/vocab/subjectName
  • Open source software
  • Big data
  • Machine learning
  • Software architecture
Label
Dynamic model generation and semantic search for open source projects using big data analytics, Sravani Punyamurthula
Instantiates
Publication
Note
  • "A thesis in Computer Science."
  • Advisors: Yugyung Lee and Yongjie Zheng
  • Vita
Antecedent source
not applicable
Bibliography note
Includes bibliographical references (pages 86-87)
Carrier category
online resource
Carrier category code
cr
Carrier MARC source
rdacarrier
Color
black and white
Content category
text
Content type code
txt
Content type MARC source
rdacontent
Contents
Introduction -- Background and related work -- Proposed framework -- Results and evaluation -- Conclusion and future work
Control code
945455733
Dimensions
unknown
Extent
1 online resource (88 pages)
File format
one file format
Form of item
online
Level of compression
mixed
Media category
computer
Media MARC source
rdamedia
Media type code
c
Other physical details
illustrations.
Quality assurance targets
not applicable
Specific material designation
remote
System control number
(OCoLC)945455733
System details
  • The full text of the thesis is available as an Adobe Acrobat .pdf file; Adobe Acrobat Reader required to view the file
  • Mode of access: World Wide Web
Label
Dynamic model generation and semantic search for open source projects using big data analytics, Sravani Punyamurthula
Publication
Note
  • "A thesis in Computer Science."
  • Advisors: Yugyung Lee and Yongjie Zheng
  • Vita
Antecedent source
not applicable
Bibliography note
Includes bibliographical references (pages 86-87)
Carrier category
online resource
Carrier category code
cr
Carrier MARC source
rdacarrier
Color
black and white
Content category
text
Content type code
txt
Content type MARC source
rdacontent
Contents
Introduction -- Background and related work -- Proposed framework -- Results and evaluation -- Conclusion and future work
Control code
945455733
Dimensions
unknown
Extent
1 online resource (88 pages)
File format
one file format
Form of item
online
Level of compression
mixed
Media category
computer
Media MARC source
rdamedia
Media type code
c
Other physical details
illustrations.
Quality assurance targets
not applicable
Specific material designation
remote
System control number
(OCoLC)945455733
System details
  • The full text of the thesis is available as an Adobe Acrobat .pdf file; Adobe Acrobat Reader required to view the file
  • Mode of access: World Wide Web

Library Locations

  • St. Louis Mercantile LibraryBorrow it
    1 University Blvd, St. Louis, MO, 63121, US
    38.710138 -90.311107
  • Thomas Jefferson LibraryBorrow it
    1 University Blvd, St. Louis, MO, 63121, US
    38.710138 -90.311107
  • University ArchivesBorrow it
    703 Lewis Hall, Columbia, MO, 65211, US
  • University of Missouri-St. Louis Libraries DepositoryBorrow it
    2908 Lemone Blvd, Columbia, MO, 65201, US
    38.919360 -92.291620
  • University of Missouri-St. Louis Libraries DepositoryBorrow it
    2908 Lemone Blvd, Columbia, MO, 65201, US
    38.919360 -92.291620
  • Ward E Barnes Education LibraryBorrow it
    8001 Natural Bridge Rd, St. Louis, MO, 63121, US
    38.707079 -90.311355
Processing Feedback ...