The Resource Feature-based analysis for open source using big data analytics, Malathy Krishnan

Feature-based analysis for open source using big data analytics, Malathy Krishnan

Label
Feature-based analysis for open source using big data analytics
Title
Feature-based analysis for open source using big data analytics
Statement of responsibility
Malathy Krishnan
Creator
Contributor
Author
Degree supervisor
Subject
Genre
Language
eng
Summary
The open source code base has increased enormously and hence understanding the functionality of the projects has become extremely difficult. The existing approaches of feature discovery that aim to identify functionality are typically semi-automatic and often require human intervention. In this thesis, an innovative framework is proposed for automatic discovery of features and the respective components for any open source project dynamically using Machine Learning. The overall goal of the approach is to create an automated and scalable model which produces accurate results. The initial step is to extract the meta-data and perform pre-processing. The next step is to dynamically discover topics using Latent Dirichlet Allocation and to form components optimally using K-Means. The final step is to discover the features implemented in the components using Term Frequency - Inverse Document Frequency algorithm. This framework is implemented in Spark that is a fast and parallel processing engine for big data analytics. ArchStudio tool is used to visualize the features to class mapping functionality. As a case study, Apache Solr and Apache Hadoop HDFS are used to illustrate the automatic discovery of components and features. We demonstrated the scalabilty and the accuracy of our proposed model compared with a manual evaluation by software architecture experts as a baseline. The accuracy is 85% when compared with the manual evaluation of Apache Solr. In addition, many new features were discovered for both the case studies through the automated framework
Cataloging source
UMK
http://library.link/vocab/creatorName
Krishnan, Malathy
Degree
M.S.
Dissertation note
School of Computing and Engineering.
Dissertation year
2015.
Granting institution
University of Missouri-Kansas City,
Illustrations
illustrations
Index
no index present
Literary form
non fiction
Nature of contents
  • dictionaries
  • bibliography
  • theses
http://library.link/vocab/relatedWorkOrContributorDate
1960-
http://library.link/vocab/relatedWorkOrContributorName
  • Lee, Yugyung
  • Zheng, Yongjie
http://library.link/vocab/subjectName
  • Machine learning
  • Big data
  • Open source software
Label
Feature-based analysis for open source using big data analytics, Malathy Krishnan
Instantiates
Publication
Note
  • "A thesis in Computer Science."
  • Advisors: Yugyung Lee and Yongjie Zheng
  • Vita
Antecedent source
not applicable
Bibliography note
Includes bibliographical references (pages 75-76)
Carrier category
online resource
Carrier category code
cr
Carrier MARC source
rdacarrier
Color
black and white
Content category
text
Content type code
txt
Content type MARC source
rdacontent
Contents
Introduction -- Background and related work -- Framework of feature-based analysis -- Component identification and feature discovery -- Implementation -- Results and evaluation -- Conclusion and future work
Control code
945436996
Dimensions
unknown
Extent
1 online resource (78 pages)
File format
one file format
Form of item
online
Level of compression
mixed
Media category
computer
Media MARC source
rdamedia
Media type code
c
Other physical details
illustrations.
Quality assurance targets
not applicable
Specific material designation
remote
System control number
(OCoLC)945436996
System details
  • The full text of the thesis is available as an Adobe Acrobat .pdf file; Adobe Acrobat Reader required to view the file
  • Mode of access: World Wide Web
Label
Feature-based analysis for open source using big data analytics, Malathy Krishnan
Publication
Note
  • "A thesis in Computer Science."
  • Advisors: Yugyung Lee and Yongjie Zheng
  • Vita
Antecedent source
not applicable
Bibliography note
Includes bibliographical references (pages 75-76)
Carrier category
online resource
Carrier category code
cr
Carrier MARC source
rdacarrier
Color
black and white
Content category
text
Content type code
txt
Content type MARC source
rdacontent
Contents
Introduction -- Background and related work -- Framework of feature-based analysis -- Component identification and feature discovery -- Implementation -- Results and evaluation -- Conclusion and future work
Control code
945436996
Dimensions
unknown
Extent
1 online resource (78 pages)
File format
one file format
Form of item
online
Level of compression
mixed
Media category
computer
Media MARC source
rdamedia
Media type code
c
Other physical details
illustrations.
Quality assurance targets
not applicable
Specific material designation
remote
System control number
(OCoLC)945436996
System details
  • The full text of the thesis is available as an Adobe Acrobat .pdf file; Adobe Acrobat Reader required to view the file
  • Mode of access: World Wide Web

Library Locations

  • St. Louis Mercantile LibraryBorrow it
    1 University Blvd, St. Louis, MO, 63121, US
    38.710138 -90.311107
  • Thomas Jefferson LibraryBorrow it
    1 University Blvd, St. Louis, MO, 63121, US
    38.710138 -90.311107
  • University ArchivesBorrow it
    703 Lewis Hall, Columbia, MO, 65211, US
  • University of Missouri-St. Louis Libraries DepositoryBorrow it
    2908 Lemone Blvd, Columbia, MO, 65201, US
    38.919360 -92.291620
  • University of Missouri-St. Louis Libraries DepositoryBorrow it
    2908 Lemone Blvd, Columbia, MO, 65201, US
    38.919360 -92.291620
  • Ward E Barnes Education LibraryBorrow it
    8001 Natural Bridge Rd, St. Louis, MO, 63121, US
    38.707079 -90.311355
Processing Feedback ...