Monday, August 30, 2021

Bigdata: Challenges and solutions

Big Data: It is very huge, quite large or abundant amount of data, information or the co-related statistics collected by the big organizations. Most of the software and data storage developed and prepared, as it is tough to evaluate the big data, manually. It is used to find out patterns and trends to make decisions concerning human, and interactive technology.

Applications of Big Data

1. Banking and Financial Services

All Credit card companies, retail banks, private wealth management services, insurance companies, and institutional investment houses use big data analysis for their financial services. The problem among them is that the massive amount of is multi-structured data stored in multiple systems, which big data can solve in quick time to make decisions. Big data is used in many ways, such as:

• Customer analytics

• Compliance analytics

• Fraud analytics

• Operational analytics

2. Big Data in telecommunications

Gaining new customers to subscribe, retaining the customers, and expanding within current customer base are top priorities for telephone communication companies. The solutions to these challenges is in the ability to collate and analyze the customer-generated data and/or machine-generated data that is being created day by day.

3. Big Data for Retail marketing

Whether the company is an online retailer or offline construction company, They all want to understand the demand of the customers and change in their needs. This need is to analyze all different data sources (data-mart) that companies deal day to day, including the customer transaction data, weblogs, social media, credit card data, and reward/coupon program data.

Bigdata challenges and solution

1. Lack of understanding of Big Data

Many organizations fail in their Big Data initiatives due to lack of understanding. Employees might not be knowing what data is, its storage methods, operations on data, importance, and data sources. Data professionals may know what needs to be done, but others may not have a clear view.

For example, if an employee do not understand the significance of data storage, he may not keep the backup of confidential or sensitive data. They might not use database systems properly for storage. As a result, when this data is required and needs to be accessed, it cannot be retrieved, easily.

Solution:

Big Data workshops and hands-on practice must be conducted for everyone. Basic training programs must be conducted for all the employees who are handling data, daily and as a part of the Big Data projects. A basic understanding of concept of Bigdata must be inculcated by all organization.

2. Data growth issues

One of the most complex challenge of Big Data is storing all these voluminous data, properly. The abundance of data being stored in data marts and databases of companies is growing, rapidly.

As these data grow rapidly with time, it will be difficult to handle in the future. The data is unstructured and comes from documents, audios, videos, text files and other sources. It means that you cannot search them in databases.

Solution:

In order to maintain these large data sets, companies are going for present techniques, such as compression, tiering (level-wise storage), and de-duplication. Compression is used for reducing the redundancies in the data, thus reducing its overall size upto some extent witout changing the meaning of data. De-duplication is the process of eradicating duplicate and unwanted data from a data. Data tiering allows companies to store the data in different storage tiers to ensure the data is residing in the most appropriate storage space. Data tiers can be private cloud, public cloud, and flash storage, depending on the data size and significance.

3. Confusion in selecting Bigdata tool

The companies sometimes get confused while selecting the best tool for Big Data analysis and storage. There are many questions arises like;

Is HBase or Cassandra the best technology for storage?

Is Hadoop or MapReduce good enough or Spark be a better choice for data analytics and storage?

Above questions bother companies and often they are unable to find the answers. They end up making poor decisions and select a technology which is not suitable. Therefore, money, time, and efforts are wasted.

Solution:

The best way to seek professional assistance. You can either hire experienced Bigdata professionals who knows much more about the tools. Another way is to go for Big Data consultancy for proper advice. Here, consultants will give some advice and recommend best tools, based on the company’s scenario. Based on their advice, you can make a strategy and then select the best tool for the betterment of the company.

4. Lack of data professionals

To utilize these novice technologies and Big Data tools, companies need to have skilled data professionals. These data professionals include data scientists, data analysts and data engineers who are experienced in working with the data handling tools and making sense out of voluminous data sets. Companies face lack of Big Data professionals in current scenario. This is because data handling tools have evolved, rapidly, but in many cases, the data professionals have not evolved as compared to.

Solution:

The companies are investing more and more money in hiring skilled professionals. They also have to offer free training programs to the existing staff to get the most out of them.

Another significant step taken by companies is to purchase the data analytics software/tools that are powered by artificial intelligence and /or machine learning. These tools can be used by professionals who are not data science experts but have preliminary knowledge.

5. Securing the data

Securing the huge data is one of the challenges task of Big Data. Often many big companies are also busy in collecting, understanding, storing, and analyzing the data that arises data security for later stages. But, this is not a good move as unprotected data repositories may become breeding grounds for hackers. Companies can lose the data with their revenue.

Solution:

Companies should recruit cyber-security professionals to protect the data. Other steps taken for securing data; such as:

• Data encryption

• Data segregation

• Identity and access control

• Implementation of endpoint security

• Real-time security monitoring

• Use Big Data security tools

6. Integrating data from a various sources

Data in company comes from a variety of sources or data marts, such as social media pages, ERP applications, MIS applications, customer logs, financial reports, e-mails, presentations and data reports created by employees. Combining all these types data to prepare a single reports is a challenging task. This is field often neglected by firms. But, data integration is important for analysis, reporting and business intelligence, so it has to be worked out.

Solution:

Companies have to resolve the data integration problems by buying the right data handling tools. Few of them are mentioned below:

• Talend Data Integration

• Centerprise Data Integrator

• ArcESB

• IBM InfoSphere

• Xplenty

• Informatica PowerCenter

• CloverDX

• Microsoft SQL

• QlikView

• Oracle Data Service Integrator

Types of open access publishing

There are 2 (two) types of open access

a. Gold open access: As per Elsevier

• An author can publish in an open access journal or hybrid journal which supports open access.

• The published article is freely available to everyone but the author can choose commercial or non-commercial Creative Commons license.

• If there is fee then publishing cost should be covered by the author or the institution or funding agency.

• All the open access journals can be found on the Directory of open access journals (https://doaj.org).

• DOAJ contains the journals in various 80 languages from 126 countries having 16665 journals and 6376560 articles till date.

b. Green open access: As per Elsevier

• The article can be published under a subscription model by linking and self-archiving the article for sharing as per policy.

• All accepted manuscripts should have Creative Commons use license, thus author retain the right to the article for further use.

• No fee is required to pay by the author for the publication as it is covered under the subscription.

• All the self-archiving publishers with policies can be found of SHERPA/RoMEO database.

• Open Access SHERPA-RoMEO project for free self-archiving.

• Sharing the research information, freely in the form of article/paper

• Self-archiving also known as "Green Open Access" in institutional repository.

• SHERPA project commenced in 2002 by JISC and CURL

• RoMEO is Publisher's copyright and archiving policies

• This project is backed by UNESCO.

Open Access Publishing

1. Many publishing house owns the rights of the articles published in their journals and they charge fee from whoever wants to read them.

2. Therefore, the idea of open access publishing is introduced where the authors and the readers are free from publishing and accessing, respectively.

3. In research, open access is a set of rule and regulation by which a research scholar can access the resource of relevant field of research for further research.

4. Open access publish the research article for free and give access to other researcher for further study for free.

5. The main principle of open access publishing is peer reviewed literature for the research and eradicates copying and reuse by open access licensing.

6. Open access publications are applicable to all types of publications like peer-reviewed journals, research papers, theses, books, etc.

7. Open access publishing is backed by UNESCO to provided free and unrestricted online resource for the research purpose. 

8. As per UNESCO, "the content is universally and freely accessible, at no cost to the reader, via the Internet or otherwise." 

9. As per UNESCO, the author or copyright owner irrevocably grants to all users, for an unlimited period, "the right to use, copy, or distribute the article, on condition that proper attribution is given."

10. As per UNESCO, "it is deposited, immediately, in full and in a suitable electronic form, in at least one widely and internationally recognized open access repository committed to open access."

11. All digital contents like audio, video, text, graph, images, software, data, executable code, articles, theses, etc. are free to publish by the author and free to access by the person who wants to use it, ethically.

12. Advantages of open access publications are

• Free for all the authors and readers who wants to use, ethically

• Increase in accessing the resource

• Free access to the researchers in all the countries

13. Disadvantages of open access publications are

• Sustainability in longer run for publication houses

• Lack in quality of the written materials as quantity increases

• No fee leads many publications houses to be shut down without fund

Turnitin similarity check

1. Turnitin is an online software based on SAAS (Software as a Service) to check for the similarity in content for plagiarism.

2. Turnitin is not an open source therefore the institutions, universities have to pay for the software to use.

3. Turnitin was developed by John Barrie and Christian Storm launched in 1997 for educational industry.

4. Turnitin is a privately owned company which provides similarity checking by online based services on payment basis.

5. It gives the similarity check on plagiarized content to get similarity report also known as plagiarism report.

6. The research scholar or scientist can check for the similarity score on Turnitin by submitting their content.

7. In Turnitin, the similarity score is shown in terms of percentage of the plagiarized content in a report.

8. High similarity score does not always mean that the content is plagiarized because some sentences can be written, generally.

9. Turnitin provides easier interpretation by color coding, various filters and source for comparison for the similarity in content.

10. It compares the similarity in content from the database repository of various industries, academic institutions for plagiarism.

11. It enables students and researchers to use the paraphrasing, citation, quoting and sources of content, appropriately.

12. It protects the institutions and researchers from the research misconduct like plagiarism, Salami-slicing, contract cheating, etc. and follow the research ethics for writing thesis and/or research article.

13. It is considered that 10% or below similarity score is allowed to submit the research paper and thesis of Ph.D.

14. The researcher has to remove the uploaded file from the Turnitin after checking the similarity score and made the changes as required to get the limitation of 10% or below similarity score.

Search Aptipedia