Showing posts with label Data security. Show all posts
Showing posts with label Data security. Show all posts

Monday, August 30, 2021

Data Science: Challenges and solutions


Data Science: Data Science is a domain which involves working with a huge amount of data and use it for developing prediction, prescriptive and prescriptive model for analysis. It is about digging the information from data, capturing the data, (creating the model) analyzing(validating the analytical model) and utilizing the data(implementing the best model). It is a blended field of Computer Science, Business and Statistics altogether. It is an intersection of Data and computation.

Applications of Data Science

1. Search on Internet

Search engines uses various data science algorithms to display the best results for search queries within seconds.

2. Advertisement on digital platform

The digital marketing uses the data science methods (from display banners to digital hoarding). This is the significant reason that digital advertisement platform have higher click-through rates than traditional advertisement platform.

3. Systems for recommendation

The recommendation systems not only make it easy to search related products from millions of available products, but they also add more to the user experience. Many companies use this system to promote products and suggestions in accordance with the customer's demand and related information. The recommendations are based on the user’s previous demand.

Data Science Challenges and solutions:

1. Identifying the problem

One of the major task in analyzing a problem and designing a solution is to determine the problem properly and state each aspects of it. Mostly, data scientists opt for a manual approach and start working on data and tools without a clear information of the business problem or the client demand.

Solution:

There should be a well-defined workout before starting the actual data analysis work. The first step in this process is to identify the problem, then designing a solution, and finally analyze the results.

2. Access to the right data

For correct analysis, it is significant to lay the hands on the right type of data. Acquiring access to data in the most appropriate form is very difficult as well as time-consuming task. There could be an issues ranging from concealed data, insufficient volume of data or less variety of data. Data could be spread unevenly across variety of business so getting the access of that data can also a challenging task.

Solution:

Data Scientists has to be smart in data management systems and other data integration tools like Stream analytics software which is helpful for filtering and classifying of data. Many Data integration software also permit connection with external data sources and the seamless inclusion in the workflow.

3. Data Cleansing

Working with data which is full of inconsistencies and anomalies is every data scientist’s nightmare. Dirty or invalid data leads to vague results. Data scientists work with terabytes to exabytes of data and when they have to spend most of the time just cleaning the data before starting the analysis.

Solution:

Data Scientists should create Data Governance tools for overall accuracy, consistency and formatting of data. Moreover, maintaining data quality should be the main aim. Business operations across the enterprise take advantage from good quality data. There should be some people employed in departments as data quality managers.

4. Lack of domain expert

Data scientists has to be good at high-end tools and techniques, is one of the crucial challenge. Data Scientists also need to have good domain knowledge and acquire subject matter expertise. The biggest task for data scientists is to apply domain knowledge to business solutions. Data scientists are a bridge between the top management and the IT department. Domain expertise is needed to convey the needs of the top management to IT Department and vice versa.

Solution:

Data scientists have to work on gaining scenario of business, understand the real problem and work on analyzing and modeling the effective solutions. Along the mastering statistical and technical tools, Data scientists also need to concentrate on the business requirements.

5. Data security issues

Now a days, data security is a biggest issue. Since data is acquired and retrieved through a lot of channels like, social media, etc. there is increased vulnerability of attack by hackers. Due to the confidentiality of data, Data scientists are facing problems in data extraction, utilization, developing algorithms. The procedure of taking consent from users is causing a major delay in time and expensiveness.

Solution:

For this issue, there are no shortcuts. One has to follow the pre-defined global data protection rules. There should be additional security checks and make use of cloud platforms for data storage. Organizations also actively need to take help of advanced solutions that involve Machine Learning to secure against cyber-crimes and fraudulent practices.

Tuesday, September 22, 2020

Database Systems MCQ

 Q.1 Which of the following relational algebraic operations is not from set theory?

(A) Union

(B) Intersection

(C) Cartesian Product

(D) Select

Ans: (D)

 

Q.2 Which of the following ensures the atomicity of the transaction?

(A) Transaction management component of DBMS

(B) Application Programmer

(C) Concurrency control component of DBMS

(D) Recovery management component of DBMS

Ans: (A)

 

Q.3 If both the functional dependencies : X->Y and Y->X hold for two attributes X and Y then the relationship between X and Y is

(A) M:N

(B) M:1

(C) 1:1

(D) 1:M

Ans: (C)

 

Q.4 What will be the number of columns and rows respectively obtained for the operation, AB, if A B are Base union compatible and all the rows of a are common to B? Assume A has 4 columns and 10 rows; and B has 4 columns and 15 rows

(A) 4,0

(B) 0,0

(C) 4,5

(D) 8,5

Ans: (A)

 

Q.5 For correct behaviour during recovery, undo and redo operation must be

(A) Commutative

(B) Associative

(C) idempotent

(D) distributive

Ans: (C)

 

Q.6 Which of the following is not a consequence of non-normalized database?

(A) Update Anomaly

(B) Insertion Anomaly

(C) Redundancy

(D) Lost update problem

Ans: (D)

 

Q.7 Which of the following is true for relational calculus?

(A) "x(P(x))º¬($x)(¬P(x)) 

(B) "x(P(x))º¬($x)(P(x))

(C) "x(P(x))º($x)(¬P(x)) 

(D) "x(P(x))º($x)(P(x))

Ans: (A)


Q.8 The part of a database management system which ensures that the data remains in a consistent state is

(A) authorization and integrity manager

(B) buffer manager

(C) transaction manager

(D) file manager

Ans: (C)

 

Q.9 Relationships among relationships can be represented in an-E-R model using

(A) Aggregation

(B) Association

(C) Weak entity sets

(D) Weak relationship sets

Ans: (A)

 

Q.10 In tuple relational calculus P1 AND P2 is equivalent to

(A) (¬P1OR¬P2).

(B) ¬(P1OR¬P2).

(C) ¬(¬P1OR P2).

(D) ¬(¬P1OR ¬P2).

Ans: (D)

 

Q.11 If α->β holds then so does

(A) ϒα->ϒβ

(B) α->ϒβ

(C) both (A) and (B)

(D) None of the above

Ans: (A)

 

Q.12 Cascading rollback is avoided in all protocol except

(A) strict two-phase locking protocol.

(B) tree locking protocol

(C) two-phase locking protocol

(D) validation based protocol.

Ans: (D)

 

Q.13 Wait-for graph is used for

(A) detecting view serializability.

(B) detecting conflict serializability.

(C) deadlock prevention

(D) deadlock detection

Ans: (D)

 

Q.14 The expression sq1(E1 |X|q2E2) is the same as

(A) E1 |X|q1^ q2E2 (B) sq1 E1^sq2 E2

(C) E1 |X|q1 q2E2 (D) None of the above

Ans: (A)

 

Q.15 The clause alter table in SQL can be used to

(A) add an attribute

(B) delete an attribute

(C) alter the default values of an attribute

(D) all of the above

Ans: (D)

 

Q. 16 The data models defined by ANSI/SPARC architecture are

(A) Conceptual, physical and internal

(B) Conceptual, view and external

(C) Logical, physical and internal

(D) Logical, physical and view

Ans: (D)

 

Q.17 Whenever two independent one-to-many relationships are mixed in the same relation, a _______ arises.

(A) Functional dependency

(B) Multi-valued dependency

(C) Transitive dependency

(D) Partial dependency

Ans:(B)

 

Q.18 A table can have only one

(A) Secondary key

(B) Alternate key

(C) Unique key

(D) Primary key

Ans: (D)

 

Q.19 Dependency preservation is not guaranteed in

(A) BCNF

(B) 3NF

(C) PJNF

(D) DKNF

Ans: (A)

 

Q.20 Which is the best file organization when data is frequently added or deleted from a file?

(A) Sequential

(B) Direct

(C) Index sequential

(D) None of the above

Ans: (B)

 

Q.21 Which of the following constitutes a basic set of operations for manipulating relational data?

(A) Predicate calculus

(B) Relational calculus

(C) Relational algebra

(D) SQL

Ans: (C)

 

Q.22 An advantage of views is

(A) Data security

(B) Derived columns

(C) Hiding of complex queries

(D) All of the above

Ans: (A)

 

Q.23 Which of the following is not a recovery technique?

(A) Deferred update

(B) Immediate update

(C) Two-phase commit

(D) Shadow paging

Ans: (C)

 

Q.24 Isolation of the transactions is ensured by

(A) Transaction management

(B) Application programmer

(C) Concurrency control

(D) Recovery management

Ans: (C)

 

Q.25 _______ operator is used to compare a value to a list of literals values that have been specified.

(A) Like

(B) COMPARE

(C) BETWEEN

(D) IN

Ans: (A)

Search Aptipedia