Wednesday, April 20, 2022

Data Structures in Data Science using Python


What is a Data Structure?

To make data manipulation and other data operations more efficient, a data structure is used to store data in an ordered manner.


Types of Data Structures:

1. Vector- It is a homogeneous data structure and one of the most basic data structures. It only contains components of the same data type, in other words. Numeric, integer, character, complex, and logical data types are possible.

How to Create a Vector in Python:
In Python, use the np.array( ) function to create a vector.
# Vector as row
vec_row = np.array([1, 2, 3])
vector_row
#Vector as column
vec_column = np.array([[1],
[2],
[3]])
vector_column


2. Matrix- A matrix is a two-dimensional data structure with a homogeneous structure. This signifies that only items of the same data type are accepted. When elements of various data types are transmitted, coercion occurs.

How to Create a Matrix in Python:
Python uses the np.mat( ) function to create a matrix.
matrix = np.mat([[1, 2],
[1, 2],
[1, 2]])
matrix

 
3. Array- They're data structures with several dimensions. Data is kept in an array in the form of matrices, rows, and columns. The matrix elements can be accessed using the matrix level, row index, and column index.

How to Create an Array in Python:
In Python, use square brackets to create arrays.
cars = ["Tata", "Maruti", "Mahindra"]
cars

 
4. Series- It's only available in Python, especially when using the Pandas package. It's a one-dimensional labeled array that can hold any data (integer, string, float, python objects, etc.). The axis labels are referred to as the 'index'.

How to Create a Series in Python:
first create an array using the array( ) function. Then feed the array as an input into the series using the Series( ) function. a = np.array(['H', 'a', 'n', 'u', 'm', 'a', 'n'])
s = pd.Series(a)
s

 
5. Data Frame- A data frame is a two-dimensional array with a table-like appearance. Each row includes one set of values from each column, and each column contains one set of values. A data frame can contain numeric, factor, or character data. The number of data items in each column should be the same.

How to Create a Data Frame in Python: A data frame is a collection of series in Python. The data-frame is created with the pandas package. To make the data frame, use the DataFrame function.
#Dataframe
cars = pd.read_csv("C:/cars.csv")
df = pd.DataFrame(cars)
df


6. List- Lists can contain elements of various sorts, such as numbers, texts, vectors, and other lists. A matrix or a function can be one of the members of a list. It is an ordered and changeable collection (it can be changed). It may have duplicate values.

How to Create a List in Python:
Creating a variable, opening a square bracket, and inputting the desired values are all it takes.
n = ["Red", "Radha", (21,32,11), True, 51.23]
n


7. Dictionary- It's also known as a hash map, and it accepts arbitrary keys and values. Numbers, numeric vectors, strings, and string vectors can all be used as keys. It's a changeable, indexed, and unordered database. It can't have any duplicate values in it.

How to Create a Dictionary in Python:
Open a curly bracket, enter the values, and specify the key.
dict = {1: [1, 2, 3, 4], 'Name': 'Krishna'}
dict


8. Tuple- Python is the only language that has it. It is made up of organized and unchangeable elements. A tuple can contain any number of items, including various types (integer, float, list, string, etc.). There are duplicate members in this group.

How to Create a Tuple in Python: Make a variable, open parenthesis, and fill in the values.
tuple1 = ("Banana",1, False)
print(tuple1)


References:
1. https://medium.com/@vinitasilaparasetty/data-structures-in-data-science-4f47d9c4ab94

No comments:

Post a Comment

Open Researcher and Contributor ID (ORCID)

Search Aptipedia