Pandas
is an opensource library that allows to you perform data manipulation
in Python and developed by Wes McKinney. Pandas library is built on top
of Numpy, meaning Pandas needs Numpy to operate. Pandas provide an easy
way to create, manipulate and wrangle the data. Pandas is also an
elegant solution for time series data.It contains data structures and
data manipulation tools designed to make data cleaning and analysis fast
and easy in Python
Why use Pandas?
Data scientists use Pandas for its following advantages:
- Easily handles missing data
- It uses Series for one-dimensional data structure and DataFrame for multi-dimensional data structure.
- It provides an efficient way to slice the data.
- It provides a flexible way to merge, concatenate or reshape the data.
- It includes a powerful time series tool to work with.
- Inserting and deleting columns in data structures.
- Merging and joining data sets.
- Reshaping and pivoting data sets.
- Aligning data and dealing with missing data.
- Manipulating data using integrated indexing for DataFrame objects.
- Performing split-apply-combine on data sets using the group by engine.
- Manipulating high-dimensional data in a data structure with a lower dimension using hierarchical axis indexing.
- Subsetting, fancy indexing, and label-based slicing data sets that are large in size.
- Generating data range, converting frequency, date shifting, lagging, and other time-series functionality.
- Reading from files with CSV, XLSX, TXT, among other formats.
- Arranging data in an order ascending or descending.
- Filtering data around a condition.
- Analyzing time series.
- Iterating over a data set.
To get started with pandas, you will need to get comfortable with its two workhorse data structures: Series and Dat aFrame
When Data contains Ndarray
# Program to Create ndarray series
Data =[[2, 3, 4], [5, 6, 7]] # Defining 2darray
# Creating series of 2darray
snd = pd.Series(Data)
# Program to Create ndarray series
Data =[[2, 3, 4], [5, 6, 7]] # Defining 2darray
# Creating series of 2darray
snd = pd.Series(Data)
dict = {"country": ["Brazil", "Russia", "India", "China", "South Africa"],
"capital": ["Brasilia", "Moscow", "New Dehli", "Beijing", "Pretoria"],
"area": [8.516, 17.10, 3.286, 9.597, 1.221],
"population": [200.4, 143.5, 1252, 1357, 52.98] }
"capital": ["Brasilia", "Moscow", "New Dehli", "Beijing", "Pretoria"],
"area": [8.516, 17.10, 3.286, 9.597, 1.221],
"population": [200.4, 143.5, 1252, 1357, 52.98] }
No comments:
Post a Comment