Knowledge_Graph/Data Scientists Activities.md at main · MMK79/Knowledge_Graph

tags

Data_Science

links

Preparing or Cleansing Data (38%)
- During [[Exploratory Data Analysis (EDA)]] Process on new data set, Analyzing:
  - Contents
  - Formats
  - Patterns
- Scheduled [[Data Pipeline]] (Data Engineer Job): Sequence of software tasks that pull from multiple data sources and reformat or remove errors from the data so that it can be used for downstream tasks such as:
  - Visualizations
  - Reports
  - Model
Creating Reports, Presentations, Data Visualization (29%)
- Extract Insight from data $\xrightarrow{\text{result into}}$ enable organization to monitor its operations + make better decisions
- Analytics (Data Analytics Job)
- Data Scientist make APIs call to provide data for variety analytics products (Meta Base, Power BI, etc.)
Selecting, Training, Deploying Models (27%)
- Use [[Machine Learning]], [[Mathematical Model]] $\xrightarrow{\text{so}}$ make prediction / cluster data into groups / perform natural language processing / etc.
- 2 types of API call in this part:
  - API consumers
    - API as an input source for ML model
  - API Producers (ML engineers)
    - Deploy their models as APIs for others to use
      - Internal consumer $\xrightarrow{\text{so}}$ Host API in their network
      - External consumer $\xrightarrow{\text{so}}$ Host API in over internet

Provide feedback