Location: N/A
Job Summary:
Job Duties
- Own data collection, cleaning, and preprocessing for HTML-based datasets.
- Utilize web analysis tools for data extraction from DOM environments.
- Collaborate with ML Engineers on feature engineering experiments.
- Generate synthetic datasets using LLMs for training.
- Analyze data with dimensionality reduction techniques (t-SNE, PCA, UMAP).
- Automate data workflows.
- Document data workflows and methodologies.
- Create validation/data quality systems.
Required Skills (Keywords)
- Python (Pandas, NumPy)
- Web analysis tools (Selenium, BeautifulSoup)
- HTML, DOM structures
- NLP techniques (tokenization, stop word removal)
- Synthetic datasets, LLMs
- Problem-solving
- Data quality, governance
- Cloud platforms (AWS, GCP, Azure)
Required Experiences (Topics)
- 2+ years as a Data Analyst (preferably in cybersecurity or ML)
- Bachelor’s degree in Data Science, Statistics, Computer Science, or related field
- Collaboration with technical teams
- US Person due to GovCloud involvement
Job URLs: