Location: N/A
Job Summary:
Job Duties:
- Own data collection, cleaning, and preprocessing for HTML-based datasets.
- Utilize web analysis tools for data extraction from DOM environments.
- Collaborate with ML Engineers on feature engineering experiments.
- Generate and augment synthetic datasets using LLMs.
- Analyze data with dimensionality reduction techniques (t-SNE, PCA, UMAP).
- Automate data workflows for processing and transformation.
- Maintain documentation for data workflows and methodologies.
- Create validation systems for data quality and integrity.
Required Skills:
- Python (Pandas, NumPy)
- Web analysis tools (Selenium, BeautifulSoup)
- HTML & DOM structures
- Natural Language Processing (NLP)
- Data quality & governance
- Cloud platforms (AWS, GCP, Azure)
Required Experience:
- 2+ years as a Data Analyst (cybersecurity/ML preferred)
- Experience in synthetic dataset generation
- Collaboration with technical teams
- Bachelor's degree in Data Science, Statistics, or related field
Job URLs: