Location: Burnsville, NC
Job Summary:
Job Duties
- Manage end-to-end data collection, cleaning, and preprocessing for HTML-based datasets.
- Utilize web analysis tools for data extraction and structuring.
- Collaborate with ML Engineers for feature engineering and dataset production.
- Generate and augment synthetic datasets using LLMs.
- Analyze data using dimensionality reduction techniques.
- Automate data workflows for efficient processing.
- Maintain comprehensive documentation of data workflows.
- Create validation systems for data integrity.
Required Skills (Keywords)
- Python (Pandas, NumPy)
- Web analysis tools (Selenium, BeautifulSoup)
- HTML, DOM structures
- NLP techniques
- Synthetic dataset generation
- Cloud platforms (AWS, GCP, Azure)
- Problem-solving
- Data quality governance
Required Experience (Topics)
- 2+ years as a Data Analyst
- Background in cybersecurity or ML
- Experience in data manipulation and automation
- Collaboration with technical teams
- Relevant educational background (Data Science, Statistics, etc.)
Job URLs: