Data Scientist
SGF Global Inc
Temporary | Full-time | Contract
Houston, TX
Job description
Job Title: Data Scientist
Location: Houston, TX
Salary: $130,000 – $210,000
Employment Type: W2 Contract (12‑month duration)
Work Model: Onsite/Hybrid as required
Position Overview
We are seeking a highly skilled Data Scientist to build, train, and deploy large‑scale self‑supervised “foundation” models for time‑series, sensor, multimodal, and industrial scientific data. This role focuses on developing advanced deep learning architectures capable of learning rich representations from high‑dimensional sequential signals, later fine‑tuned for tasks such as:
- Anomaly/event detection
- Predictive maintenance
- Forecasting
- Classification
- Multi‑sensor fusion
- Industrial/scientific modeling
This is a high‑impact, research‑driven role working with large datasets, complex sensor modalities, and distributed training infrastructure.
Key Responsibilities
1. Foundation Model Development
- Build and train self‑supervised and semi-supervised foundation models for time‑series and multimodal data
- Fine‑tune large models for domain‑specific tasks
- Apply contrastive learning, masked modeling, temporal predictive coding, multimodal alignment, etc.
- Develop transfer learning, adapter, and prompt‑based strategies for rapid downstream adaptation
2. Data & Signal Processing
- Process, augment, and engineer features for univariate/multivariate time‑series datasets
- Analyze IoT sensor streams, industrial vibration/temperature data, audio, imagery, etc.
- Perform sampling, synchronization, denoising, artifact removal, and sensor quality checks
- Integrate time series with images, structured data, audio, and text
3. Advanced Machine Learning & Architectures
- Build models using:
- RNNs / GRU / LSTMs
- TCNs
- 1D/2D/3D CNNs
- Transformers (BERT, ViT, TimeSFormer)
- Graph Neural Networks
- Diffusion / generative architectures
- Multi‑modal encoders and fusion models
- Evaluate model performance using:
- MSE, RMSE, R²
- F1, AUC, Precision/Recall
- DTW, correlation, similarity metrics
- IoU and event‑based segmentation metrics
4. Software Engineering & Infrastructure
- Build production‑ready pipelines for ingesting, cleaning, segmenting, and aligning large‑scale multi‑sensor datasets
- Develop in:
- Python (NumPy, Pandas, SciPy)
- PyTorch (Lightning, Distributed)
- TensorFlow/Keras
- JAX/Flax
- C++/CUDA for custom kernels
- Train models on:
- Multi‑GPU and multi‑node clusters
- Mixed‑precision systems
- Distributed optimization (ZeRO, DDP, etc.)
5. Mathematical & Algorithmic Foundations
- Apply strong background in:
- Linear algebra, probability, and statistics
- Signal processing (Fourier, wavelets, Kalman filters, noise modeling)
- Optimization (stochastic, convex, non‑convex)
- Numerical methods, ODE/PDE modeling, regularization techniques
6. Collaboration & Communication
- Partner with scientists, engineers, domain experts, and product teams
- Present model behavior insights, attention maps, and uncertainty quantification
- Communicate findings clearly to both technical and non‑technical audiences
Required Qualifications
- MS or PhD in Computer Science, Data Science, AI, Engineering, or related fields
- 3+ years of experience in Data Science, Machine Learning, or AI
- Strong experience building and training deep learning models
- Experience working with time‑series or sensor data
- Proficiency in Python, deep learning frameworks, and ML engineering best practices
Preferred Qualifications
- Experience with multimodal learning
- Experience with large‑scale distributed training
- Background in industrial, scientific, or sensor‑driven AI
Pay: Up to $1,300,000.00 per year
Location:
- Houston, TX 77077 (Required)
Work Location: In person