Camshowrecordings/model/sam_samantha/5 (Trusted — Blueprint)
device = torch.device(cfg.get("device", "cpu")) model.to(device)
python video_segment.py recordings/2024-03-15.mp4 recordings/2024-03-15_segmented.mp4 --stride 3 Adjust stride to balance speed vs. temporal resolution. | Symptom | Likely Cause | Fix | |---------|--------------|-----| | RuntimeError: CUDA out of memory | Batch size > 1 or image size too large for GPU. | Reduce stride , lower image_size in config.yaml , or switch to CPU ( device: cpu ). | | FileNotFoundError on model.ckpt | Wrong relative path or missing checkpoint. | Verify the file exists in camshowrecordings/model/sam_samantha/5/ . If you cloned a shallow repo, run git lfs pull . | | ImportError: cannot import name 'SamSamantha' | Model class location changed. | Look inside the repo’s camshowrecordings/models/ folder for the exact class name; update the import accordingly. | | torch.cuda.is_available() == False even though GPU is present | Missing or mismatched CUDA toolkit / driver. | Install the correct NVIDIA driver + matching CUDA version, then reinstall PyTorch with the appropriate --index-url flag. | | Segmentation masks are all black | Model not switched to evaluation mode or preprocessing mismatch. | Ensure camshowrecordings/model/sam_samantha/5
model: name: sam_samantha version: 5 backbone: vit_h image_size: 1024 num_classes: 1 # Usually segmentation → binary mask preprocess: normalize: true mean: [0.485, 0.456, 0.406] std: [0.229, 0.224, 0.225] device: cuda Below is a minimal, self‑contained script that loads the model and runs a single inference on a video frame. device = torch
if __name__ == "__main__": parser = argparse.ArgumentParser() parser.add_argument("input_video", type=Path) parser.add_argument("output_video", type=Path) parser.add_argument("--stride", type=int, default=5, help="Run inference every N frames (default=5)") args = parser.parse_args() process_video(args.input_video, args.output_video, args.stride) | Reduce stride , lower image_size in config
Open config.yaml to verify things like:
if frame_idx % stride == 0: mask = infer(frame) # binary mask (0/255) overlay = cv2.addWeighted(frame, 0.7, cv2.cvtColor(mask, cv2.COLOR_GRAY2BGR), 0.3, 0) out.write(overlay) else: out.write(frame) # write raw frame for non‑processed indices
# ------------------------------------------------------------------ # 4️⃣ Pre‑process a single frame (example uses OpenCV) # ------------------------------------------------------------------ def preprocess(img: np.ndarray, cfg) -> torch.Tensor: # Resize while keeping aspect ratio (optional) target_sz = cfg["model"]["image_size"] img_resized = cv2.resize(img, (target_sz, target_sz))
