view article Article Deploying Open Source Vision Language Models (VLM) on Jetson nvidia β’ Feb 24 β’ 37
Running on Zero Agents Featured 60 Fast Image Studio π 60 fast image editing using FireRed Image Edit and gr.Server
Running on Zero Agents Featured 114 Pixal3D π 114 High-fidelity pixel-aligned image-to-3D generation.
view post Post 138 Trained a Swin-T from scratch on NWPU-RESISC45 β no pretrained weights, no fine-tuning.Every component hand-coded in PyTorch: window partitioning, shifted window attention with relative positional bias, patch merging across 4 stages, ~28M parameters.Architecture:embed_dim=96, window_size=7, depths=[2, 2, 6, 2]heads=[3, 6, 12, 24] across stagesPatch embed via Conv2d (4Γ4, stride 4) β 56Γ56 feature mapPatchMerging downsamples by concatenating 2Γ2 neighbors + linear projectionGlobal average pooling β linear classifierTraining:AdamW (lr=3e-4, weight_decay=0.05)Cosine annealing with 3-epoch linear warmup over 20 epochsMixed precision (autocast + GradScaler)Gradient clipping (max_norm=1.0)Label smoothing (0.1)ImageNet normalization, batch size 3280/20 train/test split, seed=42Result: 82% test accuracy on 45 land-use categories, 31,500 images.π Sathya77/swin-transformer-satelliteWhat accuracy do you think is achievable on NWPU-RESISC45 with Swin-T trained from scratch, without any pretraining? See translation π 1 1 + Reply
view changelog Hugging Face Changelog Emoji Autocomplete in Discussions and Posts Sep 11, 2025 β’ 68