--- library_name: transformers license: apache-2.0 language: - en - zh tags: - remote-sensing - mllm - multimodal - earth-observation - satellite-imagery pipeline_tag: image-text-to-text --- # 🌍 TerraSense-Base A Multimodal Large Language Model for Remote Sensing. ## 📖 Documentation For usage instructions, examples, and detailed documentation, please visit: 👉 **[GitHub Repository](https://github.com/TerraSense-CASM/terrasense)** ## 🚀 Quick Start ```python from transformers import AutoModelForVision2Seq, AutoProcessor from qwen_vl_utils import process_vision_info import torch model = AutoModelForVision2Seq.from_pretrained( "TerraSense-CASM/TerraSense-Base", torch_dtype=torch.bfloat16, device_map="auto", trust_remote_code=True ) processor = AutoProcessor.from_pretrained("TerraSense-CASM/TerraSense-Base", trust_remote_code=True) messages = [{"role": "user", "content": [ {"type": "image", "image": "path/to/image.jpg"}, {"type": "text", "text": "Describe this remote sensing image."}, ]}] text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) image_inputs, _ = process_vision_info(messages) inputs = processor(text=[text], images=image_inputs, padding=True, return_tensors="pt").to("cuda") output = model.generate(**inputs, max_new_tokens=512) print(processor.batch_decode(output, skip_special_tokens=True)[0]) ``` ## 📜 License [Apache 2.0](https://github.com/TerraSense-CASM/terrasense/blob/main/LICENSE)