Delete microsoft

Files changed (11) hide show

microsoft/git-base/README.md +0 -66
microsoft/git-base/config.json +0 -106
microsoft/git-base/generation_config.json +0 -7
microsoft/git-base/gitattributes +0 -34
microsoft/git-base/model.safetensors +0 -3
microsoft/git-base/preprocessor_config.json +0 -28
microsoft/git-base/pytorch_model.bin +0 -3
microsoft/git-base/special_tokens_map.json +0 -7
microsoft/git-base/tokenizer.json +0 -0
microsoft/git-base/tokenizer_config.json +0 -19
microsoft/git-base/vocab.txt +0 -0

microsoft/git-base/README.md DELETED Viewed

@@ -1,66 +0,0 @@
----
-language: en
-license: mit
-tags:
-- vision
-- image-to-text
-- image-captioning
-model_name: microsoft/git-base
-pipeline_tag: image-to-text
----
-# GIT (GenerativeImage2Text), base-sized
-GIT (short for GenerativeImage2Text) model, base-sized version. It was introduced in the paper [GIT: A Generative Image-to-text Transformer for Vision and Language](https://arxiv.org/abs/2205.14100) by Wang et al. and first released in [this repository](https://github.com/microsoft/GenerativeImage2Text).
-Disclaimer: The team releasing GIT did not write a model card for this model so this model card has been written by the Hugging Face team.
-## Model description
-GIT is a Transformer decoder conditioned on both CLIP image tokens and text tokens. The model is trained using "teacher forcing" on a lot of (image, text) pairs.
-The goal for the model is simply to predict the next text token, giving the image tokens and previous text tokens.
-The model has full access to (i.e. a bidirectional attention mask is used for) the image patch tokens, but only has access to the previous text tokens (i.e. a causal attention mask is used for the text tokens) when predicting the next text token.
-![GIT architecture](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/model_doc/git_architecture.jpg)
-This allows the model to be used for tasks like:
-- image and video captioning
-- visual question answering (VQA) on images and videos
-- even image classification (by simply conditioning the model on the image and asking it to generate a class for it in text).
-## Intended uses & limitations
-You can use the raw model for image captioning. See the [model hub](https://huggingface.co/models?search=microsoft/git) to look for
-fine-tuned versions on a task that interests you.
-### How to use
-For code examples, we refer to the [documentation](https://huggingface.co/docs/transformers/main/model_doc/git#transformers.GitForCausalLM.forward.example).
-## Training data
-From the paper:
-> We collect 0.8B image-text pairs for pre-training, which include COCO (Lin et al., 2014), Conceptual Captions
-(CC3M) (Sharma et al., 2018), SBU (Ordonez et al., 2011), Visual Genome (VG) (Krishna et al., 2016),
-Conceptual Captions (CC12M) (Changpinyo et al., 2021), ALT200M (Hu et al., 2021a), and an extra 0.6B
-data following a similar collection procedure in Hu et al. (2021a).
-=> however this is for the model referred to as "GIT" in the paper, which is not open-sourced.
-This checkpoint is "GIT-base", which is a smaller variant of GIT trained on 10 million image-text pairs.
-See table 11 in the [paper](https://arxiv.org/abs/2205.14100) for more details.
-### Preprocessing
-We refer to the original repo regarding details for preprocessing during training.
-During validation, one resizes the shorter edge of each image, after which center cropping is performed to a fixed-size resolution. Next, frames are normalized across the RGB channels with the ImageNet mean and standard deviation.
-## Evaluation results
-For evaluation results, we refer readers to the [paper](https://arxiv.org/abs/2205.14100).

microsoft/git-base/config.json DELETED Viewed

@@ -1,106 +0,0 @@
-{
-  "_commit_hash": null,
-  "architectures": [
-    "GitForCausalLM"
-  ],
-  "attention_probs_dropout_prob": 0.1,
-  "bos_token_id": 101,
-  "classifier_dropout": null,
-  "eos_token_id": 102,
-  "hidden_act": "gelu",
-  "hidden_dropout_prob": 0.1,
-  "hidden_size": 768,
-  "initializer_range": 0.02,
-  "intermediate_size": 3072,
-  "layer_norm_eps": 1e-12,
-  "max_position_embeddings": 1024,
-  "model_type": "git",
-  "num_attention_heads": 12,
-  "num_hidden_layers": 6,
-  "num_image_with_embedding": null,
-  "pad_token_id": 0,
-  "position_embedding_type": "absolute",
-  "tie_word_embeddings": false,
-  "torch_dtype": "float32",
-  "transformers_version": null,
-  "use_cache": true,
-  "vision_config": {
-    "_name_or_path": "",
-    "add_cross_attention": false,
-    "architectures": null,
-    "attention_dropout": 0.0,
-    "bad_words_ids": null,
-    "begin_suppress_tokens": null,
-    "bos_token_id": null,
-    "chunk_size_feed_forward": 0,
-    "cross_attention_hidden_size": null,
-    "decoder_start_token_id": null,
-    "diversity_penalty": 0.0,
-    "do_sample": false,
-    "dropout": 0.0,
-    "early_stopping": false,
-    "encoder_no_repeat_ngram_size": 0,
-    "eos_token_id": null,
-    "exponential_decay_length_penalty": null,
-    "finetuning_task": null,
-    "forced_bos_token_id": null,
-    "forced_eos_token_id": null,
-    "hidden_act": "quick_gelu",
-    "hidden_size": 768,
-    "id2label": {
-      "0": "LABEL_0",
-      "1": "LABEL_1"
-    },
-    "image_size": 224,
-    "initializer_factor": 1.0,
-    "initializer_range": 0.02,
-    "intermediate_size": 3072,
-    "is_decoder": false,
-    "is_encoder_decoder": false,
-    "label2id": {
-      "LABEL_0": 0,
-      "LABEL_1": 1
-    },
-    "layer_norm_eps": 1e-05,
-    "length_penalty": 1.0,
-    "max_length": 20,
-    "min_length": 0,
-    "model_type": "git_vision_model",
-    "no_repeat_ngram_size": 0,
-    "num_attention_heads": 12,
-    "num_beam_groups": 1,
-    "num_beams": 1,
-    "num_channels": 3,
-    "num_hidden_layers": 12,
-    "num_return_sequences": 1,
-    "output_attentions": false,
-    "output_hidden_states": false,
-    "output_scores": false,
-    "pad_token_id": null,
-    "patch_size": 16,
-    "prefix": null,
-    "problem_type": null,
-    "projection_dim": 512,
-    "pruned_heads": {},
-    "remove_invalid_values": false,
-    "repetition_penalty": 1.0,
-    "return_dict": true,
-    "return_dict_in_generate": false,
-    "sep_token_id": null,
-    "suppress_tokens": null,
-    "task_specific_params": null,
-    "temperature": 1.0,
-    "tf_legacy_loss": false,
-    "tie_encoder_decoder": false,
-    "tie_word_embeddings": true,
-    "tokenizer_class": null,
-    "top_k": 50,
-    "top_p": 1.0,
-    "torch_dtype": null,
-    "torchscript": false,
-    "transformers_version": "4.26.0.dev0",
-    "typical_p": 1.0,
-    "use_bfloat16": false
-  },
-  "vocab_size": 30522
-}

microsoft/git-base/generation_config.json DELETED Viewed

@@ -1,7 +0,0 @@
-{
-  "_from_model_config": true,
-  "bos_token_id": 101,
-  "eos_token_id": 102,
-  "pad_token_id": 0,
-  "transformers_version": "4.27.0.dev0"
-}

microsoft/git-base/gitattributes DELETED Viewed

@@ -1,34 +0,0 @@
-*.7z filter=lfs diff=lfs merge=lfs -text
-*.arrow filter=lfs diff=lfs merge=lfs -text
-*.bin filter=lfs diff=lfs merge=lfs -text
-*.bz2 filter=lfs diff=lfs merge=lfs -text
-*.ckpt filter=lfs diff=lfs merge=lfs -text
-*.ftz filter=lfs diff=lfs merge=lfs -text
-*.gz filter=lfs diff=lfs merge=lfs -text
-*.h5 filter=lfs diff=lfs merge=lfs -text
-*.joblib filter=lfs diff=lfs merge=lfs -text
-*.lfs.* filter=lfs diff=lfs merge=lfs -text
-*.mlmodel filter=lfs diff=lfs merge=lfs -text
-*.model filter=lfs diff=lfs merge=lfs -text
-*.msgpack filter=lfs diff=lfs merge=lfs -text
-*.npy filter=lfs diff=lfs merge=lfs -text
-*.npz filter=lfs diff=lfs merge=lfs -text
-*.onnx filter=lfs diff=lfs merge=lfs -text
-*.ot filter=lfs diff=lfs merge=lfs -text
-*.parquet filter=lfs diff=lfs merge=lfs -text
-*.pb filter=lfs diff=lfs merge=lfs -text
-*.pickle filter=lfs diff=lfs merge=lfs -text
-*.pkl filter=lfs diff=lfs merge=lfs -text
-*.pt filter=lfs diff=lfs merge=lfs -text
-*.pth filter=lfs diff=lfs merge=lfs -text
-*.rar filter=lfs diff=lfs merge=lfs -text
-*.safetensors filter=lfs diff=lfs merge=lfs -text
-saved_model/**/* filter=lfs diff=lfs merge=lfs -text
-*.tar.* filter=lfs diff=lfs merge=lfs -text
-*.tflite filter=lfs diff=lfs merge=lfs -text
-*.tgz filter=lfs diff=lfs merge=lfs -text
-*.wasm filter=lfs diff=lfs merge=lfs -text
-*.xz filter=lfs diff=lfs merge=lfs -text
-*.zip filter=lfs diff=lfs merge=lfs -text
-*.zst filter=lfs diff=lfs merge=lfs -text
-*tfevents* filter=lfs diff=lfs merge=lfs -text

microsoft/git-base/model.safetensors DELETED Viewed

@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:48c6af04ebdcc18bb43c1dfa8eefc606f04fddf8f6e8d649e4b3ad6881ee7d8c
-size 706526022

microsoft/git-base/preprocessor_config.json DELETED Viewed

@@ -1,28 +0,0 @@
-{
-  "crop_size": {
-    "height": 224,
-    "width": 224
-  },
-  "do_center_crop": true,
-  "do_convert_rgb": true,
-  "do_normalize": true,
-  "do_rescale": true,
-  "do_resize": true,
-  "image_mean": [
-    0.48145466,
-    0.4578275,
-    0.40821073
-  ],
-  "image_processor_type": "CLIPImageProcessor",
-  "image_std": [
-    0.26862954,
-    0.26130258,
-    0.27577711
-  ],
-  "processor_class": "GitProcessor",
-  "resample": 3,
-  "rescale_factor": 0.00392156862745098,
-  "size": {
-    "shortest_edge": 224
-  }
-}

microsoft/git-base/pytorch_model.bin DELETED Viewed

@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:b504cfb25b73203725e8e1b750288da5653d2fef5e585baf2e7b08f8f2d95ebf
-size 706587995

microsoft/git-base/special_tokens_map.json DELETED Viewed

@@ -1,7 +0,0 @@
-{
-  "cls_token": "[CLS]",
-  "mask_token": "[MASK]",
-  "pad_token": "[PAD]",
-  "sep_token": "[SEP]",
-  "unk_token": "[UNK]"
-}

microsoft/git-base/tokenizer.json DELETED Viewed

The diff for this file is too large to render. See raw diff

microsoft/git-base/tokenizer_config.json DELETED Viewed

@@ -1,19 +0,0 @@
-{
-  "cls_token": "[CLS]",
-  "do_lower_case": true,
-  "mask_token": "[MASK]",
-  "model_input_names": [
-    "input_ids",
-    "attention_mask"
-  ],
-  "model_max_length": 512,
-  "name_or_path": "bert-base-uncased",
-  "pad_token": "[PAD]",
-  "processor_class": "GitProcessor",
-  "sep_token": "[SEP]",
-  "special_tokens_map_file": null,
-  "strip_accents": null,
-  "tokenize_chinese_chars": true,
-  "tokenizer_class": "BertTokenizer",
-  "unk_token": "[UNK]"
-}

microsoft/git-base/vocab.txt DELETED Viewed

The diff for this file is too large to render. See raw diff