Seriki commited on
Commit
4fa5c86
·
verified ·
1 Parent(s): 698506e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +234 -2
README.md CHANGED
@@ -1,5 +1,5 @@
1
  ---
2
- title: Aura
3
  emoji: 🏆
4
  colorFrom: gray
5
  colorTo: indigo
@@ -9,4 +9,236 @@ app_file: app.py
9
  pinned: false
10
  ---
11
 
12
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: paperweb
3
  emoji: 🏆
4
  colorFrom: gray
5
  colorTo: indigo
 
9
  pinned: false
10
  ---
11
 
12
+ [https://fastht.ml](https://github.com/Web4application/Aura_Full_Project.xlsl)
13
+
14
+ `fastht.ml` acceass the @local and repository gather data and flood paperwebht.ml sire with docs
15
+
16
+ [include diagrams like this]
17
+ (https://arxiv.org/pdf/2405.01535)
18
+
19
+ # paperweb documentation
20
+
21
+ #reference this diccumentation to [huggingface.co/QUBUHUB/Paperweb](https://github.com/Web4application/Brain)
22
+ ##
23
+ [configure paperwebht.ml](https://huggingface.co/docs/hub/spaces-config-reference)
24
+
25
+ <p>
26
+
27
+ Audit Logs
28
+ Storage Regions
29
+ Data Studio for Private datasets
30
+ Resource Groups
31
+ Advanced Security
32
+ Tokens Management
33
+ Network Security
34
+ Rate Limits
35
+
36
+ Repositories
37
+ Introduction
38
+ Getting Started
39
+ Repository Settings
40
+ Storage Limits
41
+ Storage Backend (Xet)
42
+ Pull requests and Discussions
43
+ Notifications
44
+ Collections
45
+ Webhooks
46
+ Next Steps
47
+ Licenses
48
+
49
+ Models
50
+ Introduction
51
+ The Model Hub
52
+ Model Cards
53
+ Gated Models
54
+ Uploading Models
55
+ Downloading Models
56
+ Libraries
57
+ Tasks
58
+ Widgets
59
+ Inference API
60
+ Download Stats
61
+
62
+ Datasets
63
+ Introduction
64
+ Datasets Overview
65
+ Dataset Cards
66
+ Gated Datasets
67
+ Uploading Datasets
68
+ Downloading Datasets
69
+ Streaming Datasets
70
+ Editing Datasets
71
+ Libraries
72
+ Dataset Viewer
73
+ Download Stats
74
+ Data files Configuration
75
+
76
+ Spaces
77
+ Introduction
78
+ Spaces Overview
79
+ Gradio Spaces
80
+ Static HTML Spaces
81
+ Docker Spaces
82
+ ZeroGPU Spaces
83
+ Embed your Space
84
+ Run with Docker
85
+ Reference
86
+ Advanced Topics
87
+ Sign in with HF
88
+
89
+ Jobs
90
+ Introduction
91
+ Jobs Overview
92
+ Quickstart
93
+ Pricing
94
+ Manage Jobs
95
+ Jobs Configuration
96
+ Popular images
97
+ Schedule Jobs
98
+ Webhooks Automation
99
+ Reference
100
+
101
+ Agents
102
+ Introduction
103
+ Agents on the Hub
104
+ Hugging Face MCP Server
105
+ Hugging Face Agent Skills
106
+ Agents and the hf CLI
107
+ Building agents with the SDK
108
+
109
+ Other
110
+ Organizations
111
+ Billing
112
+ Security
113
+ Moderation
114
+ Paper Pages
115
+ Search
116
+ Digital Object Identifier (DOI)
117
+ Hub API Endpoints
118
+ Sign in with HF
119
+ Contributor Code of Conduct
120
+ Content Guidelines
121
+
122
+ ## What's paperweb platform and auraxlsl?
123
+
124
+ define it according to
125
+ [https://fastht.ml]
126
+ # https://github.com/Web4application/Aura_Full_Project.xlsl
127
+
128
+ We are helping the community work together towards the goal of advancing Machine Learning 🔥.
129
+
130
+ The Hugging Face Hub is a platform with over 2M models, 500k datasets, and 1M demos in which people can easily collaborate in their ML workflows. The Hub works as a central place where anyone can share, explore, discover, and experiment with open-source Machine Learning.
131
+
132
+ No single company, including the Tech Titans, will be able to “solve AI” by themselves – the only way we'll achieve this is by sharing knowledge and resources in a community-centric approach. We are building the largest open-source collection of models, datasets, and demos on the Hugging Face Hub to democratize and advance ML for everyone 🚀.
133
+
134
+ We encourage you to read the [Code of Conduct](https://huggingface.co/code-of-conduct) and the [Content Guidelines](https://huggingface.co/content-guidelines) to familiarize yourself with the values that we expect our community members to uphold 🤗.
135
+
136
+ ## What can you find on thier base?
137
+
138
+ The paperweb hosts web paper based and scientific workbook, which are version-controlled buckets that can contain all your files. 💾
139
+
140
+ # example https://arxiv.org/pdf/2405.01535
141
+ On it, you'll be able to upload and discover...
142
+
143
+ - Models: _hosting the latest state-of-the-art models for LLM, text, vision, and audio tasks_
144
+ - Datasets: _featuring a wide variety of data for different domains and modalities_
145
+ - Spaces: _interactive apps for demonstrating ML models directly in your browser_
146
+
147
+ The Hub offers **versioning, commit history, diffs, branches, and over a dozen library integrations**!
148
+ All repositories build on [Xet](./xet/index), a new technology to efficiently store Large Files inside Git, intelligently splitting files into unique chunks and accelerating uploads and downloads.
149
+
150
+ [paperweb](https://huggingface.co/docs/dataset-viewer/analyze_data)
151
+
152
+ Youcan learn more about the features that all repositories share in the [**Repositories documentation**](./repositories).
153
+
154
+ ## Models
155
+
156
+ You can discover and use dozens of thousands of open-source ML models shared by the community. To promote responsible model usage and development, model repos are equipped with [Model Cards](./model-cards) to inform users of each model's limitations and biases. Additional [metadata](./model-cards#model-card-metadata) about info such as their tasks, languages, and evaluation results can be included, with training metrics charts even added if the repository contains [TensorBoard traces](./tensorboard). It's also easy to add an [**inference widget**](./models-widgets) to your model, allowing anyone to play with the model directly in the browser! For programmatic access, a serverless API is provided by [**Inference Providers**](./models-inference).
157
+
158
+ To upload models to the Hub, or download models and integrate them into your work, explore the [**Models documentation**](./models). You can also choose from [**over a dozen libraries**](./models-libraries) such as 🤗 Transformers, Asteroid, and ESPnet that support the Hub.
159
+
160
+ ## Datasets
161
+
162
+ The Hub is home to over 500k public datasets in more than 8k languages that can be used for a broad range of tasks across NLP, Computer Vision, and Audio. The Hub makes it simple to find, download, and upload datasets. Datasets are accompanied by extensive documentation in the form of [**Dataset Cards**](./datasets-cards) and [**Data Studio**](./datasets-viewer) to let you explore the data directly in your browser. While many datasets are public, [**organizations**](./organizations) and individuals can create private datasets to comply with licensing or privacy issues. You can learn more about [**Datasets here on the Hugging Face Hub documentation**](./datasets-overview).
163
+
164
+ The [🤗 `datasets`](https://huggingface.co/docs/datasets/index) library allows you to programmatically interact with the datasets, so you can easily use datasets from the Hub in your projects. With a single line of code, you can access the datasets; even if they are so large they don't fit in your computer, you can use streaming to efficiently access the data.
165
+
166
+ ## Spaces
167
+
168
+ [Spaces](https://huggingface.co/spaces) is a simple way to host ML demo apps on the Hub. They allow you to build your ML portfolio, showcase your projects at conferences or to stakeholders, and work collaboratively with other people in the ML ecosystem.
169
+
170
+ We currently support two awesome Python SDKs (**[Gradio](https://gradio.app/)** and **[Streamlit](./spaces-sdks-streamlit)**) that let you build cool apps in a matter of minutes. Users can also create static Spaces, which are simple HTML/CSS/JavaScript pages, or deploy any Docker-based application.
171
+
172
+ If you need GPU power for your demos, try [**ZeroGPU**](./spaces-zerogpu): it dynamically provides NVIDIA H200 GPUs, in real-time, only when needed.
173
+
174
+ After you've explored a few Spaces (take a look at our [Space of the Week!](https://huggingface.co/spaces)), dive into the [**Spaces documentation**](./spaces-overview) to learn all about how you can create your own Space. You'll also be able to upgrade your Space to run on a GPU or other accelerated hardware. ⚡️
175
+
176
+ ## Organizations
177
+
178
+ Companies, universities and non-profits are an essential part of the Hugging Face community! The Hub offers [**Organizations**](./organizations), which can be used to group accounts and manage datasets, models, and Spaces. Educators can also create collaborative organizations for students using [Hugging Face for Classrooms](https://huggingface.co/classrooms). An organization's repositories will be featured on the organization’s page and every member of the organization will have the ability to contribute to the repository. In addition to conveniently grouping all of an organization's work, the Hub allows admins to set roles to [**control access to repositories**](./organizations-security), and manage their organization's [payment method and billing info](https://huggingface.co/pricing). Machine Learning is more fun when collaborating! 🔥
179
+
180
+ [Explore existing organizations](https://huggingface.co/organizations), create a new organization [here](https://huggingface.co/organizations/new), and then visit the [**Organizations documentation**](./organizations) to learn more.
181
+
182
+ ## Security
183
+
184
+ The Hugging Face Hub supports security and access control features to give you the peace of mind that your code, models, and data are safe. Visit the [**Security**](./security) section in these docs to learn about:
185
+
186
+ - User Access Tokens
187
+ - Access Control for Organizations
188
+ - Signing commits with GPG
189
+ - Malware scanning
190
+
191
+ # Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
192
+
193
+ Analyze a dataset on the Hub
194
+ In the Quickstart, you were introduced to various endpoints for interacting with datasets on the Hub. One of the most useful ones is the /parquet endpoint, which allows you to get a dataset stored on the Hub and analyze it. This is a great way to explore the dataset, and get a better understanding of it’s contents.
195
+
196
+ To demonstrate, this guide will show you an end-to-end example of how to retrieve a dataset from the Hub and do some basic data analysis with the Pandas library.
197
+
198
+ Get a dataset
199
+
200
+ The Hub is home to more than 200,000 datasets across a wide variety of tasks, sizes, and languages. For this example, you’ll use the codeparrot/codecomplex dataset, but feel free to explore and find another dataset that interests you! The dataset contains Java code from programming competitions, and the time complexity of the code is labeled by a group of algorithm experts.
201
+
202
+ Let’s say you’re interested in the average length of the submitted code as it relates to the time complexity. Here’s how you can get started.
203
+
204
+ Use the /parquet endpoint to convert the dataset to a Parquet file and return the URL to it:
205
+
206
+ Python
207
+
208
+ JavaScript
209
+
210
+ cURL
211
+
212
+ import requests
213
+ API_URL = "https://datasets-server.huggingface.co/parquet?dataset=codeparrot/codecomplex"
214
+ def query():
215
+ response = requests.get(API_URL)
216
+ return response.json()
217
+ data = query()
218
+ {"parquet_files":
219
+ [
220
+ {"dataset": "codeparrot/codecomplex", "config": "default", "split": "train", "url": "https://huggingface.co/datasets/codeparrot/codecomplex/resolve/refs%2Fconvert%2Fparquet/default/train/0000.parquet", "filename": "0000.parquet", "size": 4115908}
221
+ ],
222
+ "pending": [], "failed": [], "partial": false
223
+ }
224
+ Read dataset with Pandas
225
+
226
+ With the URL, you can read the Parquet file into a Pandas DataFrame:
227
+
228
+ import pandas as pd
229
+
230
+ url = "https://huggingface.co/datasets/codeparrot/codecomplex/resolve/refs%2Fconvert%2Fparquet/default/train/0000.parquet"
231
+ df = pd.read_parquet(url)
232
+ df.head(5)
233
+ src complexity problem from
234
+ import java.io.*;\nimport java.math.BigInteger… quadratic 1179_B. Tolik and His Uncle CODEFORCES
235
+ import java.util.Scanner;\n \npublic class pil… linear 1197_B. Pillars CODEFORCES
236
+ import java.io.BufferedReader;\nimport java.io… linear 1059_C. Sequence Transformation CODEFORCES
237
+ import java.util.;\n\nimport java.io.;\npubl… linear 1011_A. Stages CODEFORCES
238
+ import java.io.OutputStream;\nimport java.io.I… linear 1190_C. Tokitsukaze and Duel CODEFORCES
239
+ Calculate mean code length by time complexity
240
+
241
+ Pandas is a powerful library for data analysis; group the dataset by time complexity, apply a function to calculate the average length of the code snippet, and plot the results:
242
+
243
+ df.groupby('complexity')['src'].apply(lambda x: x.str.len().mean()).sort_values(ascending=False).plot.barh(color="orange")
244
+ < > Update on GitHub