-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathvickys.json
More file actions
231 lines (231 loc) · 59.3 KB
/
vickys.json
File metadata and controls
231 lines (231 loc) · 59.3 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
[
{
"file": "E://data science tool//GA1//first.py",
"question": "Install and run Visual Studio Code. In your Terminal (or Command Prompt), type code -s and press Enter. Copy and paste the entire output below.\n\nWhat is the output of code -s?"
},
{
"file": "E://data science tool//GA1//second.py",
"question": "Running uv run --with httpie -- https [URL] installs the Python package httpie and sends a HTTPS request to the URL.\n\nSend a HTTPS request to https://httpbin.org/get with the URL encoded parameter email set to 24f2006438@ds.study.iitm.ac.in\n\nWhat is the JSON output of the command? (Paste only the JSON body, not the headers)"
},
{
"file": "E://data science tool//GA1//third.py",
"question": "Let's make sure you know how to use npx and prettier.\n\nDownload README.md. In the directory where you downloaded it, make sure it is called README.md, and run npx -y prettier@3.4.2 README.md | sha256sum.\n\nWhat is the output of the command?"
},
{
"file": "E://data science tool//GA1//fourth.py",
"question": "Let's make sure you can write formulas in Google Sheets. Type this formula into Google Sheets. (It won't work in Excel)\n\n=SUM(ARRAY_CONSTRAIN(SEQUENCE(100, 100, 12, 10), 1, 10))\nWhat is the result?"
},
{
"file": "E://data science tool//GA1//fifth.py",
"question": "'Let's make sure you can write formulas in Excel. Type this formula into Excel.\n\nNote: This will ONLY work in Office 365.\n\n=SUM(TAKE(SORTBY({14,1,2,9,10,12,9,4,3,3,7,2,5,0,3,0}, {10,9,13,2,11,8,16,14,7,15,5,4,6,1,3,12}), 1, 7))\nWhat is the result?"
},
{
"file": "E://data science tool//GA1//sixth.py",
"question": "Just above this paragraph, there's a hidden input with a secret value.\n\nWhat is the value in the hidden input?"
}
,
{
"file": "E://data science tool//GA1//seventh.py",
"question": "How many Wednesdays are there in the date range 1981-03-03 to 2012-12-30?"
},
{
"file": "E://data science tool//GA1//eighth.py",
"question": "file name is q-extract-csv-zip.zip and unzip file which has a single extract.csv file inside."
},
{
"file": "E://data science tool//GA1//ninth.py",
"question": "Let's make sure you know how to use JSON. Sort this JSON array of objects by the value of the age field. In case of a tie, sort by the name field. Paste the resulting JSON below without any spaces or newlines.\n\n# [{\"name\":\"Alice\",\"age\":0},{\"name\":\"Bob\",\"age\":16},{\"name\":\"Charlie\",\"age\":23},{\"name\":\"David\",\"age\":32},{\"name\":\"Emma\",\"age\":95},{\"name\":\"Frank\",\"age\":25},{\"name\":\"Grace\",\"age\":36},{\"name\":\"Henry\",\"age\":71},{\"name\":\"Ivy\",\"age\":15},{\"name\":\"Jack\",\"age\":55},{\"name\":\"Karen\",\"age\":9},{\"name\":\"Liam\",\"age\":53},{\"name\":\"Mary\",\"age\":43},{\"name\":\"Nora\",\"age\":11},{\"name\":\"Oscar\",\"age\":40},{\"name\":\"Paul\",\"age\":73}]"
},
{
"file": "E://data science tool//GA1//tenth.py",
"question": "+Download q-mutli-cursor-json.txt and use multi-cursors and convert it into a single JSON object, where key=value pairs are converted into {key: value, key: value, ...}.\nWhat's the result when you paste the JSON at tools-in-data-science.pages.dev/jsonhash and click the Hash button?"
},
{
"file": "E://data science tool//GA1//eleventh.py",
"question": "Let's make sure you know how to select elements using CSS selectors. Find all <div>s having a foo class in the hidden element below. What's the sum of their data-value attributes?\n\nSum of data-value attributes:"
},
{
"file": "E://data science tool//GA1//twelfth.py",
"question": "Download q-unicode-data.zip and process the files in which contains three files with different encodings:\n\ndata1.csv: CSV file encoded in CP-1252\ndata2.csv: CSV file encoded in UTF-8\ndata3.txt: Tab-separated file encoded in UTF-16\nEach file has 2 columns: symbol and value. Sum up all the values where the symbol matches \u0153 OR \u017d OR \u0178 across all three files.\n\nWhat is the sum of all values associated with these symbols?"
},
{
"file": "E://data science tool//GA1//thirteenth.py",
"question": "Let's make sure you know how to use GitHub. Create a GitHub account if you don't have one. Create a new public repository. Commit a single JSON file called email.json with the value {\"email\": \"24f2006438@ds.study.iitm.ac.in\"} and push it.\n\nEnter the raw Github URL of email.json so we can verify it. (It might look like https://raw.githubusercontent.com/[GITHUB ID]/[REPO NAME]/main/email.json.)"
},
{
"file": "E://data science tool//GA1//fourteenth.py",
"question": "Download q-replace-across-files.zip and unzip it into a new folder, then replace all \"IITM\" (in upper, lower, or mixed case) with \"IIT Madras\" in all files. Leave everything as-is - don't change the line endings.\n\nWhat does running cat * | sha256sum in that folder show in bash?"
},
{
"file": "E://data science tool//GA1//fifteenth.py",
"question": "Download and extract it. Use ls with options to list all files in the folder along with their date and file size.\n\nWhat's the total size of all files at least 4675 bytes large and modified on or after Sun, 31 Oct, 2010, 9:43 am IST?"
},
{
"file": "E://data science tool//GA1//sixteenth.py",
"question": "Download and extract it. Use mv to move all files under folders into an empty folder. Then rename all files replacing each digit with the next. 1 becomes 2, 9 becomes 0, a1b9c.txt becomes a2b0c.txt.\n\nWhat does running grep . * | LC_ALL=C sort | sha256sum in bash on that folder show?"
},
{
"file": "E://data science tool//GA1//seventeenth.py",
"question": "Download q-compare-files.zip and extract it. It has 2 nearly identical files, a.txt and b.txt, with the same number of lines."
},
{
"file": "E://data science tool//GA1//eighteenth.py",
"question": "There is a tickets table in a SQLite database that has columns type, units, and price. Each row is a customer bid for a concert ticket.\n\ntype\tunits\tprice\nbronze\t297\t0.6\nBronze\t673\t1.62\nSilver\t105\t1.26\nSilver\t82\t0.79\nSILVER\t121\t0.84\n...\nWhat is the total sales of all the items in the \"Gold\" ticket type? Write SQL to calculate it."
},
{
"file": "E://data science tool//GA2//first.py",
"question": "Write documentation in Markdown for an **imaginary** analysis of the number of steps you walked each day for a week, comparing over time and with friends. The Markdown must include:\n\nTop-Level Heading: At least 1 heading at level 1, e.g., # Introduction\nSubheadings: At least 1 heading at level 2, e.g., ## Methodology\nBold Text: At least 1 instance of bold text, e.g., **important**\nItalic Text: At least 1 instance of italic text, e.g., *note*\nInline Code: At least 1 instance of inline code, e.g., sample_code\nCode Block: At least 1 instance of a fenced code block, e.g.\n\nprint(\"Hello World\")\nBulleted List: At least 1 instance of a bulleted list, e.g., - Item\nNumbered List: At least 1 instance of a numbered list, e.g., 1. Step One\nTable: At least 1 instance of a table, e.g., | Column A | Column B |\nHyperlink: At least 1 instance of a hyperlink, e.g., [Text](https://example.com)\nImage: At least 1 instance of an image, e.g., \nBlockquote: At least 1 instance of a blockquote, e.g., > This is a quote"
},
{
"file": "E://data science tool//GA2//second.py",
"question": "Download the image below and compress it losslessly to an image that is less than 1,500 bytes.\nvicky.png\nBy losslessly, we mean that every pixel in the new image should be identical to the original image.\n\nUpload your losslessly compressed image (less than 1,500 bytes)"
},
{
"file": "E://data science tool//GA2//third.py",
"question": "Publish a page using GitHub Pages that showcases your work. Ensure that your email address 24f2006438@ds.study.iitm.ac.in is in the page's HTML.\n\nGitHub pages are served via CloudFlare which obfuscates emails. So, wrap your email address inside a:\n\n<!--email_off-->24f2006438@ds.study.iitm.ac.in<!--/email_off-->\nWhat is the GitHub Pages URL? It might look like: https://[USER].github.io/[REPO]/"
},
{
"file": "E://data science tool//GA2//fourth.py",
"question": "Let's make sure you can access Google Colab. Run this program on Google Colab, allowing all required access to your email ID: 24f2006438@ds.study.iitm.ac.in.\n\nimport hashlib\nimport requests\nfrom google.colab import auth\nfrom oauth2client.client import GoogleCredentials\n\nauth.authenticate_user()\ncreds = GoogleCredentials.get_application_default()\ntoken = creds.get_access_token().access_token\nresponse = requests.get(\n \"https://www.googleapis.com/oauth2/v1/userinfo\",\n params={\"alt\": \"json\"},\n headers={\"Authorization\": f\"Bearer {token}\"}\n)\nemail = response.json()[\"email\"]\nhashlib.sha256(f\"{email} {creds.token_expiry.year}\".encode()).hexdigest()[-5:]\nWhat is the result? (It should be a 5-character string)"
},
{
"file": "E://data science tool//GA2//fifth.py",
"question": "Download 'E:\\data science tool\\GA2\\lenna.webp' this image. Create a new Google Colab notebook and run this code (after fixing a mistake in it) to calculate the number of pixels with a certain minimum brightness:\n\nimport numpy as np\nfrom PIL import Image\nfrom google.colab import files\nimport colorsys\n\n# There is a mistake in the line below. Fix it\nimage = Image.open(list(files.upload().keys)[0])\n\nrgb = np.array(image) / 255.0\nlightness = np.apply_along_axis(lambda x: colorsys.rgb_to_hls(*x)[1], 2, rgb)\nlight_pixels = np.sum(lightness > 0.718)\nprint(f'Number of pixels with lightness > 0.718: {light_pixels}')\nWhat is the result? (It should be a number)"
},
{
"file": "E://data science tool//GA2//sixth.py",
"question": "Download this ql-python.json which has the marks of 100 imaginary students.\n\nCreate and deploy a Python app to Vercel. Expose an API so that when a request like https://your-app.vercel.app/api?name=X&name=Y is made, it returns a JSON response with the marks of the names X and Y in the same order, like this:\n\n{ \"marks\": [10, 20] }\nMake sure you enable CORS to allow GET requests from any origin.\n\nWhat is the Vercel URL? It should look like: https://your-app.vercel.app/api"
},
{
"file": "E://data science tool//GA2//seventh.py",
"question": "Create a GitHub action on one of your GitHub repositories. Make sure one of the steps in the action has a name that contains your email address 24f2006438@ds.study.iitm.ac.in. For example:\n\n\njobs:\n test:\n steps:\n - name: 24f2006438@ds.study.iitm.ac.in\n run: echo \"Hello, world!\"\n \nTrigger the action and make sure it is the most recent action.\n\nWhat is your repository URL? It will look like: https://github.com/USER/REPO"
},
{
"file": "E://data science tool//GA2//eighth.py",
"question": "Create and push an image to Docker Hub. Add a tag named 24f2006438 to the image.\n\nWhat is the Docker image URL? It should look like: https://hub.docker.com/repository/docker/$USER/$REPO/general"
},
{
"file": "E://data science tool//GA2//ninth.py",
"question": "Download q-fastapi.csv. This file has 2-columns:\n\nstudentId: A unique identifier for each student, e.g. 1, 2, 3, ...\nclass: The class (including section) of the student, e.g. 1A, 1B, ... 12A, 12B, ... 12Z\nWrite a FastAPI server that serves this data. For example, /api should return all students data (in the same row and column order as the CSV file) as a JSON like this:\n\n{\n \"students\": [\n {\n \"studentId\": 1,\n \"class\": \"1A\"\n },\n {\n \"studentId\": 2,\n \"class\": \"1B\"\n }, ...\n ]\n}\nIf the URL has a query parameter class, it should return only students in those classes. For example, /api?class=1A should return only students in class 1A. /api?class=1A&class=1B should return only students in class 1A and 1B. There may be any number of classes specified. Return students in the same order as they appear in the CSV file (not the order of the classes).\n\nMake sure you enable CORS to allow GET requests from any origin.\n\nWhat is the API URL endpoint for FastAPI? It might look like: http://127.0.0.1:8000/api"
},
{
"file": "E://data science tool//GA2//tenth.py",
"question": "Download Llamafile. Run the Llama-3.2-1B-Instruct.Q6_K.llamafile model with it.\n\nCreate a tunnel to the Llamafile server using ngrok.\n\nWhat is the ngrok URL? It might look like: https://[random].ngrok-free.app/"
},
{
"file": "E://data science tool//GA3//first.py",
"question": "Write a Python program that uses httpx to send a POST request to OpenAI's API to analyze the sentiment of this (meaningless) text into GOOD, BAD or NEUTRAL. Specifically:\n\nMake sure you pass an Authorization header with dummy API key.\nUse gpt-4o-mini as the model.\nThe first message must be a system message asking the LLM to analyze the sentiment of the text. Make sure you mention GOOD, BAD, or NEUTRAL as the categories.\nThe second message must be exactly the text contained above.\nThis test is crucial for DataSentinel Inc. as it validates both the API integration and the correctness of message formatting in a controlled environment. Once verified, the same mechanism will be used to process genuine customer feedback, ensuring that the sentiment analysis module reliably categorizes data as GOOD, BAD, or NEUTRAL. This reliability is essential for maintaining high operational standards and swift response times in real-world applications.\n\nNote: This uses a dummy httpx library, not the real one. You can only use:\n\nresponse = httpx.get(url, **kwargs)\nresponse = httpx.post(url, json=None, **kwargs)\nresponse.raise_for_status()\nresponse.json()\nCode"
},
{
"file": "E://data science tool//GA3//second.py",
"question": "LexiSolve Inc. is a startup that delivers a conversational AI platform to enterprise clients. The system leverages OpenAI\u2019s language models to power a variety of customer service, sentiment analysis, and data extraction features. Because pricing for these models is based on the number of tokens processed\u2014and strict token limits apply\u2014accurate token accounting is critical for managing costs and ensuring system stability.\n\nTo optimize operational costs and prevent unexpected API overages, the engineering team at LexiSolve has developed an internal diagnostic tool that simulates and measures token usage for typical prompts sent to the language model.\n\nOne specific test case an understanding of text tokenization. Your task is to generate data for that test case.\n\nSpecifically, when you make a request to OpenAI's GPT-4o-Mini with just this user message:\n\n\nList only the valid English words from these: 67llI, W56, 857xUSfYl, wnYpo5, 6LsYLB, c, TkAW, mlsmBx, 9MrIPTn4vj, BF2gKyz3, 6zE, lC6j, peoq, cj4, pgYVG, 2EPp, yXnG9jVa5, glUMfxVUV, pyF4if, WlxxTdMs9A, CF5Sr, A0hkI, 3ldO4One, rx, J78ThyyGD, w2JP, 1Xt, OQKOXlQsA, d9zdH, IrJUGta, hfbG3, 45w, vnAlhZ, CKWsdaifG, OIwf1FHxPD, Z7ugFzvZ, r504, BbWREDk, FLe2, decONFmc, DJ31Bku, CQ, OMr, I4ZYVo1eR, OHgG, cwpP4euE3t, 721Ftz69, H, m8, ROilvXH7Ku, N7vjgD, bZplYIAY, wcnE, Gl, cUbAg, 6v, VMVCho, 6yZDX8U, oZeZgWQ, D0nV8WoCL, mTOzo7h, JolBEfg, uw43axlZGT, nS3, wPZ8, JY9L4UCf8r, bp52PyX, Pf\n... how many input tokens does it use up?\n\nNumber of tokens:"
},
{
"file": "E://data science tool//GA3//third.py",
"question": "RapidRoute Solutions is a logistics and delivery company that relies on accurate and standardized address data to optimize package routing. Recently, they encountered challenges with manually collecting and verifying new addresses for testing their planning software. To overcome this, the company decided to create an automated address generator using a language model, which would provide realistic, standardized U.S. addresses that could be directly integrated into their system.\n\nThe engineering team at RapidRoute is tasked with designing a service that uses OpenAI's GPT-4o-Mini model to generate fake but plausible address data. The addresses must follow a strict format, which is critical for downstream processes such as geocoding, routing, and verification against customer databases. For consistency and validation, the development team requires that the addresses be returned as structured JSON data with no additional properties that could confuse their parsers.\n\nAs part of the integration process, you need to write the body of the request to an OpenAI chat completion call that:\n\nUses model gpt-4o-mini\nHas a system message: Respond in JSON\nHas a user message: Generate 10 random addresses in the US\nUses structured outputs to respond with an object addresses which is an array of objects with required fields: zip (number) state (string) latitude (number) .\nSets additionalProperties to false to prevent additional properties.\nNote that you don't need to run the request or use an API key; your task is simply to write the correct JSON body.\n\nWhat is the JSON body we should send to https://api.openai.com/v1/chat/completions for this? (No need to run it or to use an API key. Just write the body of the request below.)"
},
{
"file": "E://data science tool//GA3//fourth.py",
"question": "Write just the JSON body (not the URL, nor headers) for the POST request that sends these two pieces of content (text and image URL) to the OpenAI API endpoint.\n\nUse gpt-4o-mini as the model.\nSend a single user message to the model that has a text and an image_url content (in that order).\nThe text content should be Extract text from this image.\nSend the image_url as a base64 URL of the image above. CAREFUL: Do not modify the image.\nWrite your JSON body here:"
},
{
"file": "E://data science tool//GA3//fifth.py",
"question": "SecurePay, a leading fintech startup, has implemented an innovative feature to detect and prevent fraudulent activities in real time. As part of its security suite, the system analyzes personalized transaction messages by converting them into embeddings. These embeddings are compared against known patterns of legitimate and fraudulent messages to flag unusual activity.\n\nImagine you are working on the SecurePay team as a junior developer tasked with integrating the text embeddings feature into the fraud detection module. When a user initiates a transaction, the system sends a personalized verification message to the user's registered email address. This message includes the user's email address and a unique transaction code (a randomly generated number). Here are 2 verification messages:\n\nDear user, please verify your transaction code 36352 sent to 24f2006438@ds.study.iitm.ac.in\nDear user, please verify your transaction code 61536 sent to 24f2006438@ds.study.iitm.ac.in\nThe goal is to capture this message, convert it into a meaningful embedding using OpenAI's text-embedding-3-small model, and subsequently use the embedding in a machine learning model to detect anomalies.\n\nYour task is to write the JSON body for a POST request that will be sent to the OpenAI API endpoint to obtain the text embedding for the 2 given personalized transaction verification messages above. This will be sent to the endpoint https://api.openai.com/v1/embeddings.\n\nWrite your JSON body here:"
},
{
"file": "E://data science tool//GA3//sixth.py",
"question": "ShopSmart is an online retail platform that places a high value on customer feedback. Each month, the company receives hundreds of comments from shoppers regarding product quality, delivery speed, customer service, and more. To automatically understand and cluster this feedback, ShopSmart's data science team uses text embeddings to capture the semantic meaning behind each comment.\n\nAs part of a pilot project, ShopSmart has curated a collection of 25 feedback phrases that represent a variety of customer sentiments. Examples of these phrases include comments like \u201cFast shipping and great service,\u201d \u201cProduct quality could be improved,\u201d \u201cExcellent packaging,\u201d and so on. Due to limited processing capacity during initial testing, you have been tasked with determine which pair(s) of 5 of these phrases are most similar to each other. This similarity analysis will help in grouping similar feedback to enhance the company\u2019s understanding of recurring customer issues.\n\nShopSmart has written a Python program that has the 5 phrases and their embeddings as an array of floats. It looks like this:\n\nembeddings = {\"Fast shipping and great service.\":[-0.1079404279589653,0.020684150978922844,-0.30074435472488403,0.11729881167411804,0.13952496647834778,-0.018052106723189354,-0.21843314170837402,0.13527116179466248,-0.09257353842258453,-0.09384968131780624,0.11293865740299225,-0.03900212049484253,-0.059287477284669876,-0.1008152961730957,-0.019155437126755714,-0.007078605704009533,-0.02967032417654991,0.03711449354887009,-0.18302017450332642,0.20056714117527008,0.09076566994190216,0.02584189549088478,0.0943814069032669,-0.03799184039235115,-0.25246360898017883,-0.1235731765627861,0.028952494263648987,-0.309251993894577,0.021375395357608795,-0.22204887866973877,0.2159872055053711,-0.11921302229166031,0.21928390860557556,-0.11432114243507385,0.017453914508223534,0.10065577924251556,-0.04200637340545654,0.17493793368339539,0.1322934925556183,0.17025874555110931,-0.15271177887916565,0.004682514350861311,0.2531017065048218,0.11580997705459595,0.014688937924802303,-0.11176885664463043,-0.292662113904953,-0.0397731214761734,0.13729171454906464,0.027570005506277084],\"I found it hard to navigate the website.\":[0.05301663279533386,-0.21206653118133545,-0.3240986168384552,-0.03143302723765373,0.12086819857358932,-0.12435400485992432,-0.1547534465789795,-0.07344505935907364,-0.16026587784290314,0.12265162914991379,-0.12467826157808304,-0.12411080300807953,-0.04150537773966789,0.026143522933125496,0.12581317126750946,0.0643252283334732,-0.0636361762881279,-0.08297022432088852,-0.2712441384792328,0.0668787807226181,0.23184643685817719,-0.03439190611243248,0.02334677428007126,0.07883589714765549,-0.07770098745822906,0.026042193174362183,-0.007098270580172539,0.09103620797395706,0.17801915109157562,0.051192667335271835,0.051760122179985046,-0.17737063765525818,0.16164399683475494,0.016608230769634247,-0.06947287172079086,-0.20606771111488342,0.13554099202156067,0.22228075563907623,0.19893397390842438,0.0876314714550972,0.03603347763419151,0.3054536283016205,0.34631049633026123,0.008765174075961113,-0.053057167679071426,0.09346816688776016,-0.18855763971805573,-0.05759681761264801,-0.03198021650314331,0.061325814574956894],\"The item arrived damaged.\":[0.04743589088320732,0.3924431800842285,-0.19287808239459991,0.0009346450679004192,-0.02529826946556568,0.007183298002928495,-0.12663501501083374,-0.1648762822151184,-0.09184173494577408,0.021719681099057198,-0.016338737681508064,0.1440839022397995,0.015228591859340668,-0.13091887533664703,-0.027949560433626175,0.14481529593467712,0.1035439744591713,-0.026539022102952003,-0.29924315214157104,0.04913375899195671,0.01723991520702839,0.14533771574497223,0.036674004048109055,-0.19653503596782684,-0.05490652099251747,-0.04375281557440758,0.25682249665260315,-0.1878628432750702,0.11273860186338425,0.08703545480966568,0.229447603225708,-0.07084038108587265,0.25891217589378357,-0.030300457030534744,0.018637394532561302,0.19883368909358978,-0.0997825413942337,0.2977803647518158,0.005384208634495735,0.03330438211560249,-0.07449733465909958,-0.022646980360150337,-0.07622132450342178,0.25598663091659546,-0.10782783478498459,0.12287358194589615,-0.02471054531633854,0.16644354164600372,-0.05433185398578644,-0.04077501222491264],\"Product quality could be improved.\":[0.02994030900299549,0.0700574517250061,-0.09608972817659378,0.0757998675107956,0.05681799724698067,-0.12199439853429794,0.1026616021990776,0.34097179770469666,0.10221496969461441,-0.022985607385635376,0.00909215584397316,-0.12154776602983475,-0.33331525325775146,-0.03502872586250305,0.09934376925230026,-0.07471518963575363,0.232376366853714,-0.1896272748708725,-0.17048589885234833,0.0928356945514679,0.21285215020179749,0.060550566762685776,0.17584548890590668,0.05365967005491257,0.0439932718873024,0.0900282934308052,0.18656465411186218,-0.18146029114723206,-0.006986604072153568,-0.11421024054288864,0.14624014496803284,-0.19919796288013458,0.14802667498588562,-0.062432803213596344,-0.26695844531059265,0.0347416065633297,0.3560296893119812,0.1255674511194229,0.022554926574230194,-0.060359153896570206,-0.0147787407040596,0.09608972817659378,0.043897565454244614,0.11484828591346741,0.15619367361068726,-0.04826818034052849,0.020592935383319855,-0.09813147783279419,0.06405982375144958,-0.08907122164964676],\"Great selection, but the size options were limited.\":[0.11335355788469315,-0.06627686321735382,-0.05730358883738518,-0.1772221475839615,-0.190682053565979,-0.14000946283340454,-0.03737764060497284,0.0863017737865448,-0.22301223874092102,0.06462736427783966,-0.09197605401277542,-0.31960687041282654,-0.15175388753414154,0.0831347405910492,0.049550943076610565,0.012775368057191372,0.0678933709859848,-0.05585202947258949,-0.21390700340270996,0.144364133477211,0.024148661643266678,0.023455873131752014,0.00280002411454916,-0.10734938085079193,0.09131625294685364,-0.033814724534749985,-0.006305208895355463,0.012156805954873562,0.2611486613750458,0.13492900133132935,0.015051675960421562,-0.15597660839557648,-0.06363766640424728,-0.26695486903190613,-0.37318259477615356,0.018375417217612267,0.1467394083738327,0.13473105430603027,0.1976759284734726,0.14555177092552185,0.13235577940940857,-0.006663974840193987,0.15043428540229797,0.08029760420322418,0.20229452848434448,0.0745573416352272,-0.00456498796120286,-0.08656569570302963,-0.25006401538848877,-0.022977517917752266]}\nYour task is to write a Python function most_similar(embeddings) that will calculate the cosine similarity between each pair of these embeddings and return the pair that has the highest similarity. The result should be a tuple of the two phrases that are most similar.\n\nWrite your Python code here:"
},
{
"file": "E://data science tool//GA3//seventh.py",
"question": "InfoCore Solutions is a technology consulting firm that maintains an extensive internal knowledge base of technical documents, project reports, and case studies. Employees frequently search through these documents to answer client questions quickly or gain insights for ongoing projects. However, due to the sheer volume of documentation, traditional keyword-based search often returns too many irrelevant results.\n\nTo address this issue, InfoCore's data science team decides to integrate a semantic search feature into their internal portal. This feature uses text embeddings to capture the contextual meaning of both the documents and the user's query. The documents are pre-embedded, and when an employee submits a search query, the system computes the similarity between the query's embedding and those of the documents. The API then returns a ranked list of document identifiers based on similarity.\n\nImagine you are an engineer on the InfoCore team. Your task is to build a FastAPI POST endpoint that accepts an array of docs and query string via a JSON body. The endpoint is structured as follows:\n\nPOST /similarity\n\n{\n \"docs\": [\"Contents of document 1\", \"Contents of document 2\", \"Contents of document 3\", ...],\n \"query\": \"Your query string\"\n}\nService Flow:\n\nRequest Payload: The client sends a POST request with a JSON body containing:\ndocs: An array of document texts from the internal knowledge base.\nquery: A string representing the user's search query.\nEmbedding Generation: For each document in the docs array and for the query string, the API computes a text embedding using text-embedding-3-small.\nSimilarity Computation: The API then calculates the cosine similarity between the query embedding and each document embedding. This allows the service to determine which documents best match the intent of the query.\nResponse Structure: After ranking the documents by their similarity scores, the API returns the identifiers (or positions) of the three most similar documents. The JSON response might look like this:\n\n{\n \"matches\": [\"Contents of document 3\", \"Contents of document 1\", \"Contents of document 2\"]\n}\nHere, \"Contents of document 3\" is considered the closest match, followed by \"Contents of document 1\", then \"Contents of document 2\".\n\nMake sure you enable CORS to allow OPTIONS and POST methods, perhaps allowing all origins and headers.\n\nWhat is the API URL endpoint for your implementation? It might look like: http://127.0.0.1:8000/similarity\nWe'll check by sending a POST request to this URL with a JSON body containing random docs and query."
},
{
"file": "E://data science tool//GA3//eighth.py",
"question": "Develop a FastAPI application that:\n\nExposes a GET endpoint /execute?q=... where the query parameter q contains one of the pre-templatized questions.\nAnalyzes the q parameter to identify which function should be called.\nExtracts the parameters from the question text.\nReturns a response in the following JSON format:\n\n{ \"name\": \"function_name\", \"arguments\": \"{ ...JSON encoded parameters... }\" }\nFor example, the query \"What is the status of ticket 83742?\" should return:\n\n{\n \"name\": \"get_ticket_status\",\n \"arguments\": \"{\"ticket_id\": 83742}\"\n}\nMake sure you enable CORS to allow GET requests from any origin.\n\nWhat is the API URL endpoint for your implementation? It might look like: http://127.0.0.1:8000/execute"
},
{
"file": "E://data science tool//GA3//ninth.py",
"question": "Prompt to Yes"
},
{
"file": "E://data science tool//GA4//first.py",
"question": "ESPN Cricinfo has ODI batting stats for each batsman. The result is paginated across multiple pages. Count the number of ducks in page number 22.\n\nUnderstanding the Data Source: ESPN Cricinfo's ODI batting statistics are spread across multiple pages, each containing a table of player data. Go to page number 22.\nSetting Up Google Sheets: Utilize Google Sheets' IMPORTHTML function to import table data from the URL for page number 22.\nData Extraction and Analysis: Pull the relevant table from the assigned page into Google Sheets. Locate the column that represents the number of ducks for each player. (It is titled \"0\".) Sum the values in the \"0\" column to determine the total number of ducks on that page.\nImpact\nBy automating the extraction and analysis of cricket batting statistics, CricketPro Insights can:\n\nEnhance Analytical Efficiency: Reduce the time and effort required to manually gather and process player performance data.\nProvide Timely Insights: Deliver up-to-date statistical analyses that aid teams and coaches in making informed decisions.\nScalability: Easily handle large volumes of data across multiple pages, ensuring comprehensive coverage of player performances.\nData-Driven Strategies: Enable the development of data-driven strategies for player selection, training focus areas, and game planning.\nClient Satisfaction: Improve service offerings by providing accurate and insightful analytics that meet the specific needs of clients in the cricketing world.\nWhat is the total number of ducks across players on page number 22 of ESPN Cricinfo's ODI batting stats?"
},
{
"file": "E://data science tool//GA4//second.py",
"question": "Source: Utilize IMDb's advanced web search at https://www.imdb.com/search/title/ to access movie data.\nFilter: Filter all titles with a rating between 5 and 7.\nFormat: For up to the first 25 titles, extract the necessary details: ID, title, year, and rating. The ID of the movie is the part of the URL after tt in the href attribute. For example, tt10078772. Organize the data into a JSON structure as follows:\n\n[\n { \"id\": \"tt1234567\", \"title\": \"Movie 1\", \"year\": \"2021\", \"rating\": \"5.8\" },\n { \"id\": \"tt7654321\", \"title\": \"Movie 2\", \"year\": \"2019\", \"rating\": \"6.2\" },\n // ... more titles\n]\nSubmit: Submit the JSON data in the text box below.\nImpact\nBy completing this assignment, you'll simulate a key component of a streaming service's content acquisition strategy. Your work will enable StreamFlix to make informed decisions about which titles to license, ensuring that their catalog remains both diverse and aligned with subscriber preferences. This, in turn, contributes to improved customer satisfaction and retention, driving the company's growth and success in a competitive market.\n\nWhat is the JSON data?"
},
{
"file": "E://data science tool//GA4//third.py",
"question": "Write a web application that exposes an API with a single query parameter: ?country=. It should fetch the Wikipedia page of the country, extracts all headings (H1 to H6), and create a Markdown outline for the country. The outline should look like this:\n\n\n## Contents\n\n# Vanuatu\n\n## Etymology\n\n## History\n\n### Prehistory\n\n...\nAPI Development: Choose any web framework (e.g., FastAPI) to develop the web application. Create an API endpoint (e.g., /api/outline) that accepts a country query parameter.\nFetching Wikipedia Content: Find out the Wikipedia URL of the country and fetch the page's HTML.\nExtracting Headings: Use an HTML parsing library (e.g., BeautifulSoup, lxml) to parse the fetched Wikipedia page. Extract all headings (H1 to H6) from the page, maintaining order.\nGenerating Markdown Outline: Convert the extracted headings into a Markdown-formatted outline. Headings should begin with #.\nEnabling CORS: Configure the web application to include appropriate CORS headers, allowing GET requests from any origin.\nWhat is the URL of your API endpoint?\nWe'll check by sending a request to this URL with ?country=... passing different countries."
},
{
"file": "E://data science tool//GA4//fourth.py",
"question": "As part of this initiative, you are tasked with developing a system that automates the following:\n\nAPI Integration and Data Retrieval: Use the BBC Weather API to fetch the weather forecast for Kathmandu. Send a GET request to the locator service to obtain the city's locationId. Include necessary query parameters such as API key, locale, filters, and search term (city).\nWeather Data Extraction: Retrieve the weather forecast data using the obtained locationId. Send a GET request to the weather broker API endpoint with the locationId.\nData Transformation: Extract the localDate and enhancedWeatherDescription from each day's forecast. Iterate through the forecasts array in the API response and map each localDate to its corresponding enhancedWeatherDescription. Create a JSON object where each key is the localDate and the value is the enhancedWeatherDescription.\nThe output would look like this:\n\n{\n \"2025-01-01\": \"Sunny with scattered clouds\",\n \"2025-01-02\": \"Partly cloudy with a chance of rain\",\n \"2025-01-03\": \"Overcast skies\",\n // ... additional days\n}\nWhat is the JSON weather forecast description for Kathmandu?"
},
{
"file": "E://data science tool//GA4//fifth.py",
"question": "By automating the extraction and processing of bounding box data, UrbanRide can:\n\nOptimize Routing: Enhance route planning algorithms with precise geographical boundaries, reducing delivery times and operational costs.\nImprove Fleet Allocation: Allocate vehicles more effectively across defined service zones based on accurate city extents.\nEnhance Market Analysis: Gain deeper insights into regional performance, enabling targeted marketing and service improvements.\nScale Operations: Seamlessly integrate new cities into their service network with minimal manual intervention, ensuring consistent data quality.\nWhat is the minimum latitude of the bounding box of the city Bangalore in the country India on the Nominatim API? Value of the minimum latitude"
},
{
"file": "E://data science tool//GA4//sixth.py",
"question": "Search using the Hacker News RSS API for the latest Hacker News post mentioning Text Editor and having a minimum of 77 points. What is the link that it points to?\n\nAutomate Data Retrieval: Utilize the HNRSS API to fetch the latest Hacker News posts. Use the URL relevant to fetching the latest posts, searching for topics and filtering by a minimum number of points.\nExtract and Present Data: Extract the most recent <item> from this result. Get the <link> tag inside it.\nShare the result: Type in just the URL in the answer.\nWhat is the link to the latest Hacker News post mentioning Text Editor having at least 77 points?"
},
{
"file": "E://data science tool//GA4//seventh.py",
"question": "By automating this data retrieval and filtering process, CodeConnect gains several strategic advantages:\n\nTargeted Recruitment: Quickly identify new, promising talent in key regions, allowing for more focused and timely recruitment campaigns.\nCompetitive Intelligence: Stay updated on emerging trends within local developer communities and adjust talent acquisition strategies accordingly.\nEfficiency: Automating repetitive data collection tasks frees up time for recruiters to focus on engagement and relationship-building.\nData-Driven Decisions: Leverage standardized and reliable data to support strategic business decisions in recruitment and market research.\nEnter the date (ISO 8601, e.g. \"2024-01-01T00:00:00Z\") when the newest user joined GitHub.\nSearch using location: and followers: filters, sort by joined descending, fetch the first url, and enter the created_at field. Ignore ultra-new users who JUST joined, i.e. after 3/25/2025, 6:58:39 PM."
},
{
"file": "E://data science tool//GA4//eighth.py",
"question": "Create a scheduled GitHub action that runs daily and adds a commit to your repository. The workflow should:\n\nUse schedule with cron syntax to run once per day (must use specific hours/minutes, not wildcards)\nInclude a step with your email 24f2006438@ds.study.iitm.ac.in in its name\nCreate a commit in each run\nBe located in .github/workflows/ directory\nAfter creating the workflow:\n\nTrigger the workflow and wait for it to complete\nEnsure it appears as the most recent action in your repository\nVerify that it creates a commit during or within 5 minutes of the workflow run\nEnter your repository URL (format: https://github.com/USER/REPO):"
},
{
"file": "E://data science tool//GA4//ninth.py",
"question": "This file '''E://data science tool//GA4/q-extract-tables-from-pdf.pdf''', contains a table of student marks in Maths, Physics, English, Economics, and Biology.\n\nCalculate the total Physics marks of students who scored 69 or more marks in Maths in groups 1-25 (including both groups).\n\nData Extraction:: Retrieve the PDF file containing the student marks table and use PDF parsing libraries (e.g., Tabula, Camelot, or PyPDF2) to accurately extract the table data into a workable format (e.g., CSV, Excel, or a DataFrame).\nData Cleaning and Preparation: Convert marks to numerical data types to facilitate accurate calculations.\nData Filtering: Identify students who have scored marks between 69 and Maths in groups 1-25 (including both groups).\nCalculation: Sum the marks of the filtered students to obtain the total marks for this specific cohort.\nBy automating the extraction and analysis of student marks, EduAnalytics empowers Greenwood High School to make informed decisions swiftly. This capability enables the school to:\n\nIdentify Performance Trends: Quickly spot areas where students excel or need additional support.\nAllocate Resources Effectively: Direct teaching resources and interventions to groups and subjects that require attention.\nEnhance Reporting Efficiency: Reduce the time and effort spent on manual data processing, allowing educators to focus more on teaching and student engagement.\nSupport Data-Driven Strategies: Use accurate and timely data to shape educational strategies and improve overall student outcomes.\nWhat is the total Physics marks of students who scored 69 or more marks in Maths in groups 1-25 (including both groups)?"
},
{
"file": "E://data science tool//GA4//tenth.py",
"question": "As part of the Documentation Transformation Project, you are a junior developer at EduDocs tasked with developing a streamlined workflow for converting PDF files to Markdown and ensuring their consistent formatting. This project is critical for supporting EduDocs' commitment to delivering high-quality, accessible educational resources to its clients.\n\n q-pdf-to-markdown.pdf has the contents of a sample document.\n\nConvert the PDF to Markdown: Extract the content from the PDF file. Accurately convert the extracted content into Markdown format, preserving the structure and formatting as much as possible.\nFormat the Markdown: Use Prettier version 3.4.2 to format the converted Markdown file.\nSubmit the Formatted Markdown: Provide the final, formatted Markdown file as your submission.\nImpact\nBy completing this exercise, you will contribute to EduDocs Inc.'s mission of providing high-quality, accessible educational resources. Automating the PDF to Markdown conversion and ensuring consistent formatting:\n\nEnhances Productivity: Reduces the time and effort required to prepare documentation for clients.\nImproves Quality: Ensures all documents adhere to standardized formatting, enhancing readability and professionalism.\nSupports Scalability: Enables EduDocs to handle larger volumes of documentation without compromising on quality.\nFacilitates Integration: Makes it easier to integrate Markdown-formatted documents into various digital platforms and content management systems.\nWhat is the markdown content of the PDF, formatted with prettier@3.4.2?"
},
{
"file": "E://data science tool//GA5//first.py",
"question": "You need to clean this Excel data and calculate the total margin for all transactions that satisfy the following criteria:\n\nTime Filter: Sales that occurred up to and including a specified date (Mon Jan 03 2022 05:23:44 GMT+0530 (India Standard Time)).\nProduct Filter: Transactions for a specific product (Zeta). (Use only the product name before the slash.)\nCountry Filter: Transactions from a specific country (IN), after standardizing the country names.\nThe total margin is defined as:\n\nTotal Margin\n=\nTotal Sales\n\u2212\nTotal Cost\nTotal Sales\n\nYour solution should address the following challenges:\n\nTrim and Normalize Strings: Remove extra spaces from the Customer Name and Country fields. Map inconsistent country names (e.g., \"USA\", \"U.S.A\", \"US\") to a standardized format.\nStandardize Date Formats: Detect and convert dates from \"MM-DD-YYYY\" and \"YYYY/MM/DD\" into a consistent date format (e.g., ISO 8601).\nExtract the Product Name: From the Product field, extract the portion before the slash (e.g., extract \"Theta\" from \"Theta/5x01vd\").\nClean and Convert Sales and Cost: Remove the \"USD\" text and extra spaces from the Sales and Cost fields. Convert these fields to numerical values. Handle missing Cost values appropriately (50% of Sales).\nFilter the Data: Include only transactions up to and including Mon Jan 03 2022 05:23:44 GMT+0530 (India Standard Time), matching product Zeta, and country IN.\nCalculate the Margin: Sum the Sales and Cost for the filtered transactions. Compute the overall margin using the formula provided.\nBy cleaning the data and calculating accurate margins, RetailWise Inc. can:\n\nImprove Decision Making: Provide clients with reliable margin analyses to optimize pricing and inventory.\nEnhance Reporting: Ensure historical data is consistent and accurate, boosting stakeholder confidence.\nStreamline Operations: Reduce the manual effort needed to clean data from legacy sources.\nDownload the Sales Excel file: 'E://data science tool//GA5//q-clean-up-excel-sales-data.xlsx'\n\nWhat is the total margin for transactions before Mon Jan 03 2022 05:23:44 GMT+0530 (India Standard Time) for Zeta sold in IN (which may be spelt in different ways)?"
},
{
"file": "E://data science tool//GA5//second.py",
"question": "As a data analyst at EduTrack Systems, your task is to process this text file and determine the number of unique students based on their student IDs. This deduplication is essential to:\n\nEnsure Accurate Reporting: Avoid inflated counts in enrollment and performance reports.\nImprove Data Quality: Clean the dataset for further analytics, such as tracking academic progress or resource allocation.\nOptimize Administrative Processes: Provide administrators with reliable data to support decision-making.\nYou need to do the following:\n\nData Extraction: Read the text file line by line. Parse each line to extract the student ID.\nDeduplication: Remove duplicates from the student ID list.\nReporting: Count the number of unique student IDs present in the file.\nBy accurately identifying the number of unique students, EduTrack Systems will:\n\nEnhance Data Integrity: Ensure that subsequent analyses and reports reflect the true number of individual students.\nReduce Administrative Errors: Minimize the risk of misinformed decisions that can arise from duplicate entries.\nStreamline Resource Allocation: Provide accurate student counts for budgeting, staffing, and planning academic programs.\nImprove Compliance Reporting: Ensure adherence to regulatory requirements by maintaining precise student records.\nDownload the text file with student marks \n'E://data science tool//GA5//q-clean-up-student-marks.txt'\nHow many unique students are there in the file?"
},
{
"file": "E://data science tool//GA5//third.py",
"question": "As a data analyst, you are tasked with determining how many successful GET requests for pages under kannada were made on Sunday between 5 and 14 during May 2024. This metric will help:\n\nScale Resources: Ensure that servers can handle the peak load during these critical hours.\nContent Planning: Determine the popularity of regional content to decide on future content investments.\nMarketing Insights: Tailor promotional strategies for peak usage times.\nThis GZipped Apache log file (61MB) 'E:\\data science tool\\GA5\\s-anand.net-May-2024.gz' has 258,074 rows. Each row is an Apache web log entry for the site s-anand.net in May 2024.\n\nEach row has these fields:\n\nIP: The IP address of the visitor\nRemote logname: The remote logname of the visitor. Typically \"-\"\nRemote user: The remote user of the visitor. Typically \"-\"\nTime: The time of the visit. E.g. [01/May/2024:00:00:00 +0000]. Not that this is not quoted and you need to handle this.\nRequest: The request made by the visitor. E.g. GET /blog/ HTTP/1.1. It has 3 space-separated parts, namely (a) Method: The HTTP method. E.g. GET (b) URL: The URL visited. E.g. /blog/ (c) Protocol: The HTTP protocol. E.g. HTTP/1.1\nStatus: The HTTP status code. If 200 <= Status < 300 it is a successful request\nSize: The size of the response in bytes. E.g. 1234\nReferer: The referer URL. E.g. 'https://s-anand.net/'\nUser agent: The browser used. This will contain spaces and might have escaped quotes.\nVhost: The virtual host. E.g. s-anand.net\nServer: The IP address of the server.\nThe fields are separated by spaces and quoted by double quotes (\"). Unlike CSV files, quoted fields are escaped via \" and not \"\". (This impacts 41 rows.)\n\nAll data is in the GMT-0500 timezone and the questions are based in this same timezone.\n\nBy determining the number of successful GET requests under the defined conditions, we'll be able to:\n\nOptimize Infrastructure: Scale server resources effectively during peak traffic times, reducing downtime and improving user experience.\nStrategize Content Delivery: Identify popular content segments and adjust digital content strategies to better serve the audience.\nImprove Marketing Efforts: Focus marketing initiatives on peak usage windows to maximize engagement and conversion.\nWhat is the number of successful GET requests for pages under /kannada/ from 5:00 until before 14:00 on Sundays?"
},
{
"file": "E://data science tool//GA5//fourth.py",
"question": "This GZipped Apache log file (61MB) \"E:\\data science tool\\GA5\\s-anand.net-May-2024.gz\" has 258,074 rows. Each row is an Apache web log entry for the site s-anand.net in May 2024.\n\nEach row has these fields:\n\nIP: The IP address of the visitor\nRemote logname: The remote logname of the visitor. Typically \"-\"\nRemote user: The remote user of the visitor. Typically \"-\"\nTime: The time of the visit. E.g. [01/May/2024:00:00:00 +0000]. Not that this is not quoted and you need to handle this.\nRequest: The request made by the visitor. E.g. GET /blog/ HTTP/1.1. It has 3 space-separated parts, namely (a) Method: The HTTP method. E.g. GET (b) URL: The URL visited. E.g. /blog/ (c) Protocol: The HTTP protocol. E.g. HTTP/1.1\nStatus: The HTTP status code. If 200 <= Status < 300 it is a successful request\nSize: The size of the response in bytes. E.g. 1234\nReferer: The referer URL. E.g. https://s-anand.net/\nUser agent: The browser used. This will contain spaces and might have escaped quotes.\nVhost: The virtual host. E.g. s-anand.net\nServer: The IP address of the server.\nThe fields are separated by spaces and quoted by double quotes (\"). Unlike CSV files, quoted fields are escaped via \" and not \"\". (This impacts 41 rows.)\n\nAll data is in the GMT-0500 timezone and the questions are based in this same timezone.\n\nFilter the Log Entries: Extract only the requests where the URL starts with /carnatic/. Include only those requests made on the specified 2024-05-09.\nAggregate Data by IP: Sum the \"Size\" field for each unique IP address from the filtered entries.\nIdentify the Top Data Consumer: Determine the IP address that has the highest total downloaded bytes. Reports the total number of bytes that this IP address downloaded.\nAcross all requests under carnatic/ on 2024-05-09, how many bytes did the top IP address (by volume of downloads) download?"
},
{
"file": "E://data science tool//GA5//fifth.py",
"question": "As a data analyst at GlobalRetail Insights, you are tasked with extracting meaningful insights from this dataset. Specifically, you need to:\n\nGroup Mis-spelt City Names: Use phonetic clustering algorithms to group together entries that refer to the same city despite variations in spelling. For instance, cluster \"Tokyo\" and \"Tokio\" as one.\nFilter Sales Entries: Select all entries where:\nThe product sold is Bacon.\nThe number of units sold is at least 28.\nAggregate Sales by City: After clustering city names, group the filtered sales entries by city and calculate the total units sold for each city.\nBy performing this analysis, GlobalRetail Insights will be able to:\n\nImprove Data Accuracy: Correct mis-spellings and inconsistencies in the dataset, leading to more reliable insights.\nTarget Marketing Efforts: Identify high-performing regions for the specific product, enabling targeted promotional strategies.\nOptimize Inventory Management: Ensure that inventory allocations reflect the true demand in each region, reducing wastage and stockouts.\nDrive Strategic Decision-Making: Provide actionable intelligence to clients that supports strategic planning and competitive advantage in the market.\nHow many units of Bacon were sold in Beijing on transactions with at least 28 units? 'E:\\data science tool\\GA5\\q-clean-up-sales-data.json'"
},
{
"file": "E://data science tool//GA5//sixth.py",
"question": "As a data recovery analyst at ReceiptRevive Analytics, your task is to develop a program that will:\n\nParse the Sales Data:\nRead the provided JSON file containing 100 rows of sales data. Despite the truncated data (specifically the missing id), you must accurately extract the sales figures from each row.\nData Validation and Cleanup:\nEnsure that the data is properly handled even if some fields are incomplete. Since the id is missing for some entries, your focus will be solely on the sales values.\nCalculate Total Sales:\nSum the sales values across all 100 rows to provide a single aggregate figure that represents the total sales recorded.\nBy successfully recovering and aggregating the sales data, ReceiptRevive Analytics will enable RetailFlow Inc. to:\n\nReconstruct Historical Sales Data: Gain insights into past sales performance even when original receipts are damaged.\nInform Business Decisions: Use the recovered data to understand sales trends, adjust inventory, and plan future promotions.\nEnhance Data Recovery Processes: Improve methods for handling imperfect OCR data, reducing future data loss and increasing data accuracy.\nBuild Client Trust: Demonstrate the ability to extract valuable insights from challenging datasets, thereby reinforcing client confidence in ReceiptRevive's services.\nDownload the data from 'E:\\data science tool\\GA5\\GA5\\q-parse-partial-json.jsonl'.\n\nWhat is the total sales value?"
},
{
"file": "E://data science tool//GA5//seventh.py",
"question": "As a data analyst at DataSure Technologies, you have been tasked with developing a script that processes a large JSON log file and counts the number of times a specific key, represented by the placeholder XF, appears in the JSON structure. Your solution must:\n\nParse the Large, Nested JSON: Efficiently traverse the JSON structure regardless of its complexity.\nCount Key Occurrences: Increment a count only when XF is used as a key in the JSON object (ignoring occurrences of XF as a value).\nReturn the Count: Output the total number of occurrences, which will be used by the operations team to assess the prevalence of particular system events or errors.\nBy accurately counting the occurrences of a specific key in the log files, DataSure Technologies can:\n\nDiagnose Issues: Quickly determine the frequency of error events or specific system flags that may indicate recurring problems.\nPrioritize Maintenance: Focus resources on addressing the most frequent issues as identified by the key count.\nEnhance Monitoring: Improve automated monitoring systems by correlating key occurrence data with system performance metrics.\nInform Decision-Making: Provide data-driven insights that support strategic planning for system upgrades and operational improvements.\nDownload the data from 'E:\\data science tool\\GA5\\q-extract-nested-json-keys.json'\n\nHow many times does XF appear as a key?"
},
{
"file": "E://data science tool//GA5//eighth.py",
"question": "Your task as a data analyst at EngageMetrics is to write a query that performs the following:\n\nFilter Posts by Date: Consider only posts with a timestamp greater than or equal to a specified minimum time (2025-02-06T03:01:57.854Z), ensuring that the analysis focuses on recent posts.\nEvaluate Comment Quality: From these recent posts, identify posts where at least one comment has received more than a given number of useful stars (5). This criterion filters out posts with low or mediocre engagement.\nExtract and Sort Post IDs: Finally, extract all the post_id values of the posts that meet these criteria and sort them in ascending order.\nBy accurately extracting these high-impact post IDs, EngageMetrics can:\n\nEnhance Reporting: Provide clients with focused insights on posts that are currently engaging audiences effectively.\nTarget Content Strategy: Help marketing teams identify trending content themes that generate high-quality user engagement.\nOptimize Resource Allocation: Enable better prioritization for content promotion and further in-depth analysis of high-performing posts.\nWrite a DuckDB SQL query to find all posts IDs after 2025-02-06T03:01:57.854Z with at least 1 comment with 5 useful stars, sorted. The result should be a table with a single column called post_id, and the relevant post IDs should be sorted in ascending order."
},
{
"file": "E://data science tool//GA5//ninth.py",
"question": "Access the Video: Use the provided YouTube link to access the mystery story audiobook.\nConvert to Audio: Extract the audio for the segment between 397.2 and 456.1.\nTranscribe the Segment: Utilize automated speech-to-text tools as needed.\nBy producing an accurate transcript of this key segment, Mystery Tales Publishing will be able to:\n\nBoost Accessibility: Provide high-quality captions and text alternatives for hearing-impaired users.\nEnhance SEO: Improve the discoverability of their content through better keyword indexing.\nDrive Engagement: Use the transcript for social media snippets, summaries, and promotional materials.\nEnable Content Analysis: Facilitate further analysis such as sentiment analysis, topic modeling, and reader comprehension studies.\nWhat is the text of the transcript of this 'Mystery Story Audiobook'['https://youtu.be/NRntuOJu4ok?si=pdWzx_K5EltiPh0Z'] between 397.2 and 456.1 seconds?"
},
{
"file": "E://data science tool//GA5//tenth.py",
"question": "As a digital forensics analyst at PixelGuard Solutions, your task is to reconstruct the original image from its scrambled pieces. You are provided with:\n\nThe 25 individual image pieces (put together as a single image).\nA mapping file detailing the original (row, col) position for each piece and its current (row, col) location.\nYour reconstructed image will be critical evidence in the investigation. Once assembled, the image must be uploaded to the secure case management system for further analysis by the investigative team.\n\nUnderstand the Mapping: Review the provided mapping file that shows how each piece's original coordinates (row, col) relate to its current scrambled position.\nReassemble the Image: Using the mapping, reassemble the 5x5 grid of image pieces to reconstruct the original image. You may use an image processing library (e.g., Python's Pillow, ImageMagick, or a similar tool) to automate the reconstruction process.\nOutput the Reconstructed Image: Save the reassembled image in a lossless format (e.g., PNG or WEBP). Upload the reconstructed image to the secure case management system as required by PixelGuard\u2019s workflow.\nBy accurately reconstructing the scrambled image, PixelGuard Solutions will:\n\nReveal Critical Evidence: Provide investigators with a clear view of the original image, which may contain important details related to the case.\nEnhance Analytical Capabilities: Enable further analysis and digital enhancements that can lead to breakthroughs in the investigation.\nMaintain Chain of Custody: Ensure that the reconstruction process is documented and reliable, supporting the admissibility of the evidence in court.\nImprove Operational Efficiency: Demonstrate the effectiveness of automated image reconstruction techniques in forensic investigations.\nHere is the image. It is a 500x500 pixel image that has been cut into 25 (5x5) pieces:\n\n\n\nHere is the mapping of each piece:'E:\\data science tool\\GA5\\jigsaw.webp'\n\nOriginal Row\tOriginal Column\tScrambled Row\tScrambled Column\n2\t1\t0\t0\n1\t1\t0\t1\n4\t1\t0\t2\n0\t3\t0\t3\n0\t1\t0\t4\n1\t4\t1\t0\n2\t0\t1\t1\n2\t4\t1\t2\n4\t2\t1\t3\n2\t2\t1\t4\n0\t0\t2\t0\n3\t2\t2\t1\n4\t3\t2\t2\n3\t0\t2\t3\n3\t4\t2\t4\n1\t0\t3\t0\n2\t3\t3\t1\n3\t3\t3\t2\n4\t4\t3\t3\n0\t2\t3\t4\n3\t1\t4\t0\n1\t2\t4\t1\n1\t3\t4\t2\n0\t4\t4\t3\n4\t0\t4\t4\nUpload the reconstructed image by moving the pieces from the scrambled position to the original position:"
}
]