2. Vorlesung

2024-10-25 13:28:49 +02:00
parent 9ea256c27e
commit 71b9d91eeb
168 changed files with 1172650 additions and 33 deletions
--- a/Material/wise_24_25/lernmaterial/regex/Mensa
+++ b/Material/wise_24_25/lernmaterial/regex/Mensa
--- a/Material/wise_24_25/lernmaterial/regex/Regular
+++ b/Material/wise_24_25/lernmaterial/regex/Regular
@@ -0,0 +1,850 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "c850ea25-9bde-4feb-a1d0-056c5870d59e",
+   "metadata": {},
+   "source": [
+    "# Regular Expressions (Regex)\n",
+    "\n",
+    "Wir schreiben das Jahr 1950 der Mathematiker __Stephen Cole Kleene__ erfand das Konzept der _Regulären Sprache_. Ein Konzept der theoretischen Informatik zum Beschreiben von syntaktischen Ausdrücken. Damit einhergehend lassen sich durch spezifische ausdrücke, den _Regular Expressions_, verschiedene Formen des _pattern matching_ durchführen. Eine der mit abstand wichtigensten Anwendungsfälle für _regual expressions_ ist das Kompilieren von Quellcode in Maschinensprache. Dabei werden ausdrücke wie _while_, _for_, _if_ etc. formalisiert und können einfacher in Übersetzt (Kompiliert) werden. \n",
+    "\n",
+    "Ein weiterer Nutzen von _regual expressions_ ist das _just-in-time compiling_ von dem auch Python als interpretierte Sprache gebrauch macht. Dabei wird der Quellcode zur Laufzeit für die Maschine übersetzt (meist nicht direkt der Quellcode, sondern eine zwischenstufe die als _Bytecode_ bezeichnet wird). Es wäre sonst nicht möglich so einfach Jupyter Notebooks zu verwenden.\n",
+    "\n",
+    "\n",
+    "Ein paar Fakten zu _regular expressions_:\n",
+    "\n",
+    "- _Regex_ findet sich in vielen Dialekten wieder. (vgl. [Regular Expression Engine Comparison](https://gist.github.com/CMCDragonkai/6c933f4a7d713ef712145c5eb94a1816))\n",
+    "- Die Programmiersprache _Perl_ entstand aus einer Bibliothek von Henry Spencer zum nutzen von _Regex_ \n",
+    "- Eine frei Nutzbare Seite (Achtung mit Werbung) zum testen und prüfen von Regulären Ausdrücken in verschiedenen Dialekten ist [Regex101](https://regex101.com/)\n",
+    "- Jedes Unix(-ähnliche) System (Linux, MacOS, BSD, etc.) hat das Programm _grep (**G**lobal/**R**egular **E**xpression/**P**rint)_ zum analysieren von Datenströmen/Textdateien vorinstalliert.\n",
+    "\n",
+    "\n",
+    "<p><a href=\"https://commons.wikimedia.org/wiki/File:Kleene.jpg#/media/File:Kleene.jpg\"><img src=\"https://upload.wikimedia.org/wikipedia/commons/1/1c/Kleene.jpg\" alt=\"Kleene.jpg\" width=\"10%\"></a><br>By Konrad Jacobs, Erlangen, Copyright is MFO - Mathematisches Forschungsinstitut Oberwolfach,&lt;a rel=\"nofollow\" class=\"external free\" href=\"https://opc.mfo.de/detail?photo_id=2122\"&gt;https://opc.mfo.de/detail?photo_id=2122&lt;/a&gt;, <a href=\"https://creativecommons.org/licenses/by-sa/2.0/de/deed.en\" title=\"Creative Commons Attribution-Share Alike 2.0 de\">CC BY-SA 2.0 de</a>, <a href=\"https://commons.wikimedia.org/w/index.php?curid=12342617\">Link</a></p>"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b689ee80",
+   "metadata": {
+    "nbgrader": {
+     "grade": false,
+     "grade_id": "cell-27269d9f8e03f3e9",
+     "locked": true,
+     "schema_version": 3,
+     "solution": false,
+     "task": false
+    }
+   },
+   "source": [
+    "## Introduction\n",
+    "\n",
+    "You can find _a lot_ of material on regular expressions (regex) online.\n",
+    "Therefore, we will not repeat the background but focus on some practical exercises in this notebook. Some tutorials/useful links can be found below.\n",
+    "\n",
+    "The way that we need and use regular expressions is to describe patterns of characters to match in a given string.\n",
+    "\n",
+    "You can think of them as a string of characters, which describe a certain pattern, e.g., \"four numbers followed by a word of at least 5 characters\".  \n",
+    "This can then be used to test given strings/texts and match the pattern specified in the regex.\n",
+    "This is done using the [Python Standard Library `re`](https://docs.python.org/3/library/re.html).\n",
+    "\n",
+    "\n",
+    "**Material on Regular Expressions:**\n",
+    "\n",
+    "- [RegEx Howto in Python](https://docs.python.org/3/howto/regex.html)\n",
+    "- [RegEx Tutorial](https://www.regular-expressions.info/tutorial.html)\n",
+    "- [Interactive RegEx Tutorial](https://regexone.com/)\n",
+    "- [WikiBook on RegEx](https://en.wikibooks.org/wiki/Regular_Expressions)\n",
+    "- [RegExr: Testing & Visualizing RegEx](https://regexr.com/)\n",
+    "- [Debuggex: Visualization of individual regex as finite state machine](https://www.debuggex.com/)\n",
+    "\n",
+    "**Testing with Regular Expressions:**\n",
+    "- [Regex101](https://regex101.com/)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "8a5d3654",
+   "metadata": {
+    "nbgrader": {
+     "grade": false,
+     "grade_id": "cell-168430a9112ab605",
+     "locked": true,
+     "schema_version": 3,
+     "solution": false,
+     "task": false
+    },
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "import re"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b6ccac77",
+   "metadata": {
+    "nbgrader": {
+     "grade": false,
+     "grade_id": "cell-4c79f2d5a1e62a04",
+     "locked": true,
+     "schema_version": 3,
+     "solution": false,
+     "task": false
+    }
+   },
+   "source": [
+    "## Example 1\n",
+    "The regular expression `Hello [A-Z][a-z]+` specifies a pattern that begins with the literal string `Hello ` and is followed by a capital letter (specified by `[A-Z]`) and at least one small letter. (`[a-z]` describes the lowercase letters and `+` specifies that there is at least one of them)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "id": "7e25056b",
+   "metadata": {
+    "nbgrader": {
+     "grade": false,
+     "grade_id": "cell-98f2d91954c191a3",
+     "locked": true,
+     "schema_version": 3,
+     "solution": false,
+     "task": false
+    },
+    "tags": []
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Testing the string: 'Hello World'\n",
+      "Found pattern at characters: 0 to 11\n",
+      "---------------------------------------------\n",
+      "Testing the string: 'Hello You!'\n",
+      "Found pattern at characters: 0 to 9\n",
+      "---------------------------------------------\n",
+      "Testing the string: 'This does not match the pattern...'\n",
+      "Pattern not found in string.\n",
+      "---------------------------------------------\n",
+      "Testing the string: 'We can also have the Hello World pattern somewhere within the string.'\n",
+      "Found pattern at characters: 21 to 32\n",
+      "---------------------------------------------\n",
+      "Testing the string: 'Hello world does not match'\n",
+      "Pattern not found in string.\n",
+      "---------------------------------------------\n",
+      "Testing the string: 'Hello W does not match either'\n",
+      "Pattern not found in string.\n",
+      "---------------------------------------------\n"
+     ]
+    }
+   ],
+   "source": [
+    "example_re = r'Hello [A-Z][a-z]+'\n",
+    "test_strings = ['Hello World',\n",
+    "                'Hello You!',\n",
+    "                'This does not match the pattern...',\n",
+    "                'We can also have the Hello World pattern somewhere within the string.',\n",
+    "                'Hello world does not match',\n",
+    "                'Hello W does not match either']\n",
+    "\n",
+    "\n",
+    "for test_word in test_strings:\n",
+    "    print(f\"Testing the string: '{test_word}'\")\n",
+    "    match_object = re.search(example_re, test_word)\n",
+    "    if match_object:\n",
+    "        print(f\"Found pattern at characters: {match_object.span()[0]:d} to {match_object.span()[1]:d}\")\n",
+    "    else:\n",
+    "        print(\"Pattern not found in string.\")\n",
+    "    print(\"-\"*45)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5ec979b2",
+   "metadata": {
+    "nbgrader": {
+     "grade": false,
+     "grade_id": "cell-aca8488169bc0df9",
+     "locked": true,
+     "schema_version": 3,
+     "solution": false,
+     "task": false
+    }
+   },
+   "source": [
+    "_Note:_ Since regex often use special characters like backslash `\\`, it is helpful to define them in Python as raw strings, i.e., using a preceding `r` (see `example_re` above)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "820c31ae",
+   "metadata": {
+    "nbgrader": {
+     "grade": false,
+     "grade_id": "cell-4d3281e8922cd534",
+     "locked": true,
+     "schema_version": 3,
+     "solution": false,
+     "task": false
+    }
+   },
+   "source": [
+    "## Task 1\n",
+    "\n",
+    "Write a regular expression `r1` which matches the following words:\n",
+    "- hello\n",
+    "- yellow\n",
+    "- jello"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "id": "e7e426b0",
+   "metadata": {
+    "nbgrader": {
+     "grade": false,
+     "grade_id": "cell-c48986402655ab08",
+     "locked": false,
+     "schema_version": 3,
+     "solution": true,
+     "task": false
+    },
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "### BEGIN SOLUTION ###\n",
+    "r1 = r'.*ello.*'\n",
+    "### END SOLUTION"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "id": "223fa54c",
+   "metadata": {
+    "nbgrader": {
+     "grade": true,
+     "grade_id": "cell-0a761cfdabd44f1b",
+     "locked": true,
+     "points": 1,
+     "schema_version": 3,
+     "solution": false,
+     "task": false
+    },
+    "tags": []
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "<re.Match object; span=(0, 5), match='hello'>\n",
+      "<re.Match object; span=(0, 6), match='yellow'>\n",
+      "<re.Match object; span=(0, 5), match='jello'>\n"
+     ]
+    }
+   ],
+   "source": [
+    "# Test Cell\n",
+    "\n",
+    "test_words = ['hello', 'yellow', 'jello']\n",
+    "for _word in test_words:\n",
+    "    match = re.match(r1, _word)\n",
+    "    print(match)\n",
+    "    if match is None: assert False\n",
+    "    assert match[0] == _word"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c3086449",
+   "metadata": {
+    "nbgrader": {
+     "grade": false,
+     "grade_id": "cell-bea454dd22c7499a",
+     "locked": true,
+     "schema_version": 3,
+     "solution": false,
+     "task": false
+    }
+   },
+   "source": [
+    "## Example 2\n",
+    "\n",
+    "In the first example, we have use the `[A-Z]` and `[a-z]` patterns to specify capital and lowercase letters, respectively.\n",
+    "There are a lot more of such predefined patterns, e.g., `[0-9]` or `\\d` for matching a (single-digit) number.\n",
+    "\n",
+    "A list of these special characters can be found in the [`re` documentation](https://docs.python.org/3/library/re.html#regular-expression-syntax).\n",
+    "\n",
+    "\n",
+    "The following regex can be used to match a word with at least 3 letters (both capital and lowercase are accepted), followed by a two-digit number, a comma, and a four-digit number where the first number is either a one or a two."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "id": "5a02b00a",
+   "metadata": {
+    "nbgrader": {
+     "grade": false,
+     "grade_id": "cell-1a01734fc48cc488",
+     "locked": true,
+     "schema_version": 3,
+     "solution": false,
+     "task": false
+    },
+    "tags": []
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Testing the string: 'November 21, 2022'\n",
+      "Found pattern at characters: 0 to 17\n",
+      "---------------------------------------------\n",
+      "Testing the string: 'Jan 01, 1970'\n",
+      "Found pattern at characters: 0 to 12\n",
+      "---------------------------------------------\n",
+      "Testing the string: 'JuNE 45, 4521'\n",
+      "Pattern not found in string.\n",
+      "---------------------------------------------\n",
+      "Testing the string: 'Abc 1, 2020'\n",
+      "Pattern not found in string.\n",
+      "---------------------------------------------\n",
+      "Testing the string: 'July 02, 90'\n",
+      "Pattern not found in string.\n",
+      "---------------------------------------------\n"
+     ]
+    }
+   ],
+   "source": [
+    "example_re2 = r'[A-Za-z]{3,} \\d{2}, [12]\\d{3}'\n",
+    "\n",
+    "test_strings = ['November 21, 2022',\n",
+    "                'Jan 01, 1970',\n",
+    "                'JuNE 45, 4521',\n",
+    "                'Abc 1, 2020',\n",
+    "                'July 02, 90']\n",
+    "\n",
+    "\n",
+    "for test_word in test_strings:\n",
+    "    print(f\"Testing the string: '{test_word}'\")\n",
+    "    match_object = re.search(example_re2, test_word)\n",
+    "    if match_object:\n",
+    "        print(f\"Found pattern at characters: {match_object.span()[0]:d} to {match_object.span()[1]:d}\")\n",
+    "    else:\n",
+    "        print(\"Pattern not found in string.\")\n",
+    "    print(\"-\"*45)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b565244d",
+   "metadata": {
+    "nbgrader": {
+     "grade": false,
+     "grade_id": "cell-0abe35e63e18f0d9",
+     "locked": true,
+     "schema_version": 3,
+     "solution": false,
+     "task": false
+    }
+   },
+   "source": [
+    "## Task 2\n",
+    "\n",
+    "Write a regular expression `r2` that only matches dates in the ISO format `YYYY-MM-DD`.\n",
+    "It should _only_ match a string, if the whole string is a date. If the date is only part of the string, it should *not* match it.\n",
+    "\n",
+    "_Hint:_ You can use `(a[0-9]|b[01])` to specify the pattern that matches either an `a` followed by a single digit **or** a `b` followed by either `0` or `1`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "id": "1e2bb2bd",
+   "metadata": {
+    "nbgrader": {
+     "grade": false,
+     "grade_id": "cell-c264d2e9cac73db0",
+     "locked": false,
+     "schema_version": 3,
+     "solution": true,
+     "task": false
+    },
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "### BEGIN SOLUTION\n",
+    "r2 = r'^(\\d{4})-(0[1-9]|1[012])-(0[1-9]|[12][0-9]|3[01])$'\n",
+    "### END SOLUTION"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "id": "5bbd62f5",
+   "metadata": {
+    "nbgrader": {
+     "grade": true,
+     "grade_id": "cell-c80282e7adcccb6a",
+     "locked": true,
+     "points": 1,
+     "schema_version": 3,
+     "solution": false,
+     "task": false
+    },
+    "tags": []
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "<re.Match object; span=(0, 10), match='1970-01-01'>\n",
+      "<re.Match object; span=(0, 10), match='1999-12-31'>\n",
+      "<re.Match object; span=(0, 10), match='2000-02-28'>\n",
+      "<re.Match object; span=(0, 10), match='2022-12-09'>\n",
+      "<re.Match object; span=(0, 10), match='4250-09-10'>\n"
+     ]
+    }
+   ],
+   "source": [
+    "# Test Cell\n",
+    "\n",
+    "# The following strings should be matched\n",
+    "dates = [\"1970-01-01\", \"1999-12-31\", \"2000-02-28\", \"2022-12-09\", \"4250-09-10\"]\n",
+    "for _date in dates:\n",
+    "    match = re.match(r2, _date)\n",
+    "    print(match)\n",
+    "    if match is None: assert False\n",
+    "    assert match[0] == _date"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "id": "0d8e4b98",
+   "metadata": {
+    "nbgrader": {
+     "grade": true,
+     "grade_id": "cell-e46e8f78178eb2b7",
+     "locked": true,
+     "points": 1,
+     "schema_version": 3,
+     "solution": false,
+     "task": false
+    },
+    "tags": []
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "None\n",
+      "None\n",
+      "None\n",
+      "None\n",
+      "None\n",
+      "None\n",
+      "None\n"
+     ]
+    }
+   ],
+   "source": [
+    "# Test Cell\n",
+    "\n",
+    "# The following strings should not be matched\n",
+    "no_dates = [\"1970-01-32\", \"abcd-12-31\", \"2000/02/28\", \"2022-14-20\", \"2002.12.02\", \"1234-2-1\", \"77-09-02\"]\n",
+    "for _date in no_dates:\n",
+    "    match = re.match(r2, _date)\n",
+    "    print(match)\n",
+    "    if match is not None: assert False"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "id": "b72e49ac",
+   "metadata": {
+    "nbgrader": {
+     "grade": true,
+     "grade_id": "cell-48f63facb72e517a",
+     "locked": true,
+     "points": 1,
+     "schema_version": 3,
+     "solution": false,
+     "task": false
+    },
+    "tags": []
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "None\n",
+      "None\n"
+     ]
+    }
+   ],
+   "source": [
+    "# Test Cell\n",
+    "\n",
+    "# The following strings should not be matched\n",
+    "no_match = [\"This text contains the date 1999-12-31 but it should not be matched.\",\n",
+    "            \"2020-02-20 is a date in the beginning of the string\"]\n",
+    "for _text in no_match:\n",
+    "    match = re.match(r2, _text)\n",
+    "    print(match)\n",
+    "    if match is not None: assert False"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ce239065",
+   "metadata": {
+    "nbgrader": {
+     "grade": false,
+     "grade_id": "cell-31d99fd79761847d",
+     "locked": true,
+     "schema_version": 3,
+     "solution": false,
+     "task": false
+    }
+   },
+   "source": [
+    "## Example 3\n",
+    "\n",
+    "You can save parts of the found pattern in a group to have access to it later.\n",
+    "\n",
+    "In the following example, we modify the regex from [Example 2](#Example-2) to capture the individual parts into groups."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 12,
+   "id": "89ba4f51",
+   "metadata": {
+    "nbgrader": {
+     "grade": false,
+     "grade_id": "cell-7d320972e47ae922",
+     "locked": true,
+     "schema_version": 3,
+     "solution": false,
+     "task": false
+    },
+    "tags": []
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "('November', '21', '2022')\n"
+     ]
+    }
+   ],
+   "source": [
+    "example_re3 = r'([A-Za-z]{3,}) (\\d{2}), ([12]\\d{3})'\n",
+    "\n",
+    "test_string = 'November 21, 2022'\n",
+    "match = re.search(example_re3, test_string)\n",
+    "print(match.groups())"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "393ff9c6",
+   "metadata": {
+    "nbgrader": {
+     "grade": false,
+     "grade_id": "cell-68cbff25c972809f",
+     "locked": true,
+     "schema_version": 3,
+     "solution": false,
+     "task": false
+    }
+   },
+   "source": [
+    "## Task 3\n",
+    "\n",
+    "Write a regular expression `r3` which matches text between `<li>...</li>` tags and adds the found text to a group. This should be the only capturing group!\n",
+    "\n",
+    "_Hint:_ You might want to check how to define non-capturing groups and non-greedy matching."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 13,
+   "id": "c93ee04d",
+   "metadata": {
+    "nbgrader": {
+     "grade": false,
+     "grade_id": "cell-420f01248c7eddeb",
+     "locked": false,
+     "schema_version": 3,
+     "solution": true,
+     "task": false
+    },
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "### BEGIN SOLUTION\n",
+    "r3 = r'<li>((?:.|\\n)*?)</li>'\n",
+    "### END SOLUTION"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 14,
+   "id": "37681e3d",
+   "metadata": {
+    "nbgrader": {
+     "grade": true,
+     "grade_id": "cell-488cd60d5bed2019",
+     "locked": true,
+     "points": 2,
+     "schema_version": 3,
+     "solution": false,
+     "task": false
+    },
+    "tags": []
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "['Item 1', '\\nItem 2', '\\n              Item 3\\n          ']\n"
+     ]
+    }
+   ],
+   "source": [
+    "# Test Cell\n",
+    "\n",
+    "test_html = \"\"\"\n",
+    "<html>\n",
+    "    <head>\n",
+    "      <title>Test HTML</title>\n",
+    "    </head>\n",
+    "    <body>\n",
+    "      <h1>Heading 1</h1>\n",
+    "      <ol>\n",
+    "          <li>Item 1</li>\n",
+    "          <li>\n",
+    "Item 2</li>\n",
+    "          <li>\n",
+    "              Item 3\n",
+    "          </li>\n",
+    "      </ol>\n",
+    "    </body>\n",
+    "</html>\n",
+    "\"\"\"\n",
+    "\n",
+    "matches = re.findall(r3, test_html)\n",
+    "print(matches)\n",
+    "assert len(matches) == 3\n",
+    "assert matches == ['Item 1', '\\nItem 2', '\\n              Item 3\\n          ']"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4370f245",
+   "metadata": {
+    "nbgrader": {
+     "grade": false,
+     "grade_id": "cell-53152b78922af0b1",
+     "locked": true,
+     "schema_version": 3,
+     "solution": false,
+     "task": false
+    }
+   },
+   "source": [
+    "## Task 4\n",
+    "\n",
+    "Write a regular expression `r4` to find all words in a string that are acronmyms, i.e., written in all capital letters, and all words that have a capital letter in them which is not at the first position.\n",
+    "\n",
+    "Next, write a function `shield_acronyms` that uses this regular expression and adds curly brackets `{...}` around the found words and returns a new string.\n",
+    "\n",
+    "_Hint:_ You can use the [`re.sub` function](https://docs.python.org/3/library/re.html#re.sub) for this task."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 15,
+   "id": "ed6b99f1",
+   "metadata": {
+    "nbgrader": {
+     "grade": false,
+     "grade_id": "cell-545bc5786ee8e947",
+     "locked": false,
+     "schema_version": 3,
+     "solution": true,
+     "task": false
+    },
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "# Define r4 here\n",
+    "### BEGIN SOLUTION\n",
+    "r4 = r'([0-9A-Z]+\\b|[a-zA-Z]+[A-Z0-9]+[a-zA-Z\\b]*)'\n",
+    "### END SOLUTION"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 16,
+   "id": "504cd6d3",
+   "metadata": {
+    "nbgrader": {
+     "grade": true,
+     "grade_id": "cell-900922b2243d5a55",
+     "locked": true,
+     "points": 1,
+     "schema_version": 3,
+     "solution": false,
+     "task": false
+    },
+    "tags": []
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "['MIMO']\n",
+      "['M2M']\n",
+      "['IN', 'mmWave']\n",
+      "['5G', 'SHIELded']\n",
+      "[]\n"
+     ]
+    }
+   ],
+   "source": [
+    "# Test Cell\n",
+    "\n",
+    "test_words = [(\"MIMO\", [\"MIMO\"]),\n",
+    "              (\"M2M\", [\"M2M\"]),\n",
+    "              (r\"Acro IN mmWave Title\", [\"IN\", \"mmWave\"]),\n",
+    "              (r\"5G should be SHIELded\", [\"5G\", \"SHIELded\"]),\n",
+    "              (r\"Regular title with Names\", []),\n",
+    "             ]\n",
+    "for text, matches in test_words:\n",
+    "    result = re.findall(r4, text)\n",
+    "    print(result)\n",
+    "    assert result == matches"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 17,
+   "id": "f955d228",
+   "metadata": {
+    "nbgrader": {
+     "grade": false,
+     "grade_id": "cell-2c36d0ef19bac550",
+     "locked": false,
+     "schema_version": 3,
+     "solution": true,
+     "task": false
+    },
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "def shield_acronyms(text: str) -> str:\n",
+    "    ### BEGIN SOLUTION\n",
+    "    r4 = r4 = r'([0-9A-Z]+\\b|[a-zA-Z]+[A-Z0-9]+[a-zA-Z\\b]*)'\n",
+    "    new_text = re.sub(r4, r'{\\g<0>}', text)\n",
+    "    return new_text\n",
+    "    ### END SOLUTION"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 18,
+   "id": "3b71b683",
+   "metadata": {
+    "nbgrader": {
+     "grade": true,
+     "grade_id": "cell-550110e95fccc717",
+     "locked": true,
+     "points": 2,
+     "schema_version": 3,
+     "solution": false,
+     "task": false
+    },
+    "tags": []
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "{MIMO}\n",
+      "{M2M}\n",
+      "Acro {IN} {mmWave} Title\n",
+      "{5G} should be {SHIELded}\n",
+      "Regular title with Names\n"
+     ]
+    }
+   ],
+   "source": [
+    "# Test Cell\n",
+    "\n",
+    "test_words = [(\"MIMO\", r\"{MIMO}\"),\n",
+    "              (\"M2M\", r\"{M2M}\"),\n",
+    "              (r\"Acro IN mmWave Title\", r\"Acro {IN} {mmWave} Title\"),\n",
+    "              (r\"5G should be SHIELded\", r\"{5G} should be {SHIELded}\"),\n",
+    "              (r\"Regular title with Names\", r'Regular title with Names'),\n",
+    "             ]\n",
+    "for text, expected in test_words:\n",
+    "    result = shield_acronyms(text)\n",
+    "    print(result)\n",
+    "    assert result == expected"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "41222440-923d-44a4-8dc7-d7a6309d4e0a",
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "celltoolbar": "Create Assignment",
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.8.16"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
--- a/Material/wise_24_25/lernmaterial/regex/Web
+++ b/Material/wise_24_25/lernmaterial/regex/Web
@@ -0,0 +1,171 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "8f7ee9ed",
+   "metadata": {
+    "nbgrader": {
+     "grade": false,
+     "grade_id": "cell-fd19a00f47ad1a34",
+     "locked": true,
+     "schema_version": 3,
+     "solution": false,
+     "task": false
+    }
+   },
+   "source": [
+    "- [Beautiful Soup Documentation](https://beautiful-soup-4.readthedocs.io/en/latest/)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "ebaad76f",
+   "metadata": {
+    "nbgrader": {
+     "grade": false,
+     "grade_id": "cell-9138585fc343d8a7",
+     "locked": true,
+     "schema_version": 3,
+     "solution": false,
+     "task": false
+    }
+   },
+   "outputs": [],
+   "source": [
+    "from bs4 import BeautifulSoup"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1336423a",
+   "metadata": {
+    "nbgrader": {
+     "grade": false,
+     "grade_id": "cell-235041934d89cb33",
+     "locked": true,
+     "schema_version": 3,
+     "solution": false,
+     "task": false
+    }
+   },
+   "source": [
+    "## Example of Parsing a Website"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "id": "8bf54e3b",
+   "metadata": {
+    "nbgrader": {
+     "grade": false,
+     "grade_id": "cell-c6761d82e17018f0",
+     "locked": true,
+     "schema_version": 3,
+     "solution": false,
+     "task": false
+    }
+   },
+   "outputs": [],
+   "source": [
+    "with open(\"example.html\") as html_file:\n",
+    "    soup = BeautifulSoup(html_file)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 13,
+   "id": "14566e25",
+   "metadata": {
+    "nbgrader": {
+     "grade": false,
+     "grade_id": "cell-93b2d5726c5469a8",
+     "locked": true,
+     "schema_version": 3,
+     "solution": false,
+     "task": false
+    }
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "<title>Test HTML</title>\n",
+      "Test HTML\n",
+      "------------------------------\n",
+      "Print all list elements on the website:\n",
+      "Item 1\n",
+      "\n",
+      "Item 2\n",
+      "\n",
+      "              Item 3\n",
+      "          \n"
+     ]
+    }
+   ],
+   "source": [
+    "print(soup.title)\n",
+    "print(soup.title.get_text())\n",
+    "\n",
+    "\n",
+    "print(\"-\"*30)\n",
+    "print(\"Print all list elements on the website:\")\n",
+    "\n",
+    "li = soup.find_all(\"li\")\n",
+    "for element in li:\n",
+    "    print(element.get_text()) # you can use .strip() to get rid of trailing whitespace"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 14,
+   "id": "d64b13b5",
+   "metadata": {
+    "nbgrader": {
+     "grade": false,
+     "grade_id": "cell-3a99db5db1577717",
+     "locked": true,
+     "schema_version": 3,
+     "solution": false,
+     "task": false
+    }
+   },
+   "outputs": [],
+   "source": [
+    "import requests"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "4bdf24a4",
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "celltoolbar": "Create Assignment",
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.10.8"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
--- a/Material/wise_24_25/lernmaterial/regex/example.html
+++ b/Material/wise_24_25/lernmaterial/regex/example.html
@@ -0,0 +1,16 @@
+<html>
+    <head>
+      <title>Test HTML</title>
+    </head>
+    <body>
+      <h1>Heading 1</h1>
+      <ol class="mylist">
+          <li>Item 1</li>
+          <li>
+Item 2</li>
+          <li>
+              Item 3
+          </li>
+      </ol>
+    </body>
+</html>