Category Archives: Regular Expressions

The Stack Overflow Regular Expressions FAQ

One of the best regular expression FAQs out there: The Stack Overflow Regex-Fu FAQ Mega Wiki

Advertisements

The full user-authentication lifecycle in Django, with testing at every step — The step-by-step tutorial I wish I had (part one)

This tutorial demonstrates the entire user-authentication lifecycle in Django, with testing at every step:

  1. Create an account, both with and without email confirmation
  2. Login, including a forgot-my-password reset (via email), and a keep-me-logged-in checkbox
  3. A page viewable when logged out, but containing extra information when logged-in
  4. A page viewable only when logged in: Your user “profile”.
  5. Change your password
  6. Logout
  7. Delete your account

[The chapters on creating and deleting an account (chapters nine and ten), and changing-your-password (chapter eight) are not yet written. Resetting your password is in chapter seven.]

[TOC: one, two, three, four, five, six, seven, eight, nine, ten]

The goal of this tutorial is to require as trivial a website as possible, before proceeding onto authentication. In contrast, the How To Tango With Django tutorial does not address authentication until chapter eight, and Mike Hibbert’s video tutorial doesn’t talk about it until chapter nine. It’s not possible to start in the middle, as they both expect that all previous steps were followed. (Other available tutorials.)

Warnings:

  • The testing code increases the amount of code, and the number of steps in this tutorial, by a lot. So creating the demo website is more substantial than I just implied. However, if you did skip the testing–which you shouldn’t–implementing the website itself would be much faster. Were proper testing in either the Tango or Hibbert tutorials, they would be a whole lot longer.
  • The one type of testing I can’t demonstrate in this tutorial is end-user simulation/integration testing with Selenium. My webserver is text-only. This leaves the JavaScript portions of this tutorial, such as client-side make-sure-the-passwords-match verification, untested.

The trivial website: Screenshots

Before doing authentication, we need to create something. Let’s take a look at that something before actually building it. It will contain these two simple views:

screenshot

The main "aggregate" page which is publicly viewable, but contains extra information for logged-in users, including a link to their private profile page:

screenshot

The profile page, which displays every non-password field in the User model, plus the one extra field that makes up the entirety of our model: birth year.

Installation

Here is my setup. The only differences for this tutorial are:

  • The virtualenv base directory is
        /home/myname/django_auth_lifecycle/djauth_venv/
  • The project base directory is
        /home/myname/django_auth_lifecycle/djauth_root/

The reason for this structure is so that the entire django_auth_lifecycle directory can be a Git repository. It is also where I place non-Django files, such as some scripts, some personal-only files, the wordpress posts, including their java builders. In actuality, each post is its own repository. When one part is done–or if there’s a bug–all changes are copied over to all future parts. Attempting to have all posts in one monster repository is impossible, given my current beginner-level Git skills.

The steps I took, which you will need to tailor to your environment:

  1. mkdir -p /home/myname/django_auth_lifecycle/djauth_venv/
  2. sudo virtualenv -p /usr/bin/python3.4 /home/myname/django_auth_lifecycle/djauth_venv/
  3. source /home/myname/django_auth_lifecycle/djauth_venv/bin/activate
  4. sudo /home/myname/django_auth_lifecycle/djauth_venv/bin/pip install django
  5. sudo /home/myname/django_auth_lifecycle/djauth_venv/bin/pip install gunicorn
  6. sudo /home/myname/django_auth_lifecycle/djauth_venv/bin/pip install psycopg2     (this step has a lot of output)
  7. sudo chown -R myname /home/myname/django_auth_lifecycle/

Install a new Django project

  1. Start your virtualenv:
        source /home/myname/django_auth_lifecycle/djauth_venv/bin/activate     (exit it with deactivate)
  2. Create the project directory:
        mkdir /home/myname/django_auth_lifecycle/djauth_root/
  3. Create the project (this is a long command that belongs on a single line) :
        django-admin.py startproject django_auth_lifecycle /home/myname/django_auth_lifecycle/djauth_root

  4. Create the sub-application:
    1. cd /home/myname/django_auth_lifecycle/djauth_root/
    2. python manage.py startapp auth_lifecycle

     
    This and the previous command create the following (items unused by this tutorial are omitted):

    $ tree /home/myname/django_auth_lifecycle/djauth_root/
    +-- auth_lifecycle
    |   +-- admin.py
    |   +-- models.py
    |   +-- views.py
    +-- django_auth_lifecycle
    |   +-- settings.py
    |   +-- urls.py
    +-- manage.py
  5. In
        /home/myname/django_auth_lifecycle/djauth_root/django_auth_lifecycle/settings.py

    1. Add 'auth_lifecycle' to INSTALLED_APPS
    2. Configure your database by overwriting the current value with
      DATABASES = {
          'default': {
              'ENGINE': 'django.db.backends.postgresql_psycopg2',
              'NAME': 'database_name_here',
              'USER': 'database_username_here',
              'PASSWORD': 'database_user_password_goes_here',
              'HOST': "localhost",  # Empty for localhost through domain sockets or
                                    # '127.0.0.1' for localhost through TCP.
              'PORT': '',           # Set to empty string for default.
          }
      }
  6. If you were not yet prompted to create a superuser, do it now:
    1. cd /home/myname/django_auth_lifecycle/djauth_root/
    2. python manage.py createsuperuser

    The rest of this tutorial expects the superuser’s username and password to both be "admin"

The model

The only thing in our model is the user’s year-of-birth, which will be stored in a UserProfile model that is linked to the default Django User model. Although I’ve added in some validation, a simpler alternative without it is below.

Replace the contents of
    /home/myname/django_auth_lifecycle/djauth_root/auth_lifecycle/models.py
with

"""
Defines a single extra user-profile field for the user-authentication
lifecycle demo project:
    - Birth year, which must be between <link to MIN_BIRTH_YEAR> and
   <link to MAX_BIRTH_YEAR>, inclusive.
"""
from datetime                   import datetime
from django.contrib.auth.models import User
from django.core.exceptions     import ValidationError
from django.db                  import models

OLDEST_EVER_AGE     = 127  #:Equal to `127`
YOUNGEST_ALLOWED_IN_SYSTEM_AGE = 13   #:Equal to `13`
MAX_BIRTH_YEAR      = datetime.now().year - YOUNGEST_ALLOWED_IN_SYSTEM_AGE
"""Most recent allowed birth year for (youngest) users."""
MIN_BIRTH_YEAR      = datetime.now().year - OLDEST_EVER_AGE
"""Most distant allowed birth year for (oldest) users."""

def _validate_birth_year(birth_year_str):
    """Validator for <link to UserProfile.birth_year>, ensuring the
        selected year is between <link to OLDEST_EVER_AGE> and
        <link to MAX_BIRTH_YEAR>, inclusive.
        Raises:
            ValidationError: When the selected year is invalid.

        - https://docs.djangoproject.com/en/1.7/ref/validators/

        I am a recovered Hungarian Notation junkie (I come from Java). I
        stopped using it long before I started with Python. In this
        particular function, however, because of the necessary cast, it's
        appropriate.
    """
    birth_year_int = -1
    try:
        birth_year_int = int(str(birth_year_str).strip())
    except TypeError:
        raise ValidationError(u'"{0}" is not an integer'.format(birth_year_str))

    if  not (MIN_BIRTH_YEAR <= birth_year_int <= MAX_BIRTH_YEAR):
        message = (u'{0} is an invalid birth year.'
                   u'Must be between {1} and {2}, inclusive')
        raise ValidationError(message.format(
            birth_year_str, MIN_BIRTH_YEAR, MAX_BIRTH_YEAR))
    #It's all good.

class UserProfile(models.Model):
    """Extra information about a user: Birth year.

    ---NOTES---

    Useful related SQL:
    - `select id from auth_user where username <> 'admin';`
    - `select * from auth_lifecycle_userprofile where user_id=(x,x,...);`
    """
    # This line is required. Links UserProfile to a User model instance.
    user = models.OneToOneField(User, related_name="profile")

    # The additional attributes we wish to include.
    birth_year = models.IntegerField(
        blank=True,
        verbose_name="Year you were born",
        validators=[_validate_birth_year])

    # Override the __str__() method to return out something meaningful
    def __str__(self):
        return self.user.username

Register it into the admin app by replacing the contents of
    /home/myname/django_auth_lifecycle/djauth_root/auth_lifecycle/admin.py
with

from django.contrib import admin
from .models import UserProfile

admin.site.register(UserProfile)

and then sync it to the database:

  1. source /home/myname/django_auth_lifecycle/djauth_venv/bin/activate
  2. cd /home/myname/django_auth_lifecycle/djauth_root/
  3. python manage.py makemigrations
  4. python manage.py migrate

The same model with no validation:

"""
Defines a single extra user-profile field for the user-authentication
lifecycle demo project: Birth year. There is no validation on this field.
"""
from django.contrib.auth.models import User
from django.db                  import models

class UserProfile(models.Model):
    """
    Extra information about a user: Birth year.

    ---NOTES---

    Useful related SQL:
    - `select id from auth_user where username <> 'admin';`
    - `select * from auth_lifecycle_userprofile where user_id=(x,x,...);`
    """
    # This line is required. Links UserProfile to a User model instance.
    user = models.OneToOneField(User, related_name="profile")

    # The additional attributes we wish to include.
    birth_year = models.IntegerField(
        blank=True,
        verbose_name="Year you were born")

    # Override the __str__() method to return out something meaningful
    def __str__(self):
        return self.user.username

Utilities needed by future tests

There’s nothing to test yet. However, we can already create some very useful utilities for future tests: creating test-users in bulk, logging a user in, finding specific text in the html, and debugging. This is also a good spot to place some more generic testing documentation.

The tests require Factory Boy to create its demo data (instead of creating it manually). To install it:

  1. source /home/myname/django_auth_lifecycle/djauth_venv/bin/activate
  2. sudo /home/myname/django_auth_lifecycle/djauth_venv/bin/pip install factory_boy

Save the following as
    /home/myname/django_auth_lifecycle/djauth_root/auth_lifecycle/test__utilities.py

"""
Utilities used by testing code throughout the authentication-lifecycle and
testing tutorial.

DEPENDS ON TEST:  *nothing* (must not depend on any test_*.py file)
DEPENDED ON TEST: test__profile.py

--- Generic information on running tests ---

To run a single test:
    1. source /home/myname/django_files/django_auth_lifecycle/djauth_venv/bin/activate
    2. cd /home/myname/django_files/django_auth_lifecycle/djauth_root/
    3. python -Wall manage.py test auth_lifecycle.test__file_name

To run all tests:
    python -Wall manage.py test auth_lifecycle

Running tests documentation:
- https://docs.djangoproject.com/en/1.7/topics/testing/overview/#running-tests

Information on '-Wall' is at the bottom of that same section. If the
output is too verbose, try it again without '-Wall'.

If a test fails because the test database cannot be created, grant your
database user creation privileges:
- http://dba.stackexchange.com/questions/33285/how-to-i-grant-a-user-account-permission-to-create-databases-in-postgresql

pylint auth_lifecycle.test__utilities > pylint_output.txt
pylint auth_lifecycle.test__view_birth_stats > pylint_output.txt
pylint auth_lifecycle.test__view_user_profile > pylint_output.txt
"""
from .models                    import MIN_BIRTH_YEAR
from auth_lifecycle.models      import UserProfile
from django.contrib.auth.models import User
from django.test                import TestCase
import factory

TEST_USER_COUNT = 5
"""The number of test users to create. Equal to `5`."""
TEST_PASSWORD = 'password123abc'
"""The password shared by all test users. Equal to `'password123abc'`."""

class UserProfileFactory(factory.django.DjangoModelFactory):
    """
    Creates `UserProfile`-s, where each user has a unique birth year,
    starting with <link to .models.MIN_BIRTH_YEAR>.

    *Warning*: Creating more than
        MAX_BIRTH_YEAR - MIN_BIRTH_YEAR
     users will cause a ValidationError.
    """
    #Uncommenting this line would allow you to directly create a
    #UserProfile, which would then automatically create a User.
    #- Docs: http://factoryboy.readthedocs.org/en/latest/reference.html#subfactory
    #user = factory.SubFactory('auth_lifecycle.test__utilities.UserFactory', profile=None)
    class Meta:
        model = UserProfile

    #factory.Sequence always starts at one. This starts it at
    #MIN_BIRTH_YEAR.
    #http://factoryboy.readthedocs.org/en/latest/reference.html#sequence
    #http://stackoverflow.com/questions/15402256/how-to-pass-in-a-starting-sequence-number-to-a-django-factoryboy-factory
    birth_year = factory.Sequence(lambda n: n + MIN_BIRTH_YEAR - 1)

class UserFactory(factory.django.DjangoModelFactory):
    """
    Creates `User`-s and its corresponding `UserProfile`-s. Each user has
    the same attributes, but with a unique sequence number, starting with
    one.

    See <link to TEST_PASSWORD>.
    """
    class Meta:
        model = User
    #Automatically create a profile when the User is created.
    #- Docs: http://factoryboy.readthedocs.org/en/latest/reference.html?highlight=subfactory#relatedfactory
    profile = factory.RelatedFactory(UserProfileFactory, 'user')

    username = factory.Sequence(lambda n: 'test_username{}'.format(n))
    first_name = factory.Sequence(lambda n: 'test_first_name{}'.format(n))
    last_name = factory.Sequence(lambda n: 'test_last_name{}'.format(n))
    email = factory.Sequence(lambda n: 'test_email{}@example.com'.format(n))

    #http://factoryboy.readthedocs.org/en/latest/reference.html#postgenerationmethodcall
    #See Django mention at the bottom of that documentation section.
    password = factory.PostGenerationMethodCall('set_password', TEST_PASSWORD)

def create_insert_test_users():
    """
    Insert <link to TEST_USER_COUNT> test users into the database. I don't
    understand why, but even though this is called for every test, via
    `setUp`, this does *not* create more than `TEST_USER_COUNT` users.
    Use the debugging statements to prove this.
    """

    #print('a User.objects.count()=' + str(User.objects.count()))

    #http://factoryboy.readthedocs.org/en/latest/reference.html?highlight=create#factory.create_batch
    UserFactory.create_batch(TEST_USER_COUNT)

    #print('b User.objects.count()=' + str(User.objects.count()))

def login_get_next_user(test_instance):
    """
    Log in the next test user, assert it succeeded, and return the `User`
    object.
    """
    test_instance.client.logout()

    test_user = UserFactory()
    #debug_test_user(test_user, prefix='Attempting to login:')

    did_login_succeed = test_instance.client.login(
        username=test_user.username,
        password=TEST_PASSWORD)
    test_instance.assertTrue(did_login_succeed)

    return  test_user

def assert_attr_val_in_content(
        test_instance, attribute_name, expected_value, page_content_str):
    """A specific attribute should be somewhere in the html."""
    #print('assert_attr_val_in_content: expected_value=' + expected_value)
    test_instance.assertTrue(str(expected_value) in page_content_str)

def debug_test_user(test_user, prefix=''):
    """
    Print all user attributes to standard out, except password.

    Parameters:
    - prefix: Defaults to `''`. If not the empty string, printed before
    the user information
    """
    if  prefix is not '':
        print(prefix)

    profile = test_user.profile
    print('test_user.id=' + str(test_user.id))
    print('   username=' + test_user.username + ', password=' + TEST_PASSWORD)
    print('   first_name=' + test_user.first_name + ', last_name=' + test_user.last_name)
    print('   email=' + test_user.email)
    print('   profile=' + str(profile))
    print('      profile.birth_year=' + str(profile.birth_year))

That’s it for now.

In the next post, well implement and test the first of two views: The private user-profile page.

[TOC: one, two, three, four, five, six, seven, eight, nine, ten]

At this point, it would be a good idea to backup your files.

…to be continued…

(cue cliffhanger segue music)

Using regular expressions to validate a numeric range

To be clear: When a simple if statement will suffice

if(num < -2055  ||  num > 2055)  {
   throw  new IllegalArgumentException("num (" + num + ") must be between -2055 and 2055");
}

using regular expressions for validating numeric ranges is not recommended.

In addition, since regular expressions analyze strings, numbers must first be translated to a string before they can be tested (an exception is when the number happens to already be a string, such as when getting user input from the console).

(To ensure the string is a number to begin with, you could use org.apache.commons.lang3.math.NumberUtils#isNumber(s))

Despite this, figuring out how to validate number ranges with regular expressions is interesting and instructive.

A one number range

Rule: A number must be exactly 15.

The simplest range there is. A regex to match this is

\b15\b

Word boundaries are necessary to avoid matching the 15 inside of 8215242.

A two number range

The rule: The number must be between 15 and 16. Three possible regexes:

\b(15|16)\b
\b1(5|6)\b
\b1[5-6]\b

A number range "mirrored" around zero

The rule: The number must be between -12 and 12.

Here is a regex for 0 through 12, positive-only:

   \b(\d|1[0-2])\b

Free-spaced:

   \b(         //The beginning of a word (or number), followed by either
      \d       //   Any digit 0 through 9
   |           //Or
      1[0-2]   //   A 1 followed by any digit between 0 and 2.
   )\b         //The end of a word

Making this work for both negative and positive is as simple as adding an optional dash at the start:

-?\b(\d|1[0-2])\b

(This assumes no inappropriate characters precede the dash.)

To forbid negative numbers, a negative lookbehind is necessary:

(?<!-)\b(\d|1[0-2])\b

Leaving the lookbehind out would cause the 11 in -11 to match. (The first example in this post should have this added.)

Note: \d versus [0-9]

In order to be compatible with all regex flavors, all \d-s should be changed to [0-9]. For example, .NET considers non ASCII numbers, such as those in different languages, as legal values for \d. Except for in the last example, for brevity, it’s left as \d.

(With thanks to TimPietzcker at stackoverflow)

Three digits, with all but the first digit equal to zero

Rule: Must be between 0 and 400.

A possible regex:

(?<!-)\b([1-3]?\d{1,2}|400)\b

Free spaced:

   (?<!-)          //Something not preceded by a dash
   \b(             //Word-start, followed by either
      [1-3]?       //   No digit, or the digit 1, 2, or 3
         \d{1,2}   //   Followed by one or two digits (between 0 and 9)
   |               //Or
      400          //   The number 400
   )\b             //Word-end

Another possibility that should never be used:

\b(0|1|2|3|4|5|6|7|8|9|10|11|12|13|14|15|16|17|18|19|20|21|22|23|24|25|26|27|28|29|30|31|32|33|34|35|36|37|38|39|40|41|42|43|44|45|46|47|48|49|50|51|52|53|54|55|56|57|58|59|60|61|62|63|64|65|66|67|68|69|70|71|72|73|74|75|76|77|78|79|80|81|82|83|84|85|86|87|88|89|90|91|92|93|94|95|96|97|98|99|100|101|102|103|104|105|106|107|108|109|110|111|112|113|114|115|116|117|118|119|120|121|122|123|124|125|126|127|128|129|130|131|132|133|134|135|136|137|138|139|140|141|142|143|144|145|146|147|148|149|150|151|152|153|154|155|156|157|158|159|160|161|162|163|164|165|166|167|168|169|170|171|172|173|174|175|176|177|178|179|180|181|182|183|184|185|186|187|188|189|190|191|192|193|194|195|196|197|198|199|200|201|202|203|204|205|206|207|208|209|210|211|212|213|214|215|216|217|218|219|220|221|222|223|224|225|226|227|228|229|230|231|232|233|234|235|236|237|238|239|240|241|242|243|244|245|246|247|248|249|250|251|252|253|254|255|256|257|258|259|260|261|262|263|264|265|266|267|268|269|270|271|272|273|274|275|276|277|278|279|280|281|282|283|284|285|286|287|288|289|290|291|292|293|294|295|296|297|298|299|300|301|302|303|304|305|306|307|308|309|310|311|312|313|314|315|316|317|318|319|320|321|322|323|324|325|326|327|328|329|330|331|332|333|334|335|336|337|338|339|340|341|342|343|344|345|346|347|348|349|350|351|352|353|354|355|356|357|358|359|360|361|362|363|364|365|366|367|368|369|370|371|372|373|374|375|376|377|378|379|380|381|382|383|384|385|386|387|388|389|390|391|392|393|394|395|396|397|398|399|400)\b

Final example: Four digits, mirrored around zero, that does not end with zeros.

Rule: Must be between -2055 and 2055

This is from a question on stackoverflow.

Regex:

(-?\b(?:20(?:5[0-5]|[0-4][0-9])|1[0-9]{3}|[1-9][0-9]{0,2}|(?<!-)0+))\b

Free-spaced:

(             //Capture group for the entire number
   -?\b             //Optional dash, followed by a word (number) boundary
   (?:20            //Followed by "20", which is followed by one of 
         (?:5[0-5]        //50 through 55
      |                                         //or
         [0-4][0-9])      //00 through 49
      |                                         //or
         1[0-9]{3}        //a one followed by any three digits
      |                                         //or
         [1-9][0-9]{0,2}  //1-9 followed by 0 through 2 of any digit
      |                                         //or
         (?<!-)0+         //one-or-more zeros *not* preceded by a dash
   )                 //end "or" non-capture group
)\b            //End number capture group, followed by a word-bound

Here is a visual representation of this regex (Try it out yourself):


(With thanks to PlasmaPower and Casimir et Hippolyte on stackoverflow for the debugging assistance.)

Final note

Depending on what you are capturing, it is likely that all sub-groups should be made into non-capture groups. For example, this:

(-?\b(?:20(?:5[0-5]|[0-4][0-9])|1?[0-9]{1,3})\b)

Instead of this:

-?\b(20(5[0-5]|[0-4][0-9])|1?[0-9]{1,3})\b

Example Java implementation

      import  java.util.Scanner;
      import  java.util.regex.Matcher;
      import  java.util.regex.Pattern;
      import  org.apache.commons.lang.math.NumberUtils;
    /**
      <P>Confirm a user-input number is a valid number by reading a string an testing it is numeric before converting it to an it--this loops until a valid number is provided.</P>
   
      <P>{@code java UserInputNumInRangeWRegex}</P>
     **/
    public class UserInputNumInRangeWRegex  {
      public static final void main(String[] ignored)  {
   
         int num = -1;
         boolean isNum = false;
   
         int iRangeMax = 2055;
   
         //"": Dummy string, to reuse matcher
         Matcher mtchrNumNegThrPos = Pattern.compile("(-?\\b(?:20(?:5[0-5]|[0-4][0-9])|1[0-9]{3}|[1-9][0-9]{0,2}|(?<!-)0+))\\b").matcher("");
   
         do  {
            System.out.print("Enter a number between -" + iRangeMax + " and " + iRangeMax + ": ");
            String strInput = (new Scanner(System.in)).next();
            if(!NumberUtils.isNumber(strInput))  {
               System.out.println("Not a number. Try again.");
            }  else if(!mtchrNumNegThrPos.reset(strInput).matches())  {
               System.out.println("Not in range. Try again.");
            }  else  {
               //Safe to convert
               num = Integer.parseInt(strInput);
               isNum = true;
            }
         }  while(!isNum);
   
         System.out.println("Number: " + num);
      }
   }

Output

[C:\java_code\]java UserInputNumInRangeWRegex
Enter a number between -2055 and 2055: tuhet
Not a number. Try again.
Enter a number between -2055 and 2055: 283837483
Not in range. Try again.
Enter a number between -2055 and 2055: -200000
Not in range. Try again.
Enter a number between -2055 and 2055: -300
Number: -300