Here is the data file we are interested in...
The data is about bank transactions conducted on accounts.
07/10/09,11:10:20 AM,63463284,credit,salary electronic deposit,$1000.00
07/11/09,11:10:20 AM,63463284,credit,CD interest,$22.00
07/12/09,11:10:20 AM,63463284,credit,tax refund,$300.00
07/13/09,11:10:20 AM,63463284,credit,ATM deposit,$500.00
07/14/09,11:10:20 AM,63463284,debit,credit card payment,$137.00
07/15/09,11:10:20 AM,63463284,credit,checking a/c interest,$20.00
07/16/09,11:10:20 AM,63463284,debit,power bill,$80.00
07/17/09,11:10:20 AM,63463284,credit,Check #8774,$350.00
07/18/09,11:10:20 AM,63463284,debit,overdraft fee,$5.00
07/19/09,11:10:20 AM,63463284,credit,Check #167,$1000.00
07/10/09,11:10:20 AM,77667529,credit,salary electronic deposit,$2000.00
07/11/09,11:10:20 AM,77667529,credit,CD interest,$22.00
07/12/09,11:10:20 AM,77667529,credit,tax refund,$300.00
07/13/09,11:10:20 AM,77667529,credit,ATM deposit,$500.00
07/14/09,11:10:20 AM,77667529,debit,credit card payment,$137.00
07/15/09,11:10:20 AM,77667529,credit,checking a/c interest,$20.00
07/16/09,11:10:20 AM,77667529,debit,power bill,$80.00
07/17/09,11:10:20 AM,77667529,credit,Check #7766,$350.00
07/18/09,11:10:20 AM,77667529,debit,Check copy fee,$10.00
07/19/09,11:10:20 AM,77667529,credit,Check #125,$1000.00
07/10/09,11:10:20 AM,33445398,credit,salary electronic deposit,$2000.00
07/11/09,11:10:20 AM,33445398,credit,CD interest,$22.00
07/12/09,11:10:20 AM,33445398,credit,tax refund,$300.00
07/13/09,11:10:20 AM,33445398,credit,ATM deposit,$500.00
07/14/09,11:10:20 AM,33445398,debit,credit card payment,$137.00
07/15/09,11:10:20 AM,33445398,credit,checking a/c interest,$20.00
07/16/09,11:10:20 AM,33445398,debit,power bill,$80.00
07/17/09,11:10:20 AM,33445398,credit,Check #1255,$350.00
07/18/09,11:10:20 AM,33445398,debit,overdraft fee,$5.00
07/19/09,11:10:20 AM,33445398,credit,Check #433,$1000.00
Let's take a look at the first few lines of data...
Date Time account # type description currency amount
------- ------- --------------- ------- ------------------ -------------- ------------
07/10/09,11:10:20 AM,63463284,credit,salary electronic deposit,$1000.00
07/11/09,11:10:20 AM,63463284,credit,CD interest,$22.00
07/12/09,11:10:20 AM,63463284,credit,tax refund,$300.00
07/13/09,11:10:20 AM,63463284,credit,ATM deposit,$500.00
07/14/09,11:10:20 AM,63463284,debit,credit card payment,$137.00
07/15/09,11:10:20 AM,63463284,credit,checking a/c interest,$20.00
07/16/09,11:10:20 AM,63463284,debit,power bill,$80.00
07/17/09,11:10:20 AM,63463284,credit,Check #8774,$350.00
07/18/09,11:10:20 AM,63463284,debit,overdraft fee,$5.00
07/19/09,11:10:20 AM,63463284,credit,Check #167,$1000.00
..
..
..
Say we want to give credit to all overdraft fees charged for all accounts by adding an adjustment(credit) for the same amount. That is we want to add a new data line for every line which contains 'overdraft fee'. For example if you look at the lines of data above. For every line like this...
07/18/09,11:10:20 AM,63463284,debit,overdraft fee,$5.00
we want to add a new line like this....
07/18/09,11:10:20 AM,63463284,credit,overdraft fee adjustment,$5.00
The regular expression .*overdraft fee.*\n will get us the line containing overdraft fee. But we also need the timestamp at the beginning of the line. So let's refine the search string by adding a group.
^(.*)debit,overdraft fee,(.*)\n
So everything from the start of line upto debit forms group 1, which is the timestamp and account number.
The replacement string should be the entire last line which is $0 plus the same timestamp and account number($1) plus new text (credit,overdraft fee adjustment, $2) plus platform independent line feed (\R). Putting them all together the replacement string should be
$0$1credit,overdraft fee adjustment,$2\R
This was tested in eclipse IDE. In case of text editors like Textpad, the search string will be
^\\(.*\\)debit,overdraft fee,\\(.*\\)\n
and the replace string will be \0\1credit,overdraft fee adjustment,\2\n
Using the search and replace sub menu item of eclipse
 

 
 
No comments:
Post a Comment